Mailing List Archive

Parameterized queries in Lucene
Hi all,

In the world of relational databases and SQL, the existence of
parameterized queries (aka. PreparedStatement) offers many advantages in
terms of security and performance.

I guess everybody is familiar with the idea that you prepare a statement
and then you execute it multiple times by just changing certain parameters.
A simple use case for demonstrating the idea
is shown below:

Query q = ... // An arbitrary complex query with a part that has a single
parameter of type int
for (int i=0; i<100; i++) {
int paramValue = i;
q.visit(new ParameterSetter(paramValue));
TopDocs docs = searcher.search(q, 10);
}

Note that this is a very simplistic use case and does not correspond to the
reality where the construction and execution are not done side by side.

I already implemented something to satisfy use-cases like the one shown
above by introducing a new subclass of Query. However, I was wondering if
there is already a mechanism to compile and execute queries with parameters
in Lucene and I am just reinventing the wheel.

Feedback is much appreciated!

Best,
Stamatis
Re: Parameterized queries in Lucene [ In reply to ]
I am curious — what use case are you targeting to solve here?

In relational world, this is useful primarily due to the fact that prepared
statements eliminate the need for re planning the query, thus saving the
cost of iterating over a potentially large combinatorial space. However,
for Lucene, there isn’t so much of a concept of a query plan (yet). Some
queries try to achieve that (IndexOrDocValuesQuery for eg), but it is a far
cry from what relational databases expose.

Atri

On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <zabetak@gmail.com> wrote:

> Hi al
> In the world of relational databases and SQL, the existence of
> parameterized queries (aka. PreparedStatement) offers many advantages in
> terms of security and performance.
>
> I guess everybody is familiar with the idea that you prepare a statement
> and then you execute it multiple times by just changing certain parameters.
> A simple use case for demonstrating the idea
> is shown below:
>
> Query q = ... // An arbitrary complex query with a part that has a single
> parameter of type int
> for (int i=0; i<100; i++) {
> int paramValue = i;
> q.visit(new ParameterSetter(paramValue));
> TopDocs docs = searcher.search(q, 10);
> }
>
> Note that this is a very simplistic use case and does not correspond to the
> reality where the construction and execution are not done side by side.
>
> I already implemented something to satisfy use-cases like the one shown
> above by introducing a new subclass of Query. However, I was wondering if
> there is already a mechanism to compile and execute queries with parameters
> in Lucene and I am just reinventing the wheel.
>
> Feedback is much appreciated!
>
> Best,
> Stamatis
>
--
Regards,

Atri
Apache Concerted
Re: Parameterized queries in Lucene [ In reply to ]
Hi Atri,

Let's assume that we have the following simple SQL query over a Lucene
index holding book authors.

SELECT *
FROM authors a
WHERE a.name = ? AND a.age > 15

The where clause corresponds to a BooleanQuery combining a TermQuery and a
IntPoint query.
During the prepare phase of the SQL statement we cannot really construct
the BooleanQuery since the parameter
is not yet bound. We have to postpone the creation of the query to the
execution time of the statement where all parameters are bound. Running the
same query many times with a different parameter means recreating the Query
every time.

I admit that creation of the Lucene query is not the most expensive part of
the planning process still we can gain something by not creating the whole
query for every single parameter.

In terms of design it appears more natural to build the Lucene query during
the preparation phase and not during the execution phase.

Best,
Stamatis




On Mon, Oct 21, 2019 at 2:20 PM Atri Sharma <atri@apache.org> wrote:

> I am curious — what use case are you targeting to solve here?
>
> In relational world, this is useful primarily due to the fact that prepared
> statements eliminate the need for re planning the query, thus saving the
> cost of iterating over a potentially large combinatorial space. However,
> for Lucene, there isn’t so much of a concept of a query plan (yet). Some
> queries try to achieve that (IndexOrDocValuesQuery for eg), but it is a far
> cry from what relational databases expose.
>
> Atri
>
> On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <zabetak@gmail.com>
> wrote:
>
> > Hi al
> > In the world of relational databases and SQL, the existence of
> > parameterized queries (aka. PreparedStatement) offers many advantages in
> > terms of security and performance.
> >
> > I guess everybody is familiar with the idea that you prepare a statement
> > and then you execute it multiple times by just changing certain
> parameters.
> > A simple use case for demonstrating the idea
> > is shown below:
> >
> > Query q = ... // An arbitrary complex query with a part that has a single
> > parameter of type int
> > for (int i=0; i<100; i++) {
> > int paramValue = i;
> > q.visit(new ParameterSetter(paramValue));
> > TopDocs docs = searcher.search(q, 10);
> > }
> >
> > Note that this is a very simplistic use case and does not correspond to
> the
> > reality where the construction and execution are not done side by side.
> >
> > I already implemented something to satisfy use-cases like the one shown
> > above by introducing a new subclass of Query. However, I was wondering if
> > there is already a mechanism to compile and execute queries with
> parameters
> > in Lucene and I am just reinventing the wheel.
> >
> > Feedback is much appreciated!
> >
> > Best,
> > Stamatis
> >
> --
> Regards,
>
> Atri
> Apache Concerted
>
Re: Parameterized queries in Lucene [ In reply to ]
I agree with Atri that it makes little sense to support parameterized
queries like this. Lucene not only uses field statistics but also term
statistics, so we couldn't make decisions about the right way to
execute this query before knowing what the value of `a.name` is.
Supporting something like that would send the message that it makes
things more efficient while in practice it would only save a couple
object creations.

On Wed, Oct 23, 2019 at 9:26 AM Stamatis Zampetakis <zabetak@gmail.com> wrote:
>
> Hi Atri,
>
> Let's assume that we have the following simple SQL query over a Lucene
> index holding book authors.
>
> SELECT *
> FROM authors a
> WHERE a.name = ? AND a.age > 15
>
> The where clause corresponds to a BooleanQuery combining a TermQuery and a
> IntPoint query.
> During the prepare phase of the SQL statement we cannot really construct
> the BooleanQuery since the parameter
> is not yet bound. We have to postpone the creation of the query to the
> execution time of the statement where all parameters are bound. Running the
> same query many times with a different parameter means recreating the Query
> every time.
>
> I admit that creation of the Lucene query is not the most expensive part of
> the planning process still we can gain something by not creating the whole
> query for every single parameter.
>
> In terms of design it appears more natural to build the Lucene query during
> the preparation phase and not during the execution phase.
>
> Best,
> Stamatis
>
>
>
>
> On Mon, Oct 21, 2019 at 2:20 PM Atri Sharma <atri@apache.org> wrote:
>
> > I am curious — what use case are you targeting to solve here?
> >
> > In relational world, this is useful primarily due to the fact that prepared
> > statements eliminate the need for re planning the query, thus saving the
> > cost of iterating over a potentially large combinatorial space. However,
> > for Lucene, there isn’t so much of a concept of a query plan (yet). Some
> > queries try to achieve that (IndexOrDocValuesQuery for eg), but it is a far
> > cry from what relational databases expose.
> >
> > Atri
> >
> > On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <zabetak@gmail.com>
> > wrote:
> >
> > > Hi al
> > > In the world of relational databases and SQL, the existence of
> > > parameterized queries (aka. PreparedStatement) offers many advantages in
> > > terms of security and performance.
> > >
> > > I guess everybody is familiar with the idea that you prepare a statement
> > > and then you execute it multiple times by just changing certain
> > parameters.
> > > A simple use case for demonstrating the idea
> > > is shown below:
> > >
> > > Query q = ... // An arbitrary complex query with a part that has a single
> > > parameter of type int
> > > for (int i=0; i<100; i++) {
> > > int paramValue = i;
> > > q.visit(new ParameterSetter(paramValue));
> > > TopDocs docs = searcher.search(q, 10);
> > > }
> > >
> > > Note that this is a very simplistic use case and does not correspond to
> > the
> > > reality where the construction and execution are not done side by side.
> > >
> > > I already implemented something to satisfy use-cases like the one shown
> > > above by introducing a new subclass of Query. However, I was wondering if
> > > there is already a mechanism to compile and execute queries with
> > parameters
> > > in Lucene and I am just reinventing the wheel.
> > >
> > > Feedback is much appreciated!
> > >
> > > Best,
> > > Stamatis
> > >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
> >



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parameterized queries in Lucene [ In reply to ]
Hi Adrien,

Are there optimizations based on statistics that take place during the
construction process of the query?

When the query is about to be executed (passing it to the searcher) all
parameters must be bound.
So in terms of optimizations I was thinking that nothing would change.

To make sure that we are all in the same page, I created a simple gist [1]
of how I had in mind to support parameterized queries in Lucene.
If you think that it is a bad idea then I will go back to the option of
recreating the query every time.

Best,
Stamatis

[1] https://gist.github.com/zabetak/ac2a8320c72779dd646230278662fdd4


On Wed, Oct 23, 2019 at 11:41 AM Adrien Grand <jpountz@gmail.com> wrote:

> I agree with Atri that it makes little sense to support parameterized
> queries like this. Lucene not only uses field statistics but also term
> statistics, so we couldn't make decisions about the right way to
> execute this query before knowing what the value of `a.name` is.
> Supporting something like that would send the message that it makes
> things more efficient while in practice it would only save a couple
> object creations.
>
> On Wed, Oct 23, 2019 at 9:26 AM Stamatis Zampetakis <zabetak@gmail.com>
> wrote:
> >
> > Hi Atri,
> >
> > Let's assume that we have the following simple SQL query over a Lucene
> > index holding book authors.
> >
> > SELECT *
> > FROM authors a
> > WHERE a.name = ? AND a.age > 15
> >
> > The where clause corresponds to a BooleanQuery combining a TermQuery and
> a
> > IntPoint query.
> > During the prepare phase of the SQL statement we cannot really construct
> > the BooleanQuery since the parameter
> > is not yet bound. We have to postpone the creation of the query to the
> > execution time of the statement where all parameters are bound. Running
> the
> > same query many times with a different parameter means recreating the
> Query
> > every time.
> >
> > I admit that creation of the Lucene query is not the most expensive part
> of
> > the planning process still we can gain something by not creating the
> whole
> > query for every single parameter.
> >
> > In terms of design it appears more natural to build the Lucene query
> during
> > the preparation phase and not during the execution phase.
> >
> > Best,
> > Stamatis
> >
> >
> >
> >
> > On Mon, Oct 21, 2019 at 2:20 PM Atri Sharma <atri@apache.org> wrote:
> >
> > > I am curious — what use case are you targeting to solve here?
> > >
> > > In relational world, this is useful primarily due to the fact that
> prepared
> > > statements eliminate the need for re planning the query, thus saving
> the
> > > cost of iterating over a potentially large combinatorial space.
> However,
> > > for Lucene, there isn’t so much of a concept of a query plan (yet).
> Some
> > > queries try to achieve that (IndexOrDocValuesQuery for eg), but it is
> a far
> > > cry from what relational databases expose.
> > >
> > > Atri
> > >
> > > On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <zabetak@gmail.com>
> > > wrote:
> > >
> > > > Hi al
> > > > In the world of relational databases and SQL, the existence of
> > > > parameterized queries (aka. PreparedStatement) offers many
> advantages in
> > > > terms of security and performance.
> > > >
> > > > I guess everybody is familiar with the idea that you prepare a
> statement
> > > > and then you execute it multiple times by just changing certain
> > > parameters.
> > > > A simple use case for demonstrating the idea
> > > > is shown below:
> > > >
> > > > Query q = ... // An arbitrary complex query with a part that has a
> single
> > > > parameter of type int
> > > > for (int i=0; i<100; i++) {
> > > > int paramValue = i;
> > > > q.visit(new ParameterSetter(paramValue));
> > > > TopDocs docs = searcher.search(q, 10);
> > > > }
> > > >
> > > > Note that this is a very simplistic use case and does not correspond
> to
> > > the
> > > > reality where the construction and execution are not done side by
> side.
> > > >
> > > > I already implemented something to satisfy use-cases like the one
> shown
> > > > above by introducing a new subclass of Query. However, I was
> wondering if
> > > > there is already a mechanism to compile and execute queries with
> > > parameters
> > > > in Lucene and I am just reinventing the wheel.
> > > >
> > > > Feedback is much appreciated!
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > --
> > > Regards,
> > >
> > > Atri
> > > Apache Concerted
> > >
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Parameterized queries in Lucene [ In reply to ]
I am still not sure of what benefits we see from the new query type. As
Adrien mentioned, Lucene queries are dependent on term specific statistics.

I can imagine that this approach would save some GC cycles by reusing the
same object, but you would need to have a very large number of objects for
that to make a significant difference. In any case, for a user to emulate
this behaviour, it should be a simple matter of writing a custom
QueryVisitor.

On Wed, 23 Oct 2019 at 17:52, Stamatis Zampetakis <zabetak@gmail.com> wrote:

> Hi Adrien,
>
> Are there optimizations based on statistics that take place during the
> construction process of the query?
>
> When the query is about to be executed (passing it to the searcher) all
> parameters must be bound.
> So in terms of optimizations I was thinking that nothing would change.
>
> To make sure that we are all in the same page, I created a simple gist [1]
> of how I had in mind to support parameterized queries in Lucene.
> If you think that it is a bad idea then I will go back to the option of
> recreating the query every time.
>
> Best,
> Stamatis
>
> [1] https://gist.github.com/zabetak/ac2a8320c72779dd646230278662fdd4
>
>
> On Wed, Oct 23, 2019 at 11:41 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> > I agree with Atri that it makes little sense to support parameterized
> > queries like this. Lucene not only uses field statistics but also term
> > statistics, so we couldn't make decisions about the right way to
> > execute this query before knowing what the value of `a.name` is.
> > Supporting something like that would send the message that it makes
> > things more efficient while in practice it would only save a couple
> > object creations.
> >
> > On Wed, Oct 23, 2019 at 9:26 AM Stamatis Zampetakis <zabetak@gmail.com>
> > wrote:
> > >
> > > Hi Atri,
> > >
> > > Let's assume that we have the following simple SQL query over a Lucene
> > > index holding book authors.
> > >
> > > SELECT *
> > > FROM authors a
> > > WHERE a.name = ? AND a.age > 15
> > >
> > > The where clause corresponds to a BooleanQuery combining a TermQuery
> and
> > a
> > > IntPoint query.
> > > During the prepare phase of the SQL statement we cannot really
> construct
> > > the BooleanQuery since the parameter
> > > is not yet bound. We have to postpone the creation of the query to the
> > > execution time of the statement where all parameters are bound. Running
> > the
> > > same query many times with a different parameter means recreating the
> > Query
> > > every time.
> > >
> > > I admit that creation of the Lucene query is not the most expensive
> part
> > of
> > > the planning process still we can gain something by not creating the
> > whole
> > > query for every single parameter.
> > >
> > > In terms of design it appears more natural to build the Lucene query
> > during
> > > the preparation phase and not during the execution phase.
> > >
> > > Best,
> > > Stamatis
> > >
> > >
> > >
> > >
> > > On Mon, Oct 21, 2019 at 2:20 PM Atri Sharma <atri@apache.org> wrote:
> > >
> > > > I am curious — what use case are you targeting to solve here?
> > > >
> > > > In relational world, this is useful primarily due to the fact that
> > prepared
> > > > statements eliminate the need for re planning the query, thus saving
> > the
> > > > cost of iterating over a potentially large combinatorial space.
> > However,
> > > > for Lucene, there isn’t so much of a concept of a query plan (yet).
> > Some
> > > > queries try to achieve that (IndexOrDocValuesQuery for eg), but it is
> > a far
> > > > cry from what relational databases expose.
> > > >
> > > > Atri
> > > >
> > > > On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <zabetak@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi al
> > > > > In the world of relational databases and SQL, the existence of
> > > > > parameterized queries (aka. PreparedStatement) offers many
> > advantages in
> > > > > terms of security and performance.
> > > > >
> > > > > I guess everybody is familiar with the idea that you prepare a
> > statement
> > > > > and then you execute it multiple times by just changing certain
> > > > parameters.
> > > > > A simple use case for demonstrating the idea
> > > > > is shown below:
> > > > >
> > > > > Query q = ... // An arbitrary complex query with a part that has a
> > single
> > > > > parameter of type int
> > > > > for (int i=0; i<100; i++) {
> > > > > int paramValue = i;
> > > > > q.visit(new ParameterSetter(paramValue));
> > > > > TopDocs docs = searcher.search(q, 10);
> > > > > }
> > > > >
> > > > > Note that this is a very simplistic use case and does not
> correspond
> > to
> > > > the
> > > > > reality where the construction and execution are not done side by
> > side.
> > > > >
> > > > > I already implemented something to satisfy use-cases like the one
> > shown
> > > > > above by introducing a new subclass of Query. However, I was
> > wondering if
> > > > > there is already a mechanism to compile and execute queries with
> > > > parameters
> > > > > in Lucene and I am just reinventing the wheel.
> > > > >
> > > > > Feedback is much appreciated!
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > --
> > > > Regards,
> > > >
> > > > Atri
> > > > Apache Concerted
> > > >
> >
> >
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
--
Regards,

Atri
Apache Concerted
Re: Parameterized queries in Lucene [ In reply to ]
In terms of performance, I am not going to insist a lot before performing
some accurate benchmarks.
(I will do that and share the results here if I find some time).

The main benefit I see for having such a new query type in Lucene core is
that it guides the users on
creating parameterized queries and it also acts as a documentation point
for the things discussed in this thread.

However, I respect your point of view that maybe it is better not to give
the users much flexibility since it will not
change something fundamentally on the way the query is evaluated in Lucene.

My question is completely answered. Thanks all for your time!

Best,
Stamatis

On Wed, Oct 23, 2019 at 2:29 PM Atri Sharma <atri@apache.org> wrote:

> I am still not sure of what benefits we see from the new query type. As
> Adrien mentioned, Lucene queries are dependent on term specific
> statistics.
>
> I can imagine that this approach would save some GC cycles by reusing the
> same object, but you would need to have a very large number of objects for
> that to make a significant difference. In any case, for a user to emulate
> this behaviour, it should be a simple matter of writing a custom
> QueryVisitor.
>
> On Wed, 23 Oct 2019 at 17:52, Stamatis Zampetakis <zabetak@gmail.com>
> wrote:
>
> > Hi Adrien,
> >
> > Are there optimizations based on statistics that take place during the
> > construction process of the query?
> >
> > When the query is about to be executed (passing it to the searcher) all
> > parameters must be bound.
> > So in terms of optimizations I was thinking that nothing would change.
> >
> > To make sure that we are all in the same page, I created a simple gist
> [1]
> > of how I had in mind to support parameterized queries in Lucene.
> > If you think that it is a bad idea then I will go back to the option of
> > recreating the query every time.
> >
> > Best,
> > Stamatis
> >
> > [1] https://gist.github.com/zabetak/ac2a8320c72779dd646230278662fdd4
> >
> >
> > On Wed, Oct 23, 2019 at 11:41 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > I agree with Atri that it makes little sense to support parameterized
> > > queries like this. Lucene not only uses field statistics but also term
> > > statistics, so we couldn't make decisions about the right way to
> > > execute this query before knowing what the value of `a.name` is.
> > > Supporting something like that would send the message that it makes
> > > things more efficient while in practice it would only save a couple
> > > object creations.
> > >
> > > On Wed, Oct 23, 2019 at 9:26 AM Stamatis Zampetakis <zabetak@gmail.com
> >
> > > wrote:
> > > >
> > > > Hi Atri,
> > > >
> > > > Let's assume that we have the following simple SQL query over a
> Lucene
> > > > index holding book authors.
> > > >
> > > > SELECT *
> > > > FROM authors a
> > > > WHERE a.name = ? AND a.age > 15
> > > >
> > > > The where clause corresponds to a BooleanQuery combining a TermQuery
> > and
> > > a
> > > > IntPoint query.
> > > > During the prepare phase of the SQL statement we cannot really
> > construct
> > > > the BooleanQuery since the parameter
> > > > is not yet bound. We have to postpone the creation of the query to
> the
> > > > execution time of the statement where all parameters are bound.
> Running
> > > the
> > > > same query many times with a different parameter means recreating the
> > > Query
> > > > every time.
> > > >
> > > > I admit that creation of the Lucene query is not the most expensive
> > part
> > > of
> > > > the planning process still we can gain something by not creating the
> > > whole
> > > > query for every single parameter.
> > > >
> > > > In terms of design it appears more natural to build the Lucene query
> > > during
> > > > the preparation phase and not during the execution phase.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2019 at 2:20 PM Atri Sharma <atri@apache.org> wrote:
> > > >
> > > > > I am curious — what use case are you targeting to solve here?
> > > > >
> > > > > In relational world, this is useful primarily due to the fact that
> > > prepared
> > > > > statements eliminate the need for re planning the query, thus
> saving
> > > the
> > > > > cost of iterating over a potentially large combinatorial space.
> > > However,
> > > > > for Lucene, there isn’t so much of a concept of a query plan (yet).
> > > Some
> > > > > queries try to achieve that (IndexOrDocValuesQuery for eg), but it
> is
> > > a far
> > > > > cry from what relational databases expose.
> > > > >
> > > > > Atri
> > > > >
> > > > > On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis <
> zabetak@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi al
> > > > > > In the world of relational databases and SQL, the existence of
> > > > > > parameterized queries (aka. PreparedStatement) offers many
> > > advantages in
> > > > > > terms of security and performance.
> > > > > >
> > > > > > I guess everybody is familiar with the idea that you prepare a
> > > statement
> > > > > > and then you execute it multiple times by just changing certain
> > > > > parameters.
> > > > > > A simple use case for demonstrating the idea
> > > > > > is shown below:
> > > > > >
> > > > > > Query q = ... // An arbitrary complex query with a part that has
> a
> > > single
> > > > > > parameter of type int
> > > > > > for (int i=0; i<100; i++) {
> > > > > > int paramValue = i;
> > > > > > q.visit(new ParameterSetter(paramValue));
> > > > > > TopDocs docs = searcher.search(q, 10);
> > > > > > }
> > > > > >
> > > > > > Note that this is a very simplistic use case and does not
> > correspond
> > > to
> > > > > the
> > > > > > reality where the construction and execution are not done side by
> > > side.
> > > > > >
> > > > > > I already implemented something to satisfy use-cases like the one
> > > shown
> > > > > > above by introducing a new subclass of Query. However, I was
> > > wondering if
> > > > > > there is already a mechanism to compile and execute queries with
> > > > > parameters
> > > > > > in Lucene and I am just reinventing the wheel.
> > > > > >
> > > > > > Feedback is much appreciated!
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Atri
> > > > > Apache Concerted
> > > > >
> > >
> > >
> > >
> > > --
> > > Adrien
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> --
> Regards,
>
> Atri
> Apache Concerted
>