Mailing List Archive

Serializatio/Deserialization of Lucene objects like queries, sort fields etc
Hello All,

Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and as
all the lucene classes have been changed to non-serializable and since we
are using RPC for invoking the search queries we ran into bunch of
NotSerializableException.

I went through various forums which suggest either to use toString() method
of queries and then use query parser at the receiver end to convert it back
to Query objects. This fixed the NotSerializableException issue but the
behaviour of queries and filters were not correct now. While looking into
these issues we identified that this could be because of toString and query
parising not returning the equivalent query objects.

Hence we again started looking for other serialization options and got a
reference of using Kryo serializers for the same purpose. But using Kryo
serializers we are running into buffer overflow and some time running into
ClassCastException for BooleanClause$Occur.

Could someone please point me towards correct way of serialization and
deserialization of Lucene objects.



--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Serializatio/Deserialization of Lucene objects like queries, sort fields etc [ In reply to ]
Hello,

> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0

Wow! That is the biggest version jump I have heard of in some time :) Did
you have to also migrate an index all that way? Or, you fully re-indexed
on you were on 8.7.0?

> as all the lucene classes have been changed to non-serializable and since
we are using RPC for invoking the search queries we ran into bunch of
NotSerializableException.

Long ago (I think perhaps in 4.0 release) we decided removed "implements
Serializable" from all Lucene classes. I think this is the (contentious!
it's title suggests just the opposite!) issue:
https://issues.apache.org/jira/browse/LUCENE-1473. We did this because 1)
Lucene is meant to be a performant, feature rich search engine for a
*single* JVM/machine, not (yet) a fully distributed search engine, and 2)
the backwards compatibility implications of truly supporting serializable
so that one could drop in a new major version of Lucene and expect it to
correctly/efficiently communicate over-the-wire with older Lucene versions
was just a too scary high requirement for ongoing development.

So the Lucene committers long ago decided that it is better to leave such
serialization to the application or distributed search engine running on
top of Lucene. It is a non-feature for Lucene.

> I went through various forums which suggest either to use toString()
method of queries and then use query parser at the receiver end

Alas, this will also not work. Our Query.toString() implementations do not
guarantee that they will always produce a String which, when round-tripped
through a QueryParser (which QueryParser?), will return the same (according
to .equals()) Query object. This was also decided at one point to be a
hopelessly high bar to hold our .toString() methods to. That said, many
Query.toString() implementations do work like this, making the situation
feel trappy :( Maybe we should consistently add a disclaimer to all
Query.toString() making this non-feature clear? At least to Query.java's
toString, which currently seems to have no such warning:

/**


* Prints a query to a string, with <code>field</code> assumed to be the
default field and

* omitted.


*/
public abstract String toString(String field);

These topics have been discussed many times over the years -- it is clearly
a big need for search applications! And I agree, is missing now in Lucene.

> Could someone please point me towards correct way of serialization and
deserialization of Lucene objects.

Perhaps look at how Solr or Elasticsearch (hmm, <= 7.10 sources, when
Elasticsearch was still open-licensed) and borrow/fork/poach those
implementations?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 4, 2021 at 5:14 AM jitesh129 <jitesh129@gmail.com> wrote:

> Hello All,
>
> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and as
> all the lucene classes have been changed to non-serializable and since we
> are using RPC for invoking the search queries we ran into bunch of
> NotSerializableException.
>
> I went through various forums which suggest either to use toString() method
> of queries and then use query parser at the receiver end to convert it back
> to Query objects. This fixed the NotSerializableException issue but the
> behaviour of queries and filters were not correct now. While looking into
> these issues we identified that this could be because of toString and query
> parising not returning the equivalent query objects.
>
> Hence we again started looking for other serialization options and got a
> reference of using Kryo serializers for the same purpose. But using Kryo
> serializers we are running into buffer overflow and some time running into
> ClassCastException for BooleanClause$Occur.
>
> Could someone please point me towards correct way of serialization and
> deserialization of Lucene objects.
>
>
>
> --
> Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Serializatio/Deserialization of Lucene objects like queries, sort fields etc [ In reply to ]
Thanks Mike for the quick response.

Michael McCandless-2 wrote
> Hello,
>
>> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0
>
>>Wow! That is the biggest version jump I have heard of in some time :)
Did
> you have to also migrate an index all that way? Or, you fully re-indexed
> on you were on 8.7.0?
>
> Yeah I agree it is one of the biggest jump one could have in terms of
> library upgrade, but mostly this happened due to the feature powered by
> Lucene being stable till date and with some new feature requests we
> decided to upgrade the library.
>
> We are re-indexing everything to 8.7.0 instead of migrating the index
> incrementally.
>
>> as all the lucene classes have been changed to non-serializable and since
> we are using RPC for invoking the search queries we ran into bunch of
> NotSerializableException.
>
> Long ago (I think perhaps in 4.0 release) we decided removed "implements
> Serializable" from all Lucene classes. I think this is the (contentious!
> it's title suggests just the opposite!) issue:
> https://issues.apache.org/jira/browse/LUCENE-1473. We did this because 1)
> Lucene is meant to be a performant, feature rich search engine for a
> *single* JVM/machine, not (yet) a fully distributed search engine, and 2)
> the backwards compatibility implications of truly supporting serializable
> so that one could drop in a new major version of Lucene and expect it to
> correctly/efficiently communicate over-the-wire with older Lucene versions
> was just a too scary high requirement for ongoing development.
>
> So the Lucene committers long ago decided that it is better to leave such
> serialization to the application or distributed search engine running on
> top of Lucene. It is a non-feature for Lucene.
>
>> I went through various forums which suggest either to use toString()
> method of queries and then use query parser at the receiver end
>
>> Alas, this will also not work. Our Query.toString() implementations do
>> not
> guarantee that they will always produce a String which, when round-tripped
> through a QueryParser (which QueryParser?), will return the same
> (according
> to .equals()) Query object. This was also decided at one point to be a
> hopelessly high bar to hold our .toString() methods to. That said, many
> Query.toString() implementations do work like this, making the situation
> feel trappy :( Maybe we should consistently add a disclaimer to all
> Query.toString() making this non-feature clear? At least to Query.java's
> toString, which currently seems to have no such warning:
>
> This we learned in the hard way.
>
> /**
>
>
> * Prints a query to a string, with
> <code>
> field
> </code>
> assumed to be the
> default field and
>
> * omitted.
>
>
> */
> public abstract String toString(String field);
>
> These topics have been discussed many times over the years -- it is
> clearly
> a big need for search applications! And I agree, is missing now in
> Lucene.
>
>> Could someone please point me towards correct way of serialization and
> deserialization of Lucene objects.
>
>> Perhaps look at how Solr or Elasticsearch (hmm, <= 7.10 sources, when
> Elasticsearch was still open-licensed) and borrow/fork/poach those
> implementations?
>
> Thanks for pointing in this direction, I will have a look at the above
> Solr or ElasticSearch implementations.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 4, 2021 at 5:14 AM jitesh129 &lt;

> jitesh129@

> &gt; wrote:
>
>> Hello All,
>>
>> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and
>> as
>> all the lucene classes have been changed to non-serializable and since we
>> are using RPC for invoking the search queries we ran into bunch of
>> NotSerializableException.
>>
>> I went through various forums which suggest either to use toString()
>> method
>> of queries and then use query parser at the receiver end to convert it
>> back
>> to Query objects. This fixed the NotSerializableException issue but the
>> behaviour of queries and filters were not correct now. While looking into
>> these issues we identified that this could be because of toString and
>> query
>> parising not returning the equivalent query objects.
>>
>> Hence we again started looking for other serialization options and got a
>> reference of using Kryo serializers for the same purpose. But using Kryo
>> serializers we are running into buffer overflow and some time running
>> into
>> ClassCastException for BooleanClause$Occur.
>>
>> Could someone please point me towards correct way of serialization and
>> deserialization of Lucene objects.
>>
>>
>>
>> --
>> Sent from:
>> https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:

> dev-unsubscribe@.apache

>> For additional commands, e-mail:

> dev-help@.apache

>>
>>





--
Sent from: https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org