Mailing List Archive

Storing Json field in Lucene
Hi
I am currently storing indexed field and stored field in separate database. In stored field database, Document Id, Type and Json string of metadata will be stored. Basically i am using it as key-value pair database. For every document to be indexed, we have three different metadata structure to be stored. That is the reason, we have Document Id and Type, so that we can query and retrieve stored field based on type. We have to depend on Lucene as we don't have any other database to store data.

Is it good idea to store complete Json as string to Lucene DB. If we store as separate fields then we have around 30 fields. There will be 30 seeks to get complete stored fields. If we store it as Json then it is a one seek to retrieve the data. Since it is Json, field name and its value will be stored for every record and it may bloat index size. 

Could you guide me what is the better approach. To store as Json or as individual fields.

RegardsGanesh
Re: Storing Json field in Lucene [ In reply to ]
during indexing, you can add the json string to a stored-only field (not
indexed, not doc-values) to each document.

at query time you can then retrieve the json field's value only for the top
K results. this field should not be used for matching or scoring.

the point is that if you do ever want to Lucene for its strengths
(text/multidimensional indexing and search), you should extract those
values from your json document (like you extract Type and id, i guess) and
_also_ add them as separate Fields with indexing/doc-values enabled,
depending on the use-cases for that field.

On Wed, Apr 22, 2020 at 7:01 AM ganesh m <emailgane@yahoo.co.in.invalid>
wrote:

> Hi
> I am currently storing indexed field and stored field in separate
> database. In stored field database, Document Id, Type and Json string of
> metadata will be stored. Basically i am using it as key-value pair
> database. For every document to be indexed, we have three different
> metadata structure to be stored. That is the reason, we have Document Id
> and Type, so that we can query and retrieve stored field based on type. We
> have to depend on Lucene as we don't have any other database to store data.
>
> Is it good idea to store complete Json as string to Lucene DB. If we store
> as separate fields then we have around 30 fields. There will be 30 seeks to
> get complete stored fields. If we store it as Json then it is a one seek to
> retrieve the data. Since it is Json, field name and its value will be
> stored for every record and it may bloat index size.
>
> Could you guide me what is the better approach. To store as Json or as
> individual fields.
>
> RegardsGanesh
>


--
Aditya Varun Chadha | http://www.adichad.net | +49 (0) 152 25914008 (M)
Re: Storing Json field in Lucene [ In reply to ]
"Is it good idea to store complete Json as string to Lucene DB. If we store as separate fields then we have around 30 fields. There will be 30 seeks to get complete stored fields”

This is not true. Under the covers, all the stored fields are compressed and stored as a blob and Lucene does the magic of un-compressing that blob and extracting the stored field when you ask for it.

Further, while you’re right that storing lots of things will bloat the index, that’s not very important. Stored data is kept in separate files (*.fdx) in each segment and has little to no impact on search performance. That data is not accessed unless you ask for the field to be returned, i.e. it’s not part of the data used to get the top N documents. Say you have a search that has 10,000,000 hits and return the top 10. _Only_ the stored data for those top 10 hits is accessed, and that only after all the scoring is done.

I think this is premature optimization, try using the least-complex way organizing your data and measure.

Best,
Erick

> On Apr 22, 2020, at 1:00 AM, ganesh m <emailgane@yahoo.co.in.INVALID> wrote:
>
> Is it good idea to store complete Json as string to Lucene DB. If we store as separate fields then we have around 30 fields. There will be 30 seeks to get complete stored fields


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org