Mailing List Archive

Help to understand the per-field formats in Lucene
Hi, Everyone:

I just start studying lucene's source code.

I'm confused about those per-field formats, in this package of "
*org.apache.lucene.codecs.perfield*"

There are many formats in lucene's codec. But there are only 3 per-field
formats: "DocValues", "KnnVectors" and "Postings".

So, what is the purpose of those three per-field formats?
Why "DocValues", "KnnVectors" and "Postings" are so special that they need
those per-field formats?

For example, I've studied the "KnnVectors" a little.
The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the
"Lucene94HnswVectorsFormat".
But why do we have this kind of structures?

Thanks & Regards

MyCoy
Re: Help to understand the per-field formats in Lucene [ In reply to ]
Hello McCoy.
"DocValues", "KnnVectors" and "Postings" are three core principally
different APIs/data structures ie docValues is data column; and postings is
inverted index.
By default codec defines these three formats for all fields. And per-field
wrappers allow configuring separate formats for a particular field,
returning it by field name.
I usually need to experiment with one or a few fields and keep
remaining fields with the default format, so per-field wrappers are usually
quite an intuitive thing.



On Tue, Oct 25, 2022 at 7:36 PM MyCoy Z <mycoy.zhang@gmail.com> wrote:

> Hi, Everyone:
>
> I just start studying lucene's source code.
>
> I'm confused about those per-field formats, in this package of "
> *org.apache.lucene.codecs.perfield*"
>
> There are many formats in lucene's codec. But there are only 3 per-field
> formats: "DocValues", "KnnVectors" and "Postings".
>
> So, what is the purpose of those three per-field formats?
> Why "DocValues", "KnnVectors" and "Postings" are so special that they need
> those per-field formats?
>
> For example, I've studied the "KnnVectors" a little.
> The "PerFieldKnnVectorsFormat.FieldsWriter" acutally uses the
> "Lucene94HnswVectorsFormat".
> But why do we have this kind of structures?
>
> Thanks & Regards
>
> MyCoy
>


--
Sincerely yours
Mikhail Khludnev