Mailing List Archive

When to use StringField and when to use FacetField for categorization?
Hi

I have found the following simple Facet Example

https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java

whereas for a simple categorization of documents I currently use
StringField, e.g.

doc1.add(new StringField("category", "book"));
doc1.add(new StringField("category", "quantum_physics"));
doc1.add(new StringField("category", "Neumann"))
doc1.add(new StringField("category", "Wheeler"))

doc2.add(new StringField("category", "magazine"));
doc2.add(new StringField("category", "astro_physics"));

which works well, but would it be better to use Facets for this, e.g.

doc1.add(new FacetField("media-type", "book"));
doc1.add(new FacetField("topic", "physics", "quantum");
doc1.add(new FacetField("author", "Neumann");
doc1.add(new FacetField("author", "Wheeler");

doc1.add(new FacetField("media-type", "magazine"));
doc1.add(new FacetField("topic", "physics", "astro");

?

IIUC the StringField approach is more general, whereas the FacetField
approach allows to do a more specific categorization / search.
Or do I misunderstand this?

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
There are some differences.

StringField is indexed into the inverted index (postings) so you can do
efficient filtering. You can also store in stored fields to retrieve.

FacetField does everything StringField does (filtering, storing (maybe?)),
but in addition it stores data for faceting. I.e. you can compute facet
counts or simple aggregations at search time.

FacetField is also hierarchical: you can filter and facet by different
points/levels of your hierarchy.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi
>
> I have found the following simple Facet Example
>
>
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>
> whereas for a simple categorization of documents I currently use
> StringField, e.g.
>
> doc1.add(new StringField("category", "book"));
> doc1.add(new StringField("category", "quantum_physics"));
> doc1.add(new StringField("category", "Neumann"))
> doc1.add(new StringField("category", "Wheeler"))
>
> doc2.add(new StringField("category", "magazine"));
> doc2.add(new StringField("category", "astro_physics"));
>
> which works well, but would it be better to use Facets for this, e.g.
>
> doc1.add(new FacetField("media-type", "book"));
> doc1.add(new FacetField("topic", "physics", "quantum");
> doc1.add(new FacetField("author", "Neumann");
> doc1.add(new FacetField("author", "Wheeler");
>
> doc1.add(new FacetField("media-type", "magazine"));
> doc1.add(new FacetField("topic", "physics", "astro");
>
> ?
>
> IIUC the StringField approach is more general, whereas the FacetField
> approach allows to do a more specific categorization / search.
> Or do I misunderstand this?
>
> Thanks
>
> Michael
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
FYI there is also KeywordField, which combines StringField and
SortedSetDocValuesField. It supports filtering, sorting, faceting and
retrieval. It's my go-to field for string values.

Le ven. 20 oct. 2023, 12:20, Michael McCandless <lucene@mikemccandless.com>
a écrit :

> There are some differences.
>
> StringField is indexed into the inverted index (postings) so you can do
> efficient filtering. You can also store in stored fields to retrieve.
>
> FacetField does everything StringField does (filtering, storing (maybe?)),
> but in addition it stores data for faceting. I.e. you can compute facet
> counts or simple aggregations at search time.
>
> FacetField is also hierarchical: you can filter and facet by different
> points/levels of your hierarchy.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <michael.wechner@wyona.com
> >
> wrote:
>
> > Hi
> >
> > I have found the following simple Facet Example
> >
> >
> >
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
> >
> > whereas for a simple categorization of documents I currently use
> > StringField, e.g.
> >
> > doc1.add(new StringField("category", "book"));
> > doc1.add(new StringField("category", "quantum_physics"));
> > doc1.add(new StringField("category", "Neumann"))
> > doc1.add(new StringField("category", "Wheeler"))
> >
> > doc2.add(new StringField("category", "magazine"));
> > doc2.add(new StringField("category", "astro_physics"));
> >
> > which works well, but would it be better to use Facets for this, e.g.
> >
> > doc1.add(new FacetField("media-type", "book"));
> > doc1.add(new FacetField("topic", "physics", "quantum");
> > doc1.add(new FacetField("author", "Neumann");
> > doc1.add(new FacetField("author", "Wheeler");
> >
> > doc1.add(new FacetField("media-type", "magazine"));
> > doc1.add(new FacetField("topic", "physics", "astro");
> >
> > ?
> >
> > IIUC the StringField approach is more general, whereas the FacetField
> > approach allows to do a more specific categorization / search.
> > Or do I misunderstand this?
> >
> > Thanks
> >
> > Michael
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Hi Mike

Thanks for your feedback!

IIUC in order to have the actual advantages of Facets one has to
"connect" it with a TaxonomyWriter

FacetsConfig config = new FacetsConfig();
DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
indexWriter.addDocument(config.build(taxoWriter, doc));

right?

Thanks

Michael




Am 20.10.23 um 12:19 schrieb Michael McCandless:
> There are some differences.
>
> StringField is indexed into the inverted index (postings) so you can do
> efficient filtering. You can also store in stored fields to retrieve.
>
> FacetField does everything StringField does (filtering, storing (maybe?)),
> but in addition it stores data for faceting. I.e. you can compute facet
> counts or simple aggregations at search time.
>
> FacetField is also hierarchical: you can filter and facet by different
> points/levels of your hierarchy.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> Hi
>>
>> I have found the following simple Facet Example
>>
>>
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>
>> whereas for a simple categorization of documents I currently use
>> StringField, e.g.
>>
>> doc1.add(new StringField("category", "book"));
>> doc1.add(new StringField("category", "quantum_physics"));
>> doc1.add(new StringField("category", "Neumann"))
>> doc1.add(new StringField("category", "Wheeler"))
>>
>> doc2.add(new StringField("category", "magazine"));
>> doc2.add(new StringField("category", "astro_physics"));
>>
>> which works well, but would it be better to use Facets for this, e.g.
>>
>> doc1.add(new FacetField("media-type", "book"));
>> doc1.add(new FacetField("topic", "physics", "quantum");
>> doc1.add(new FacetField("author", "Neumann");
>> doc1.add(new FacetField("author", "Wheeler");
>>
>> doc1.add(new FacetField("media-type", "magazine"));
>> doc1.add(new FacetField("topic", "physics", "astro");
>>
>> ?
>>
>> IIUC the StringField approach is more general, whereas the FacetField
>> approach allows to do a more specific categorization / search.
>> Or do I misunderstand this?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Hi Adrien

Thank you very much for your feedback as well!

I just replaced the StringField by KeywordField :-)

Thanks

Michael

Am 20.10.23 um 14:13 schrieb Adrien Grand:
> FYI there is also KeywordField, which combines StringField and
> SortedSetDocValuesField. It supports filtering, sorting, faceting and
> retrieval. It's my go-to field for string values.
>
> Le ven. 20 oct. 2023, 12:20, Michael McCandless
> <lucene@mikemccandless.com> a écrit :
>
> There are some differences.
>
> StringField is indexed into the inverted index (postings) so you
> can do
> efficient filtering.  You can also store in stored fields to retrieve.
>
> FacetField does everything StringField does (filtering, storing
> (maybe?)),
> but in addition it stores data for faceting.  I.e. you can compute
> facet
> counts or simple aggregations at search time.
>
> FacetField is also hierarchical: you can filter and facet by different
> points/levels of your hierarchy.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner
> <michael.wechner@wyona.com>
> wrote:
>
> > Hi
> >
> > I have found the following simple Facet Example
> >
> >
> >
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
> >
> > whereas for a simple categorization of documents I currently use
> > StringField, e.g.
> >
> > doc1.add(new StringField("category", "book"));
> > doc1.add(new StringField("category", "quantum_physics"));
> > doc1.add(new StringField("category", "Neumann"))
> > doc1.add(new StringField("category", "Wheeler"))
> >
> > doc2.add(new StringField("category", "magazine"));
> > doc2.add(new StringField("category", "astro_physics"));
> >
> > which works well, but would it be better to use Facets for this,
> e.g.
> >
> > doc1.add(new FacetField("media-type", "book"));
> > doc1.add(new FacetField("topic", "physics", "quantum");
> > doc1.add(new FacetField("author", "Neumann");
> > doc1.add(new FacetField("author", "Wheeler");
> >
> > doc1.add(new FacetField("media-type", "magazine"));
> > doc1.add(new FacetField("topic", "physics", "astro");
> >
> > ?
> >
> > IIUC the StringField approach is more general, whereas the
> FacetField
> > approach allows to do a more specific categorization / search.
> > Or do I misunderstand this?
> >
> > Thanks
> >
> > Michael
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
You can use either the "doc values" implementation for facets
(SortedSetDocValuesFacetField), or the "taxonomy" implementation
(FacetField, in which case, yes, you need to create a TaxonomyWriter).

It used to be that the "doc values" based faceting did not support
arbitrary hierarchy, but I think that was fixed at some point.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi Mike
>
> Thanks for your feedback!
>
> IIUC in order to have the actual advantages of Facets one has to
> "connect" it with a TaxonomyWriter
>
> FacetsConfig config = new FacetsConfig();
> DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
> indexWriter.addDocument(config.build(taxoWriter, doc));
>
> right?
>
> Thanks
>
> Michael
>
>
>
>
> Am 20.10.23 um 12:19 schrieb Michael McCandless:
> > There are some differences.
> >
> > StringField is indexed into the inverted index (postings) so you can do
> > efficient filtering. You can also store in stored fields to retrieve.
> >
> > FacetField does everything StringField does (filtering, storing
> (maybe?)),
> > but in addition it stores data for faceting. I.e. you can compute facet
> > counts or simple aggregations at search time.
> >
> > FacetField is also hierarchical: you can filter and facet by different
> > points/levels of your hierarchy.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
> michael.wechner@wyona.com>
> > wrote:
> >
> >> Hi
> >>
> >> I have found the following simple Facet Example
> >>
> >>
> >>
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
> >>
> >> whereas for a simple categorization of documents I currently use
> >> StringField, e.g.
> >>
> >> doc1.add(new StringField("category", "book"));
> >> doc1.add(new StringField("category", "quantum_physics"));
> >> doc1.add(new StringField("category", "Neumann"))
> >> doc1.add(new StringField("category", "Wheeler"))
> >>
> >> doc2.add(new StringField("category", "magazine"));
> >> doc2.add(new StringField("category", "astro_physics"));
> >>
> >> which works well, but would it be better to use Facets for this, e.g.
> >>
> >> doc1.add(new FacetField("media-type", "book"));
> >> doc1.add(new FacetField("topic", "physics", "quantum");
> >> doc1.add(new FacetField("author", "Neumann");
> >> doc1.add(new FacetField("author", "Wheeler");
> >>
> >> doc1.add(new FacetField("media-type", "magazine"));
> >> doc1.add(new FacetField("topic", "physics", "astro");
> >>
> >> ?
> >>
> >> IIUC the StringField approach is more general, whereas the FacetField
> >> approach allows to do a more specific categorization / search.
> >> Or do I misunderstand this?
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
cool, thank you very much!

Michael



Am 20.10.23 um 15:44 schrieb Michael McCandless:
> You can use either the "doc values" implementation for facets
> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
>
> It used to be that the "doc values" based faceting did not support
> arbitrary hierarchy, but I think that was fixed at some point.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> Hi Mike
>>
>> Thanks for your feedback!
>>
>> IIUC in order to have the actual advantages of Facets one has to
>> "connect" it with a TaxonomyWriter
>>
>> FacetsConfig config = new FacetsConfig();
>> DirectoryTaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
>> indexWriter.addDocument(config.build(taxoWriter, doc));
>>
>> right?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>>
>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
>>> There are some differences.
>>>
>>> StringField is indexed into the inverted index (postings) so you can do
>>> efficient filtering. You can also store in stored fields to retrieve.
>>>
>>> FacetField does everything StringField does (filtering, storing
>> (maybe?)),
>>> but in addition it stores data for faceting. I.e. you can compute facet
>>> counts or simple aggregations at search time.
>>>
>>> FacetField is also hierarchical: you can filter and facet by different
>>> points/levels of your hierarchy.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
>> michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have found the following simple Facet Example
>>>>
>>>>
>>>>
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>> whereas for a simple categorization of documents I currently use
>>>> StringField, e.g.
>>>>
>>>> doc1.add(new StringField("category", "book"));
>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>> doc1.add(new StringField("category", "Neumann"))
>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>
>>>> doc2.add(new StringField("category", "magazine"));
>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>
>>>> which works well, but would it be better to use Facets for this, e.g.
>>>>
>>>> doc1.add(new FacetField("media-type", "book"));
>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>> doc1.add(new FacetField("author", "Neumann");
>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>
>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>
>>>> ?
>>>>
>>>> IIUC the StringField approach is more general, whereas the FacetField
>>>> approach allows to do a more specific categorization / search.
>>>> Or do I misunderstand this?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Just following up on Mike's comment:


> It used to be that the "doc values" based faceting did not support
>
arbitrary hierarchy, but I think that was fixed at some point.


Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField supports
hierarchical faceting, I think you just need to enable it in the
FacetsConfig. One thing to keep in mind is even though SSDV faceting
doesn't require a taxonomy index, it still requires a
SortedSetDocValuesReaderState to be maintained, which can be a little bit
expensive to create, but only needs to be done once. This benchmark code
<https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java>
serves as a pretty basic example of SSDV/hierarchical SSDV faceting.

On Fri, Oct 20, 2023 at 7:09?AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> cool, thank you very much!
>
> Michael
>
>
>
> Am 20.10.23 um 15:44 schrieb Michael McCandless:
> > You can use either the "doc values" implementation for facets
> > (SortedSetDocValuesFacetField), or the "taxonomy" implementation
> > (FacetField, in which case, yes, you need to create a TaxonomyWriter).
> >
> > It used to be that the "doc values" based faceting did not support
> > arbitrary hierarchy, but I think that was fixed at some point.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <
> michael.wechner@wyona.com>
> > wrote:
> >
> >> Hi Mike
> >>
> >> Thanks for your feedback!
> >>
> >> IIUC in order to have the actual advantages of Facets one has to
> >> "connect" it with a TaxonomyWriter
> >>
> >> FacetsConfig config = new FacetsConfig();
> >> DirectoryTaxonomyWriter taxoWriter = new
> DirectoryTaxonomyWriter(taxoDir);
> >> indexWriter.addDocument(config.build(taxoWriter, doc));
> >>
> >> right?
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >>
> >>
> >>
> >> Am 20.10.23 um 12:19 schrieb Michael McCandless:
> >>> There are some differences.
> >>>
> >>> StringField is indexed into the inverted index (postings) so you can do
> >>> efficient filtering. You can also store in stored fields to retrieve.
> >>>
> >>> FacetField does everything StringField does (filtering, storing
> >> (maybe?)),
> >>> but in addition it stores data for faceting. I.e. you can compute
> facet
> >>> counts or simple aggregations at search time.
> >>>
> >>> FacetField is also hierarchical: you can filter and facet by different
> >>> points/levels of your hierarchy.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
> >> michael.wechner@wyona.com>
> >>> wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> I have found the following simple Facet Example
> >>>>
> >>>>
> >>>>
> >>
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
> >>>> whereas for a simple categorization of documents I currently use
> >>>> StringField, e.g.
> >>>>
> >>>> doc1.add(new StringField("category", "book"));
> >>>> doc1.add(new StringField("category", "quantum_physics"));
> >>>> doc1.add(new StringField("category", "Neumann"))
> >>>> doc1.add(new StringField("category", "Wheeler"))
> >>>>
> >>>> doc2.add(new StringField("category", "magazine"));
> >>>> doc2.add(new StringField("category", "astro_physics"));
> >>>>
> >>>> which works well, but would it be better to use Facets for this, e.g.
> >>>>
> >>>> doc1.add(new FacetField("media-type", "book"));
> >>>> doc1.add(new FacetField("topic", "physics", "quantum");
> >>>> doc1.add(new FacetField("author", "Neumann");
> >>>> doc1.add(new FacetField("author", "Wheeler");
> >>>>
> >>>> doc1.add(new FacetField("media-type", "magazine"));
> >>>> doc1.add(new FacetField("topic", "physics", "astro");
> >>>>
> >>>> ?
> >>>>
> >>>> IIUC the StringField approach is more general, whereas the FacetField
> >>>> approach allows to do a more specific categorization / search.
> >>>> Or do I misunderstand this?
> >>>>
> >>>> Thanks
> >>>>
> >>>> Michael
> >>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
thanks very much for this additional information, Marc!

Am 20.10.23 um 20:30 schrieb Marc D'Mello:
> Just following up on Mike's comment:
>
>
>> It used to be that the "doc values" based faceting did not support
>>
> arbitrary hierarchy, but I think that was fixed at some point.
>
>
> Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField supports
> hierarchical faceting, I think you just need to enable it in the
> FacetsConfig. One thing to keep in mind is even though SSDV faceting
> doesn't require a taxonomy index, it still requires a
> SortedSetDocValuesReaderState to be maintained, which can be a little bit
> expensive to create, but only needs to be done once. This benchmark code
> <https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java>
> serves as a pretty basic example of SSDV/hierarchical SSDV faceting.
>
> On Fri, Oct 20, 2023 at 7:09?AM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> cool, thank you very much!
>>
>> Michael
>>
>>
>>
>> Am 20.10.23 um 15:44 schrieb Michael McCandless:
>>> You can use either the "doc values" implementation for facets
>>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
>>> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
>>>
>>> It used to be that the "doc values" based faceting did not support
>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <
>> michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> Hi Mike
>>>>
>>>> Thanks for your feedback!
>>>>
>>>> IIUC in order to have the actual advantages of Facets one has to
>>>> "connect" it with a TaxonomyWriter
>>>>
>>>> FacetsConfig config = new FacetsConfig();
>>>> DirectoryTaxonomyWriter taxoWriter = new
>> DirectoryTaxonomyWriter(taxoDir);
>>>> indexWriter.addDocument(config.build(taxoWriter, doc));
>>>>
>>>> right?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
>>>>> There are some differences.
>>>>>
>>>>> StringField is indexed into the inverted index (postings) so you can do
>>>>> efficient filtering. You can also store in stored fields to retrieve.
>>>>>
>>>>> FacetField does everything StringField does (filtering, storing
>>>> (maybe?)),
>>>>> but in addition it stores data for faceting. I.e. you can compute
>> facet
>>>>> counts or simple aggregations at search time.
>>>>>
>>>>> FacetField is also hierarchical: you can filter and facet by different
>>>>> points/levels of your hierarchy.
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
>>>> michael.wechner@wyona.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I have found the following simple Facet Example
>>>>>>
>>>>>>
>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>>> whereas for a simple categorization of documents I currently use
>>>>>> StringField, e.g.
>>>>>>
>>>>>> doc1.add(new StringField("category", "book"));
>>>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>>>> doc1.add(new StringField("category", "Neumann"))
>>>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>>>
>>>>>> doc2.add(new StringField("category", "magazine"));
>>>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>>>
>>>>>> which works well, but would it be better to use Facets for this, e.g.
>>>>>>
>>>>>> doc1.add(new FacetField("media-type", "book"));
>>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>>>> doc1.add(new FacetField("author", "Neumann");
>>>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>>>
>>>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> IIUC the StringField approach is more general, whereas the FacetField
>>>>>> approach allows to do a more specific categorization / search.
>>>>>> Or do I misunderstand this?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Hey Michael-

You've gotten a lot of great information here already. I'll point you to
one more implementation as well: StringValueFacetCounts. This
implementation lets you do faceting over arbitrary "string-like" doc value
fields (SORTED and SORTED_SET). So if you already have a field of this type
you're using for other purposes, and you want to do faceting over it, you
can do it with this implementation.

The faceting-specific fields (there's a taxonomy-based approach and a
non-taxonomy-based approach, both with pros/cons) are also available, which
is what you've referenced here so far (and what others have pointed you
to). These are more "managed" fields with faceting in mind.

A high-level difference here is that faceting-specific fields tend to index
all the facet fields into a single doc values field in the index, which can
make faceting more efficient. StringValueFacetCounts can be less efficient
for faceting (if you have many different fields you want to individually
facet) but could be more flexible for you if you already have these fields
in your index for other purposes and don't want to duplicate the data into
these facet-specific fields.

Not sure if these details are helpful for you or not. If any of this is a
bit unclear, let me know and I'll try to describe things better or answer
specific questions. Honestly, we probably have too many ways to do the same
thing in the faceting module, and maybe our documentation could be a bit
more helpful.

Cheers,
-Greg

On Fri, Oct 20, 2023 at 2:54?PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> thanks very much for this additional information, Marc!
>
> Am 20.10.23 um 20:30 schrieb Marc D'Mello:
> > Just following up on Mike's comment:
> >
> >
> >> It used to be that the "doc values" based faceting did not support
> >>
> > arbitrary hierarchy, but I think that was fixed at some point.
> >
> >
> > Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField
> supports
> > hierarchical faceting, I think you just need to enable it in the
> > FacetsConfig. One thing to keep in mind is even though SSDV faceting
> > doesn't require a taxonomy index, it still requires a
> > SortedSetDocValuesReaderState to be maintained, which can be a little bit
> > expensive to create, but only needs to be done once. This benchmark code
> > <
> https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java
> >
> > serves as a pretty basic example of SSDV/hierarchical SSDV faceting.
> >
> > On Fri, Oct 20, 2023 at 7:09?AM Michael Wechner <
> michael.wechner@wyona.com>
> > wrote:
> >
> >> cool, thank you very much!
> >>
> >> Michael
> >>
> >>
> >>
> >> Am 20.10.23 um 15:44 schrieb Michael McCandless:
> >>> You can use either the "doc values" implementation for facets
> >>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
> >>> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
> >>>
> >>> It used to be that the "doc values" based faceting did not support
> >>> arbitrary hierarchy, but I think that was fixed at some point.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <
> >> michael.wechner@wyona.com>
> >>> wrote:
> >>>
> >>>> Hi Mike
> >>>>
> >>>> Thanks for your feedback!
> >>>>
> >>>> IIUC in order to have the actual advantages of Facets one has to
> >>>> "connect" it with a TaxonomyWriter
> >>>>
> >>>> FacetsConfig config = new FacetsConfig();
> >>>> DirectoryTaxonomyWriter taxoWriter = new
> >> DirectoryTaxonomyWriter(taxoDir);
> >>>> indexWriter.addDocument(config.build(taxoWriter, doc));
> >>>>
> >>>> right?
> >>>>
> >>>> Thanks
> >>>>
> >>>> Michael
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
> >>>>> There are some differences.
> >>>>>
> >>>>> StringField is indexed into the inverted index (postings) so you can
> do
> >>>>> efficient filtering. You can also store in stored fields to
> retrieve.
> >>>>>
> >>>>> FacetField does everything StringField does (filtering, storing
> >>>> (maybe?)),
> >>>>> but in addition it stores data for faceting. I.e. you can compute
> >> facet
> >>>>> counts or simple aggregations at search time.
> >>>>>
> >>>>> FacetField is also hierarchical: you can filter and facet by
> different
> >>>>> points/levels of your hierarchy.
> >>>>>
> >>>>> Mike McCandless
> >>>>>
> >>>>> http://blog.mikemccandless.com
> >>>>>
> >>>>>
> >>>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
> >>>> michael.wechner@wyona.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi
> >>>>>>
> >>>>>> I have found the following simple Facet Example
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
> >>>>>> whereas for a simple categorization of documents I currently use
> >>>>>> StringField, e.g.
> >>>>>>
> >>>>>> doc1.add(new StringField("category", "book"));
> >>>>>> doc1.add(new StringField("category", "quantum_physics"));
> >>>>>> doc1.add(new StringField("category", "Neumann"))
> >>>>>> doc1.add(new StringField("category", "Wheeler"))
> >>>>>>
> >>>>>> doc2.add(new StringField("category", "magazine"));
> >>>>>> doc2.add(new StringField("category", "astro_physics"));
> >>>>>>
> >>>>>> which works well, but would it be better to use Facets for this,
> e.g.
> >>>>>>
> >>>>>> doc1.add(new FacetField("media-type", "book"));
> >>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
> >>>>>> doc1.add(new FacetField("author", "Neumann");
> >>>>>> doc1.add(new FacetField("author", "Wheeler");
> >>>>>>
> >>>>>> doc1.add(new FacetField("media-type", "magazine"));
> >>>>>> doc1.add(new FacetField("topic", "physics", "astro");
> >>>>>>
> >>>>>> ?
> >>>>>>
> >>>>>> IIUC the StringField approach is more general, whereas the
> FacetField
> >>>>>> approach allows to do a more specific categorization / search.
> >>>>>> Or do I misunderstand this?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Michael
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Not all of your fields might be strings

Sent from my iPhone

> On Oct 23, 2023, at 1:10 PM, Greg Miller <gsmiller@gmail.com> wrote:
>
> ?Hey Michael-
>
> You've gotten a lot of great information here already. I'll point you to
> one more implementation as well: StringValueFacetCounts. This
> implementation lets you do faceting over arbitrary "string-like" doc value
> fields (SORTED and SORTED_SET). So if you already have a field of this type
> you're using for other purposes, and you want to do faceting over it, you
> can do it with this implementation.
>
> The faceting-specific fields (there's a taxonomy-based approach and a
> non-taxonomy-based approach, both with pros/cons) are also available, which
> is what you've referenced here so far (and what others have pointed you
> to). These are more "managed" fields with faceting in mind.
>
> A high-level difference here is that faceting-specific fields tend to index
> all the facet fields into a single doc values field in the index, which can
> make faceting more efficient. StringValueFacetCounts can be less efficient
> for faceting (if you have many different fields you want to individually
> facet) but could be more flexible for you if you already have these fields
> in your index for other purposes and don't want to duplicate the data into
> these facet-specific fields.
>
> Not sure if these details are helpful for you or not. If any of this is a
> bit unclear, let me know and I'll try to describe things better or answer
> specific questions. Honestly, we probably have too many ways to do the same
> thing in the faceting module, and maybe our documentation could be a bit
> more helpful.
>
> Cheers,
> -Greg
>
>> On Fri, Oct 20, 2023 at 2:54?PM Michael Wechner <michael.wechner@wyona.com>
>> wrote:
>>
>> thanks very much for this additional information, Marc!
>>
>>> Am 20.10.23 um 20:30 schrieb Marc D'Mello:
>>> Just following up on Mike's comment:
>>>
>>>
>>>> It used to be that the "doc values" based faceting did not support
>>>>
>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>
>>>
>>> Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField
>> supports
>>> hierarchical faceting, I think you just need to enable it in the
>>> FacetsConfig. One thing to keep in mind is even though SSDV faceting
>>> doesn't require a taxonomy index, it still requires a
>>> SortedSetDocValuesReaderState to be maintained, which can be a little bit
>>> expensive to create, but only needs to be done once. This benchmark code
>>> <
>> https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java
>>>
>>> serves as a pretty basic example of SSDV/hierarchical SSDV faceting.
>>>
>>> On Fri, Oct 20, 2023 at 7:09?AM Michael Wechner <
>> michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> cool, thank you very much!
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> Am 20.10.23 um 15:44 schrieb Michael McCandless:
>>>>> You can use either the "doc values" implementation for facets
>>>>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
>>>>> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
>>>>>
>>>>> It used to be that the "doc values" based faceting did not support
>>>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <
>>>> michael.wechner@wyona.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Mike
>>>>>>
>>>>>> Thanks for your feedback!
>>>>>>
>>>>>> IIUC in order to have the actual advantages of Facets one has to
>>>>>> "connect" it with a TaxonomyWriter
>>>>>>
>>>>>> FacetsConfig config = new FacetsConfig();
>>>>>> DirectoryTaxonomyWriter taxoWriter = new
>>>> DirectoryTaxonomyWriter(taxoDir);
>>>>>> indexWriter.addDocument(config.build(taxoWriter, doc));
>>>>>>
>>>>>> right?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
>>>>>>> There are some differences.
>>>>>>>
>>>>>>> StringField is indexed into the inverted index (postings) so you can
>> do
>>>>>>> efficient filtering. You can also store in stored fields to
>> retrieve.
>>>>>>>
>>>>>>> FacetField does everything StringField does (filtering, storing
>>>>>> (maybe?)),
>>>>>>> but in addition it stores data for faceting. I.e. you can compute
>>>> facet
>>>>>>> counts or simple aggregations at search time.
>>>>>>>
>>>>>>> FacetField is also hierarchical: you can filter and facet by
>> different
>>>>>>> points/levels of your hierarchy.
>>>>>>>
>>>>>>> Mike McCandless
>>>>>>>
>>>>>>> http://blog.mikemccandless.com
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
>>>>>> michael.wechner@wyona.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I have found the following simple Facet Example
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>>>>> whereas for a simple categorization of documents I currently use
>>>>>>>> StringField, e.g.
>>>>>>>>
>>>>>>>> doc1.add(new StringField("category", "book"));
>>>>>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>>>>>> doc1.add(new StringField("category", "Neumann"))
>>>>>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>>>>>
>>>>>>>> doc2.add(new StringField("category", "magazine"));
>>>>>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>>>>>
>>>>>>>> which works well, but would it be better to use Facets for this,
>> e.g.
>>>>>>>>
>>>>>>>> doc1.add(new FacetField("media-type", "book"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>>>>>> doc1.add(new FacetField("author", "Neumann");
>>>>>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>>>>>
>>>>>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> IIUC the StringField approach is more general, whereas the
>> FacetField
>>>>>>>> approach allows to do a more specific categorization / search.
>>>>>>>> Or do I misunderstand this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Yo, a facet can be booleon or coordinate or currency,,, so maybe all your facets are the currency and then your fields would be an integer. And then you could say... i just want yen, or pesos or whatever

Sent from my iPhone

> On Oct 20, 2023, at 7:05 AM, Michael Wechner <michael.wechner@wyona.com> wrote:
>
> ?Hi Adrien
>
> Thank you very much for your feedback as well!
>
> I just replaced the StringField by KeywordField :-)
>
> Thanks
>
> Michael
>
>> Am 20.10.23 um 14:13 schrieb Adrien Grand:
>> FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, faceting and retrieval. It's my go-to field for string values.
>>
>> Le ven. 20 oct. 2023, 12:20, Michael McCandless <lucene@mikemccandless.com> a écrit :
>>
>> There are some differences.
>>
>> StringField is indexed into the inverted index (postings) so you
>> can do
>> efficient filtering. You can also store in stored fields to retrieve.
>>
>> FacetField does everything StringField does (filtering, storing
>> (maybe?)),
>> but in addition it stores data for faceting. I.e. you can compute
>> facet
>> counts or simple aggregations at search time.
>>
>> FacetField is also hierarchical: you can filter and facet by different
>> points/levels of your hierarchy.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner
>> <michael.wechner@wyona.com>
>> wrote:
>>
>> > Hi
>> >
>> > I have found the following simple Facet Example
>> >
>> >
>> >
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>> >
>> > whereas for a simple categorization of documents I currently use
>> > StringField, e.g.
>> >
>> > doc1.add(new StringField("category", "book"));
>> > doc1.add(new StringField("category", "quantum_physics"));
>> > doc1.add(new StringField("category", "Neumann"))
>> > doc1.add(new StringField("category", "Wheeler"))
>> >
>> > doc2.add(new StringField("category", "magazine"));
>> > doc2.add(new StringField("category", "astro_physics"));
>> >
>> > which works well, but would it be better to use Facets for this,
>> e.g.
>> >
>> > doc1.add(new FacetField("media-type", "book"));
>> > doc1.add(new FacetField("topic", "physics", "quantum");
>> > doc1.add(new FacetField("author", "Neumann");
>> > doc1.add(new FacetField("author", "Wheeler");
>> >
>> > doc1.add(new FacetField("media-type", "magazine"));
>> > doc1.add(new FacetField("topic", "physics", "astro");
>> >
>> > ?
>> >
>> > IIUC the StringField approach is more general, whereas the
>> FacetField
>> > approach allows to do a more specific categorization / search.
>> > Or do I misunderstand this?
>> >
>> > Thanks
>> >
>> > Michael
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Hi Greg

Thank you very much for your additional information, really very much
appreciated!

Yes, generally speaking I think Lucene has many great features, which
unfortunately are not so obvious for various reasons.

Documentation could of course always be better, but I guess it is also
because many people do not use Lucene itself, but
rather use Solr, OpenSearch, Elasticsearch, etc. and do not have to know
what Lucene itself is offering and therefore there are
not so many people asking for these things and therefore there is not
really an incentive to improve the documentation.

In the python world there is a huge hype re RAG / RAG-Fusion and there
are many people writing posts and documentation, see for example

https://medium.com/@murtuza753/using-llama-2-0-faiss-and-langchain-for-question-answering-on-your-own-data-682241488476

I do not mean to say Lucene should or has to jump on this bandwagon, but
I would argue there is definitely an evolution in search algorithms
and I think it would be nice if more people would know what Lucene has
to offer and it would be more transparent where Lucene is heading.

But then again, it might be only me not being familiar enough with these
things :-)

Thanks

Michael





Am 23.10.23 um 21:09 schrieb Greg Miller:
> Hey Michael-
>
> You've gotten a lot of great information here already. I'll point you to
> one more implementation as well: StringValueFacetCounts. This
> implementation lets you do faceting over arbitrary "string-like" doc value
> fields (SORTED and SORTED_SET). So if you already have a field of this type
> you're using for other purposes, and you want to do faceting over it, you
> can do it with this implementation.
>
> The faceting-specific fields (there's a taxonomy-based approach and a
> non-taxonomy-based approach, both with pros/cons) are also available, which
> is what you've referenced here so far (and what others have pointed you
> to). These are more "managed" fields with faceting in mind.
>
> A high-level difference here is that faceting-specific fields tend to index
> all the facet fields into a single doc values field in the index, which can
> make faceting more efficient. StringValueFacetCounts can be less efficient
> for faceting (if you have many different fields you want to individually
> facet) but could be more flexible for you if you already have these fields
> in your index for other purposes and don't want to duplicate the data into
> these facet-specific fields.
>
> Not sure if these details are helpful for you or not. If any of this is a
> bit unclear, let me know and I'll try to describe things better or answer
> specific questions. Honestly, we probably have too many ways to do the same
> thing in the faceting module, and maybe our documentation could be a bit
> more helpful.
>
> Cheers,
> -Greg
>
> On Fri, Oct 20, 2023 at 2:54?PM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> thanks very much for this additional information, Marc!
>>
>> Am 20.10.23 um 20:30 schrieb Marc D'Mello:
>>> Just following up on Mike's comment:
>>>
>>>
>>>> It used to be that the "doc values" based faceting did not support
>>>>
>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>
>>>
>>> Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField
>> supports
>>> hierarchical faceting, I think you just need to enable it in the
>>> FacetsConfig. One thing to keep in mind is even though SSDV faceting
>>> doesn't require a taxonomy index, it still requires a
>>> SortedSetDocValuesReaderState to be maintained, which can be a little bit
>>> expensive to create, but only needs to be done once. This benchmark code
>>> <
>> https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java
>>> serves as a pretty basic example of SSDV/hierarchical SSDV faceting.
>>>
>>> On Fri, Oct 20, 2023 at 7:09?AM Michael Wechner <
>> michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> cool, thank you very much!
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> Am 20.10.23 um 15:44 schrieb Michael McCandless:
>>>>> You can use either the "doc values" implementation for facets
>>>>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
>>>>> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
>>>>>
>>>>> It used to be that the "doc values" based faceting did not support
>>>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2023 at 9:03?AM Michael Wechner <
>>>> michael.wechner@wyona.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Mike
>>>>>>
>>>>>> Thanks for your feedback!
>>>>>>
>>>>>> IIUC in order to have the actual advantages of Facets one has to
>>>>>> "connect" it with a TaxonomyWriter
>>>>>>
>>>>>> FacetsConfig config = new FacetsConfig();
>>>>>> DirectoryTaxonomyWriter taxoWriter = new
>>>> DirectoryTaxonomyWriter(taxoDir);
>>>>>> indexWriter.addDocument(config.build(taxoWriter, doc));
>>>>>>
>>>>>> right?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
>>>>>>> There are some differences.
>>>>>>>
>>>>>>> StringField is indexed into the inverted index (postings) so you can
>> do
>>>>>>> efficient filtering. You can also store in stored fields to
>> retrieve.
>>>>>>> FacetField does everything StringField does (filtering, storing
>>>>>> (maybe?)),
>>>>>>> but in addition it stores data for faceting. I.e. you can compute
>>>> facet
>>>>>>> counts or simple aggregations at search time.
>>>>>>>
>>>>>>> FacetField is also hierarchical: you can filter and facet by
>> different
>>>>>>> points/levels of your hierarchy.
>>>>>>>
>>>>>>> Mike McCandless
>>>>>>>
>>>>>>> http://blog.mikemccandless.com
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner <
>>>>>> michael.wechner@wyona.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I have found the following simple Facet Example
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>>>>> whereas for a simple categorization of documents I currently use
>>>>>>>> StringField, e.g.
>>>>>>>>
>>>>>>>> doc1.add(new StringField("category", "book"));
>>>>>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>>>>>> doc1.add(new StringField("category", "Neumann"))
>>>>>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>>>>>
>>>>>>>> doc2.add(new StringField("category", "magazine"));
>>>>>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>>>>>
>>>>>>>> which works well, but would it be better to use Facets for this,
>> e.g.
>>>>>>>> doc1.add(new FacetField("media-type", "book"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>>>>>> doc1.add(new FacetField("author", "Neumann");
>>>>>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>>>>>
>>>>>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> IIUC the StringField approach is more general, whereas the
>> FacetField
>>>>>>>> approach allows to do a more specific categorization / search.
>>>>>>>> Or do I misunderstand this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Michael
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
As oppossed to like i want to find everything less than < 6.00

Sent from my iPhone

> On Oct 20, 2023, at 7:05 AM, Michael Wechner <michael.wechner@wyona.com> wrote:
>
> ?Hi Adrien
>
> Thank you very much for your feedback as well!
>
> I just replaced the StringField by KeywordField :-)
>
> Thanks
>
> Michael
>
>> Am 20.10.23 um 14:13 schrieb Adrien Grand:
>> FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, faceting and retrieval. It's my go-to field for string values.
>>
>> Le ven. 20 oct. 2023, 12:20, Michael McCandless <lucene@mikemccandless.com> a écrit :
>>
>> There are some differences.
>>
>> StringField is indexed into the inverted index (postings) so you
>> can do
>> efficient filtering. You can also store in stored fields to retrieve.
>>
>> FacetField does everything StringField does (filtering, storing
>> (maybe?)),
>> but in addition it stores data for faceting. I.e. you can compute
>> facet
>> counts or simple aggregations at search time.
>>
>> FacetField is also hierarchical: you can filter and facet by different
>> points/levels of your hierarchy.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner
>> <michael.wechner@wyona.com>
>> wrote:
>>
>> > Hi
>> >
>> > I have found the following simple Facet Example
>> >
>> >
>> >
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>> >
>> > whereas for a simple categorization of documents I currently use
>> > StringField, e.g.
>> >
>> > doc1.add(new StringField("category", "book"));
>> > doc1.add(new StringField("category", "quantum_physics"));
>> > doc1.add(new StringField("category", "Neumann"))
>> > doc1.add(new StringField("category", "Wheeler"))
>> >
>> > doc2.add(new StringField("category", "magazine"));
>> > doc2.add(new StringField("category", "astro_physics"));
>> >
>> > which works well, but would it be better to use Facets for this,
>> e.g.
>> >
>> > doc1.add(new FacetField("media-type", "book"));
>> > doc1.add(new FacetField("topic", "physics", "quantum");
>> > doc1.add(new FacetField("author", "Neumann");
>> > doc1.add(new FacetField("author", "Wheeler");
>> >
>> > doc1.add(new FacetField("media-type", "magazine"));
>> > doc1.add(new FacetField("topic", "physics", "astro");
>> >
>> > ?
>> >
>> > IIUC the StringField approach is more general, whereas the
>> FacetField
>> > approach allows to do a more specific categorization / search.
>> > Or do I misunderstand this?
>> >
>> > Thanks
>> >
>> > Michael
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
Hey man dont advertise or opinion. Lucene is just fine the way it is. Your just idolating some Jesuit opinion to try and hurt people and disinformation.

Sent from my iPhone

> On Oct 23, 2023, at 2:25 PM, Cody Amen <cody.amen@icloud.com> wrote:
>
> ?As oppossed to like i want to find everything less than < 6.00
>
> Sent from my iPhone
>
>> On Oct 20, 2023, at 7:05 AM, Michael Wechner <michael.wechner@wyona.com> wrote:
>>
>> ?Hi Adrien
>>
>> Thank you very much for your feedback as well!
>>
>> I just replaced the StringField by KeywordField :-)
>>
>> Thanks
>>
>> Michael
>>
>>>> Am 20.10.23 um 14:13 schrieb Adrien Grand:
>>> FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, faceting and retrieval. It's my go-to field for string values.
>>>
>>> Le ven. 20 oct. 2023, 12:20, Michael McCandless <lucene@mikemccandless.com> a écrit :
>>>
>>> There are some differences.
>>>
>>> StringField is indexed into the inverted index (postings) so you
>>> can do
>>> efficient filtering. You can also store in stored fields to retrieve.
>>>
>>> FacetField does everything StringField does (filtering, storing
>>> (maybe?)),
>>> but in addition it stores data for faceting. I.e. you can compute
>>> facet
>>> counts or simple aggregations at search time.
>>>
>>> FacetField is also hierarchical: you can filter and facet by different
>>> points/levels of your hierarchy.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner
>>> <michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have found the following simple Facet Example
>>>>
>>>>
>>>>
>>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>
>>>> whereas for a simple categorization of documents I currently use
>>>> StringField, e.g.
>>>>
>>>> doc1.add(new StringField("category", "book"));
>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>> doc1.add(new StringField("category", "Neumann"))
>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>
>>>> doc2.add(new StringField("category", "magazine"));
>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>
>>>> which works well, but would it be better to use Facets for this,
>>> e.g.
>>>>
>>>> doc1.add(new FacetField("media-type", "book"));
>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>> doc1.add(new FacetField("author", "Neumann");
>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>
>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>
>>>> ?
>>>>
>>>> IIUC the StringField approach is more general, whereas the
>>> FacetField
>>>> approach allows to do a more specific categorization / search.
>>>> Or do I misunderstand this?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When to use StringField and when to use FacetField for categorization? [ In reply to ]
The Jesuit mind is always justifying itself and trying to seem like there is popularity and hype around there ultimate disgraceful mislead.

Sent from my iPhone

> On Oct 23, 2023, at 2:26 PM, Cody Amen <cody.amen@icloud.com> wrote:
>
> ?Hey man dont advertise or opinion. Lucene is just fine the way it is. Your just idolating some Jesuit opinion to try and hurt people and disinformation.
>
> Sent from my iPhone
>
>> On Oct 23, 2023, at 2:25 PM, Cody Amen <cody.amen@icloud.com> wrote:
>>
>> ?As oppossed to like i want to find everything less than < 6.00
>>
>> Sent from my iPhone
>>
>>>> On Oct 20, 2023, at 7:05 AM, Michael Wechner <michael.wechner@wyona.com> wrote:
>>>
>>> ?Hi Adrien
>>>
>>> Thank you very much for your feedback as well!
>>>
>>> I just replaced the StringField by KeywordField :-)
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>>> Am 20.10.23 um 14:13 schrieb Adrien Grand:
>>>> FYI there is also KeywordField, which combines StringField and SortedSetDocValuesField. It supports filtering, sorting, faceting and retrieval. It's my go-to field for string values.
>>>>
>>>> Le ven. 20 oct. 2023, 12:20, Michael McCandless <lucene@mikemccandless.com> a écrit :
>>>>
>>>> There are some differences.
>>>>
>>>> StringField is indexed into the inverted index (postings) so you
>>>> can do
>>>> efficient filtering. You can also store in stored fields to retrieve.
>>>>
>>>> FacetField does everything StringField does (filtering, storing
>>>> (maybe?)),
>>>> but in addition it stores data for faceting. I.e. you can compute
>>>> facet
>>>> counts or simple aggregations at search time.
>>>>
>>>> FacetField is also hierarchical: you can filter and facet by different
>>>> points/levels of your hierarchy.
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Fri, Oct 20, 2023 at 5:43?AM Michael Wechner
>>>> <michael.wechner@wyona.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I have found the following simple Facet Example
>>>>>
>>>>>
>>>>>
>>>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>>
>>>>> whereas for a simple categorization of documents I currently use
>>>>> StringField, e.g.
>>>>>
>>>>> doc1.add(new StringField("category", "book"));
>>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>>> doc1.add(new StringField("category", "Neumann"))
>>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>>
>>>>> doc2.add(new StringField("category", "magazine"));
>>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>>
>>>>> which works well, but would it be better to use Facets for this,
>>>> e.g.
>>>>>
>>>>> doc1.add(new FacetField("media-type", "book"));
>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>>> doc1.add(new FacetField("author", "Neumann");
>>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>>
>>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>>
>>>>> ?
>>>>>
>>>>> IIUC the StringField approach is more general, whereas the
>>>> FacetField
>>>>> approach allows to do a more specific categorization / search.
>>>>> Or do I misunderstand this?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org