Hey folks-
I've been chatting with Marc D'Mello a bit about the SSDV faceting
he's working on (LUCENE-10250) (disclaimer: we both work on Amazon's
Product Search engine). We're trying to figure out where
taxonomy-based faceting has a performance advantage over SSDV, and it
occurred to me that the way the two approaches resolve the paths for
given ordinals is a bit different. TaxonomyReader was recently updated
to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
results in a separate TermsEnum#seekExact() call down in
Lucene90DocValuesProducer for each ordinal being returned.
Having no knowledge about the actual data representation behind the
TermsDict in an SSDV field, I'm wondering if someone here can provide
a high-level sense of whether-or-not there might be an advantage to
looking up ordinals in bulk. I'm going to dig into the code anyway
(curious!), but thought I'd raise the idea/question here as well
regarding whether-or-not a bulk lookup might be advantageous in
general for SSDV fields. Any thoughts?
Cheers,
-Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
I've been chatting with Marc D'Mello a bit about the SSDV faceting
he's working on (LUCENE-10250) (disclaimer: we both work on Amazon's
Product Search engine). We're trying to figure out where
taxonomy-based faceting has a performance advantage over SSDV, and it
occurred to me that the way the two approaches resolve the paths for
given ordinals is a bit different. TaxonomyReader was recently updated
to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is
stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This
results in a separate TermsEnum#seekExact() call down in
Lucene90DocValuesProducer for each ordinal being returned.
Having no knowledge about the actual data representation behind the
TermsDict in an SSDV field, I'm wondering if someone here can provide
a high-level sense of whether-or-not there might be an advantage to
looking up ordinals in bulk. I'm going to dig into the code anyway
(curious!), but thought I'd raise the idea/question here as well
regarding whether-or-not a bulk lookup might be advantageous in
general for SSDV fields. Any thoughts?
Cheers,
-Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org