Hi folks-
I'm trying to make sure I have a proper understanding of what
FacetResult#value is meant to represent, particularly in multi-valued
doc scenarios. Apologies if I'm missing something obvious, but it
seems that either my understanding is incorrect, or we have a bug in
how we count multi-value docs. This is particularly relevant to me at
the moment since I'm working on a couple facet-related changes, and I
want to make sure I've got a proper understanding of this field.
Thanks!
From the Javadocs:
/**
* Total value for this path (sum of all child counts, or sum of all
child values), even those not
* included in the topN.
*/
public final Number value;
So from the Javadocs, it seems this is simply the sum of all values
for the given dim+path. In the case of single-value docs, this would
also represent the total number of documents containing a value for
the given dim+path, which seems fairly useful (i.e., it might be nice
to know how many documents contain a value for a given facet
dim+path). On the other hand, if docs can be multi-valued, this seems
somewhat less useful. If this is truly the sum of the values for the
given dim+path, each document can contribute more than one count, so
the user can no longer interpret this as the number of documents that
have at least one value for the facet dim+path. It seems as though it
would be more useful to provide the number of documents with a given
dim+path value instead of just the total count, but this is where I'm
probably just misunderstanding something.
Finally, looking at the way taxonomy facets are counted, it looks like
this value is populated with the total number of documents, and
populated with -1 in multi-value cases where an accurate doc count
can't be provided (see IntTaxonomyFacets L:228 for example). This
isn't consistent with the implementation in LongValueFacetCounts
though, which will always populate the total of all values, ignoring
single- vs. multi-valued cases (see LongValueFacetCounts L:163). It
appears the implementation in SortedSetDocValuesFacetCounts will also
"double count" multi-value cases similar to LongValueFacetCounts.
So... which do we think it is? Is it meant to be the total number of
docs, or the total of all values? Can anyone shed some light on this?
Thanks a bunch!
Cheers,
-Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
I'm trying to make sure I have a proper understanding of what
FacetResult#value is meant to represent, particularly in multi-valued
doc scenarios. Apologies if I'm missing something obvious, but it
seems that either my understanding is incorrect, or we have a bug in
how we count multi-value docs. This is particularly relevant to me at
the moment since I'm working on a couple facet-related changes, and I
want to make sure I've got a proper understanding of this field.
Thanks!
From the Javadocs:
/**
* Total value for this path (sum of all child counts, or sum of all
child values), even those not
* included in the topN.
*/
public final Number value;
So from the Javadocs, it seems this is simply the sum of all values
for the given dim+path. In the case of single-value docs, this would
also represent the total number of documents containing a value for
the given dim+path, which seems fairly useful (i.e., it might be nice
to know how many documents contain a value for a given facet
dim+path). On the other hand, if docs can be multi-valued, this seems
somewhat less useful. If this is truly the sum of the values for the
given dim+path, each document can contribute more than one count, so
the user can no longer interpret this as the number of documents that
have at least one value for the facet dim+path. It seems as though it
would be more useful to provide the number of documents with a given
dim+path value instead of just the total count, but this is where I'm
probably just misunderstanding something.
Finally, looking at the way taxonomy facets are counted, it looks like
this value is populated with the total number of documents, and
populated with -1 in multi-value cases where an accurate doc count
can't be provided (see IntTaxonomyFacets L:228 for example). This
isn't consistent with the implementation in LongValueFacetCounts
though, which will always populate the total of all values, ignoring
single- vs. multi-valued cases (see LongValueFacetCounts L:163). It
appears the implementation in SortedSetDocValuesFacetCounts will also
"double count" multi-value cases similar to LongValueFacetCounts.
So... which do we think it is? Is it meant to be the total number of
docs, or the total of all values? Can anyone shed some light on this?
Thanks a bunch!
Cheers,
-Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org