Mailing List Archive

Feature request: Search facet counts in Kinosearch?
I'm working on a project that is using the solr for full text search.
The results include search facets with counts.

For an example see http://openlibrary.org/search?q=tom+sawyer

Is this a feature that would ever be considered for Kinosearch? I
understand that performance would be reduced.

--
Edward.

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Feature request: Search facet counts in Kinosearch? [ In reply to ]
On Jun 12, 2008, at 7:37 AM, Edward Betts wrote:

> I'm working on a project that is using the solr for full text search.
> The results include search facets with counts.
>
> For an example see http://openlibrary.org/search?q=tom+sawyer
>
> Is this a feature that would ever be considered for Kinosearch?


Faceted search doesn't belong in core KS, but would be built on top of
it, just as Solr is built on top of Lucene. SVN trunk for KinoSearch
now has at least some the features necessary to support such an
extension (e.g. a public API for Lexicon). I'd be interested in
adding others that are required (a public API for BitVector, the
equivalent of SortedVIntList, etc).

As for whether I'd write a faceted search tool myself: I think I could
best aid such an effort by focusing on the KS core. It would be
unreasonable for me as an individual to publish and expect to support
both KS and something like Solr. However, I'd be happy to help out in
a strong supporting role.

For more on faceted search, see <http://people.apache.org/~hossman/apachecon2006us/faceted-searching-with-solr.pdf
>.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Feature request: Search facet counts in Kinosearch? [ In reply to ]
On Thu, Jun 12, 2008 at 11:15 AM, Marvin Humphrey
<marvin@rectangular.com> wrote:
> For more on faceted search, see
> <http://people.apache.org/~hossman/apachecon2006us/faceted-searching-with-solr.pdf>.

I scanned through this quickly, but didn't get a good feel for the
architecture. Is the faceted approach layered on top of the search as
a post-processing filter, or are the facets being handled directly by
the search engine?

My instinct (and similar attempts in the past) have involved doing a
rough pass at filtering on the server, then pushing as much of the raw
data as possible to the browser while letting most of the presentation
details happen at the client side. I don't yet see how it can be
done efficiently on the server side.

It's a topic that interests me, though.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Feature request: Search facet counts in Kinosearch? [ In reply to ]
On Jun 12, 2008, at 11:16 AM, Nathan Kurz wrote:

> Is the faceted approach layered on top of the search as
> a post-processing filter, or are the facets being handled directly by
> the search engine?

The main trick for obtaining the facet counts is massive server-side
caching.

You cache doc sets for each facet. A BitVector works well for facets
which match lots of documents; for more sparse sets, a SortedVIntList,
which encodes a set of integers using a compressed format, may use
less memory.

When you search, you use a dual-purpose HitCollector which wraps both
a TopDocCollector and a BitCollector. The TopDocCollector gets you
your standard search results ranked by score.

The BitCollector gets you a list of all the doc numbers that matched.
For each facet that you want a result for, you count the number of
docs in the intersection of the main result set with the facet's
cached result set.

The other problem is how to decide which facets to evaluate each query
against. I think most people use sort of drill-down, where top-level
queries are compared against general categories, and once you select
one of those categories (e.g. by clicking on "DVDs", or "Books"), the
facet set changes. However, I don't believe that Solr constrains you
with regard to how you select facets.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch