Mailing List Archive

fast way to collect results? (KinoSearch .15)
I'm looking for the fastest way to collect the full set of results
from a search. Here's what I'm using currently:

my $hits = $index->search($query);
my $collector = KinoSearch::Search::BitCollector->new();
$hits->{searcher}->search_hit_collector(
hit_collector => $collector,
weight => $hits->{weight}
);
my @result_ids = @{$collector->get_bit_vector()->to_arrayref};

What I'm finding is that it takes MUCH longer to call
search_hit_collector that the initial search than I'd expect. The
initial search on my index takes something like .004s, while the
search_hit_collector_call brings processing speed to 0.11s.

Is there any faster way to pull out all the ids for an executed search?

Thanks again and in advance!

Best,

Matthew
Re: fast way to collect results? (KinoSearch .15) [ In reply to ]
On Aug 22, 2007, at 12:48 PM, Matthew Berk wrote:

> I'm looking for the fastest way to collect the full set of results
> from a search. Here's what I'm using currently:
>
> my $hits = $index->search($query);
> my $collector = KinoSearch::Search::BitCollector->new();
> $hits->{searcher}->search_hit_collector(
> hit_collector => $collector,
> weight => $hits->{weight}
> );
> my @result_ids = @{$collector->get_bit_vector()->to_arrayref};
>
> What I'm finding is that it takes MUCH longer to call
> search_hit_collector that the initial search than I'd expect. The
> initial search on my index takes something like .004s, while the
> search_hit_collector_call brings processing speed to 0.11s.

On first look, I was confused myself. However, now that I've had a
chance to peruse things more closely, I believe the slowdown is due
to the BitVector object continually reallocating as it stores
increasing document numbers. Try this:

my $collector = KinoSearch::Search::BitCollector->new(
capacity => $searcher->max_doc,
);

PS: Just for the record, this is mostly private API we're accessing
here. Those document numbers may change up with any index revision,
making them difficult to match up against external data.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch