Using v0.15 (still)
I have a pretty healthy document collection (around 15 million) that gets
moderate traffic (260k searches a day) and have been working on
improving performance as searches have crept into the >1s range.
My search server required the total number of hits to be returned
before seeking results ... mainly to short circuit some expensive
pre-processing, but we don't need to get into that here :-).
Anyway, I discovered that calling total_hits on a hits object BEFORE
calling seek on the hits object actually triggers the default 0,100
seek:
KinoSearch/Search/Hits.pm, line 67:
sub total_hits {
my $self = shift;
$self->seek( 0, 100 )
unless defined $self->{total_hits};
return $self->{total_hits};
}
For me, I juggled the pre-processing to avoid the total hits call until
I ran my 0,10 seek. This cut total search time by more than half (obviously).
Just out of curiosity, why is a seek required to populate total hits?
--
Brett Paden
paden@multiply.com
I have a pretty healthy document collection (around 15 million) that gets
moderate traffic (260k searches a day) and have been working on
improving performance as searches have crept into the >1s range.
My search server required the total number of hits to be returned
before seeking results ... mainly to short circuit some expensive
pre-processing, but we don't need to get into that here :-).
Anyway, I discovered that calling total_hits on a hits object BEFORE
calling seek on the hits object actually triggers the default 0,100
seek:
KinoSearch/Search/Hits.pm, line 67:
sub total_hits {
my $self = shift;
$self->seek( 0, 100 )
unless defined $self->{total_hits};
return $self->{total_hits};
}
For me, I juggled the pre-processing to avoid the total hits call until
I ran my 0,10 seek. This cut total search time by more than half (obviously).
Just out of curiosity, why is a seek required to populate total hits?
--
Brett Paden
paden@multiply.com