On Oct 18, 2006, at 3:52 PM, Chris Nandor wrote:
> So, I am running my indexer every five minutes. Been running that
> way for
> a few weeks, and I have 6000+ files in my invindex directory.
>
> Is this ... bad? Should I "manually" optimize more often?
Something is wrong. Either segments are not being merged away often
enough, or old, unused segment files are not being deleted. Segment
growth is limited by the fibonacci series. If you really have
several thousand segments in your index, get away from your box,
because a worm hole is about to open up.
What extensions are represented, and how many of each do you have?
Is this an NFS disk?
Please try this code:
my $searcher = KinoSearch::Searcher->new(
analyzer => $analyzer,
invindex => $invindex,
);
my $seg_infos = $searcher->{reader}{seg_infos};
my @seg_names = sort keys %{ $seg_infos->{infos} };
print "NUM DOCS: " . $searcher->{reader}->num_docs . "\n";
print "SEG NAMES: @seg_names\n";
If there is a one-to-one correspondence between the segment names and
the _XX.cfs files in your invindex, then all those .cfs files are in
use. If you've haven't noticed a significant slowdown, that's
probably not the case.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/