Mailing List Archive

Index size
So, I am running my indexer every five minutes. Been running that way for
a few weeks, and I have 6000+ files in my invindex directory.

Is this ... bad? Should I "manually" optimize more often?

--
Chris Nandor pudge@pobox.com http://pudge.net/
Open Source Technology Group pudge@ostg.com http://ostg.com/
Index size [ In reply to ]
On Oct 18, 2006, at 3:52 PM, Chris Nandor wrote:

> So, I am running my indexer every five minutes. Been running that
> way for
> a few weeks, and I have 6000+ files in my invindex directory.
>
> Is this ... bad? Should I "manually" optimize more often?

Something is wrong. Either segments are not being merged away often
enough, or old, unused segment files are not being deleted. Segment
growth is limited by the fibonacci series. If you really have
several thousand segments in your index, get away from your box,
because a worm hole is about to open up.

What extensions are represented, and how many of each do you have?
Is this an NFS disk?

Please try this code:

my $searcher = KinoSearch::Searcher->new(
analyzer => $analyzer,
invindex => $invindex,
);
my $seg_infos = $searcher->{reader}{seg_infos};
my @seg_names = sort keys %{ $seg_infos->{infos} };

print "NUM DOCS: " . $searcher->{reader}->num_docs . "\n";
print "SEG NAMES: @seg_names\n";

If there is a one-to-one correspondence between the segment names and
the _XX.cfs files in your invindex, then all those .cfs files are in
use. If you've haven't noticed a significant slowdown, that's
probably not the case.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Index size [ In reply to ]
At 16:43 -0700 2006.10.18, Marvin Humphrey wrote:
>Something is wrong. Either segments are not being merged away often
>enough, or old, unused segment files are not being deleted.

I was thinking the latter was most likely.

$ ls -l invindex/ | wc -l
6973
$ ls -l invindex/_*.cfs | wc -l
12
$ ls -l invindex/_*.f* | wc -l
6948


I just realized I had code (which I found out Monday I didn't need) to
maintain two indexes, and I was copying files back and forth, which (duh!)
copied the deleted files back and forth. But then I discovered I didn't
need to update a non-live index, since it safely updates the live index.

So, rather than fixing my code, I'll just pull it and update the live index
instead.

--
Chris Nandor pudge@pobox.com http://pudge.net/
Open Source Technology Group pudge@ostg.com http://ostg.com/
Index size [ In reply to ]
On Oct 18, 2006, at 6:29 PM, Chris Nandor wrote:

> At 16:43 -0700 2006.10.18, Marvin Humphrey wrote:
>> Something is wrong. Either segments are not being merged away often
>> enough, or old, unused segment files are not being deleted.
>
> I was thinking the latter was most likely.
>
> $ ls -l invindex/ | wc -l
> 6973
> $ ls -l invindex/_*.cfs | wc -l
> 12
> $ ls -l invindex/_*.f* | wc -l
> 6948

The .f* files are temporary, one per indexed field, and they should
be cleaned up at the end of each session. I surprised that they are
still around.

I'd still like to know how many fields you have. It would be useful
to know if the problem is that .f* files in particular aren't being
deleted, or that you have a lot of fields and it's whole segments
that aren't being deleted.

> I just realized I had code (which I found out Monday I didn't need) to
> maintain two indexes, and I was copying files back and forth, which
> (duh!)
> copied the deleted files back and forth. But then I discovered I
> didn't
> need to update a non-live index, since it safely updates the live
> index.
>
> So, rather than fixing my code, I'll just pull it and update the
> live index
> instead.

Glad you've found a fix.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/