On Mar 28, 2007, at 9:23 AM, Roger Dooley wrote:
> Marvin Humphrey (3/28/2007 12:03 PM) wrote:
>> On Mar 28, 2007, at 7:11 AM, Roger Dooley wrote:
>>> When indexing with 0.20_02,
>> What's your actual config? I know you're using the new Tokenizer,
>> but that's not in 0.20_02. Did you copy just Tokenizer.pm into
>> 0.20_02, or did you check out from subversion?
>
> New Tokenizer from the previous week, but the rest is from 0.20_02.
OK. Unfortunately, I can't duplicate this issue using either that
config, or subversion trunk. In both cases, memory usage plateaus at
33.8 MB on my box for the benchmarking script.
> I've commented that part out for this round of indexing. I can try
> setting this again and see what happens.
It would be better to leave it at its default. My only concern was
the remote possibility that it was set to a value that was causing
the problem.
> Anything else I can try to figure out what is going on?
Troubleshooting memory leaks isn't easy. Here's what I would do,
which is not the same as what I recommend you do:
1) Move the problem script to a Linux system if it's
not there already.
2) Compile a debugging perl from 5.8.8 sources.
3) Run devel/valgrind_test.plx using the debugging perl.
4) Examine the output for memory leaks.
If none show up, then the problem is script specific.
5) Run the script under valgrind and debug perl. The
environment variable PERL_DESTRUCT_LEVEL has to be
set to 2 and the suppressions file devel/p588_valgrind.supp
has to be fed to valgrind. (Peek the commands that
valgrind_test.plx runs.)
6) If nothing turns up after indexing a few documents
and exiting cleanly, invoke the script under valgrind
and debug perl again, but let it run for a long time
and then crash it intentionally so that Perl doesn't
run its cleanup routines. Then examine valgrind's
output looking for clues as to where the memory went.
Hopefully at that point we'd be able to narrow down the search to
KS's perl code (not likely), KS's C code (likely), or your script
itself (quite possible -- could be a black hole hash, for example).
What I recommend you do is attempt to duplicate the problem so that I
can hunt it down. Create a script I'll be able to run and monitor
its memory usage using top. Use the us_constitution HTML
presentation if you can. If the footprint keeps growing long past 30
MB, send it my way.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/