Mailing List Archive

Another minimal test case: File::Find causes crash
Here is my code:

#!/usr/bin/perl
use strict;
use warnings;

package Schema;
use base qw( KinoSearch::Schema );
use KinoSearch::Analysis::PolyAnalyzer;

our %fields = ( title => 'KinoSearch::Schema::FieldSpec' );

sub analyzer { KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' ) }

package main;

use File::Find;
use KinoSearch::InvIndexer;

my $index = KinoSearch::InvIndexer->new(invindex => Schema->clobber('index'));

find(\&wanted, "en");

$index->finish();

sub wanted {
/\.html$/ or return;
my $filename = $_;

my %field;
open my $fh, $filename or die "$filename: $!";
while (<$fh>) {
m!<body>! and last;
if (m!<title>(.*)</title>!) {
$field{title} = $1;
last;
}
}
close $fh;

$index->add_doc(\%field);
}

I'm running this with KinoSearch-0.20_03 from CPAN. It needs a
reasonably big collection of files, like 50,000 of them. I've used a
static dump from wikipedia. If you want to try that you need to
install 7zip, if you're running Debian the package name is p7zip-full.

Assuming you want to use the wiki dump and you've put the code in
index_wiki.pl the steps to run look like this:

wget http://static.wikipedia.org/downloads/April_2007/en/wikipedia-en-html.0.7z
7z x wikipedia-en-html.0.7z
perl index_wiki.pl

The output I get is:

Error in function kino_FSFolder_open_outstream at
c_src/KinoSearch/Store/FSFolder.c:56: Can't open '_1.skip': No such
file or directory
at /home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/Index/SegWriter.pm
line 121
KinoSearch::Index::SegWriter::add_doc('KinoSearch::Index::SegWriter=HASH(0x816bdfc)',
'HASH(0x890e790)', 1) called at
/home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/InvIndexer.pm
line 114
KinoSearch::InvIndexer::add_doc('KinoSearch::InvIndexer=HASH(0x816b7c0)',
'HASH(0x890e790)') called at ./index_wiki.pl line 42
main::wanted() called at /usr/share/perl/5.8/File/Find.pm line 886
File::Find::_find_dir('HASH(0x816c00c)', 'en', 8) called at
/usr/share/perl/5.8/File/Find.pm line 700
File::Find::_find_opt('HASH(0x816c00c)', 'en') called at
/usr/share/perl/5.8/File/Find.pm line 1223
File::Find::find('CODE(0x8337cac)', 'en') called at
./index_wiki.pl line 23

The line numbers in index_wiki.pl are wrong because I took out the
'use lib' line in the sample above.

Let me know if you need any more info.
--
Edward Betts
Re: Another minimal test case: File::Find causes crash [ In reply to ]
On 05/06/07, Edward Betts <edwardbetts@gmail.com> wrote:
> I'm running this with KinoSearch-0.20_03 from CPAN.

I just tried it with KinoSearch from subversion. Same malfunction.
Another minimal test case: File::Find causes crash [ In reply to ]
Edward,

Thanks for the report and for the test case. It allowed me to
isolate the problem quickly.

On Jun 5, 2007, at 8:25 AM, Edward Betts wrote:

> I'm running this with KinoSearch-0.20_03 from CPAN. It needs a
> reasonably big collection of files, like 50,000 of them.

The time to trigger the error was proportional the amount of material
consumed. Since you were only capturing titles, it took a while to
reach the 16 MiB default threshold. The flushing process happened to
require opening a new file, and... kaboom.

> Error in function kino_FSFolder_open_outstream at
> c_src/KinoSearch/Store/FSFolder.c:56: Can't open '_1.skip': No such
> file or directory
> at /home/edward/src/KinoSearch-0.20_03/blib/lib/KinoSearch/
> Index/SegWriter.pm

This happened because File::Find had changed the working directory
and the relative path to the index was no longer valid. Relative
index paths get absolute-ified as of repository revision 2461,
resolving the issue.

Cheers,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Another minimal test case: File::Find causes crash [ In reply to ]
On 06/06/07, Marvin Humphrey <marvin@rectangular.com> wrote:
> This happened because File::Find had changed the working directory
> and the relative path to the index was no longer valid. Relative
> index paths get absolute-ified as of repository revision 2461,
> resolving the issue.

I guessed it was something to do with File::Find changing directory,
but I hadn't thought to use an absolute path for the index. Thanks for
your help.

--
Edward Betts