Mailing List Archive

kinosearch and a mod perl web-app
Hello,

I'm using a KinoSearch index as part of a mod-perl web-app. I'm attempting
to cache the searcher to improve performance - and it does. However, when I
have load tested the application, after a few minutes of faultless execution
the script starts to throw the error detailed below. It's the
fetch_hit_hashref that throws the error. I'm caching the searcher by
putting a

use Smart::Search();

in the server startup scripts

and

$Smart::Search::Searcher = KinoSearch::Searcher->new(
invindex => '/data01/index',
analyzer => KinoSearch::Analysis::PolyAnalyzer->new(language =>
'en')
);

in the outermost scope of the perl module file that has the search
function. The search is then conducted thus: -

sub search {
...
my $hits = $Smart::Search::Searcher->search(query => $query);
...
while (my $hit = $hits->fetch_hit_hashref) {
....
}
}

My guess is that the problem occurs when 2 threads use the searcher
simultaneously. So, I suppose my question is: Should I be taking steps to
lock the searcher prior to use? and, is there an example of how to
correctly cache a searcher somewhere that I might learn from,

The facts: -
KinoSearch 0.13
Perl 5.8.8
Suse Linux 10.1
linux kernel 2.6.16
gcc 4.1.0
mod_perl 2.02-14
apache2-prefork 2.2.0-21.7

The error: -
[Thu Oct 19 21:51:51 2006] [error] refill: tried to read 1024 bytes, got 0:
25 at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Index/FieldsReader.pm
line 54

KinoSearch::Index::FieldsReader::fetch_raw('KinoSearch::Index::FieldsReader=HASH(0x806bf584)',
171) called at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Index/FieldsReader.pm
line 68

KinoSearch::Index::FieldsReader::fetch_doc('KinoSearch::Index::FieldsReader=HASH(0x806bf584)',
171) called at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Index/SegReader.pm
line 179

KinoSearch::Index::SegReader::fetch_doc('KinoSearch::Index::SegReader=HASH(0x8055b560)',
171) called at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Search/Hit.pm
line 22

KinoSearch::Search::Hit::get_doc('KinoSearch::Search::Hit=HASH(0x8078183c)')
called at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Search/Hit.pm
line 29

KinoSearch::Search::Hit::get_field_values('KinoSearch::Search::Hit=HASH(0x8078183c)')
called at
/usr/lib/perl5/site_perl/5.8.8/i586-linux-thread-multi/KinoSearch/Search/Hits.pm
line 92

KinoSearch::Search::Hits::fetch_hit_hashref('KinoSearch::Search::Hits=HASH(0x803fe064)')
called at /srv/www/perl-lib/Smart/Search.pm line 76

Smart::Search::group_by_proceeding('KinoSearch::Search::Hits=HASH(0x803fe064)')
called at /srv/www/perl-lib/Smart/Search.pm line 34
Smart::Search::run('HASH(0x804206f8)') called at
/srv/www/cgi-bin/trawlcit.pl line 9

ModPerl::ROOT::ModPerl::Registry::srv_www_cgi_2dbin_trawlcit_2epl::handler('Apache2::RequestRec=SCALAR(0x80762090)')
called at
/usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/ModPerl/RegistryCooker.pm
line 203
eval {...} called at
/usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/ModPerl/RegistryCooker.pm
line 203
ModPerl::RegistryCooker::run('ModPerl::Registry=HASH(0x807848f4)')
called at
/usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/ModPerl/RegistryCooker.pm
line 169

ModPerl::RegistryCooker::default_handler('ModPerl::Registry=HASH(0x807848f4)')
called at
/usr/lib/perl5/vendor_perl/5.8.8/i586-linux-thread-multi/ModPerl/Registry.pm
line 30
ModPerl::Registry::handler('ModPerl::Registry',
'Apache2::RequestRec=SCALAR(0x80762090)') called at -e line 0
eval {...} called at -e line 0

Any help greatly appreciated

Thank you

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.rectangular.com/pipermail/kinosearch/attachments/20061019/9ddddf41/attachment.html
kinosearch and a mod perl web-app [ In reply to ]
On Oct 19, 2006, at 2:49 PM, Dan Chicot wrote:

> My guess is that the problem occurs when 2 threads use the searcher
> simultaneously. So, I suppose my question is: Should I be taking
> steps to lock the searcher prior to use?

That seems likely. KinoSearch is not thread-safe. I've tried to
make it fail sanely by making CLONE a fatal method for all KinoSearch
classes:

slothbear:~/perltest marvin$ cat thread_is_dead.plx
#!/usr/bin/perl
use strict;
use warnings;

use threads;
use KinoSearch::Analysis::Tokenizer;
my $t = KinoSearch::Analysis::Tokenizer->new;
my $thread = threads->create('do_stuff');
sub do_stuff { sleep 1 }

slothbear:~/perltest marvin$ perl thread_is_dead.plx
CLONE invoked by package 'KinoSearch::Util::CClass', indicating that
threads or Win32 fork were initiated, but KinoSearch is not thread-
safe at /Library/Perl/5.8.6/darwin-thread-multi-2level/KinoSearch/
Util/Class.pm line 97.
slothbear:~/perltest marvin$

Looking at the stack trace, it's FieldsReader::fetch_raw() that's
throwing the innermost Perl error. FieldsReader objects live inside
IndexReader objects, which live inside Searcher objects. If a
FieldsReader gets out of sync and thinks its reading one part of the
index when its actually reading another, you'd likely get an error
like the one you got, or possibly garbage if you were really unlucky.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
kinosearch and a mod perl web-app [ In reply to ]
At 15:46 -0700 2006.10.19, Marvin Humphrey wrote:
>On Oct 19, 2006, at 2:49 PM, Dan Chicot wrote:
>
>> My guess is that the problem occurs when 2 threads use the searcher
>> simultaneously. So, I suppose my question is: Should I be taking
>> steps to lock the searcher prior to use?
>
>That seems likely. KinoSearch is not thread-safe. I've tried to
>make it fail sanely by making CLONE a fatal method for all KinoSearch
>classes:

Heh, that's the same error I just emailed about. But I am not using
threads. Using mod_perl 1.x with no threads.

--
Chris Nandor pudge@pobox.com http://pudge.net/
Open Source Technology Group pudge@ostg.com http://ostg.com/
kinosearch and a mod perl web-app [ In reply to ]
On Oct 19, 2006, at 4:28 PM, Chris Nandor wrote:

> Heh, that's the same error I just emailed about. But I am not using
> threads. Using mod_perl 1.x with no threads.

The "refill" error is a very general IO error. For you, it occurred
in SegTermEnum::scan_to. For Dan, it occured in
FieldsReader::fetch_raw. It's possible that they are related, but
that's not necessarily the case.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
kinosearch and a mod perl web-app [ In reply to ]
On Oct 19, 2006, at 2:49 PM, Dan Chicot wrote:
> The error: -
> [Thu Oct 19 21:51:51 2006] [error] refill: tried to read 1024
> bytes, got 0: 25 at

In light of the non-errno errno we saw in Chris Nandor's message, I
looked this one up. On my system, it's ENOTTY "Inappropriate ioctl
for device". That doesn't make a lot of sense. So maybe we're
looking at the same PerlIO_read bug.

Just to confirm it's the same on your system, what do you get if you
compile and run the little app below?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

#include <stdio.h>
#include <errno.h>
#include <string.h>

int main() {
printf("%s\n", strerror(25));
}