Mailing List Archive

Error in KinoSearch::Searcher::search
Hello,

I have ben experimeting around with KinoSearch 0.20_01. However when I
tried to search my newly build indexes I received errors. I suspected that
the problems may arise from my documents, so I set up the constition example
from the samples directory. The index gets build wihout problems, but
search does not work:

Error in function refill at c_src/KinoSearch/Store/InStream.c:100: Read past
EOF of /var/www/test/kslokal/constitution/uscon_invindex/_1.cf (start:
18446744073709545189 len 228)
at /usr/local/lib/perl/5.8.8/KinoSearch/Index/SegReader.pm line 120
KinoSearch::Index::SegReader::fetch_term_info('KinoSearch::Index::SegReader=HASH(0xe01990)',
'KinoSearch::Index::Term=SCALAR(0xe9c1c0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Index/SegReader.pm line 125
KinoSearch::Index::SegReader::doc_freq('KinoSearch::Index::SegReader=HASH(0xe01990)',
'KinoSearch::Index::Term=SCALAR(0xe9c1c0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Searcher.pm line 163
KinoSearch::Searcher::doc_freq('KinoSearch::Searcher=HASH(0xe275a0)',
'KinoSearch::Index::Term=SCALAR(0xe9c1c0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/Similarity.pm line 31
KinoSearch::Search::Similarity::idf('KinoSearch::Search::Similarity=SCALAR(0xe8a520)',
'KinoSearch::Index::Term=SCALAR(0xe9c1c0)',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/TermQuery.pm line 65
KinoSearch::Search::TermWeight::init_instance('KinoSearch::Search::TermWeight=HASH(0xe8a260)')
called at /usr/local/lib/perl/5.8.8/KinoSearch/Util/Class.pm line 40
KinoSearch::Util::Class::new('KinoSearch::Search::TermWeight', 'parent',
'KinoSearch::Search::TermQuery=HASH(0xe81530)', 'searcher',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/TermQuery.pm line 26
KinoSearch::Search::TermQuery::create_weight('KinoSearch::Search::TermQuery=HASH(0xe81530)',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/BooleanQuery.pm line 97
KinoSearch::Search::BooleanWeight::init_instance('KinoSearch::Search::BooleanWeight=HASH(0xe9cab0)')
called at /usr/local/lib/perl/5.8.8/KinoSearch/Util/Class.pm line 40
KinoSearch::Util::Class::new('KinoSearch::Search::BooleanWeight', 'parent',
'KinoSearch::Search::BooleanQuery=HASH(0xe81750)', 'searcher',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/BooleanQuery.pm line 54
KinoSearch::Search::BooleanQuery::create_weight('KinoSearch::Search::BooleanQuery=HASH(0xe81750)',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/Query.pm line 33
KinoSearch::Search::Query::to_weight('KinoSearch::Search::BooleanQuery=HASH(0xe81750)',
'KinoSearch::Searcher=HASH(0xe275a0)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Searcher.pm line 168
KinoSearch::Searcher::create_weight('KinoSearch::Searcher=HASH(0xe275a0)',
'KinoSearch::Search::BooleanQuery=HASH(0xe81750)') called at
/usr/local/lib/perl/5.8.8/KinoSearch/Searcher.pm line 86
KinoSearch::Searcher::search_top_docs('KinoSearch::Searcher=HASH(0xe275a0)',
'num_wanted', 10, 'query',
'KinoSearch::Search::BooleanQuery=HASH(0xe81750)', 'filter', 'undef',
'sort_spec', 'undef', ...) called at
/usr/local/lib/perl/5.8.8/KinoSearch/Search/Hits.pm line 47
KinoSearch::Search::Hits::seek(3) called at
/usr/local/lib/perl/5.8.8/KinoSearch/Searcher.pm line 70
KinoSearch::Searcher::search('KinoSearch::Searcher=HASH(0xe275a0)',
'query', 'President') called at search.cgi line 31

I cannot really see what to do. The Build itself seemed fine:

perl Build test
Skipping boilerplater.pl...
t/000-load....................ok
t/001-build_invindexes........ok
t/002-kinosearch..............ok
t/003-charmonizer.............ok
t/010-verify_args.............ok
t/011-class...................ok
t/012-priority_queue..........ok
t/013-bit_vector..............ok
t/015-sort_external...........ok
t/016-varray..................ok
t/017-hash....................ok
t/018-cclass..................ok
t/019-obj.....................ok
t/020-yaml....................ok
t/021-dyn_virtual_table.......ok
t/050-ramfile.................ok
t/051-fsfile..................ok
t/101-simple_template_io......ok
t/102-strings_template_io.....ok
t/103-repeats_template_io.....ok
t/104-parse_template_io.......ok
t/105-folder..................ok
t/106-locking.................ok
t/107-index_file_names........ok
t/108-invindex................ok
t/150-polyanalyzer............ok
t/151-analyzer................ok
t/152-token_batch.............ok
t/153-lc_normalizer...........ok
t/154-tokenizer...............ok
t/155-stopalizer..............ok
t/202-term....................ok
t/203-compound_file_reader....ok
t/204-doc_reader..............ok
t/205-seg_reader..............ok
t/206-seg_infos...............ok
t/207-seg_term_list...........ok
t/208-terminfo................ok
t/209-seg_term_list_heavy.....ok
t/210-deldocs.................ok
t/211-seg_term_docs...........ok
t/212-multi_term_docs.........ok
t/213-segment_merging.........ok
t/214-spec_field..............ok
t/215-term_vectors............ok
t/216-schema..................ok
t/217-multi_term_list.........ok
t/302-many_fields.............ok
t/303-highlighter.............ok
t/304-verify_utf8.............ok
t/305-invindexer..............ok
t/501-termquery...............ok
t/502-phrasequery.............ok
t/503-booleanquery............ok
t/504-similarity..............ok
t/505-hit_queue...............ok
t/506-hit_collector...........ok
t/507-query_filter............ok
t/508-hits....................ok
t/509-multi_searcher..........ok
t/510-remote_search...........ok
t/511-sort_spec...............ok
t/512-range_filter............ok
t/601-queryparser.............ok
t/602-boosts..................ok
t/603-query_boosts............ok
t/604-simple_search...........ok
t/605-store_pos_boost.........ok
t/606-proximity...............ok
t/701-uscon...................ok
t/999-remove_invindexes.......ok
t/pod-coverage................skipped
all skipped: Test::Pod::Coverage 1.04 required for testing POD
coverage
t/pod.........................skipped
all skipped: Test::Pod 1.14 required for testing POD
All tests successful, 2 tests skipped.

Any help would be appreciated.

Thanks,
Karel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.rectangular.com/pipermail/kinosearch/attachments/20070303/75ba345d/attachment.htm
Error in KinoSearch::Searcher::search [ In reply to ]
On Mar 3, 2007, at 7:59 AM, Karel K. wrote:

Karel,

Thanks for the report.

> Read past EOF of /var/www/test/kslokal/constitution/uscon_invindex/
> _1.cf

> (start: 18446744073709545189 len 228)

This is an internal error, which you should never see. :) KS is
trying to read from byte 18446744073709545189 in a "file" (actually a
virtual file within a compound file) which is only 228 bytes long.

Either the input stream has gotten out of sync and has read in
something wrong, or (less likely) there's been a memory error.

KS passes its test suite and the US constitution search works
correctly on Perl OS X 10.4, FreeBSD 5.3, and an old RedHat 9
system. What system are you on? What version of Perl?

Is the error 100% consistent?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Error in KinoSearch::Searcher::search [ In reply to ]
Dear Marvin,

thanks for your reply. I just wanted to make sure first, whether this
is a known problem.

This error occurred on a Debian 64-bit system (testing and
experimental combined) with Perl 5.8.8. I will try to reproduce the
error on similar Debian systems to narrow down the possible cause. I
want to make sure whether there is a difference between 64 and 32bit
systems, as Debian 64bit might be the cause of such such errors. I
will also check whether the error is related to a specific Perl
version. As soon as I have isolated the problem I will send send you
more information.

Thanks for your help,
karel

On 3/3/07, Marvin Humphrey <marvin@rectangular.com> wrote:
>
> On Mar 3, 2007, at 7:59 AM, Karel K. wrote:
>
> Karel,
>
> Thanks for the report.
>
> > Read past EOF of /var/www/test/kslokal/constitution/uscon_invindex/
> > _1.cf
>
> > (start: 18446744073709545189 len 228)
>
> This is an internal error, which you should never see. :) KS is
> trying to read from byte 18446744073709545189 in a "file" (actually a
> virtual file within a compound file) which is only 228 bytes long.
>
> Either the input stream has gotten out of sync and has read in
> something wrong, or (less likely) there's been a memory error.
>
> KS passes its test suite and the US constitution search works
> correctly on Perl OS X 10.4, FreeBSD 5.3, and an old RedHat 9
> system. What system are you on? What version of Perl?
>
> Is the error 100% consistent?
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
Error in KinoSearch::Searcher::search [ In reply to ]
On Mar 5, 2007, at 5:32 AM, Karel K. wrote:

> I just wanted to make sure first, whether this
> is a known problem.

It may have become one in between your first email and your second.

> This error occurred on a Debian 64-bit system (testing and
> experimental combined) with Perl 5.8.8.

OK, that's good to know. I've also heard that KS 0.20_01 "isn't
working" (no more specifics) on a 64-bit Fedora box.

> I will try to reproduce the
> error on similar Debian systems to narrow down the possible cause.

Your error, the one reported by Edward Betts, and another one I've
discovered myself during the course of investigating -- large indexes
occasionally returning too few and incorrect results -- all appear to
be related to the term list iterator, which was refactored just
before 0.20_01 was released. In at least some cases, it's scanning
past a target it should stop on.

If you haven't already done the work, I suggest you hold off until I
solve the failing test case supplied by Edward.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Error in KinoSearch::Searcher::search [ In reply to ]
The problem has disappeared with 0.20_02. Thank you very much!
Error in KinoSearch::Searcher::search [ In reply to ]
On Mar 5, 2007, at 1:50 PM, Karel K. wrote:

> And another Note:
> I guess, besides my specific problem, there is some 64bit related
> issue. Maybe in the mathfunctions library, because the length seems
> not to be reported correctly. (Filesize is equal) Should be more than
> 228.
>
> 32-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/kinosearch/KinoSearch-0.20_01/sample/uscon_invindex/_1.cf
> (start: 4294960869 len 4294967295)
>
> 64-bit:
> Error in function refill at c_src/KinoSearch/Store/InStream.c:100:
> Read past EOF of
> /var/www/wikipedia/kslokal/constitution/uscon_invindex/_1.cf (start:
> 18446744073709545189 len 228)

I am almost certain that this discrepancy arises as an artifact of my
using Perl's sprintf command -- or more specifically, the XS command
sv_vcatpvf(), to prepare error messages. (The relevant code is in
c_src/KinoSearch/Util/Carp.c.) The file pointers are 64 bit integers
in the KS C library; when they have to pass through Perl scalars,
they are turned to doubles, which can hold integers up to 2**53, more
than enough for any real file size. However, the conversion does not
happen for the error messages, and sv_vcatpvf() handles integers
differently depending on whether your Perl is 32-bit or 64-bit.

I have to use sv_vcatpvf() for error messages because the C sprintf()
command is vulnerable to buffer overflow attacks and error messages
could be manipulated by something as simple as an maliciously crafted
query string. snprintf() solves this problem when it's available,
but it isn't always -- I don't think MSVC provides it.

It might be worth working up alternative code for the routines in
Carp.c using snprintf() when it's detected. Another remedy might be
to always convert file pointers in error messages to doubles. That
would be kind of annoying, though, because the error messages are
scattered throughout the library rather than concentrated in one file.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/