Mailing List Archive

help with query filter
Hello,

I am trying to debug an issue I am having with
KinoSearch::Search::QueryFilter (from KinoSearch version 0.162) and am
hoping I might get a pointer here.

What I am seeing is that when I create an index by the standard calls to
spec_field and add_doc that things work fine. But then if I re-index a
document (done in a different process and without spec_field being called),
my queries begin to fail. If I remove the filter from my query, the results
are as expected and so that's why I'm focusing on the filter.

This is how I spec the filter field in the initial indexing:

$invIndexer->spec_field (
name => '__group',
analyzer => undef,
indexed => 1,
analyzed => 0,
vectorized => 0,
);

What I _think_ is happening is that when I create my index, the filter field
is correctly not being analyzed but that when I do the re-index, it is being
analyzed and this then is causing an issue. The reason I think this is if
the value I use for __group is "events" then I see the issue but if it is
"event" then I do not see the issue.

Something else I saw, if right after I create the index, I dump it, I see
this:

Fields:
...
29: __group [a,i,s]

which is telling me that the field is to be analyzed. Perhaps that's the
issue?

Also, if I diff the dump from a clean index and a dump from after
re-indexing one document, I see entries like this:

> __group:event
> Doc 0 (2 occurrences)
> Doc 1 (2 occurrences)

Which seems to be showing the __group field now with an analyzed value. (I
checked my code a few times to make sure I wasn't switching the __group
value between indexings.)

BTW, my filter is created like this:

my $groupQuery = KinoSearch::Search::TermQuery->new (
term => KinoSearch::Index::Term->new ( '__group', 'events' )
);

my $filter = KinoSearch::Search::QueryFilter->new (
query => $groupQuery
);

So I'm hoping this rings a bell with anyone in terms of something I'm doing
wrong or what the issue might be. If not then I'll work on developing a
concise test case to hopefully reproduce what I'm seeing.

Thanks for your time,
Mike
Re: help with query filter [ In reply to ]
On Sep 2, 2008, at 1:06 PM, Mike Barborak wrote:

> if I re-index a document (done in a different process and without
> spec_field being called),

Is there a reason not to call spec_field() when you re-index? Calling
it would likely solve the problem.

Calling spec_field() multiple times is fine (and recommended) so long
as the field definition is always the same.

> What I _think_ is happening is that when I create my index, the
> filter field is correctly not being analyzed but that when I do the
> re-index, it is being analyzed and this then is causing an issue.

Yes, it looks like that's right. The 'analyzed' flag is not stored
with the index. It's defaulting to a true value when the fields
metadata is read in (FieldInfos->read_infos). This wouldn't cause
significant problems for most people because once the data is in the
index, it doesn't get re-analyzed. (I can see an esoteric bug with
Searcher->_prepare_simple_search, but it wouldn't be easy to tickle.)

The workaround should be to call spec_field(). A fix for maint would
involve storing the 'analyzed' flag, which would be a little tricky
for back-compat reasons.

I know the devel branch is not an option for you, but for the record
and anyone who might be concerned, this problem would not affect devel
-- field definitions are determined by the FieldSpec class assigned to
the given field name in the Schema, and this load-from-disk-vs-call-
spec_field initialization conflict wouldn't happen.

> So I'm hoping this rings a bell with anyone in terms of something
> I'm doing wrong or what the issue might be. If not then I'll work on
> developing a concise test case to hopefully reproduce what I'm seeing.

Good detective work.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch