Mailing List Archive

retreiving wrong categories
Hello,



I have an installation of lucene that is retrieving the wrong documents,
consistently. The code hasn't changed and works fine in other
installations. I have been using lucene successfully for a couple years
now and I haven't seen this problem since I was originally implementing
lucene.



I have a field "cat" (short for categories) that this item belongs to.
It's a hierarchy and this field looks like:



Bases[2]{0}



Or



Computer Cradles Docking Stations[7]{3}|Motorola[15]{1}|ML850[43]{1}|





Its cat name [catID] {cat sequence number} | another category |
another category, etc etc



So if I search cat for "2" I should find anything that belongs in that
category. In the second example here, the item belongs to cat 43, and
by this ancestry belongs to cat 15 and cat 7 as well.



For the most part this works. However, in this problem install, there
are a couple items that consistently are appearing out of their
category. If I search for cat7, an item keeps coming up with the
"Bases[2]{0}" category. Also, if I look in "bases", a couple of the
"Computer Cradles Docking Stations" items come up, amoung other problem
items. If I print out the category from lucene right there on the
display it shows that it is still in that category.. so I don't feel
that it's an indexing issue or before indexing. For somereason a
search of



+cat:("2") +(cartable:1)





Is getting me mostly cat2 items but a few others.



What can I do to start tracking down why this is?

Any thoughts?



Thanks for your help



-jN
Re: retreiving wrong categories [ In reply to ]
FYI: general@lucene is a very high level, low subscriber list for
discussing broad topics relating to the entire lucene Top Levle Project
(Lucene-Java, Nutch, Hadoop, Solr, Lucy, Lucene.Net, etc...) your
question is probably best asked on the java-user list, unless it relates
to a port to another language, in which case you should use the
appropriate user list)

that said...

: I have an installation of lucene that is retrieving the wrong documents,
: consistently. The code hasn't changed and works fine in other
: installations. I have been using lucene successfully for a couple years
: now and I haven't seen this problem since I was originally implementing
: lucene.

: I have a field "cat" (short for categories) that this item belongs to.
: It's a hierarchy and this field looks like:

: Computer Cradles Docking Stations[7]{3}|Motorola[15]{1}|ML850[43]{1}|

: +cat:("2") +(cartable:1)

: Is getting me mostly cat2 items but a few others.

: What can I do to start tracking down why this is?

I would start by using Luke to inspect what is actually indexed for each
of these docs, if i had to guess i would suppose that maybe cat:("2") is
matching not only on categoryId #2, but categories where "2" is the
sequence number.




-Hoss