I want to be able to search on a field which contains a numerical value,
specifying a range, such as 1-100. If my understanding of Lucene is
correct, all fields look essentially like strings, so a simple ranhe
query won't work (after all, searching on the range "a"-"azz" should not
match "b"). So my plan is to pad up all numbers to a fixed length by
prefixing them with zeros on both indexing and search, so the range then
becomes (e.g.) 000001-000100.
My one worry is that it will upset the rankings, as number which
happened to have occurred in more documents will get a lower IDF,
whereas all number really ought to receive equal treatment. So a
possible refinement is to include the clause for the number in my
overall boolean expression, but give it a boost of zero or some small
number. So it has to match but does not contribute to the relevance
Any comments? Does this seem like a reasonable way to do things? I
assume the internal handling of dates does something like it.
-- David Elworthy
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
specifying a range, such as 1-100. If my understanding of Lucene is
correct, all fields look essentially like strings, so a simple ranhe
query won't work (after all, searching on the range "a"-"azz" should not
match "b"). So my plan is to pad up all numbers to a fixed length by
prefixing them with zeros on both indexing and search, so the range then
becomes (e.g.) 000001-000100.
My one worry is that it will upset the rankings, as number which
happened to have occurred in more documents will get a lower IDF,
whereas all number really ought to receive equal treatment. So a
possible refinement is to include the clause for the number in my
overall boolean expression, but give it a boost of zero or some small
number. So it has to match but does not contribute to the relevance
Any comments? Does this seem like a reasonable way to do things? I
assume the internal handling of dates does something like it.
-- David Elworthy
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>