Mailing List Archive

Breaking up words with a certain pattern and search by parts
In my project I would like to search for product code such as
MEM12345 either by "MEM" or by "12345". I can't do that right
now in Lucene 1.2. Prefix query doesn't do prefix search followed
by numbers, and there is no "end with" type of search. How do I
modify the HTMLParser to index MEM12345 as two words MEM and 12345
instead of one?



To unsubscribe, e-mail: <>
For additional commands, e-mail: <>
Re: Breaking up words with a certain pattern and search by parts [ In reply to ]
Hi Sheldon,

You will need to write a custom query parser. However, the current
queryParser.jj is the right place to start.

You can easily modify this to have a pattern of _LETTER only and _DIGIT
Currently it allows numbers and letters to be combined as a term.

If you have more questions, just ask.


On 4/5/02 10:15 AM, "Sheldon Shi" <> wrote:

> In my project I would like to search for product code such as
> MEM12345 either by "MEM" or by "12345". I can't do that right
> now in Lucene 1.2. Prefix query doesn't do prefix search followed
> by numbers, and there is no "end with" type of search. How do I
> modify the HTMLParser to index MEM12345 as two words MEM and 12345
> instead of one?
> Thanks.
> Sheldon
> --
> To unsubscribe, e-mail: <>
> For additional commands, e-mail: <>

To unsubscribe, e-mail: <>
For additional commands, e-mail: <>
Re: Breaking up words with a certain pattern and search by parts [ In reply to ]
Hi Sheldon,

It was my understanding that you should parse the input text yourself
(since you understand the deeper semantics). When you see "MEM12345"
you can add {"MEM12345", "MEM" and "12345"} into the words to index.
This is similar to converting the words to lowercase or stripping
accents when analyzing a text.

If you wanted to get even more fancy you could create one or more
separate fields for the product codes or fragments of product code.
Your GUI could use them to search on.

Does this make sense?
Sheldon Shi wrote:

> In my project I would like to search for product code such as
> MEM12345 either by "MEM" or by "12345". I can't do that right
> now in Lucene 1.2. Prefix query doesn't do prefix search followed
> by numbers, and there is no "end with" type of search. How do I
> modify the HTMLParser to index MEM12345 as two words MEM and 12345
> instead of one?
> Thanks.
> Sheldon
> --
> To unsubscribe, e-mail: <>
> For additional commands, e-mail: <>

Neil Pitman

To unsubscribe, e-mail: <>
For additional commands, e-mail: <>