I am facing a requirement change to get % sign retained in searches. e.g.
Sample search docs:
1. Number of boys 50
2. My score was 50%
3. 40-50% for pass score
Search query: 50%
Expected results: Doc-2, Doc-3 i.e.
My score was
1. 50%
2. 40-50% for pass score
Actual result: All 3 documents (because tokenizer strips off the % both
during indexing as well as searching and hence matches all docs with 50 in
it.
On the implementation front, I am using a set of filters like
lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
StandardTokenizer.
Per my analysis suggests, StandardTokenizer strips off the % I am facing a
requirement change to get % sign retained in searches. e.g
Sample search docs:
1. Number of boys 50
2. My score was 50%
3. 40-50% for pass score
Search query: 50%
Expected results: Doc-2, Doc-3 i.e.
My score was 50%
40-50% for pass score
Actual result: All 4 documents
On the implementation front, I am using a set of filters like
lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
StandardTokenizer.
Per my analysis, StandardTOkenizer strips off the % sign and hence the
behavior.Has someone faced similar requirement? Any help/guidance is highly
appreciated.
Regards
Amitesh
--
Regards,
Amitesh
Sent from Gmail Mobile
(Please ignore typos)
Sample search docs:
1. Number of boys 50
2. My score was 50%
3. 40-50% for pass score
Search query: 50%
Expected results: Doc-2, Doc-3 i.e.
My score was
1. 50%
2. 40-50% for pass score
Actual result: All 3 documents (because tokenizer strips off the % both
during indexing as well as searching and hence matches all docs with 50 in
it.
On the implementation front, I am using a set of filters like
lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
StandardTokenizer.
Per my analysis suggests, StandardTokenizer strips off the % I am facing a
requirement change to get % sign retained in searches. e.g
Sample search docs:
1. Number of boys 50
2. My score was 50%
3. 40-50% for pass score
Search query: 50%
Expected results: Doc-2, Doc-3 i.e.
My score was 50%
40-50% for pass score
Actual result: All 4 documents
On the implementation front, I am using a set of filters like
lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer
StandardTokenizer.
Per my analysis, StandardTOkenizer strips off the % sign and hence the
behavior.Has someone faced similar requirement? Any help/guidance is highly
appreciated.
Regards
Amitesh
--
Regards,
Amitesh
Sent from Gmail Mobile
(Please ignore typos)