Mailing List Archive

Finding number of occurrences of a given word in a document using LUCENE
I am currently facing a difficulty...how do I find the number of occurrences of a particular word in a certain document. Currently, I get some value called as score which is difficult to interpret. I was told to use an interface called TermDocs...any more tips?

Thanks,
Ajit
Re: Finding number of occurrences of a given word in a document using LUCENE [ In reply to ]
I never needed to use this, but it looks like you can call
IndexReader's termDocs(Term) method.
That will give you TermDocs, which has a freq() method that may do what
you need.

Otis


--- AJIT RAJWADE <ajit.rajwade@veritas.com> wrote:
> I am currently facing a difficulty...how do I find the number of
> occurrences of a particular word in a certain document. Currently, I
> get some value called as score which is difficult to interpret. I was
> told to use an interface called TermDocs...any more tips?
>
> Thanks,
> Ajit
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Finding number of occurrences of a given word in a document using LUCENE [ In reply to ]
I tried out the following code to find the number of occurrences of a word in a document using Lucene. It doesnt work...can someone help me?


{
IndexReader reader = IndexReader.open("index");
Searcher searcher = new IndexSearcher("index");
Analyzer analyzer = new StandardAnalyzer();

BufferedReader in = new BufferedReader(new
InputStreamReader(System.in));

// CODE TO TAKE A LINE FROM THE USER

Query query = QueryParser.parse(line, "contents", analyzer);
Hits hits = searcher.search(query);
Term term = new Term("body", line); /* line is the actual text being searched */
TermDocs help = reader.termDocs(term);

/* THIS LOOP IS NEVER ENTERED WHEN THE CODE IS EXECUTED!*/
while (help.next() == true)
{
System.out.println ("\nFreq: " + help.freq());
}

int [] docs = new int [hits.length()];
int [] frequency = new int [hits.length()];
help.read (docs,frequency);

// PRINT THE ARRAYS DOCS AND FREQUENCY. ALL VALUES ARE ZERO!
}


Can you help me please? The while loop is never entered and the array of document numbers and word frequencies contains all zeroes. Why is this happening?
Re: Finding number of occurrences of a given word in a document using LUCENE [ In reply to ]
Hello,

Are you sure that "index" is the name of your index?
Is it really in the same directory as the one that you are invoking
this class from? Why not use a full path to be sure?
Are you sure that your search is matching anything in the first place?
Try printing out the (number of) matching document.
Instead of 'line' why not hard code a single search term, a single
word, in your testing code below?
Are you sure your index has a field named 'contents'? How about
'body'?
It looks like you are not really making use of IndexSearcher nor Hits
instances - do you really need to perform a search to get your
frequences?

That's all I can think of. I don't know if any of this will help you,
but it may help you get started.

Otis


--- AJIT RAJWADE <ajit.rajwade@veritas.com> wrote:
> I tried out the following code to find the number of occurrences of a
> word in a document using Lucene. It doesnt work...can someone help
> me?
>
>
> {
> IndexReader reader = IndexReader.open("index");
> Searcher searcher = new IndexSearcher("index");
> Analyzer analyzer = new StandardAnalyzer();
>
> BufferedReader in = new BufferedReader(new
> InputStreamReader(System.in));
>
> // CODE TO TAKE A LINE FROM THE USER
>
> Query query = QueryParser.parse(line, "contents", analyzer);
> Hits hits = searcher.search(query);
> Term term = new Term("body", line); /* line is the actual text
> being searched */
> TermDocs help = reader.termDocs(term);
>
> /* THIS LOOP IS NEVER ENTERED WHEN THE CODE IS EXECUTED!*/
> while (help.next() == true)
> {
> System.out.println ("\nFreq: " + help.freq());
> }
>
> int [] docs = new int [hits.length()];
> int [] frequency = new int [hits.length()];
> help.read (docs,frequency);
>
> // PRINT THE ARRAYS DOCS AND FREQUENCY. ALL VALUES ARE ZERO!
> }
>
>
> Can you help me please? The while loop is never entered and the
> array of document numbers and word frequencies contains all zeroes.
> Why is this happening?
>
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>