Mailing List Archive

Test Message/ Intorduction
My name is Thomas Bizzell. I am just sending a test message.


I worked for 11 years as a Software Engineer/Architect for a company called
Synthesys Technologies, INC in Austin, Texas, until the company went
bankrupt early this year. We used a propriatary full text Databases for use
use in a medical records system that we called Emrx. THat product was
written in c. I also developed a search engine in C++ that used an Object
Oriented Database/Objectivity as a data store.

The search engines were aimed to provide precission in the searching. In
the C++ version, were were using XML documents.


a simple query could be:

w/section(w/title(Discharge Summary) wo/4(discharged home)))

the wo/4 means found in the document in the order of the query within 4 word
window.


I have been trying to build up my java skills and learning EJB. I was
looking for information about Tomcat, when I came accross this project. I
am going to download the sources. I might be able to help out in the future.


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Test Message/ Intorduction [ In reply to ]
Hi,
i am trying to do implement the 'within' search like you said in your email:
w/section(w/title(Discharge Summary) wo/4(discharged home)))

I think is not hard add this features to Lucene, if someone with good knoledge of Lucene give to us some hint.
Please.
Looking at the code i understood the main scenario when a search is fired.
What i don't understand are the PhraseScorer and ExactPhraseScorer classes.

How they works? What happen inside those class? What they do?

Because i think the only difference between an ExactPhraseQuery and a WithinQuery is the way how they calculate the score.
ExactPhraseQuery make sure that the order of each term in the query is the same inside the document, doing some check on the position read from the SegmentReader (depending on the Analyzer too).
WithinQuery should do the same BUT with a little difference:
The order could have a maximum offset of x words (x is the wo/x ->e.g.: buy wo/10 car => x=10).
I think i am on the right way, but i need to understand more about the Scorer (see question above), the TermPositions, and some math used to calculate the score.

My idea, (based on my knoledge about Lucene!) is:

1.The first thing to do is change the QueryParser.jj to make it able to reconize that when 'wo/10' occurs it should instantiate a class
'WithinQuery' and set an internal variable 'withinWords' to '10'.
This happen when you create your query object inside your servlet or jsp page:

Query query = QueryParser.parse(“your query string”, "your fields name", analyzer);
I NEED HELP TO CHANGE THE QUERYPARSER.JJ BECAUSE I DON'T KNOW JAVACC!!

2.The class WithinQuery should have some kind of method to check the offset (words between, NOT chars between!!) and build the weights:
- final float sumOfSquaredWeights(Searcher searcher)
- query.normalize(norm);

3.The Query object have a method 'Scorer scorer(Query query, Searcher searcher, IndexReader reader)'
inside the Query object, the method 'scorer(IndexReader reader);' is called, every XxxxQuery class must implement this method because it is abstract, so in our WithinQuery this method will return a WithinQueryScorer object.

Am i on the right way?

Please let me know.
I appreciate any help.
Also i am almost done with the Highlight feature, how i can add it to the contribution list?

Thaks,bye.






See Dave Matthews Band live or win a signed guitar
http://r.lycos.com/r/bmgfly_mail_dmb/http://win.ipromotions.com/lycos_020201/splash.asp

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>