Mailing List Archive

bug with field retrieval?
I'm using Lucene to index a set of documents that contain a lot of duplicate
fields.

A simple example of our typical document structure might look like this:

section_title:
the first section
text:
this is section 1 text
section_title:
the second section
text:
this is section 2 text

If I do a search in section_title for the text "first" then this document is
a hit. However if I then call hits.doc(0).get("section_title"), the section
title with the hit is *not* the field that is returned - the last occurance
of section_title in the document is the one that gets returned.

If I'm not mistaken, this is a bug? Ideally, the section title that made the
document a hit would be returned. Less ideally, but still more desirable
than the current behaviour, would be for the first section title to be
returned, as this is typically what a user sees first when they choose to
view the entire document after seeing it in the search results.

Was the current behaviour intentional/anticipated?

If no one has time to fix this but can point me at the best place to start
I'm willing to attempt to make a fix and contribute a patch. At the moment I
use the results of doc().get("section_title") as part of my search results,
so the current behaviour is leading to slightly strange looking results
pages...

Regards,

Lee Mallabone.