Mailing List Archive

Searching a hierarchical data set
All-

I have an implementation type question for the group that is causing me some
headaches. The records I need indexed are of the following form: there are a
couple header fields, and then an other field with an arbitrary number of
sub fields. Something like a notebook...
Name: Tom Barrett
Date: 2/6/2002
Notes:
Note1: This is the first note
Note2: This is my second note
...
NoteN: This is my last note

Now I want to do searches on these documents where you match things in the
header fields, and match in the notes field, also returning which note field
you match in. Something like...
name:"Tom" notes:"second note"
i want to return the document above, and also let me know that the match
occured in the Note2 field.

Originally, I thought I would just concatinate all of the NoteN fields into
one field Notes, so that a document would be <Name, Date, Notes>. But this
makes it impossible to know which note field a match occurs on, and also if
I do a search for
name:"Tom" notes:"note"
I want a match for each note, since all of the NoteN fields would match this
query.

The only way I can think to do this is to have document consist of <Name,
Date, Note> where each note is one of the NoteN fields. But the problem I
see is that there are about 2 million of these Note fields so the index
would be huge if I duplicate the Name and Date field for every document
(normalization problem). And I want a search for
name:"Tom"
to only return the above document (not the above document N times).


I think the right answer is to have two indexes and normalize that way...one
with just a document just being a <name, date, docID> and a second index
being a <docID, Note> with each note field being it's own document. Then, a
search for
name:"Tom" notes:"second note"
would search the <name, date, docID> index to get the hits on "Tom", and
then search the <docID, Note> index to get hits on "second note" and then do
a docID matching type thing.


Anyone have any other thoughts on how to do this? I'm kind of stuck...

Thanks for the help,

Tom


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>