Mailing List Archive

document attributes
Hi all
I have already posted same message to java-user group yesterday, no
response yet, please help me out of problem
I have text docs similar to the TREC format some think like this

<file>
<author>xxxxxxxx</author>
<publisher>yyyyyyyy</publisher>
<text>
Full text search with one or more keywords with advanced search
operators to enhance search has to be implemented. Advanced search with
document attributes like author, title, type and Meta keyword in
addition to the search term should be implemented. </text> </file>

with this type of doc I can index text between "<text>" tags, but I want
to preserve the data in the other tags as well like
<author>xxxxxxxx</author> <publisher>yyyyyyyy</publisher>.

Because the user when searching for docs he will use author as attribute
for searching Doc,

Is there any possibility to index the author and publisher data and use
in the search. please tell me how can I index and use in search.

And also please guide me with methods or references in IR that use
attributes of DOC for search purpose.

Thanks in advance
Madhu




Madhu Satyanarayana. Panitini
PASS GCA Solution Centre Pvt Ltd.
601 Aditya Trade Centre, Ameerpet,
Hyderabad, India.
www.pass-consulting.com