Mailing List Archive

Lucene as xml store
hi,

I am a beginner to lucene , So kindly excuse me if the questions mentioned a
bit naive.
- Can I use lucene as an xml store + search engine?
- What I understood is that if we want to perform search on xml doc. we need
to parse xml document, form indexes and on the basis of fields perform
search.
- So, does this mean, that even if we use lucene as xml store (IF WE CAN!!),
we need to parse it to form indexes?

Please reply to this as soon as possible

Regards,
Namrata
RE: Lucene as xml store [ In reply to ]
Hi Otis,

But can lucene be used as an xml store i.e. storing original xml documents
as it is?

Regards,
Namrata

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Friday, July 22, 2005 11:50 AM
To: general@lucene.apache.org
Subject: Re: Lucene as xml store

Hi Namrata,

Yes, you would need to parse the XML.
Here is one way to do it:
http://www-128.ibm.com/developerworks/java/library/j-lucene/

Otis


--- Namrata Kumari <Nkumari@saba.com> wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need
> to parse xml document, form indexes and on the basis of fields perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?
>
> Please reply to this as soon as possible
>
> Regards,
> Namrata
>
>
>
Re: Lucene as xml store [ In reply to ]
Hi Namrata,

Yes, you would need to parse the XML.
Here is one way to do it:
http://www-128.ibm.com/developerworks/java/library/j-lucene/

Otis


--- Namrata Kumari <Nkumari@saba.com> wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a
> bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need
> to parse xml document, form indexes and on the basis of fields
> perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!),
> we need to parse it to form indexes?
>
> Please reply to this as soon as possible
>
> Regards,
> Namrata
>
>
>
RE: Lucene as xml store [ In reply to ]
Hey Erik,

Thanks for the info.

- Well, the application I want to develop is more like storing xml files and
with each of them having different structure. And then performing search on
them that in turn can depend on the structure of the xml doc and user's
requirement.

- Moreover, I did not exactly understood as to how I can store the xml
document. I mean, I went through the java doc and couldnot figure out the
api's that could be used for this purpose. Can you guide me in this?

- But the biggest question is: Is Lucene a good option [.which now I doubt on
the basis of what I have read till now :-(]

Regards,
Namrata


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, July 22, 2005 2:11 PM
To: general@lucene.apache.org
Subject: Re: Lucene as xml store


On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need to parse xml document, form indexes and on the basis of fields
> perform search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings essentially).
Lucene is also a flat document space and doing queries for things
hierarchical is not how it was designed, but it can be done to a limited
degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you could
store an XML document in it as well.

You have not provided any information on the types of queries you need to
support or what the user experience will be like. There are many ways to
use Lucene and whether it is suitable solution to your
application depends on that information. Tell us more about what
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say! :) No need to say such a thing - if you
have well articulated questions that are straightforward enough to answer,
you'll get responses quickly here.

Erik
Re: Lucene as xml store [ In reply to ]
On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a
> bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml
> doc. we need
> to parse xml document, form indexes and on the basis of fields perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF
> WE CAN!!),
> we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings
essentially). Lucene is also a flat document space and doing queries
for things hierarchical is not how it was designed, but it can be
done to a limited degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you
could store an XML document in it as well.

You have not provided any information on the types of queries you
need to support or what the user experience will be like. There are
many ways to use Lucene and whether it is suitable solution to your
application depends on that information. Tell us more about what
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say! :) No need to say such a thing - if you
have well articulated questions that are straightforward enough to
answer, you'll get responses quickly here.

Erik
RE: Lucene as xml store [ In reply to ]
You are better off using an XML database like

http://xml.apache.org/xindice/
or
http://exist.sourceforge.net/

... which will allow you to perform fast XPath queries on your XML data.

-----Original Message-----
From: Namrata Kumari [mailto:Nkumari@saba.com]
Sent: 22 July 2005 10:37
To: general@lucene.apache.org
Subject: RE: Lucene as xml store


Hey Erik,

Thanks for the info.

- Well, the application I want to develop is more like storing xml files and
with each of them having different structure. And then performing search on
them that in turn can depend on the structure of the xml doc and user's
requirement.

- Moreover, I did not exactly understood as to how I can store the xml
document. I mean, I went through the java doc and couldnot figure out the
api's that could be used for this purpose. Can you guide me in this?

- But the biggest question is: Is Lucene a good option [.which now I doubt on
the basis of what I have read till now :-(]

Regards,
Namrata


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, July 22, 2005 2:11 PM
To: general@lucene.apache.org
Subject: Re: Lucene as xml store


On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need to parse xml document, form indexes and on the basis of fields
> perform search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings essentially).
Lucene is also a flat document space and doing queries for things
hierarchical is not how it was designed, but it can be done to a limited
degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you could
store an XML document in it as well.

You have not provided any information on the types of queries you need to
support or what the user experience will be like. There are many ways to
use Lucene and whether it is suitable solution to your
application depends on that information. Tell us more about what
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say! :) No need to say such a thing - if you
have well articulated questions that are straightforward enough to answer,
you'll get responses quickly here.

Erik


--
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.
Re: Lucene as xml store [ In reply to ]
On Jul 22, 2005, at 4:37 AM, Namrata Kumari wrote:
> - Well, the application I want to develop is more like storing xml
> files and
> with each of them having different structure. And then performing
> search on
> them that in turn can depend on the structure of the xml doc and
> user's
> requirement.

That's still a pretty generic requirement. What type of queries?
XPath?

> - Moreover, I did not exactly understood as to how I can store the xml
> document. I mean, I went through the java doc and couldnot figure
> out the
> api's that could be used for this purpose. Can you guide me in this?

Look at the various types of fields. There is a "stored" attribute
on Field that allows the field to be stored.

> - But the biggest question is: Is Lucene a good option [.which now I
> doubt on
> the basis of what I have read till now :-(]

It really all depends. I built a search engine for the Rossetti
Archive (http://www.rossettiarchive.org/rose/) which indexes XML
files like this:

http://www.rossettiarchive.org/docs/1-1847.s244.raw.xml

XPath queries are not possible into the XML, but that is also not a
use case for the system. Highly structured queries such as this one
are supported because the indexing process extracted detailed
information from the XML files:

http://www.rossettiarchive.org/rose/?query=%2Bgenre%3Asonnet+%2B%
28author%3Arossetti+OR+author%3Adgr%29+%2Byear%3A%5B1850+TO+1870%5D

I still do not have a clear cut understanding of your needs and thus
still not sure if Lucene is suitable or not. Certainly for full-text
searches it is a fine choice, but the structured queries are a
different story.

Erik


>
> Regards,
> Namrata
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Friday, July 22, 2005 2:11 PM
> To: general@lucene.apache.org
> Subject: Re: Lucene as xml store
>
>
> On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:
>
>
>>
>> hi,
>>
>> I am a beginner to lucene , So kindly excuse me if the questions
>> mentioned a bit naive.
>> - Can I use lucene as an xml store + search engine?
>> - What I understood is that if we want to perform search on xml doc.
>> we need to parse xml document, form indexes and on the basis of
>> fields
>> perform search.
>> - So, does this mean, that even if we use lucene as xml store (IF WE
>> CAN!!), we need to parse it to form indexes?
>>
>
> Lucene is a search engine and only deals with text (Strings
> essentially).
> Lucene is also a flat document space and doing queries for things
> hierarchical is not how it was designed, but it can be done to a
> limited
> degree depending on how data is indexed.
>
> Yes, Lucene can store text as well as make it searchable - so you
> could
> store an XML document in it as well.
>
> You have not provided any information on the types of queries you
> need to
> support or what the user experience will be like. There are many
> ways to
> use Lucene and whether it is suitable solution to your
> application depends on that information. Tell us more about what
> you're wanting to do and we can guide you further.
>
>
>> Please reply to this as soon as possible
>>
>
> That's what they all say! :) No need to say such a thing - if you
> have well articulated questions that are straightforward enough to
> answer,
> you'll get responses quickly here.
>
> Erik
>