Mailing List Archive

XML search language...
Heyoo to all...

Simple question, I have to create a "service" for searching throughout
our database of articles (funny enough) on www.vnunet.com...

The idea is to send a search query in XML, and return the matched items in
XML again, and then styling them...

Question: is there a standard dtd/schema for those kinds of XML
"transactions" ??? Or should I just invent my own? Pointers, anyone?

Pier


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
There is no such standard that I know of, although somebody mentioned
inventing(?) Query Markup Language (QML, an abbreviation already used
for things other than Query Markup Language) the other day on either
-user or -dev.

Otis

--- Pier Fumagalli <pier@betaversion.org> wrote:
> Heyoo to all...
>
> Simple question, I have to create a "service" for searching
> throughout
> our database of articles (funny enough) on www.vnunet.com...
>
> The idea is to send a search query in XML, and return the matched
> items in
> XML again, and then styling them...
>
> Question: is there a standard dtd/schema for those kinds of XML
> "transactions" ??? Or should I just invent my own? Pointers, anyone?
>
> Pier
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
Oh, one more thing - depending on the nature of this service, you may
want to consider not using XML, and just writing to and reading from
streams.

Otis

--- Pier Fumagalli <pier@betaversion.org> wrote:
> Heyoo to all...
>
> Simple question, I have to create a "service" for searching
> throughout
> our database of articles (funny enough) on www.vnunet.com...
>
> The idea is to send a search query in XML, and return the matched
> items in
> XML again, and then styling them...
>
> Question: is there a standard dtd/schema for those kinds of XML
> "transactions" ??? Or should I just invent my own? Pointers, anyone?
>
> Pier
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
Take a look at the SOAP messaging protocol:

http://xml.apache.org/soap/docs/index.html
http://www.w3.org/TR/SOAP/
-Nathan

Pier Fumagalli <pier@betaversion.org> wrote:Heyoo to all...

Simple question, I have to create a "service" for searching throughout
our database of articles (funny enough) on www.vnunet.com...

The idea is to send a search query in XML, and return the matched items in
XML again, and then styling them...

Question: is there a standard dtd/schema for those kinds of XML
"transactions" ??? Or should I just invent my own? Pointers, anyone?

Pier


--
To unsubscribe, e-mail:
For additional commands, e-mail:



---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now
Re: XML search language... [ In reply to ]
On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:

> Take a look at the SOAP messaging protocol:

Or XML-RPC is you don't feel like having an headache:

http://www.xmlrpc.com/spec

In any case, those are simply messaging protocols. They don't address
"searching" per se.

What about (ab)using XPath as a query language:

http://www.w3.org/TR/xpath

Or taking your chance with XML Query Language (XQL):

http://www.w3.org/TandS/QL/QL98/pp/xql.html

PA.


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
I expressed the idea of formalizing an xml spec for defining search
queries. My idea is that it would look something like the following

<?xml version="1.0" encoding="UTF-8"?>
<query>
<boolean type="and">
<term field="character">Bird</term>
<group>
<term field="category">Cartoon</term>
<boolean type="not">
<term field="name">Roadrunner</term>
</boolean>
</group>
</boolean>
</query>

Could be used to express the following Lucene query:

character:Bird AND (category:Cartoon NOT name:Roadrunner)

The reason for expressing the syntax in xml it to facilitate the use of
generic technologies for filtering and manipulating the query (SAX XSLT
etc).

So, say I have 4 different "Index Services" that have the similar
information stored under different fields. So I might want to translate
the query above into the following:

Server 1:

character:Bird AND (category:Cartoon NOT name:Roadrunner)

Server 2:

character:Bird AND (grouping:Cartoon NOT title:Roadrunner)

Server 3:

character=Bird AND (grouping=Cartoon -name=Roadrunner)

Server 4:

(& character=Bird (grouping=Cartoon (!name=Roadrunner)))

Now I can write SAX Filters that will generate the appropriate query for
each "Index Service".

This makes XML the stepping stone to easily get from one syntax to
another. All that needs to be written to map each syntax is:
1.) A SAX parser to generate SAX events from the query string.
2.) A ContentHandler to generate a query string from SAX events.

these can then be combined with other parsers/handlers to translate from
any syntax to another mapped syntax.

I know this is probibly outside the scope of Lucene itself as a project.
But it would be interesting if such a standard arose for query syntax,
then if Lucene supported such features it would make it even easier for
others to integrate it into their present "legacy" Search Services.
Imagine, you could install a Lucene Service and write a Parser/Handler
to map your "legacy" systems queries directly into Lucene syntax, then
it could start acting just like your "legacy" service.

-Mark Diggory

p.s.

I've attached an example schema representation of the syntax (dubed
QueryML.)

Of course this does not explore other issues that may arise when trying
to collect a multitude of results from different sources (like merging...).


Otis Gospodnetic wrote:

> There is no such standard that I know of, although somebody mentioned
> inventing(?) Query Markup Language (QML, an abbreviation already used
> for things other than Query Markup Language) the other day on either
> -user or -dev.
>
> Otis
>
> --- Pier Fumagalli wrote:
>
> >Heyoo to all...
> >
> > Simple question, I have to create a "service" for searching
> >throughout
> >our database of articles (funny enough) on www.vnunet.com...
> >
> >The idea is to send a search query in XML, and return the matched
> >items in
> >XML again, and then styling them...
> >
> >Question: is there a standard dtd/schema for those kinds of XML
> >"transactions" ??? Or should I just invent my own? Pointers, anyone?
> >
> > Pier
> >
> >
> >--
> >To unsubscribe, e-mail:
> >
> >For additional commands, e-mail:
> >
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
> For additional commands, e-mail:
Re: XML search language... [ In reply to ]
On 29/11/02 4:09 "Otis Gospodnetic" <otis_gospodnetic@yahoo.com> wrote:

> Oh, one more thing - depending on the nature of this service, you may
> want to consider not using XML, and just writing to and reading from
> streams.

We have "tree-like" data to think about an XML (and our partners want that
as well :-)

Pier


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
On 29/11/02 4:28 "Nathan Ander" <nathanander@yahoo.com> wrote:
>
> Take a look at the SOAP messaging protocol:
>
> http://xml.apache.org/soap/docs/index.html
> http://www.w3.org/TR/SOAP/

Never! :-) It's an overkill... What I need to do can be done with ONE
servlet generating/reading SAX events! :-)

Pier


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
On 29/11/02 8:15 "petite_abeille" <petite_abeille@mac.com> wrote:

>
> On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:
>
>> Take a look at the SOAP messaging protocol:
>
> Or XML-RPC is you don't feel like having an headache:

Nanananana... Way out... :-)

Pier


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
Your not providing enough info about what your trying to do, thus you
get generalized responses to basic messaging technologies available.

1.) Does your lucene query really have to be in xml <query><term
name="foo">bar</term></query> ? Or are you just wrapping the query in an
XML wrapper? <query>foo:bar</query>

can you settle for http params? http://host/servlet?query="foo:bar"

2.) If what your looking for is just to generate an XML response from
the Lucene results, thats a pretty trivial SAX Filtering strategy.

Lucene resultset --> SAXFilter -->SAX Events --> Serialize to response


-Mark

Pier Fumagalli wrote:
> On 29/11/02 8:15 "petite_abeille" <petite_abeille@mac.com> wrote:
>
>
>>On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:
>>
>>
>>>Take a look at the SOAP messaging protocol:
>>
>>Or XML-RPC is you don't feel like having an headache:
>
>
> Nanananana... Way out... :-)
>
> Pier
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
On Sunday, Dec 8, 2002, at 02:10 Europe/Zurich, Pier Fumagalli wrote:

> Nanananana... Way out... :-)

XQuery 1.0: An XML Query Language

http://www.w3.org/TR/xquery/


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
On 8/12/02 17:13 "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:

> Your not providing enough info about what your trying to do, thus you
> get generalized responses to basic messaging technologies available.

Yeah, sorry Mark, but I'm overwhelmed by work ATM, and this project will
start early next january... So, I can only dedicate few minutes a day...

Couple of days ago, I flagged a message in which you said:

> <?xml version="1.0" encoding="UTF-8"?>
> <query>
> <boolean type="and">
> <term field="character">Bird</term>
> <group>
> <term field="category">Cartoon</term>
> <boolean type="not">
> <term field="name">Roadrunner</term>
> </boolean>
> </group>
> </boolean>
> </query>

That _is_ a beauty.. Plain, simple, and doing exactly what I need! :-)

No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that does the
job I require... :-)

Only thing I don't "like" is how you group up the terms, for example, I
don't quite get the distinction between "boolean / and" and group...

In theory, a binary operation can always be reducible to its minimal
configuration of two terms, depending on what precedence we give to the (for
instance) "and" "or" and "not" operations... So, I don't see why group is
actually there! :-)

And also, one other thing is that since we have the flexibility of XML, why
not using specific tags, such as <and/> or <or/> and <not/>...

That is because, if you process SAX events, you can easily trigger on those
names which are unique in your tag, while if you do use attributes, well,
the whole thing gets a little bit messed up in terms of parsing/checking and
slower because you have to analyze every single attribute to get the "type"
of your boolean operation....

I'm thinking about something like:

<?xml version="1.0"?>
<query index="Articles">
<and>
<term field="subject">Microsoft</term>
<or>
<term>Lawsuit</term>
<term>Court</term>
</or>
</and>
</query>

Does it make sense????

Pier


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
Pier, I think some of this stuff, esp. boolean operators already exists
in Ant.
So you can look there for ideas, if not code.

Otis

--- Pier Fumagalli <pier@betaversion.org> wrote:
> On 8/12/02 17:13 "Mark R. Diggory" <mdiggory@latte.harvard.edu>
> wrote:
>
> > Your not providing enough info about what your trying to do, thus
> you
> > get generalized responses to basic messaging technologies
> available.
>
> Yeah, sorry Mark, but I'm overwhelmed by work ATM, and this project
> will
> start early next january... So, I can only dedicate few minutes a
> day...
>
> Couple of days ago, I flagged a message in which you said:
>
> > <?xml version="1.0" encoding="UTF-8"?>
> > <query>
> > <boolean type="and">
> > <term field="character">Bird</term>
> > <group>
> > <term field="category">Cartoon</term>
> > <boolean type="not">
> > <term field="name">Roadrunner</term>
> > </boolean>
> > </group>
> > </boolean>
> > </query>
>
> That _is_ a beauty.. Plain, simple, and doing exactly what I need!
> :-)
>
> No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that
> does the
> job I require... :-)
>
> Only thing I don't "like" is how you group up the terms, for example,
> I
> don't quite get the distinction between "boolean / and" and group...
>
> In theory, a binary operation can always be reducible to its minimal
> configuration of two terms, depending on what precedence we give to
> the (for
> instance) "and" "or" and "not" operations... So, I don't see why
> group is
> actually there! :-)
>
> And also, one other thing is that since we have the flexibility of
> XML, why
> not using specific tags, such as <and/> or <or/> and <not/>...
>
> That is because, if you process SAX events, you can easily trigger on
> those
> names which are unique in your tag, while if you do use attributes,
> well,
> the whole thing gets a little bit messed up in terms of
> parsing/checking and
> slower because you have to analyze every single attribute to get the
> "type"
> of your boolean operation....
>
> I'm thinking about something like:
>
> <?xml version="1.0"?>
> <query index="Articles">
> <and>
> <term field="subject">Microsoft</term>
> <or>
> <term>Lawsuit</term>
> <term>Court</term>
> </or>
> </and>
> </query>
>
> Does it make sense????
>
> Pier
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: XML search language... [ In reply to ]
Pier Fumagalli wrote:

>><?xml version="1.0" encoding="UTF-8"?>
>><query>
>> <boolean type="and">
>> <term field="character">Bird</term>
>> <group>
>> <term field="category">Cartoon</term>
>> <boolean type="not">
>> <term field="name">Roadrunner</term>
>> </boolean>
>> </group>
>> </boolean>
>></query>
>
>
> That _is_ a beauty.. Plain, simple, and doing exactly what I need! :-)
>
> No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that does the
> job I require... :-)
>
> Only thing I don't "like" is how you group up the terms, for example, I
> don't quite get the distinction between "boolean / and" and group...
>
> In theory, a binary operation can always be reducible to its minimal
> configuration of two terms, depending on what precedence we give to the (for
> instance) "and" "or" and "not" operations... So, I don't see why group is
> actually there! :-)
>

I was trying to allow for precedence that may be determined somehow by
"()"'s. I think your right and I guess it could be simpler to base it on
nesting and just use ((stack or queues) and recursion) to deal with
precedence:

With preceedence ordered (AND, OR).

The following *do* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
(field1:foo AND field2:bar) OR (field3:bim AND field4:bam)

*last one in XML*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">
<or>
<and>
<term field="field1">foo</term>
<term field="field2">bar</term>
</and>
<and>
<term field="field3">bim</term>
<term field="field4">bam</term>
</and>
</or>
</query>


The following *don't* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
field1:foo AND (field2:bar OR field3:bim) AND field4:bam

*but the last one can still be captured in XML without a group tag*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">
<and>
<term field="field1">foo</term>
<and>
<or>
<term field="field2">bar</term>
<term field="field3">bim</term>
</or>
<term field="field4">bam</term>
</and>
</and>
</query>





> And also, one other thing is that since we have the flexibility of XML, why
> not using specific tags, such as <and/> or <or/> and <not/>...
>

I was trying to make it more extensible and generic. But that may be
overkill as well. The idea is that services could define their own
operations without building "new xml tags" because the op was just an
attribute. thus:

<and> could be <boolean type="AND|and|+|&&|&">
<or> could be <boolean type="OR|or|'||'|'|'">
<not> could be <boolean type="NOT|not|-|!">

then we know its a boolean relation, we just don't care what
characters are representing it in the long run. The only risk this
brings up (and thus the need for <group> tags) is precedence of the
unknown character representations.

Also note that <term field="xxx">test</term> carries another attribute
that defines the operation being performed on that term. For example:

<term field="date" op="gt">1996</term>

> That is because, if you process SAX events, you can easily trigger on those
> names which are unique in your tag, while if you do use attributes, well,
> the whole thing gets a little bit messed up in terms of parsing/checking and
> slower because you have to analyze every single attribute to get the "type"
> of your boolean operation....
>
> I'm thinking about something like:
>
> <?xml version="1.0"?>
> <query index="Articles">
> <and>
> <term field="subject">Microsoft</term>
> <or>
> <term>Lawsuit</term>
> <term>Court</term>
> </or>
> </and>
> </query>
>
> Does it make sense????
>
> Pier
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>