Mailing List Archive

Zones
Hi,

We use Verity, a commercial vendor, for our Search, but were in serious trouble
with its performance, and looking for a solid, more economical, open source
alternative, like Lucene.

A prototype we built using Lucene compared favorably with Verity, but then
along came "zones". Verity tech support helped us re-configure our indices
with "zones", giving us a fivefold increase in performance. Note, "Zones" are
a separate, non-fielded, word list with addressing maps (each word mapped to an
address/document).

Is anyone familiar with Verity "zones"? Does Lucene implement "zones" in its
own way? How?


-Joe




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Zones [ In reply to ]
This sounds like a partitioning of the index am I correct?

On Fri, 2002-01-25 at 14:50, Joe Lerner wrote:
>
> Hi,
>
> We use Verity, a commercial vendor, for our Search, but were in serious trouble
> with its performance, and looking for a solid, more economical, open source
> alternative, like Lucene.
>
> A prototype we built using Lucene compared favorably with Verity, but then
> along came "zones". Verity tech support helped us re-configure our indices
> with "zones", giving us a fivefold increase in performance. Note, "Zones" are
> a separate, non-fielded, word list with addressing maps (each word mapped to an
> address/document).
>
> Is anyone familiar with Verity "zones"? Does Lucene implement "zones" in its
> own way? How?
>
>
> -Joe
>
>
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
--
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
- fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Zones [ In reply to ]
Verity uses partitions, regardless of zones, but adds partitions with the
"special" addressing scheme for zones.

--- Original ---

This sounds like a partitioning of the index am I correct?

On Fri, 2002-01-25 at 14:50, Joe Lerner wrote:
>
> Hi,
>
> We use Verity, a commercial vendor, for our Search, but were in serious troub
le
> with its performance, and looking for a solid, more economical, open source
> alternative, like Lucene.
>
> A prototype we built using Lucene compared favorably with Verity, but then
> along came "zones". Verity tech support helped us re-configure our indices
> with "zones", giving us a fivefold increase in performance. Note, "Zones" ar
e
> a separate, non-fielded, word list with addressing maps (each word mapped to
an
> address/document).
>
> Is anyone familiar with Verity "zones"? Does Lucene implement "zones" in its
> own way? How?
>
>
> -Joe



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Zones [ In reply to ]
We evaluated Verity for several weeks and had a consultant on site helping
us for a few days. We were favorably impressed with the product but ended
up choosing Lucene after a week or so of comparing the two head-to-head.
Had our requirements been different, I can see Verity as being a superior
choice in many instances. For starters, it does a whole lot more 'out of
the box'. However, we have had great success with Lucene thus far - no
regrets.

We are indexing a large corpus of XML documents (~10M). One thing that
Verity does with XML notes is that it indexes each XML tag as a zone.*
What's cool about it is that the zones are nested so that it mirrors the
schema of your XML document. You can limit your search to any part of the
document by searching on specific zones. A Verity zone is analogous to a
Lucene field. Verity also has 'field' indexes - but these are a different
kind of index that Lucene does not have. Verity fields allow you to index
various numeric types, date types etc. side-by-side with your textual index.


The edge that Verity zones have over Lucene fields is that they are nested.
However, nested fields can be simulated quite easily in Lucene by doing
redundant indexing. I have a hunch this is what Verity does anyways because
their indexes are HUGE.

Verity Zones may mean different things for different kinds of indexed
documents.

Incidentally, we found that the indexing speed of Lucene was much faster.
The K2Spider could spend days optimizing an index. Verity seemed to be
faster for retrievals but they compared well. We ran a lot of tests, but in
the end our results were sort of 'touchy feely'. We decided that Lucene was
plenty fast for us.

Regards,
Philip

*not each instance of a tag, but rather a zone for each kind of tag.

-----Original Message-----
From: Joe Lerner [mailto:lerner@nandomedia.com]
Sent: Friday, January 25, 2002 1:51 PM
To: lucene-user@jakarta.apache.org
Subject: Zones



Hi,

We use Verity, a commercial vendor, for our Search, but were in serious
trouble
with its performance, and looking for a solid, more economical, open source
alternative, like Lucene.

A prototype we built using Lucene compared favorably with Verity, but then
along came "zones". Verity tech support helped us re-configure our indices
with "zones", giving us a fivefold increase in performance. Note, "Zones"
are
a separate, non-fielded, word list with addressing maps (each word mapped to
an
address/document).

Is anyone familiar with Verity "zones"? Does Lucene implement "zones" in
its
own way? How?


-Joe




--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Zones [ In reply to ]
"Ogren, Philip V." wrote:
> We are indexing a large corpus of XML documents (~10M). One thing that
> Verity does with XML notes is that it indexes each XML tag as a zone.*
> What's cool about it is that the zones are nested so that it mirrors the
> schema of your XML document. You can limit your search to any part of the
> document by searching on specific zones. A Verity zone is analogous to a
> Lucene field. Verity also has 'field' indexes - but these are a different
> kind of index that Lucene does not have. Verity fields allow you to index
> various numeric types, date types etc. side-by-side with your textual index.
>
> The edge that Verity zones have over Lucene fields is that they are nested.
> However, nested fields can be simulated quite easily in Lucene by doing
> redundant indexing. I have a hunch this is what Verity does anyways because
> their indexes are HUGE.

The XML indexing scheme we developed for Lucene here at ISOGEN (and
posted about late last year) provides more complete XML indexing than
Verity can provide because it is not limited by some of the constraints
inherent in Verity's zone mechanism. Our indexing approach is also
infinitely more flexible than Verity's (or any of other commercial
systems) because relatively simple Java code can be used to extend the
default indexing to optimize for specific DTDs or types of queries.

Also, Verity is, as far as I know, unable to index elements are
attributes that have "." (period) in their names because their indexers
always treat "." as a word separator. Doh.

Of the commercial full-text indexers that do XML indexing, my analysis
is that Verity does the best job, but it is still, in my opinion, not
sufficiently complete or flexible to be useful in production. Otherwise,
Verity is a full-text fine indexing system.

Cheers,

Eliot Kimber
ISOGEN International, LLC

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>