Mailing List Archive: storing index in third party database.

storing index in third party database.

Apr 2, 2002, 11:07 PM

Post #1 of 6 (1022 views)

Hi all

I want to index the datas which I already stored in a thirdparty database table and develop a search facility using lucene. I am thinking of storing this indexes back to the database in another table. I know for this we have to create a 'directory' which do all the indexing operations,

for example

Indexwriter indwriter = new Indexwriter("dirStore",null,create);

where dirStore is the directory, create is boolean.

but I don't know the format to be followed for the directory(dirStore).Please help me if anybody has done similar thing.
TIA
Amith

__________________________________________________________________
Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/

Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: storing index in third party database. [ In reply to ]

otis_gospodnetic at yahoo

Apr 3, 2002, 7:12 AM

Post #2 of 6 (1016 views)

Permalink

If you want to store indices in a database search the mailing list
archives for SqlDirectory.

Once I considered using it for one application at work, so I asked its
author about performance. The answer was that it doesn't perform all
that well when the index grows, if I recall correctly. Consequently,
we chose to use file-based indices instead.

Otis

--- amithnz@netscape.net wrote:
> Hi all
>
> I want to index the datas which I already stored in a thirdparty
> database table and develop a search facility using lucene. I am
> thinking of storing this indexes back to the database in another
> table. I know for this we have to create a 'directory' which do all
> the indexing operations,
>
> for example
>
> Indexwriter indwriter = new Indexwriter("dirStore",null,create);
>
> where dirStore is the directory, create is boolean.
>
> but I don't know the format to be followed for the
> directory(dirStore).Please help me if anybody has done similar
> thing.
> TIA
> Amith
>
>
> __________________________________________________________________
> Your favorite stores, helpful shopping tools and great gift ideas.
> Experience the convenience of buying online with Shop@Netscape!
> http://shopnow.netscape.com/
>
> Get your own FREE, personal Netscape Mail account today at
> http://webmail.netscape.com/
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

RE: storing index in third party database. [ In reply to ]

dahe at lingomotors

Apr 3, 2002, 7:55 AM

Post #3 of 6 (1011 views)

Permalink

> -----Original Message-----
> From: Karl Øie [mailto:karl@gan.no]
> Sent: Wednesday, April 03, 2002 10:00 AM
> To: Lucene Users List
> Subject: Re: storing index in third party database.
>
>
> without having investigated the problem much i would think that a SQL
> database would be a very bad match for lucene as most of
> lucene's working is
> creating key's for words and documents and then creating
> indexes of these
> keys. for these purposes a SQL database is an unecessary
> overhead, not even
> talking about the overhead represented by the SQL language parser.
>
> for these kind of indexes a lower-level database would be
> better suited. I
> have good experiences with BerkeleyDB
> (http://www.sleepycat.com) and a friend
> of me uses gdbm successfully for such key-pair indexing
> tasks. the advantage
> of these low-level databasesystems is that they are really
> much or less
> persistent b-tree/hashtable implementations, and thus created
> for key-pairing.
>
> they have no SQL layer as you will have to program against
> them as they are
> more subroutines that applications. but for key-pair indexes i have
> experienced that BerkeleyDB runs circles around any SQL
> database (including
> db2 and oracle!!!).

I would agree with this based on my experiences in implementing the
ANVIL system at Canon. SQL server was far too slow for simple term
lookup. We started with gdbm and subsequently moved to Berkeley DB. BDB
was faster in general, and more importantly, has support for
multi-threading. Analysis with Purify suggested that gdbm has some
"uninitialized memory read" problems. The folks at Sleepycat were also
very helpful in getting us going.

-- David

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: storing index in third party database. [ In reply to ]

karl at gan

Apr 3, 2002, 7:59 AM

Post #4 of 6 (1010 views)

Permalink

without having investigated the problem much i would think that a SQL
database would be a very bad match for lucene as most of lucene's working is
creating key's for words and documents and then creating indexes of these
keys. for these purposes a SQL database is an unecessary overhead, not even
talking about the overhead represented by the SQL language parser.

for these kind of indexes a lower-level database would be better suited. I
have good experiences with BerkeleyDB (http://www.sleepycat.com) and a friend
of me uses gdbm successfully for such key-pair indexing tasks. the advantage
of these low-level databasesystems is that they are really much or less
persistent b-tree/hashtable implementations, and thus created for key-pairing.

they have no SQL layer as you will have to program against them as they are
more subroutines that applications. but for key-pair indexes i have
experienced that BerkeleyDB runs circles around any SQL database (including
db2 and oracle!!!).

Berkeley has a java-api and a b-tree record type that could be a very good
match for a key-based searchtree, and it's free. take a look at it!

mvh karl øie

(ps: i am not payed by the sleepy cat to write this :-)

On Wednesday 03 April 2002 16:12, you wrote:
> If you want to store indices in a database search the mailing list
> archives for SqlDirectory.
>
> Once I considered using it for one application at work, so I asked its
> author about performance. The answer was that it doesn't perform all
> that well when the index grows, if I recall correctly. Consequently,
> we chose to use file-based indices instead.
>
> Otis
>
> --- amithnz@netscape.net wrote:
> > Hi all
> >
> > I want to index the datas which I already stored in a thirdparty
> > database table and develop a search facility using lucene. I am
> > thinking of storing this indexes back to the database in another
> > table. I know for this we have to create a 'directory' which do all
> > the indexing operations,
> >
> > for example
> >
> > Indexwriter indwriter = new Indexwriter("dirStore",null,create);
> >
> > where dirStore is the directory, create is boolean.
> >
> > but I don't know the format to be followed for the
> > directory(dirStore).Please help me if anybody has done similar
> > thing.
> > TIA
> > Amith
> >
> >
> > __________________________________________________________________
> > Your favorite stores, helpful shopping tools and great gift ideas.
> > Experience the convenience of buying online with Shop@Netscape!
> > http://shopnow.netscape.com/
> >
> > Get your own FREE, personal Netscape Mail account today at
> > http://webmail.netscape.com/
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Tax Center - online filing with TurboTax
> http://taxes.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

Re: storing index in third party database. [ In reply to ]

karl at gan

Apr 3, 2002, 8:15 AM

Post #5 of 6 (1015 views)

Permalink

hm.... puting lucene on top of BDB would actually be quite cool. it would
provide lucene with recovery and transaction handling....

but as far as i have seen in the lucene implementation of Directory it pushes
back a inputstream, for BDB this would require us to iterate over the keys
and generate this stream, equally on insert we must accept the stream and
break it up into keys...

is it possible to "intercept" lucene's work at the key-handling point? or
would this require a larger rewrite?

mvh karl øie

On Wednesday 03 April 2002 16:55, you wrote:
> > -----Original Message-----
> > From: Karl Øie [mailto:karl@gan.no]
> > Sent: Wednesday, April 03, 2002 10:00 AM
> > To: Lucene Users List
> > Subject: Re: storing index in third party database.
> >
> >
> > without having investigated the problem much i would think that a SQL
> > database would be a very bad match for lucene as most of
> > lucene's working is
> > creating key's for words and documents and then creating
> > indexes of these
> > keys. for these purposes a SQL database is an unecessary
> > overhead, not even
> > talking about the overhead represented by the SQL language parser.
> >
> > for these kind of indexes a lower-level database would be
> > better suited. I
> > have good experiences with BerkeleyDB
> > (http://www.sleepycat.com) and a friend
> > of me uses gdbm successfully for such key-pair indexing
> > tasks. the advantage
> > of these low-level databasesystems is that they are really
> > much or less
> > persistent b-tree/hashtable implementations, and thus created
> > for key-pairing.
> >
> > they have no SQL layer as you will have to program against
> > them as they are
> > more subroutines that applications. but for key-pair indexes i have
> > experienced that BerkeleyDB runs circles around any SQL
> > database (including
> > db2 and oracle!!!).
>
> I would agree with this based on my experiences in implementing the
> ANVIL system at Canon. SQL server was far too slow for simple term
> lookup. We started with gdbm and subsequently moved to Berkeley DB. BDB
> was faster in general, and more importantly, has support for
> multi-threading. Analysis with Purify suggested that gdbm has some
> "uninitialized memory read" problems. The folks at Sleepycat were also
> very helpful in getting us going.
>
> -- David

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>

RE: storing index in third party database. [ In reply to ]

apache at lucene

Apr 3, 2002, 10:04 AM

Post #6 of 6 (1013 views)

Permalink

> From: Karl Øie [mailto:karl.at.gan.no@apache.at.lucene.com]
>
> is it possible to "intercept" lucene's work at the
> key-handling point? or
> would this require a larger rewrite?

Not only would it be a large re-write, but I think it would make indexing
slower. I have implemented full-text indexes a B-tree, and adding documents
when the dictionary gets large is very slow. In Lucene terms, the index is
always maintained in an optimized format, as a single segment. If you were
to use multiple B-Trees, one per segment, then you would not take advantage
of the B-Tree, and may as well use flat files, as Lucene already does.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>