Mailing List Archive

Database integration best practices ...
Hi!

As many others I want to use Lucene as a frontend for searching
content which is burried in a relational database. As far as I
can see this should be no problem, by building documents for
single rows in the tables. Since many of you have already done such
an approach I would appreciate any suggestions on the following
issues:

- Consistency
What is the best way to maintain consistency between the database
and the lucene index. I can think of two solutions:

- update index on every insert
- ignore index at insert and do full reindex after time
(e.g. nightly)


- Transactional issues
what is the best way to make a database insert + index insert
atomic!?


- Content Separation
My content in the database is spread across multiple tables.
But there are clusters of related tables. For example I have
3 tables describing authors of papers. My solution would be a
separate index for each of those clusters. When the user does
a search every index must be searched separately of course ...

Is maintaining a separate index for every "topic" a good idea?


One might ask why not searching against the database directly. Well,
I would have to build a search interface (think of boolean issues)
on my own, which is definitely something I do not have time for.
Additionally my database (Postgresql) doesn't support full-text
searches (yet).

Any additional input on your expiriences are very welcome!

Thx in advance,
Peter









--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Database integration best practices ... [ In reply to ]
I forgot one thing to ask:

search results should be anchored to a unique id which maps to a
serial in the database. If my search now results in multiple such
id's what is the best way to transform this into a row-fetching
SQL-statement? I think I would end up in something:

SELECT * FROM atable WHERE id = 12 AND id = 23 AND id = 34 AND ....

... and so on.

For this purpose it would be nice to limit lucene search results, so
that the SQL statement can be limited. Any better idea!?

Thx,
Peter


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE : Database integration best practices ... [ In reply to ]
Just a little idea, replace "AND" by "OR" in your Select statement
I used to store some fields in lucene index in order to show them in the
result page. Otherwise, I use :

Select * From atable Where id in (12, 23, 34, ...)

Elie


-----Message d'origine-----
De : Peter Sojan [mailto:ilikeunix@gmx.net]
Envoyé : mercredi 27 mars 2002 09:59
À : Lucene Users List
Objet : Re: Database integration best practices ...


I forgot one thing to ask:

search results should be anchored to a unique id which maps to a
serial in the database. If my search now results in multiple such
id's what is the best way to transform this into a row-fetching
SQL-statement? I think I would end up in something:

SELECT * FROM atable WHERE id = 12 AND id = 23 AND id = 34 AND ....

... and so on.

For this purpose it would be nice to limit lucene search results, so
that the SQL statement can be limited. Any better idea!?

Thx,
Peter


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Database integration best practices ... [ In reply to ]
Peter Sojan wrote:
> I forgot one thing to ask:
>
> I think I would end up in something:
>
> SELECT * FROM atable WHERE id = 12 AND id = 23 AND id = 34 AND ....
>

Either:

SELECT * FROM atable WHERE id IN ( 12, 23, 34 ... )

OR

SELECT * FROM atable WHERE id = 12 OR id = 23 OR id = 34 OR

Would be more what you want to achieve!

However this would be quite slow as you would have to do a search on the
index
then retrieve data from the database. A nicer approach would be to store
summary data in the index itself and retrieve this from the index
directly after performing a search. Only when a user selects an entry
from this summary would you retrieve data from the database directly. In
essence the search returns a summary of data from the index itself.

Flow of retrieving an entry would be:

search index
-> present results (from index)
-> select desired result (from database)

You really don't want to be retrieving all columns from the database for
all matches in a search as this puts unnecessary overhead on the entire
system (unless of course you are batche processing ALL results of the
search).

geoff


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Database integration best practices ... [ In reply to ]
On Wed, Mar 27, 2002 at 10:53:30AM +0000, geoff webb wrote:

> Either:
>
> SELECT * FROM atable WHERE id IN ( 12, 23, 34 ... )
>
> OR
>
> SELECT * FROM atable WHERE id = 12 OR id = 23 OR id = 34 OR
>

Of course it has to be OR'ed. Must have been an "Freudian typo" :)

> Flow of retrieving an entry would be:
>
> search index
> -> present results (from index)
> -> select desired result (from database)
>

This should be the right way to go. I just don't want to let my index grow
that much, but as you mention going directly into the database for displaying
results would cause prohibitive bottlenecks in the backend ...

Thx
Peter


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>