Mailing List Archive

Lucene VS SQL Server
Hi,

I'm looking to get some idea of which full-text engine is faster and scales
better: SQL Server or Lucene. My index will contain millions of short
documents of approximately 100 words. Please advise of the pros and cons.

Thank You,
Greg
RE: Lucene VS SQL Server [ In reply to ]
This isn't even a contest, unless SQL Server has improved greatly since I
last tried it. I've also benchmarked MySQL. They cannot handle large
collections. That being said, they are very convenient, and your data set
isn't that large, so perhaps you should benchmark it and see if you can get
away with a "normal" database. If not, Lucene will have no problem at all
with a collection the size of yours.

Sincerely,
James

> -----Original Message-----
> From: lavafish@gmail.com [mailto:lavafish@gmail.com]
> Sent: Tuesday, February 14, 2006 11:54 PM
> To: general@lucene.apache.org
> Subject: Lucene VS SQL Server
>
> Hi,
>
> I'm looking to get some idea of which full-text engine is faster and
> scales
> better: SQL Server or Lucene. My index will contain millions of short
> documents of approximately 100 words. Please advise of the pros and cons.
>
> Thank You,
> Greg
Re: Lucene VS SQL Server [ In reply to ]
On a past project, we used SQL Server's Index Server capabilities to
full-text index and query Word and PDF documents quite successfully.

If you're already putting the data into SQL Server, I recommend
giving its capabilities a try to see if it suffices, as that would be
the easiest way to get full-text search. But if you're not using SQL
Server already and want to add it just for indexing, I recommend
Lucene for the job.

Erik


On Feb 15, 2006, at 12:53 AM, <lavafish@gmail.com>
<lavafish@gmail.com> wrote:

> Hi,
>
> I'm looking to get some idea of which full-text engine is faster
> and scales
> better: SQL Server or Lucene. My index will contain millions of short
> documents of approximately 100 words. Please advise of the pros and
> cons.
>
> Thank You,
> Greg
Re: Lucene VS SQL Server [ In reply to ]
The pros of a database full-text index solution is that the
administration of the index is much easier than lucene´s index. The
coding is much easier too, you don´t need to code you app to support
lucene, the only thing you need to do is change your query.
In otherside, lucene have many more features than the SQL Server
full-text index function.
I´m developing a document management system, and I have choosen
lucene to do the job. I have much more work to do, because, I have to
make a indexing server using SOAP to comunicate with my web application.
I needed to do all this job, because, my web application can be
clustered in many servers, but the lucene index have to be just in one
place. Then, as you can imagine, the problem was the index locks, etc.
In this project, the choice to choose lucene, was the number of
features that it gives to my application, the great performance, and the
fact that my system works in MySQL, Oracle, SQL Server, Informix,
PostgreSQL and Firebird database, so, for me, it was impossible to
choose the database solution.
In my oppinion, if you need a simple solution, to build right away,
with a good performance, choose database solution. If you want to do, a
more complex system, that will use more features to search the index,
and a GREAT performance, use lucene. Lucene is the best solution for
larger projects.

I don´t know if you have already worked with oracle, but if you want
a database solution, try the oracle one. In my oppinion the full-text
function of this database is the best.


Fernando Engelmann Junior



Erik Hatcher wrote:

> On a past project, we used SQL Server's Index Server capabilities to
> full-text index and query Word and PDF documents quite successfully.
>
> If you're already putting the data into SQL Server, I recommend
> giving its capabilities a try to see if it suffices, as that would be
> the easiest way to get full-text search. But if you're not using SQL
> Server already and want to add it just for indexing, I recommend
> Lucene for the job.
>
> Erik
>
>
> On Feb 15, 2006, at 12:53 AM, <lavafish@gmail.com>
> <lavafish@gmail.com> wrote:
>
>> Hi,
>>
>> I'm looking to get some idea of which full-text engine is faster and
>> scales
>> better: SQL Server or Lucene. My index will contain millions of short
>> documents of approximately 100 words. Please advise of the pros and
>> cons.
>>
>> Thank You,
>> Greg
>
>
RE: Lucene VS SQL Server [ In reply to ]
Hi,

> I don´t know if you have already worked with oracle, but
> if you want a database solution, try the oracle one. In my
> oppinion the full-text function of this database is the best.

Lucene can reindex and update data faster than Oracle.

Pasha Bizhan
Re: Lucene VS SQL Server [ In reply to ]
I was trying to say that if you wanna choose a database solution, try oracle instead SQL Server. Not the fact that oracle is faster or better than lucene.

Fernando

Pasha Bizhan wrote:
Hi,
I don&acute;t know if you have already worked with oracle, but if you want a database solution, try the oracle one. In my oppinion the full-text function of this database is the best.
Lucene can reindex and update data faster than Oracle. Pasha Bizhan


--











Atenciosamente

Fernando Luiz Engelmann Jr.
Desenvolvimento de Sistemas
SoftExpert Quality Software
+55 (47) 2101-9955
http://www.softexpert.com"]http://www.softexpert.com

Re: Lucene VS SQL Server [ In reply to ]
Thank you for all your comments. From what i researched on my own, the new
SQL Server 2005 fulltext functionality is highly improved. Microsoft tested
it and said that it performs well on 2 billion records (although the size
and type of info in each record is unpublished). After using lucene for a
previous project, i found myself having to do a lot of extra work and taking
a similar approach to Fernando's where i implemented a webservice that would
encapsulate lucene.NET APIs. I had to worry about finding the right way to
update the index without disturbing the user experience, build some sort of
replication, and keep my index in sync with a PostreSQL db. Lucene
requires building a lot of this functionality that is already in SQL
Server. But my primary concern is performance, and in terms of comparing
Lucene to PostgreSQL's TSearch2, it's not even close to a competition.
Lucene's performance is insanely awesome in comparison.

-Greg


On 2/15/06, Fernando Luiz Engelmann Junior <fernando@softexpert.com> wrote:
>
> I was trying to say that if you wanna choose a database solution, try
> oracle instead SQL Server. Not the fact that oracle is faster or better than
> lucene.
>
> Fernando
>
> Pasha Bizhan wrote:
>
> Hi,
>
>
>
> I don´t know if you have already worked with oracle, but
> if you want a database solution, try the oracle one. In my
> oppinion the full-text function of this database is the best.
>
>
>
> Lucene can reindex and update data faster than Oracle.
>
> Pasha Bizhan
>
>
>
>
>
>
>
> --
>
>
>
>
>
>
>
>
>
>
>
> Atenciosamente
> ------------------------------
> *Fernando Luiz Engelmann Jr.*
> *Desenvolvimento de Sistemas
> SoftExpert Quality Software
> +55 (47) 2101-9955
> http://www.softexpert.com
> *
> ------------------------------
>
>