Mailing List Archive

OT: super fast MySQL full-text searching.
I have tried Senna that is an embeddable fulltext search engine.
It is embbeded into MySQL.

http://qwik.jp/senna/

I inserted 1,000,000 of documents by using INSERT INTO sql,
and I can search documents by using SELECT * FROM table
WHERE MATCH(field_name) AGAINST('search-words').
It is based on SQL, so easy to use and support incremental
update.

I don't do benchmark test yet but it's not slow.

I think that the Lucene need to support incremental update futurely.

--
Scott
Re: OT: super fast MySQL full-text searching. [ In reply to ]
If you're interested, we use the following pattern to do incremental
updates between a database and a Lucene index.

1) Add a field to the database table you wish to index called
"DateUpdate". Update this date whenever a field in the table is
changed.
2) Create a new database table to store the ID of any item that is
deleted from the table above. I'll referrer to this as the "Deleted"
table.
3) Have an indexer application that runs every X minutes, and does the
following:
a) Load all the items from the "Deleted" table and remove them
from the Lucene index
b) Load all the items from the main table with a "DateUpdate"
date greater than the last time the indexer application ran. Delete
these items from the Lucene index, and then reinsert them with the
newer data.
c) Purge all the items from the "Deleted" table.
d) Save the date of the last "DateUpdated" item you processed, and
use this date to load items the next time the indexer application
runs.

This is an oversimplification, since you need to consider failover
etc, and there may be other factors that dictate your search indexing
rules. But it gives you a general idea. I'd be curious to
hear/discuss other solutions to this.

Monsur



On 10/19/06, Scott <m.scott.tiger@gmail.com> wrote:
> I have tried Senna that is an embeddable fulltext search engine.
> It is embbeded into MySQL.
>
> http://qwik.jp/senna/
>
> I inserted 1,000,000 of documents by using INSERT INTO sql,
> and I can search documents by using SELECT * FROM table
> WHERE MATCH(field_name) AGAINST('search-words').
> It is based on SQL, so easy to use and support incremental
> update.
>
> I don't do benchmark test yet but it's not slow.
>
> I think that the Lucene need to support incremental update futurely.
>
> --
> Scott
>