Hello everybody,
we recently kicked of an internal project to implement a Full Text
Search capability into DBMAIL.
It's basically centred about a few different things:
* A dedicated dbmail_mimeparts_fts table with (native MySQL) full text
indexes where only a "clean" (read cleartext) version of the message is
saved. The table size is approximately 1/100 of the size of the original
table (no attachments, no html, no headers. just the plain message).
* A different lookup method for the "BODY" of the messages that uses
MATCH AGAINST instead of LIKE
For the IMAPD part it's not a very common use to have a client search
for a BODY content - but it happens... The situation where this hits a
lot is when using a WebMail that relies on the servers for such
operations.
On our 400GB DB the BODY searches using LIKE %text% required more 10/15
minutes to complete (an were clearly timing out). With the Full Text and
the MATCH AGAINST they usually complete in 2 seconds.
Not too surprising: that's what Full Text Search are meant for. :-)
Now, for the fancy part: I'd like to upstream the fixes somehow, and
that poses a few problems. We have completely no experience of pgSQl or
ORACLE, but that may not be a problem is someone with such a DB is
willing to help up to nail down the query syntax (yeah, I know we can
RTFM but we need help with tests too).
Furthermore we intend to ad a switch to bypass the code and use the old
approach if the DB doesn't support the feature, or if you like slow
queries.
Anyhow interested in testing out the first phases?
For the first phase you need MySQL and PHP as we have one temporary hook
written in PHP.
It's even better if you're using our roundcube-dbmail wrapper as it
integrates perfectly.
--
Andrea Brancatelli
Schema31 S.p.a.
Chief Technology Officier
ROMA - FI - PA
ITALY
Tel: +39.06.98.358.472
Cell: +39.331.2488468
Fax: +39.055.71.880.466
Società del Gruppo OVIDIO TECH S.R.L.
we recently kicked of an internal project to implement a Full Text
Search capability into DBMAIL.
It's basically centred about a few different things:
* A dedicated dbmail_mimeparts_fts table with (native MySQL) full text
indexes where only a "clean" (read cleartext) version of the message is
saved. The table size is approximately 1/100 of the size of the original
table (no attachments, no html, no headers. just the plain message).
* A different lookup method for the "BODY" of the messages that uses
MATCH AGAINST instead of LIKE
For the IMAPD part it's not a very common use to have a client search
for a BODY content - but it happens... The situation where this hits a
lot is when using a WebMail that relies on the servers for such
operations.
On our 400GB DB the BODY searches using LIKE %text% required more 10/15
minutes to complete (an were clearly timing out). With the Full Text and
the MATCH AGAINST they usually complete in 2 seconds.
Not too surprising: that's what Full Text Search are meant for. :-)
Now, for the fancy part: I'd like to upstream the fixes somehow, and
that poses a few problems. We have completely no experience of pgSQl or
ORACLE, but that may not be a problem is someone with such a DB is
willing to help up to nail down the query syntax (yeah, I know we can
RTFM but we need help with tests too).
Furthermore we intend to ad a switch to bypass the code and use the old
approach if the DB doesn't support the feature, or if you like slow
queries.
Anyhow interested in testing out the first phases?
For the first phase you need MySQL and PHP as we have one temporary hook
written in PHP.
It's even better if you're using our roundcube-dbmail wrapper as it
integrates perfectly.
--
Andrea Brancatelli
Schema31 S.p.a.
Chief Technology Officier
ROMA - FI - PA
ITALY
Tel: +39.06.98.358.472
Cell: +39.331.2488468
Fax: +39.055.71.880.466
Società del Gruppo OVIDIO TECH S.R.L.