Mailing List Archive

Full Text Search
Hello everybody,

we recently kicked of an internal project to implement a Full Text
Search capability into DBMAIL.

It's basically centred about a few different things:

* A dedicated dbmail_mimeparts_fts table with (native MySQL) full text
indexes where only a "clean" (read cleartext) version of the message is
saved. The table size is approximately 1/100 of the size of the original
table (no attachments, no html, no headers. just the plain message).
* A different lookup method for the "BODY" of the messages that uses
MATCH AGAINST instead of LIKE

For the IMAPD part it's not a very common use to have a client search
for a BODY content - but it happens... The situation where this hits a
lot is when using a WebMail that relies on the servers for such
operations.

On our 400GB DB the BODY searches using LIKE %text% required more 10/15
minutes to complete (an were clearly timing out). With the Full Text and
the MATCH AGAINST they usually complete in 2 seconds.

Not too surprising: that's what Full Text Search are meant for. :-)

Now, for the fancy part: I'd like to upstream the fixes somehow, and
that poses a few problems. We have completely no experience of pgSQl or
ORACLE, but that may not be a problem is someone with such a DB is
willing to help up to nail down the query syntax (yeah, I know we can
RTFM but we need help with tests too).

Furthermore we intend to ad a switch to bypass the code and use the old
approach if the DB doesn't support the feature, or if you like slow
queries.

Anyhow interested in testing out the first phases?

For the first phase you need MySQL and PHP as we have one temporary hook
written in PHP.

It's even better if you're using our roundcube-dbmail wrapper as it
integrates perfectly.

--

Andrea Brancatelli
Schema31 S.p.a.
Chief Technology Officier

ROMA - FI - PA
ITALY
Tel: +39.06.98.358.472
Cell: +39.331.2488468
Fax: +39.055.71.880.466
Società del Gruppo OVIDIO TECH S.R.L.
Re: Full Text Search [ In reply to ]
Hi.

sounds interesting, but yeah mysql only is not really an option. I could
probably have a look at postgresq, but i am definitely not going to set
up oracle ;))

Regards

On 2018-02-27 18:49, Andrea Brancatelli wrote:

> Hello everybody,
>
> we recently kicked of an internal project to implement a Full Text Search capability into DBMAIL.
>
> It's basically centred about a few different things:
>
> * A dedicated dbmail_mimeparts_fts table with (native MySQL) full text indexes where only a "clean" (read cleartext) version of the message is saved. The table size is approximately 1/100 of the size of the original table (no attachments, no html, no headers. just the plain message).
> * A different lookup method for the "BODY" of the messages that uses MATCH AGAINST instead of LIKE
>
> For the IMAPD part it's not a very common use to have a client search for a BODY content - but it happens... The situation where this hits a lot is when using a WebMail that relies on the servers for such operations.
>
> On our 400GB DB the BODY searches using LIKE %text% required more 10/15 minutes to complete (an were clearly timing out). With the Full Text and the MATCH AGAINST they usually complete in 2 seconds.
>
> Not too surprising: that's what Full Text Search are meant for. :-)
>
> Now, for the fancy part: I'd like to upstream the fixes somehow, and that poses a few problems. We have completely no experience of pgSQl or ORACLE, but that may not be a problem is someone with such a DB is willing to help up to nail down the query syntax (yeah, I know we can RTFM but we need help with tests too).
>
> Furthermore we intend to ad a switch to bypass the code and use the old approach if the DB doesn't support the feature, or if you like slow queries.
>
> Anyhow interested in testing out the first phases?
>
> For the first phase you need MySQL and PHP as we have one temporary hook written in PHP.
>
> It's even better if you're using our roundcube-dbmail wrapper as it integrates perfectly.
>
> --
>
> Andrea Brancatelli
> Schema31 S.p.a.
> Chief Technology Officier
>
> ROMA - FI - PA
> ITALY
> Tel: +39.06.98.358.472
> Cell: +39.331.2488468
> Fax: +39.055.71.880.466
> Società del Gruppo OVIDIO TECH S.R.L.
>
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Full Text Search [ In reply to ]
I've got access to some Oracle test boxes, and might help out with this if
it goes ahead. I'm also handy with Postgres & MySQL.

Dan

On Thu, Mar 1, 2018 at 6:12 PM, Thomas Raschbacher <lordvan@lordvan.com>
wrote:

> Hi.
>
> sounds interesting, but yeah mysql only is not really an option. I could
> probably have a look at postgresq, but i am definitely not going to set up
> oracle ;))
>
> Regards
>
> On 2018-02-27 18:49, Andrea Brancatelli wrote:
>
> Hello everybody,
>
> we recently kicked of an internal project to implement a Full Text Search
> capability into DBMAIL.
>
> It's basically centred about a few different things:
>
> - A dedicated dbmail_mimeparts_fts table with (native MySQL) full text
> indexes where only a "clean" (read cleartext) version of the message is
> saved. The table size is approximately 1/100 of the size of the original
> table (no attachments, no html, no headers. just the plain message).
> - A different lookup method for the "BODY" of the messages that uses
> MATCH AGAINST instead of LIKE
>
>
> For the IMAPD part it's not a very common use to have a client search for
> a BODY content - but it happens... The situation where this hits a lot is
> when using a WebMail that relies on the servers for such operations.
>
> On our 400GB DB the BODY searches using LIKE %text% required more 10/15
> minutes to complete (an were clearly timing out). With the Full Text and
> the MATCH AGAINST they usually complete in 2 seconds.
>
> Not too surprising: that's what Full Text Search are meant for. :-)
>
> Now, for the fancy part: I'd like to upstream the fixes somehow, and that
> poses a few problems. We have completely no experience of pgSQl or ORACLE,
> but that may not be a problem is someone with such a DB is willing to help
> up to nail down the query syntax (yeah, I know we can RTFM but we need help
> with tests too).
>
> Furthermore we intend to ad a switch to bypass the code and use the old
> approach if the DB doesn't support the feature, or if you like slow queries.
>
> Anyhow interested in testing out the first phases?
>
> For the first phase you need MySQL and PHP as we have one temporary hook
> written in PHP.
>
> It's even better if you're using our roundcube-dbmail wrapper as it
> integrates perfectly.
>
>
> --
>
>
> *Andrea Brancatelli
> Schema31 S.p.a.
> Chief Technology Officier*
> ROMA - FI - PA
> ITALY
> Tel: +39.06.98.358.472
> Cell: +39.331.2488468 <+39%20331%20248%208468>
> Fax: +39.055.71.880.466
> Società del Gruppo *OVIDIO TECH S.R.L.*
>
>
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail
>
>
>
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail
>
>