Mailing List Archive

Subject: Pure Perl milter for use with clamd.
Hi there,

The subject:

This is about scanning mail on a mailserver using clamd - specifically
about a milter for interfacing clamd to an MTA. If you've no interest
in such things, then this probably isn't for you.

Thanks:

ClamAV (specifically clamd, via clamav-milter) has been scanning small
volumes of mail on servers which I manage here for many years. Before
all else, please let me say thank you, to all who have contributed, in
whatever way. For the past couple of decades I've contributed in some
small ways. I hope this will become another contribution.

History:

For the past several years about 98 percent of attempts to send email
to us were not wanted, and without some form of filtering email would
no longer have offered a useful means of communication. This is just
a matter of the volumes; it excludes the quite separate issue of the
potential for mailicious email to pose a threat. Even though a Linux-
only shop like us will be immune from Windows malware and it will tend
to be less of a worry, we wouldn't want to help it to propagate so we
still scan for it; and third-party ClamAV databases have been valuable
in weeding out things like phishing and some other types of spam.

Impetus:

Until fairly recently I've found myself using seven or more milters to
protect against unwanted mail. Other milters address things which
ClamAV doesn't, such as greylisting; rejecting mail sent to spam-traps
and mail from unwanted sources such as those identified by geolocation
and various DNSBL tests; SPF, DKIM and DMARC processing; and regex
scanning of message parts in general (for whitelisting, blacklisting,
and other purposes). My reward has been the quite insignificant level
of unwanted mail breaking through - ClamAV hasn't been called upon to
reject a message here since last September - but on the other hand...

Issues:

the use of many different milters introduced near as many problems as
it solved. Differing (let's say) design philosophies, implementation
details and limitations (even no IPv6!) and their support requirements
- not to mention some *very* different takes on configuration files -
have sometimes found me expending unreasonable effort to track down
failures of one sort or another. This led me to begin developing one
single milter of my own, with multiple goals: replace all the seven
milters which I'd typically use; simplify configuration; eliminate a
few of the limitations and compromises (and their associated confusion
and frustration); whilst at the same time increase flexibility. That
work took almost three years, and is now substantially complete.

Development:

Although with the replacement of clamav-milter (the last milter which
I replaced) the work reached something of a milestone, much remains to
be done to assess e.g. the reliability and scalability of both the
milter and the Sendmail interface, especially at higher mail volumes.

That's where you, gentle reader, might come in.

The milter is pure Perl, and I can easily produce a "cut-down" version
of the script which only replaces clamav-milter. I do not mean in any
way to suggest that there is anything wrong with clamav-milter, but it
could be that there are some tradeoffs.

Tradeoffs - minus:

1. On the small-volume servers I manage I can't remember the last time
that a clamav-milter failed. The Perl milter is not as well exercised
as its 'C' counterpart, and it might break - although it's unusual for
that to happen now, except when I'm developing on production (which is
mostly how I do it:).

2. The Perl milter may be slower. I do not know how much that will be
an issue in higher volume settings than my own, but, given that clamd
typically takes at least tens of milliseconds to scan a short message,
I guess that it isn't going to be serious. I'd like to know; there's
still the option of using XS for some parts of the milter. It has to
be said that the way in which Sendmail presents data to milters isn't
exactly streamlined, but that's out of my hands for the foreseeable.

3. Sendmail's milter interface may perform some sanity tests which as
yet the Perl interface doesn't do. That's a work in progress. At the
moment it doesn't appear to present any problems but one needs to be
prepared for surprises.

4. It's a Perl milter. Obviously you'll need Perl on the system, and
it should be 5.16 or later (think UTF-8).

Tradeoffs - plus:

1. The Perl milter can easily be customized for specific purposes.
For example, things like adding headers, logging, whitelisting (also
other custom short-circuits), custom reply codes, talking to multiple
clamd daemons, tailored responses and similar can, even if you're not
a Perl guru, easily be configured using the Perl milter script as a
kind of template.

2. Control is more fine-grained. For example: (1) the milter can pass
the message headers and body to clamd separately - clamd's nifty cache
of md5sums allows that when there are messages with identical bodies,
the body need only be scanned by the engine once; (2) choice between
ACCEPT, REJECT, TEMPFAIL, DISCARD and QUARANTINE can be more flexible,
as can response codes etc returned to the client; (3) operating system
tools and facilities are available to the milter. If you want, say,
to reply with "5.7.26" to mail scanned under particular circumstances,
or even TARPIT the blighters, it's very easy to do that. Express the
circumstances in Perl code (how many lines of code it takes to do that
doesn't really matter), and then call a couple of functions.

3. The milter might enable you to respond more quickly for example to
attacks, or the odd issues which crop up in other parts of the system.
Writing a one-line statement with a Perl regex filter might be quicker
than e.g. waiting for a vendor to write, test and publish patch.

4. The Perl milter uses its own Sendmail interface, and this will talk
to all reasonably recent versions of Sendmail. You'd be crazy to run
a Sendmail that's so old that the milter can't talk to it. There's no
need to build Sendmail "libmilter", nor for Sendmail to be recompiled.
You don't even need a compiler on the system. You do need to be able
to configure the MTA to use the milter, but that's very easy; insert a
suitable 'X' line in sendmail.cf - either by use the m4 preprocessor
to rebuild your sendmail.cf file (like you're supposed to), or edit a
line in that file (that's what I usually do).

More on the Sendmail interface:

The Sendmail interface is a Perl module, which is published on CPAN.
It's called Sendmail::PMilter. It replaces Sendmail's "libmilter"
library which is normally used for milters which are written in C.
Working with the Perl interface is very much easier than working with
the C interface; you can actually concentrate on what you want to get
done, rather than how you're going to do it. The interface is rather
old, and when I found it the development had stalled with some rather
nasty outstanding issues so I took on its maintainership to fix them.
I've been using it for about three years. The latest *development*
version should be used which at present is 1.20_03. To fix bugs, and
to support the latest Milter specifications, I wrote and/or modified
much of the code. Currently I use the pre-fork mechanism to handle
simultaneous connections. Threaded versions of parts which provide
concurrency need exercise. Anyone willing to give that a try is both
welcome and encouraged to do so. The CPAN distribution includes some
old example milters. My plan, if (a) there's interest here, and (b)
the thing doesn't just crash and burn is to add 'clamav-perl-milter'
(if that's eventually what it gets called) as another example packaged
by the CPAN distribution. I would have no objection to it also being
included by the ClamAV distro. As I wrote the milter I have the right
to say that; being no lawyer I'm not sure what the position is with
the interface Perl module, but in any case it's freely available on
CPAN and can be installed easily from there. Installation involves
little more than extracting a tarball. The CPAN install mechanism (a
one-line shell command) can do that, then after installation run some
module tests. At the time of writing that has been done on at least
45 different configurations on over 80 systems.

Why a "cut-down" milter?

Bigger target audience. Up and running much more easily, no need for
a database for example. And less risk. I'll be publishing the full
monty later on, but I'm not ready for that yet. And I really want to
test the support module thoroughly before perching much atop it, and I
did promise a guy in New York that he can be the first guinea-pig.

Other MTAs:

The milter might also work with Postfix, although many assumptions may
need to be re-visited. I'd especially like to know about that too.

Over to you:

Is anyone interested enough to have a go?

Please reply on-list. Non-list mail to my list address is rejected.

Sorry for the length of all this. There was a lot more I wanted to
say but I had to draw the line somewhere.

If there's no objection I'll drop a link to this post over on the
ClamAV Users' List.

--

73,
Ged.
_______________________________________________

clamav-devel mailing list
clamav-devel@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-devel

Please submit your patches to our Bugzilla: http://bugzilla.clamav.net

Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: Subject: Pure Perl milter for use with clamd. [ In reply to ]
On Monday 19 August 2019, G.W. Haywood wrote:

> Tradeoffs - plus:
>
> 1. The Perl milter can easily be customized for specific purposes.

> 2. Control is more fine-grained.

> 3. The milter might enable you to respond more quickly for example to
> attacks, or the odd issues which crop up in other parts of the system.

Try looking to mailfromd: http://puszcza.gnu.org.ua/software/mailfromd/

http://puszcza.gnu.org.ua/software/mailfromd/manual/html_section/Interfaces-to-Third_002dParty-Programs.html#ClamAV

--
Regards,
Sergey
_______________________________________________

clamav-devel mailing list
clamav-devel@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-devel

Please submit your patches to our Bugzilla: http://bugzilla.clamav.net

Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: Subject: Pure Perl milter for use with clamd. [ In reply to ]
Ged,

This project sounds pretty cool. It wouldn't be something I would want to maintain as a part of the clamav repository. Personally I'm well versed in Python but I have almost no Perl experience. I suspect that you'll want to maintain ownership anyways for the freedom it permits to add and change features as needed.

If you do plan to maintain a "full" version with all the milters, as well as a cut-down version for just clam, you may want to make each milter into a separate module where the "full" one imports all of the modules (code reuse, vs duplication).

I do think it's a great idea share it on CPAN for other users. If you host your code in a Github repository, you can make a pretty slick documentation site for the full documentation, using github.io. Github.io has nice templates and does not require too much effort.

In terms of ClamAV's involvement in the project, we would be happy to host documentation that provides basic instructions for how to use your milter and perhaps other 3rd party milters for other mail applications. As you are no doubt aware, we migrated all of our documentation into Markdown hosted on github, here: https://github.com/Cisco-Talos/clamav-faq/tree/master/manual This documentation is rendered on clamav.net, here: https://www.clamav.net/documents/

On a related topic, we have been discussing the idea of phasing in an HTTP server as a replacement for the TCP server in clamd. Offloading the socket and string parsing code to a 3rd party HTTP server library and the metadata to the libjson-c library has a lot of appeal for various reasons. The idea is still in early stages though and if we do ever find time to work on it, it would no doubt be an optional default-off replacement for the existing TCP server / protocol in at least the first feature release while projects such as this add support for it.

Regards,
Micah


?On 8/19/19, 1:05 PM, "clamav-devel on behalf of G.W. Haywood" <clamav-devel-bounces@lists.clamav.net on behalf of clamav-devel@jubileegroup.co.uk> wrote:

Hi there,

The subject:

This is about scanning mail on a mailserver using clamd - specifically
about a milter for interfacing clamd to an MTA. If you've no interest
in such things, then this probably isn't for you.

Thanks:

ClamAV (specifically clamd, via clamav-milter) has been scanning small
volumes of mail on servers which I manage here for many years. Before
all else, please let me say thank you, to all who have contributed, in
whatever way. For the past couple of decades I've contributed in some
small ways. I hope this will become another contribution.

History:

For the past several years about 98 percent of attempts to send email
to us were not wanted, and without some form of filtering email would
no longer have offered a useful means of communication. This is just
a matter of the volumes; it excludes the quite separate issue of the
potential for mailicious email to pose a threat. Even though a Linux-
only shop like us will be immune from Windows malware and it will tend
to be less of a worry, we wouldn't want to help it to propagate so we
still scan for it; and third-party ClamAV databases have been valuable
in weeding out things like phishing and some other types of spam.

Impetus:

Until fairly recently I've found myself using seven or more milters to
protect against unwanted mail. Other milters address things which
ClamAV doesn't, such as greylisting; rejecting mail sent to spam-traps
and mail from unwanted sources such as those identified by geolocation
and various DNSBL tests; SPF, DKIM and DMARC processing; and regex
scanning of message parts in general (for whitelisting, blacklisting,
and other purposes). My reward has been the quite insignificant level
of unwanted mail breaking through - ClamAV hasn't been called upon to
reject a message here since last September - but on the other hand...

Issues:

the use of many different milters introduced near as many problems as
it solved. Differing (let's say) design philosophies, implementation
details and limitations (even no IPv6!) and their support requirements
- not to mention some *very* different takes on configuration files -
have sometimes found me expending unreasonable effort to track down
failures of one sort or another. This led me to begin developing one
single milter of my own, with multiple goals: replace all the seven
milters which I'd typically use; simplify configuration; eliminate a
few of the limitations and compromises (and their associated confusion
and frustration); whilst at the same time increase flexibility. That
work took almost three years, and is now substantially complete.

Development:

Although with the replacement of clamav-milter (the last milter which
I replaced) the work reached something of a milestone, much remains to
be done to assess e.g. the reliability and scalability of both the
milter and the Sendmail interface, especially at higher mail volumes.

That's where you, gentle reader, might come in.

The milter is pure Perl, and I can easily produce a "cut-down" version
of the script which only replaces clamav-milter. I do not mean in any
way to suggest that there is anything wrong with clamav-milter, but it
could be that there are some tradeoffs.

Tradeoffs - minus:

1. On the small-volume servers I manage I can't remember the last time
that a clamav-milter failed. The Perl milter is not as well exercised
as its 'C' counterpart, and it might break - although it's unusual for
that to happen now, except when I'm developing on production (which is
mostly how I do it:).

2. The Perl milter may be slower. I do not know how much that will be
an issue in higher volume settings than my own, but, given that clamd
typically takes at least tens of milliseconds to scan a short message,
I guess that it isn't going to be serious. I'd like to know; there's
still the option of using XS for some parts of the milter. It has to
be said that the way in which Sendmail presents data to milters isn't
exactly streamlined, but that's out of my hands for the foreseeable.

3. Sendmail's milter interface may perform some sanity tests which as
yet the Perl interface doesn't do. That's a work in progress. At the
moment it doesn't appear to present any problems but one needs to be
prepared for surprises.

4. It's a Perl milter. Obviously you'll need Perl on the system, and
it should be 5.16 or later (think UTF-8).

Tradeoffs - plus:

1. The Perl milter can easily be customized for specific purposes.
For example, things like adding headers, logging, whitelisting (also
other custom short-circuits), custom reply codes, talking to multiple
clamd daemons, tailored responses and similar can, even if you're not
a Perl guru, easily be configured using the Perl milter script as a
kind of template.

2. Control is more fine-grained. For example: (1) the milter can pass
the message headers and body to clamd separately - clamd's nifty cache
of md5sums allows that when there are messages with identical bodies,
the body need only be scanned by the engine once; (2) choice between
ACCEPT, REJECT, TEMPFAIL, DISCARD and QUARANTINE can be more flexible,
as can response codes etc returned to the client; (3) operating system
tools and facilities are available to the milter. If you want, say,
to reply with "5.7.26" to mail scanned under particular circumstances,
or even TARPIT the blighters, it's very easy to do that. Express the
circumstances in Perl code (how many lines of code it takes to do that
doesn't really matter), and then call a couple of functions.

3. The milter might enable you to respond more quickly for example to
attacks, or the odd issues which crop up in other parts of the system.
Writing a one-line statement with a Perl regex filter might be quicker
than e.g. waiting for a vendor to write, test and publish patch.

4. The Perl milter uses its own Sendmail interface, and this will talk
to all reasonably recent versions of Sendmail. You'd be crazy to run
a Sendmail that's so old that the milter can't talk to it. There's no
need to build Sendmail "libmilter", nor for Sendmail to be recompiled.
You don't even need a compiler on the system. You do need to be able
to configure the MTA to use the milter, but that's very easy; insert a
suitable 'X' line in sendmail.cf - either by use the m4 preprocessor
to rebuild your sendmail.cf file (like you're supposed to), or edit a
line in that file (that's what I usually do).

More on the Sendmail interface:

The Sendmail interface is a Perl module, which is published on CPAN.
It's called Sendmail::PMilter. It replaces Sendmail's "libmilter"
library which is normally used for milters which are written in C.
Working with the Perl interface is very much easier than working with
the C interface; you can actually concentrate on what you want to get
done, rather than how you're going to do it. The interface is rather
old, and when I found it the development had stalled with some rather
nasty outstanding issues so I took on its maintainership to fix them.
I've been using it for about three years. The latest *development*
version should be used which at present is 1.20_03. To fix bugs, and
to support the latest Milter specifications, I wrote and/or modified
much of the code. Currently I use the pre-fork mechanism to handle
simultaneous connections. Threaded versions of parts which provide
concurrency need exercise. Anyone willing to give that a try is both
welcome and encouraged to do so. The CPAN distribution includes some
old example milters. My plan, if (a) there's interest here, and (b)
the thing doesn't just crash and burn is to add 'clamav-perl-milter'
(if that's eventually what it gets called) as another example packaged
by the CPAN distribution. I would have no objection to it also being
included by the ClamAV distro. As I wrote the milter I have the right
to say that; being no lawyer I'm not sure what the position is with
the interface Perl module, but in any case it's freely available on
CPAN and can be installed easily from there. Installation involves
little more than extracting a tarball. The CPAN install mechanism (a
one-line shell command) can do that, then after installation run some
module tests. At the time of writing that has been done on at least
45 different configurations on over 80 systems.

Why a "cut-down" milter?

Bigger target audience. Up and running much more easily, no need for
a database for example. And less risk. I'll be publishing the full
monty later on, but I'm not ready for that yet. And I really want to
test the support module thoroughly before perching much atop it, and I
did promise a guy in New York that he can be the first guinea-pig.

Other MTAs:

The milter might also work with Postfix, although many assumptions may
need to be re-visited. I'd especially like to know about that too.

Over to you:

Is anyone interested enough to have a go?

Please reply on-list. Non-list mail to my list address is rejected.

Sorry for the length of all this. There was a lot more I wanted to
say but I had to draw the line somewhere.

If there's no objection I'll drop a link to this post over on the
ClamAV Users' List.

--

73,
Ged.
_______________________________________________

clamav-devel mailing list
clamav-devel@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-devel

Please submit your patches to our Bugzilla: http://bugzilla.clamav.net

Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


_______________________________________________

clamav-devel mailing list
clamav-devel@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-devel

Please submit your patches to our Bugzilla: http://bugzilla.clamav.net

Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: Subject: Pure Perl milter for use with clamd. [ In reply to ]
Hi Micah,

On Fri, 23 Aug 2019, Micah Snyder wrote:

> This project sounds pretty cool. It wouldn't be something I would
> want to maintain as a part of the clamav repository. Personally I'm
> well versed in Python but I have almost no Perl experience. I
> suspect that you'll want to maintain ownership anyways for the
> freedom it permits to add and change features as needed.

Thanks! The main thing I'd want to avoid is having umpteen different
versions all over the place all getting out of step with each other.
I've seen that many times with other projects, including a couple of
milters. It's messy and confusing for anybody who wants to use them.

Obviously you already have your hands full with the existing ClamAV
codebase, I wouldn't want to add to the burden. I still lean towards
keeping the masters on CPAN, as examples with the source for the
interface module. If it only serves to get more people started with
their own ideas it will be useful. If it becomes popular I'll think
about other ways of doing it, including some form of support space.

> If you do plan to maintain a "full" version with all the milters, as
> well as a cut-down version for just clam, you may want to make each
> milter into a separate module where the "full" one imports all of
> the modules (code reuse, vs duplication).

The "full" version is what I expect I'll be using for the forseeable
future, so I'll maintain that at the very least. The clamav milter
version won't need much work once it's settled in and I can backport
the odd improvement to it from the full version, which I'm doing now.

On the subject of modules you've touched on something that has been
bothering me for a while. It would be great if you could just pick
which bits you wanted to use and then write something like

use xm_IPC;
use xm_GeoIP;
use xm_ASN;
use xm_DNSBL;
use xm_SPF;
use xm_greylist;
use xm_DKIM;
use xm_ARC;
use xm_tarpit;

but that's pie in the sky at the moment. It would cost months of pain
at the very best and with the amount of interdependence there is, both
between different callbacks, and, within each callback, between (what
would be) the different modules I think I'd spend the rest of my life
ironing out the surprises. At the moment most of the functions can be
selected by command-line options and it's very likely to stay that way
unless someone (someone younger?) steps up. Incidentally 'xm' stands
for "extensible milter", which means it will do more or less anything
you might want to do with mail.

> ... If you host your code in a Github repository, you can make a
> pretty slick documentation site ... we migrated all of our
> documentation into Markdown hosted on github...

Thanks, I'll take a look at that. There's a lot of documentation and
I intend to write more, and I don't have a really good way to present
it all at the moment.

> On a related topic, we have been discussing the idea of phasing in
> an HTTP server as a replacement for the TCP server in clamd.

Hmmmmmm. Given the pressures on other development I wonder if you'll
have enough hands. I'm in the "if it ain't broke, don't fix it" camp.
While I can see the attraction of off-loading some of the complexity
and maintenance, you have many outstanding issues, and not only is the
existing interface nice and simple, it also seems to be very reliable.
As a clamd user, I'm not sure what HTTP offers me that I'd especially
want. About the only thing I'd ask for is a better grip on the state
of the databases used by clamd - e.g. something to load each one when
I wanted to load it, rather than all at once, plus maybe some kind of
an extended 'VERSIONCOMMANDS' instruction which would tell me the name
and timestamp of all the currently loaded database files. But the fix
for #10979 is very much more important than these niceties, and, if I
may say so, long overdue. I've merged the patch in attachment #7196
into 0.101.4, and I'm currently running both the unpatched and patched
versions of clamd side-by-side, scanning with both. I'll let you know
if I find anything really interesting, but as I've mentioned it would
need bigger volumes of genuine mail than we see here to test it well.
I suppose I could let the spammers get further along the milter chain,
but that goes against the grain a bit. }:-) Anyway, the patch *seems*
to be doing the right things; here's an UNpatched daemon getting PINGs
at the top of every minute on its TCP interface, around the time it's
reloading its databases:

Aug 23 10:09:01 mail6 root: PONG
Aug 23 10:10:01 mail6 root: PONG
Aug 23 10:11:01 mail6 clamd[32258]: SelfCheck: Database modification detected. Forcing reload.
Aug 23 10:11:03 mail6 clamd[32258]: Reading databases from /etc/mail/clamav
Aug 23 10:14:41 mail6 clamd[32258]: Database correctly reloaded (8905170 signatures)
Aug 23 10:14:01 mail6 root: PONG
Aug 23 10:12:01 mail6 root: PONG
Aug 23 10:13:01 mail6 root: PONG
Aug 23 10:11:01 mail6 root: PONG
Aug 23 10:15:01 mail6 root: PONG

Note the timestamps of the database reload span more than one minute,
and see the jumble of PONG replies after the reload completes. This
jumble was a surprise - at least it was to me. This daemon holds up
mail for three or four minutes while it's reloading its databases.

Here's the patched daemon doing the same thing:

Aug 24 09:32:01 mail6 root: PONG
Aug 24 09:33:01 mail6 root: PONG
Aug 24 09:34:01 mail6 root: PONG
Aug 24 09:34:01 mail6 clamd[17521]: SelfCheck: Database modification detected. Forcing reload.
Aug 24 09:34:01 mail6 clamd[17521]: Reading databases from /etc/mail/clamav
Aug 24 09:35:01 mail6 root: PONG
Aug 24 09:36:01 mail6 root: PONG
Aug 24 09:37:01 mail6 root: PONG
Aug 24 09:37:46 mail6 clamd[17521]: Database correctly reloaded (8903969 signatures)
Aug 24 09:38:01 mail6 root: PONG
Aug 24 09:39:01 mail6 root: PONG

The patched daemon replies to PINGs and will scan messages while it's
reloading its databases. I haven't looked at what happens if, while
it's reloading, you run something like a recursive directory scan but
then it doesn't normally do things like that here, it just scans mail.

--

73,
Ged.
_______________________________________________

clamav-devel mailing list
clamav-devel@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-devel

Please submit your patches to our Bugzilla: http://bugzilla.clamav.net

Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml