Mailing List Archive

local_scan interface discussion
--

[.
If this should be discussed in a separate (new) forum, let me know
and we'll set something up. Otherwise please keep the discussion
properly threaded so people can follow or ignore it as they wish.
]


My reason for writing this is to facilitate collaboration to design a
"perfect" local_scan interface. Philip has the final say regarding
what will or will not be included in the official exim release, of
course. I want to help make the interface meet my desires, and also
to help implement it (now :-)) as I am able.

Please contribute your $0.02 (in whatever monetary unit you deem most
appropriate).



The Problem :
The current problem with the local_scan API is that it requires exim
to be recompiled when ever the scanner is changed. This is annoying
when the scanner changes often or for experimenting with different
scanners, and runs directly counter to distributions of pre-packaged
software. A better implementation will decouple the scanner from the
rest of exim.

The Goal :
The goal of this discussion is to develop an design for the
implementation of the local_scan function which decouples the scanner
from exim proper. The goal of this decoupling is
1) increased flexibility wrt choosing a scanner
2) allow scanners to be (re)compiled without rebuilding the rest of exim
3) allow distributions (eg RedHat or Debian) to provide separate
packages for exim and each of the available scanners
4) allow users of a distribution to use a scanner while still
using the packaged version of exim

Another goal is for the design (and implementation) to be acceptable
to Philip and included in the standard exim release. This is
important for supporting goals #3 and #4 above.



I think some basic premises must be agreed upon :

o Incompatible changes will occur from time to time. Preventing
these changes from disrupting anything is not realistic. (eg
exim4 breaks compatibility with all exim3 config files)

o When (not if) an incompatibility arises, it is desirable to to
detect, report, and gracefully handle it.

o Following the XP philosophy, the simplest method that works is
best. I would like to keep the implementation as small and as
simple as possible, while still meeting the goals.



To start with, I'll list some ideas I have which I don't think are
very realistic. I am doing this so that they can be "vetoed" right
from the beginning.


o Use CORBA or XML-RPC or some such middleware to communicate
between exim and the scanner.

Pros:
o on-the-wire protocols are standard
o implementation libraries are available
o allows scanners to be written in any language
o eliminates C-level binary compatibility concerns

Cons:
o would require too much complexity in exim itself to
implement its side of the communication


o Embed a high-level language interpreter (eg perl, python, or
java), and let it dynamically load modules and whatnot

Pros:
o eliminates C-level binary compatibility concerns
o allows (forces, rather) local_scan functions to be written
in a language other than C

Cons:
o complexity
o increases the size of "exim" since it would contain an
extra interpreter
o increases performance overhead due to startup requirements
of the extra interpreter
o stirs up language wars


o Same as above, but implement the interpreter ourselves

Pros:
o eliminates C-level binary compatibility concerns
o allows (forces, rather) local_scan functions to be written
in a language other than C
o avoids existing language wars

Cons:
o way too much complexity
o who has time to create or learn Yet Another high level
language anyways?
o could create a new language war


On a more practical level, I have these ideas :

o Treat the local_scan the same way other (dynamic) libraries, such
as libldap or libpg, are treated. Let the system's dynamic linker
deal with loading the local scan library at runtime.

Pros:
o eliminates C-level binary compatibility concerns
o eliminates the need to write code dealing with dlopen(), etc.

Cons:
o only allows a single liblocal_scan to be installed at any
time (AFAIK)

Questions :
o How would a system with multiple local_scan libraries
installed behave?
o How would the admin specify which one to use?



These last two ideas are the most practical, I think.


o Create an interface that leverages existing IPC mechanisms such as
pipes, UNIX Domain Sockets (these are the same as fifos and named
pipes, right?), or TCP sockets to communicate with a scanner. The
scanner would be a separate, complete, application.

Pros:
o eliminates C-level binary compatibility concerns
o allows local_scan functions to be written in any language
o prevents language wars (as embedding an interpreter would create)

Cons:
o Requires creating a new protocol.
(or beating an existing one (maybe LMTP or BSMTP) into a
shape suitable for this use)
o Could be complex.

Comments:
o The complexity of creating and implementing a new protocol
can be minimized by devising a sufficiently simple
protocol.
o If this mechanism is chosen, then additional discussion on
the merits of each IPC mechanism and protocol choices will
need to follow.

Additional Data:
One idea I had for this is using a pipe. exim would open a
pipe to the specified scanner program. The message would be
passed to the scanner on stdin. The exit code from the
scanner program would determine what exim should do with it --
accept, tempreject, permreject. If the scanner rejects the
message, its output would be the message to return to the
other server. Otherwise its output would specify how the
message should be modified (namely adding or modifying
headers). The format of the header modification text is a
detail that can be worked out later.



o Use libdl (dlopen, dlsym) to load an admin-specified .so.

Pros:
o doesn't require a lot of code
o an initial implementation is already available
o the scanner API is almost identical to the current one
o no new protocols need to be devised

Cons:
o C is (very apparently) not well suited for dynamic programs
o the libdl API doesn't provide any type checking the way
the C compiler does (or the way python does for "dynamic"
modules)
o This makes it easy for an admin to shoot a 3-sided hole in
exim. If a bad .so is specified (accidentally or
maliciously), exim _could_ have a hard time handling it
gracefully. It will more than likely crash if the ABI
checking doesn't catch the mismatch.


-D

--
If we claim we have not sinned, we make Him out to be a liar and His
Word has no place in our lives.
I John 1:10

http://dman.ddts.net/~dman/
--
[ Content of type application/pgp-signature deleted ]
--
Re: local_scan interface discussion [ In reply to ]
At 20:43 -0500 14/7/02, Derrick 'dman' Hudson wrote:

>o Treat the local_scan the same way other (dynamic) libraries, such
> as libldap or libpg, are treated. Let the system's dynamic linker
> deal with loading the local scan library at runtime.
>
> Cons:
> o only allows a single liblocal_scan to be installed at any
> time (AFAIK)

One simple way to deal with this is simply to allow a small but not 1
number of these, ie:

liblocal_scan1
liblocal_scan2
liblocal_scan3
liblocal_scan4

It could even be a make option as to how many you want with a
reasonable default of 2-4.

It adds a little user complexity in installing them, but a tool to
"install" a local_scan which copies it in to a spare "slot" and
remembers the names of the items installed in each slot for easy
removal/overwrite would be pretty trivial.

>o Create an interface that leverages existing IPC mechanisms such as
> pipes, UNIX Domain Sockets (these are the same as fifos and named
> pipes, right?), or TCP sockets to communicate with a scanner. The
> scanner would be a separate, complete, application.

> Cons:
> o Requires creating a new protocol.
> (or beating an existing one (maybe LMTP or BSMTP) into a
> shape suitable for this use)
> o Could be complex.

Very complex for the local scan author - a quick look at the
local_scan.h shows the kinds of things you'd have to potentially
write:

Header & Recipient list processing
Child handling
logging and debugging
memory management for strings

> Comments:
> o The complexity of creating and implementing a new protocol
> can be minimized by devising a sufficiently simple
> protocol.
> o If this mechanism is chosen, then additional discussion on
> the merits of each IPC mechanism and protocol choices will
> need to follow.

The protocol could be simple enough, just dump down tagged data, but
handling that for the local_scan author would be a pain. Plus what
are you going to do with the actual message data - send that whole
lot down through the pipe and back again?

> Additional Data:
> One idea I had for this is using a pipe. exim would open a
> pipe to the specified scanner program. The message would be
> passed to the scanner on stdin. The exit code from the
> scanner program would determine what exim should do with it --
> accept, tempreject, permreject. If the scanner rejects the
> message, its output would be the message to return to the
> other server. Otherwise its output would specify how the
> message should be modified (namely adding or modifying
> headers). The format of the header modification text is a
> detail that can be worked out later.

You'd also need to send the extra data like recipients and senders
and protocol and authentication and such. The local_scan author
would potentially have to parse the headers, and then figure out a
way to support debugging and logging and such. Plus it would be
rather a large extra amount of processing power, especially for large
messages.

>o Use libdl (dlopen, dlsym) to load an admin-specified .so.
>
> Pros:
> o doesn't require a lot of code
> o an initial implementation is already available
> o the scanner API is almost identical to the current one
> o no new protocols need to be devised
>
> Cons:
> o C is (very apparently) not well suited for dynamic programs
> o the libdl API doesn't provide any type checking the way
> the C compiler does (or the way python does for "dynamic"
> modules)
> o This makes it easy for an admin to shoot a 3-sided hole in
> exim. If a bad .so is specified (accidentally or
> maliciously), exim _could_ have a hard time handling it
> gracefully. It will more than likely crash if the ABI
> checking doesn't catch the mismatch.

Agreed, although the simplicity of this approach would seem to make
it the best solution (or alternatively the approach of just using a
fixed number of local_scans, which would have the advantage of
working without dynamic loading on systems that might not support
this).

Enjoy,
Peter.

--
<http://www.interarchy.com/> <http://download.interarchy.com/>
Re: local_scan interface discussion [ In reply to ]
On Jul 14 Derrick 'dman' Hudson wrote:

>Another goal is for the design (and implementation) to be acceptable
>to Philip and included in the standard exim release. This is
>important for supporting goals #3 and #4 above.

This is the more important goal, especially the bit about preventing
Philip receiving floods of e-mail asking for support from incompatible
binary modules.

Matt

(I like the current API!)
Re: local_scan interface discussion [ In reply to ]
Something over in the corner behind me (not the cat, who is illegally on
the table beside me at the moment) keeps muttering:
local_scan has to be fast
local_scan has to be fast
...

The argument about having to recompile Exim when local_scan changes is less
important when one is doing development on a sandbox machine (even if the
sandbox is one of a cluster, temporarily removed from duty, as we're
presently doing for other scanners), and only putting "working" code into
service.

We won't be using local_scan until you pioneers have taken the arrows.

--John ("may all the arrows be small, and tipped with suction cups")

--
John Baxter jwblist@olympus.net Port Ludlow, WA, USA
Re: local_scan interface discussion [ In reply to ]
On Mon, 15 Jul 2002, Matt Bernstein wrote:

> This is the more important goal, especially the bit about preventing
> Philip receiving floods of e-mail asking for support from incompatible
> binary modules.

Thank you for your concern. :-)

> (I like the current API!)

It was always intended to provoke experimentation and discussion, and it
seems to have succeeded. I will get to this eventually, but not quickly,
I'm afraid.


--
Philip Hazel University of Cambridge Computing Service,
ph10@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Re: local_scan interface discussion [ In reply to ]
On Mon, 15 Jul 2002, Peter N Lewis wrote:

> Very complex for the local scan author - a quick look at the
> local_scan.h shows the kinds of things you'd have to potentially
> write:
>
> Header & Recipient list processing
> Child handling
> logging and debugging
> memory management for strings

This makes me wonder whether we need exim to provide a library which
modules could use.
If so we need two sets of version numbers, one for the version of the
exim support library, and one for the version of the module interface.
(Is the support library statically linked to the module, part of the exim
binary, or another module ? Which bits can be upgraded independently of
which others ?).


At 20:43 -0500 14/7/02, Derrick 'dman' Hudson wrote:
> Additional Data:
> One idea I had for this is using a pipe. exim would open a
> pipe to the specified scanner program. The message would be
> passed to the scanner on stdin. The exit code from the
> scanner program would determine what exim should do with it --
> accept, tempreject, permreject. If the scanner rejects the
> message, its output would be the message to return to the
> other server. Otherwise its output would specify how the
> message should be modified (namely adding or modifying
> headers). The format of the header modification text is a
> detail that can be worked out later.

I'm worried that a pipe would push more data around.
Would a file handle, or even a filename make better use of memory
such as filesystem caches ?
On many OSes, the -D and -H files will already be in memory, although
if we use those we need to indicate changes in format, perhaps by
passing the exim version number to the scanner.

We may wish to allow multiple scans to be done in parallel, to reduce
latency. If so might we need to ensure that scanners only have read
access to the spool files ?

--
Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge
A.C.Aitchison@dpmms.cam.ac.uk http://www.dpmms.cam.ac.uk/~werdna
Re: local_scan interface discussion [ In reply to ]
>My reason for writing this is to facilitate collaboration to design a
>"perfect" local_scan interface. Philip has the final say regarding
>what will or will not be included in the official exim release, of
>course. I want to help make the interface meet my desires, and also
>to help implement it (now :-)) as I am able.
>
>Please contribute your $0.02 (in whatever monetary unit you
>deem most appropriate).


>The Problem :

I agree with the problem statement & goals for this discussion.


>I think some basic premises must be agreed upon :

I agree with Derrick; and, I would add a premise, which is:

The exposed interface should be "robust," in the sense that it should
support as much flexibility in scanner function as possible, while (more
importantly) protecting Exim itself from being corrupted or damaged by
faulty scanner operation.

To this end, I would recommend an approach like Derrick's "pipe" idea.
It is well-understood (stdin, stout), and can evolve without modifying
the underlying communication method (e.g. sort of like the way HELO
evolved into EHLO). If people want to add extra functionality to the
interface, (e.g. TCP/IP to a remote machine), they can use a separate
package (e.g. Stunnel) to provide it. That is, using a pipe, I think,
maintains the maximum flexibility for scanner designers and users, with
a small footprint as well.

> Cons:
> o Requires creating a new protocol.
> (or beating an existing one (maybe LMTP or BSMTP) into a
> shape suitable for this use)
> o Could be complex.

First, I want to say that, with the rising tide of spam and viruses in
the world, creating such a protocol is a VERY GOOD IDEA, which will
likely have wide appeal. It could be the enabling technology to make
spam and virus filtering dramatically more common. Good Standards are
very important things, and I commend this group for having the
discussion.

I think we can address the first issue by leveraging the existing
local_scan "protocol." Also, Derrick has a good start on the types of
options required:

> Additional Data:
> One idea I had for this is using a pipe. exim would open a
> pipe to the specified scanner program. The message would be
> passed to the scanner on stdin. The exit code from the
> scanner program would determine what exim should do with it --
> accept, tempreject, permreject. If the scanner rejects the
> message, its output would be the message to return to the
> other server. Otherwise its output would specify how the
> message should be modified (namely adding or modifying
> headers). The format of the header modification text is a
> detail that can be worked out later.

As for Exim's output to the scanner, I suggest adding a single text
string, to be specified in the Exim config file (send just prior to
sending the actual email message). Wonderful would be the capability to
specify a "general" string that can be expanded to include the contents
of available Exim variables (at the time local_scan is called, of
course). This string should not be global in scope, but specific to the
instance of local_scan called (and any expansion done immediately prior
to calling local_scan). This string would be scanner-unique, and could
be used for any purpose devised by the scanner creator. One could then
even pass multiple commands, options, etc. by designing the scanner to
recognize a separator string/character. A null string would be
permissible (and in fact should be the default). With such a scheme,
one could even create a "parent" local_scan function that could parse
the options string, and call any number of different scanners, and feed
the results back into Exim.

To harp the point, with the options string, one could call a virus
scanner with one call to local_scan, and then call SpamAssassin with a
another call to local_scan. Imagine:

require=local_scan("exim-sa;$domain;$local_part")
deny=local_scan("antivirus;killall;log")

I didn't think about the contents of the options string, just tossing a
wild example out there.

I am thinking that it should behave as much like existing ACL commands
(tests) as possible, and be callable from any ACL. This would maintain
the "flavor" of Exim, and might be easiest to incorporate in the code.

The only real problem I see, with such a pipe scheme ("pipe dream"?
*grins*), is the unknown time delay introduced by the scanner. I
believe we would need a timeout on the connection, or at perhaps a
"keepalive" scheme? Do we keep processing other messages, while waiting
for the scanner to return (parallel processing)? Or is it to be an
"inline" (serial processing) type function?? How are current ACL tests
handled? I defer to other experts in these areas.

Just another thought, could we make the timeout be a function of the
original message size? That way, we can have shorter, more reasonable
timeouts for ordinary messages, but let the scanner crunch a bit longer
on larger messages.

Just my two pennies...

Jim Roberts
Punster Productions, Inc.
Re: local_scan interface discussion [ In reply to ]
On Mon, 15 Jul 2002, James P. Roberts wrote:

> To this end, I would recommend an approach like Derrick's "pipe" idea.

One of the things people were telling me when I was thinking about
local_scan() was that they wanted an efficient way of scanning messages,
without having to run an additional process. If you start using pipes,
you lose this efficiency.

Sure, you get better protection of Exim itself, but that wasn't the
point.

> require=local_scan("exim-sa;$domain;$local_part")
> deny=local_scan("antivirus;killall;log")

You can do that kind of thing already, using ${run or ${perl. The idea
of local_scan() was to be more efficient for those that couldn't afford
even the cost of ${run.

--
Philip Hazel University of Cambridge Computing Service,
ph10@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
Re: local_scan interface discussion [ In reply to ]
----- Original Message -----
From: "Philip Hazel" <ph10@cus.cam.ac.uk>
To: "James P. Roberts" <punster@punsterproductions.com>
Cc: <exim-users@exim.org>
Sent: Monday, July 15, 2002 4:15 PM
Subject: Re: [Exim] local_scan interface discussion


On Mon, 15 Jul 2002, James P. Roberts wrote:

> To this end, I would recommend an approach like Derrick's "pipe" idea.

One of the things people were telling me when I was thinking about
local_scan() was that they wanted an efficient way of scanning messages,
without having to run an additional process. If you start using pipes,
you lose this efficiency.

Sure, you get better protection of Exim itself, but that wasn't the
point.

> require=local_scan("exim-sa;$domain;$local_part")
> deny=local_scan("antivirus;killall;log")

You can do that kind of thing already, using ${run or ${perl. The idea
of local_scan() was to be more efficient for those that couldn't afford
even the cost of ${run.

--
Philip Hazel University of Cambridge Computing Service,
ph10@cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.


Thanks for the clarification of the motivation, Philip.

So it would seem there are two needs... (1) A way to scan FAST, and (2)
a way to scan with minimal impact/risk to Exim itself, minimum
administrative effort level, and maximum flexibility in choice of
scanners.

I agree these two things are pretty much mutually exclusive.

If I understand correctly, Exim already provides for number (2). Nifty!
(I think it is time for me to Re-RTFM...)

At any rate, I see your point, Philip. As the bomb said to the Dark
Star crew, "I must think on this further."

Thanks again.

Jim Roberts