There have been two things that I have thought of doing differently in
qmail for a very long time.
1. Avoid scanning todo directory for new email. qmail-queue has the
information of a new email. Can this be transmitted to qmail-send
efficiently without the need for scanning todo directory?
2. Send multiple emails using as few connections as possible for remote
deliveries rather a making new connection for every email to be
delivered to remote hosts. This will be my next project and I
think this will be challenging because too much to deal with tcp/ip
sockets, timeouts, handling reconnection and stuff. Not sure I will
succeed, but I will give it a try.
Eliminating lock/trigger mechanism.
In my experience I have found this to be costly in a very high injection
environment. Recently I have done lot of tests and high injection rates
always has a drastic impact on the qtime value (see zoverall script
from qmailanalog). However with the current hardwares, I haven't
encountered the Silly Qmail Syndrome which was often seen in the
hardware of the 90s.
So I thought of a way to do away with the lock/trigger mechanism and
henceavoid the need of scanning the todo directory for newly injected
mails byusing POSIX message queues in qmail-queue to communicate
with qmail-send.
To achieve this qmail-queue was re-written to use POSIX message queues and
POSIX shared memory to communicate with qmail-send. Here qmail-queue simply
communicates the inode number of the new message to qmail-send and
qmail-send classifies it as local or remote for delivery. The entire todo
scan using opendir(), readdir() is avoided. Additionally, qmail-send was
modified to write the current local and remote concurrency to shared
memory. This enabled a separate process to look at the shared memory
segment and dynamically increase the queue count based on the incoming
injection rate. Having the current concurrency also enables to look at the
queue without having to scan the queue directories (mess, local, remote
and todo). Just by setting the environment variable DYNAMIC_QUEUE to 1
enables point 1 and 2. Initially, this method gave worse performance than
the lock/trigger method, but after ironing out the issues with using IPC,
I'm getting at least 40% improvement. To measure the performance a script
using qmailanalog was written which could fire multiple processes to
inject mail using qmail-inject. I tested various distributions netqmail,
notqmail, s/qmail, indimail-mta, netqmail with exttodo patch. In this
process, I discovered that it is a huge benefit to have a separate processor
for todo directory. netqmail with exttodo patch, s/qmail and indimail-mta
benefits a lot in managing a very low qtime. However the biggest impact on
delivery rate is from the fsync() calls made in qmail-queue.c, qmail-send.c,
qmail-local.c. In tests conducted, fsync made deliveries 8x slower. So if
you have a stable power supply and an OS like Linux (which I haven't seen
crashing for more than 10 years and counting), one can disable fsync.
Having the local and concurrency in shared memory also allows to have a
tmux script named 'qtop' which displays the local and remote concurrency
for each queue in top window and qmail-send logs in the bottom window. This
is done without using disk IO to list files in todo, local, remote subdirs. In
contrast, using qmail-qread drastically reduces injection and delivery
rates when injecting emails at a high rate.
There are few other things that this release achieves
1 dynamically Increase the number of queues based on the injection
rate.
2 Rate limit remote deliveries. This used to be the biggest challenge for
me when dealing with high volume delivery to sites like yahoo, aol and
roadrunner.
3 A much smaller MTA, just one binary that would handle both local, remote
delivery for small board computers. A downside is that it doesn't have the
trust partitioning that having different processes to handle different jobs.
To achieve 1, a program qmonitor was written to look at the concurrency
values stored in the shared memory. When the queue reaches the
concurrency limit, a new queue is created with its own qmail-send process
to process the queue. The current number of queues is also maintained in
shared memory. This allows qmail-queue to know that a new queue is
available for use.
To achieve 2, prioq.c, qmail-send was modified and a new mta was written,
named slowq-send. This special qmail-send looks at queue/ratelimit
directory for a file named 'domain'. Here 'domain' is the destination
domain and it contains a simple expression. e.g. 10/3600 means allow only
10 emails per hour to be delivered to this domain.
To achieve 3, a new MTA was written combining the code of qmail-send,
qmail-lspawn.c, qmail-rspawn.c. This binary known as qmta-send doesn't have
any trust partitioning. But it allows one to run just one systemd service
and one binary to do both local and remote deliveries.
The code has been released as part of indimail-mta 3.0.0 release. It has
been in the works since Oct 2021. It has been difficult and hence I believe
that I will keep on finding bugs as I go along. But sharing this to just
let you folks know about this.
https://github.com/mbhangui/indimail-mta/releases/tag/v3.0.0
Few other things that this release has
1. envdir can use multiple directories. You just need to create
soft links. This allows one to have a directory with global
variables for all supervised services
2. softlimit now has a -q option to set message queue limits.
3. ucspi-tcp, qmail-smtpd, qmail-remote is openssl 3.0.0 ready
4. djb has this comment on line 248 in function comm_canwrite()
in qmail-send.c
/* XXX: could allow a bigger buffer; say 10 recipients */
I tried this by having an environment variable TODO_CHUNK_SIZE
to allow a variable buffer. This does have a positive impact on delivery
rates at the cost of a delayed start.
All of this was benchmarked and documented here
https://github.com/mbhangui/indimail-mta/tree/master/indimail-mta-x/qmail-perf
The google sheet also has the raw data but those sheets are hidden.
Once can enable them using the unhide option in the google sheet menu.
The tests were done on an aging 2012 sony vaio laptop. But this is what I have.
Better hardware could show different results. Also FreeBSD showed
very good inject speeds for netqmail, notqmail, s/qmail but my
FreeBSD is on virtualbox and most probably the fsync() calls do not
actually do fsync(). If someone has a spare box available and
can give login access for a month, I can redo the tests on FreeBSD.
One of the few things I discovered was
1. The conf split doesn't have much impact. Increasing
conf split gives a very marginal improvement on
inject and delivery speeds.
2. dynamically linked qmail-queue reduces injection speeds.
This was a big reason why indimail-mta had the worst
injection rates. I have now made qmail-queue statically
linked in indimail-mta.
3. zfs for the queue directory gives the best performance
if fsync is enabled. ext4 gives the best performance if
fsync is disabled. This is for file creation speed
4. zfs gives the worst performance for file deletion
5. file access times are more or less similar for ext4
and zfs
6. The access times and deletion times doesn't have
high granularity in the tests because I used the
unix time command which gives times only
upto two decimal places.
7. I did the tests on FreeBSD too and got good
speeds. But my FreeBSD is on a virtualbox
and I think the virtual box disk io is cheating on
fsync(). One of these days I will carry out by
installing FreeBSD natively on a separate hard
disk.
8. Having a separate process to scan the todo drastically improves
the qtime. It is instantly visible if you use the zoverall script from
qmailanalog.
9. external todo processor has a remarkable impact on the
local concurrency. The concurrency never reaches high
values with high inject rates.
10. For a large amount of mail injection, it is almost impossible to
beat the speed of netqmail.
--
Regards Manvendra - http://www.indimail.org
GPG Pub Key
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC7CBC760014D250C
qmail for a very long time.
1. Avoid scanning todo directory for new email. qmail-queue has the
information of a new email. Can this be transmitted to qmail-send
efficiently without the need for scanning todo directory?
2. Send multiple emails using as few connections as possible for remote
deliveries rather a making new connection for every email to be
delivered to remote hosts. This will be my next project and I
think this will be challenging because too much to deal with tcp/ip
sockets, timeouts, handling reconnection and stuff. Not sure I will
succeed, but I will give it a try.
Eliminating lock/trigger mechanism.
In my experience I have found this to be costly in a very high injection
environment. Recently I have done lot of tests and high injection rates
always has a drastic impact on the qtime value (see zoverall script
from qmailanalog). However with the current hardwares, I haven't
encountered the Silly Qmail Syndrome which was often seen in the
hardware of the 90s.
So I thought of a way to do away with the lock/trigger mechanism and
henceavoid the need of scanning the todo directory for newly injected
mails byusing POSIX message queues in qmail-queue to communicate
with qmail-send.
To achieve this qmail-queue was re-written to use POSIX message queues and
POSIX shared memory to communicate with qmail-send. Here qmail-queue simply
communicates the inode number of the new message to qmail-send and
qmail-send classifies it as local or remote for delivery. The entire todo
scan using opendir(), readdir() is avoided. Additionally, qmail-send was
modified to write the current local and remote concurrency to shared
memory. This enabled a separate process to look at the shared memory
segment and dynamically increase the queue count based on the incoming
injection rate. Having the current concurrency also enables to look at the
queue without having to scan the queue directories (mess, local, remote
and todo). Just by setting the environment variable DYNAMIC_QUEUE to 1
enables point 1 and 2. Initially, this method gave worse performance than
the lock/trigger method, but after ironing out the issues with using IPC,
I'm getting at least 40% improvement. To measure the performance a script
using qmailanalog was written which could fire multiple processes to
inject mail using qmail-inject. I tested various distributions netqmail,
notqmail, s/qmail, indimail-mta, netqmail with exttodo patch. In this
process, I discovered that it is a huge benefit to have a separate processor
for todo directory. netqmail with exttodo patch, s/qmail and indimail-mta
benefits a lot in managing a very low qtime. However the biggest impact on
delivery rate is from the fsync() calls made in qmail-queue.c, qmail-send.c,
qmail-local.c. In tests conducted, fsync made deliveries 8x slower. So if
you have a stable power supply and an OS like Linux (which I haven't seen
crashing for more than 10 years and counting), one can disable fsync.
Having the local and concurrency in shared memory also allows to have a
tmux script named 'qtop' which displays the local and remote concurrency
for each queue in top window and qmail-send logs in the bottom window. This
is done without using disk IO to list files in todo, local, remote subdirs. In
contrast, using qmail-qread drastically reduces injection and delivery
rates when injecting emails at a high rate.
There are few other things that this release achieves
1 dynamically Increase the number of queues based on the injection
rate.
2 Rate limit remote deliveries. This used to be the biggest challenge for
me when dealing with high volume delivery to sites like yahoo, aol and
roadrunner.
3 A much smaller MTA, just one binary that would handle both local, remote
delivery for small board computers. A downside is that it doesn't have the
trust partitioning that having different processes to handle different jobs.
To achieve 1, a program qmonitor was written to look at the concurrency
values stored in the shared memory. When the queue reaches the
concurrency limit, a new queue is created with its own qmail-send process
to process the queue. The current number of queues is also maintained in
shared memory. This allows qmail-queue to know that a new queue is
available for use.
To achieve 2, prioq.c, qmail-send was modified and a new mta was written,
named slowq-send. This special qmail-send looks at queue/ratelimit
directory for a file named 'domain'. Here 'domain' is the destination
domain and it contains a simple expression. e.g. 10/3600 means allow only
10 emails per hour to be delivered to this domain.
To achieve 3, a new MTA was written combining the code of qmail-send,
qmail-lspawn.c, qmail-rspawn.c. This binary known as qmta-send doesn't have
any trust partitioning. But it allows one to run just one systemd service
and one binary to do both local and remote deliveries.
The code has been released as part of indimail-mta 3.0.0 release. It has
been in the works since Oct 2021. It has been difficult and hence I believe
that I will keep on finding bugs as I go along. But sharing this to just
let you folks know about this.
https://github.com/mbhangui/indimail-mta/releases/tag/v3.0.0
Few other things that this release has
1. envdir can use multiple directories. You just need to create
soft links. This allows one to have a directory with global
variables for all supervised services
2. softlimit now has a -q option to set message queue limits.
3. ucspi-tcp, qmail-smtpd, qmail-remote is openssl 3.0.0 ready
4. djb has this comment on line 248 in function comm_canwrite()
in qmail-send.c
/* XXX: could allow a bigger buffer; say 10 recipients */
I tried this by having an environment variable TODO_CHUNK_SIZE
to allow a variable buffer. This does have a positive impact on delivery
rates at the cost of a delayed start.
All of this was benchmarked and documented here
https://github.com/mbhangui/indimail-mta/tree/master/indimail-mta-x/qmail-perf
The google sheet also has the raw data but those sheets are hidden.
Once can enable them using the unhide option in the google sheet menu.
The tests were done on an aging 2012 sony vaio laptop. But this is what I have.
Better hardware could show different results. Also FreeBSD showed
very good inject speeds for netqmail, notqmail, s/qmail but my
FreeBSD is on virtualbox and most probably the fsync() calls do not
actually do fsync(). If someone has a spare box available and
can give login access for a month, I can redo the tests on FreeBSD.
One of the few things I discovered was
1. The conf split doesn't have much impact. Increasing
conf split gives a very marginal improvement on
inject and delivery speeds.
2. dynamically linked qmail-queue reduces injection speeds.
This was a big reason why indimail-mta had the worst
injection rates. I have now made qmail-queue statically
linked in indimail-mta.
3. zfs for the queue directory gives the best performance
if fsync is enabled. ext4 gives the best performance if
fsync is disabled. This is for file creation speed
4. zfs gives the worst performance for file deletion
5. file access times are more or less similar for ext4
and zfs
6. The access times and deletion times doesn't have
high granularity in the tests because I used the
unix time command which gives times only
upto two decimal places.
7. I did the tests on FreeBSD too and got good
speeds. But my FreeBSD is on a virtualbox
and I think the virtual box disk io is cheating on
fsync(). One of these days I will carry out by
installing FreeBSD natively on a separate hard
disk.
8. Having a separate process to scan the todo drastically improves
the qtime. It is instantly visible if you use the zoverall script from
qmailanalog.
9. external todo processor has a remarkable impact on the
local concurrency. The concurrency never reaches high
values with high inject rates.
10. For a large amount of mail injection, it is almost impossible to
beat the speed of netqmail.
--
Regards Manvendra - http://www.indimail.org
GPG Pub Key
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC7CBC760014D250C