Mailing List Archive

[Bug 517] New: Load average variable stuck during long connections
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=517

Summary: Load average variable stuck during long connections
Product: Exim
Version: 4.66
Platform: x86
OS/Version: Linux
Status: NEW
Severity: bug
Priority: low
Component: General execution
AssignedTo: ph10@hermes.cam.ac.uk
ReportedBy: moseleymark@gmail.com
QAContact: exim-dev@exim.org


Hi all. We've been working with a new commercial spam filter. It's sitting on
the edge in front of our Exim (4.66 - haven't gotten around to upgrading --
running on Debian Sarge on Dell 750/850's) boxes. The spam filter box keeps
connections open for a long time to save on session start up (it'll keep it
open for a tunable amount of emails or till Exim times it out).

I'm seeing something odd: Exim is queueing incoming messages from the spam
filter box due to high load even when the load is low. All I can guess is that
the initial load average gets reused over and over, despite the fact that
/proc/loadavg is rechecked for each message. Or perhaps once it hit the
'queue_only_load' threshold, some flag got set that hasn't gotten unset.

Here's some bits extracted from strace of one of the exim procs on a
long-running connection:


19:21:22.044329 open("/proc/loadavg", O_RDONLY|O_LARGEFILE) = 7
19:21:22.044466 read(7, "29.97 24.32 17.25 17/149 30502\n", 40) = 31
...
19:21:22.103730 write(3, "2007-06-14 19:21:22 1Hyycz-0002rS-VA no immediate
delivery: load average 45.82\n", 79) = 79
...
19:21:45.552132 open("/proc/loadavg", O_RDONLY|O_LARGEFILE) = 7
19:21:45.552305 read(7, "22.91 23.15 17.05 1/127 30926\n", 40) = 30
...
19:21:45.598339 write(3, "2007-06-14 19:21:45 1HyydN-0002rS-FB no immediate
delivery: load average 45.82\n", 79) = 79
...
19:22:05.841367 open("/proc/loadavg", O_RDONLY|O_LARGEFILE) = 7
19:22:05.841599 read(7, "17.00 21.78 16.73 3/131 31091\n", 40) = 30
...
19:22:05.896607 write(3, "2007-06-14 19:22:05 1Hyydh-0002rS-Oj no immediate
delivery: load average 45.82\n", 79) = 79
...
19:22:46.319279 open("/proc/loadavg", O_RDONLY|O_LARGEFILE) = 7
19:22:46.319495 read(7, "9.28 19.19 16.07 1/128 31557\n", 40) = 29
...
19:22:46.373883 write(3, "2007-06-14 19:22:46 1HyyeM-0002rS-7q no immediate
delivery: load average 45.82\n", 79) = 79


... and so on. Note the error message about 'no immediate deliver' has a static
load average that bears no resemblance to what's in /proc/loadavg. The
connection in question has been connected for about 6 hours! Though I've
observed it on a number of other processes on other boxes also behind the spam
filter box.

The load settings I'm using are:

smtp_load_reserve = 34
queue_only_load = 35
deliver_queue_load_max = 150

The boxes triggering this are *not* smtp_reserve_hosts.

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##