Mailing List Archive

[Bug 516] New: Some queue process never finish execution
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

http://www.exim.org/bugzilla/show_bug.cgi?id=516

Summary: Some queue process never finish execution
Product: Exim
Version: 4.60
Platform: x86
OS/Version: Linux
Status: NEW
Severity: bug
Priority: critical
Component: Queues
AssignedTo: ph10@hermes.cam.ac.uk
ReportedBy: borja.pacheco@juntadeandalucia.es
QAContact: exim-dev@exim.org


We have a very long exim infrastructure running and nowadays very often we are
watching some exim queue process that are "pseudo" hanged-up.
They are running, but for a very, very long time. They are becoming a problem
because they are using a lot of memory and servers are starting to use swap.
This issue make us to restart server every time.

Some useful information bellow...

1. Process details:
Debian- 1019 0.0 0.1 8248 1500 ? Ss Jun18 0:29 /usr/sbin/exim4
-bd -q30m -C /etc/exim4/exim4.conf
Debian- 19704 0.0 0.1 6248 1300 ? S 06:03 0:00 \_
/usr/sbin/exim4 -q
Debian- 19712 18.4 16.4 804060 170348 ? D 06:03 63:16 | \_
/usr/sbin/exim4 -q
Debian- 5932 0.0 0.1 6248 1300 ? S 06:33 0:00 \_
/usr/sbin/exim4 -q
Debian- 5933 30.9 28.4 600140 294712 ? R 06:33 96:36 \_
/usr/sbin/exim4 -q
Debian- 31679 0.0 0.0 0 0 ? R 11:45 0:00 \_
[exim4]

>> Our waster process is 19712.
>> The status means that the process is awaiting IO, doesn't it?

2. Swap usage....
top - 11:46:44 up 1 day, 1:43, 6 users, load average: 5.30, 4.85, 4.63
Tasks: 189 total, 4 running, 184 sleeping, 0 stopped, 1 zombie
Cpu(s): 11.1% us, 44.4% sy, 0.0% ni, 6.1% id, 37.8% wa, 0.5% hi, 0.2% si
Mem: 1036036k total, 1026176k used, 9860k free, 2480k buffers
Swap: 1466780k total, 1257444k used, 209336k free, 139832k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
19712 Debian-e 15 0 785m 190m 5848 D 0.3 18.9 63:16.72 594m exim4
5933 Debian-e 16 0 587m 288m 5848 S 28.6 28.6 96:56.38 298m exim4

3. Strace (strace -p 19712) looks like the exim just write messages to log
file, but anything else...

close(4) = 0
open("/var/log/exim4/mainlog", O_WRONLY|O_APPEND|O_LARGEFILE) = 4
fcntl64(4, F_GETFD) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
fstat64(4, {st_mode=S_IFREG|0640, st_size=120965160, ...}) = 0
write(4, "2007-06-19 11:17:54 1I0BG7-0003d"..., 464) = 464
time(NULL) = 1182244674
write(8, "0y\2\3\3\360\333cr\0042ou=centralizado,o=rec"..., 123) = -1 EPIPE
(Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
time(NULL) = 1182244674
time(NULL) = 1182244674
write(7, "2007-06-19 11:17:54 aurora.gonza"..., 446) = 446
time(NULL) = 1182244674
time([1182244674]) = 1182244674
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
send(3, "<22>Jun 19 11:17:54 exim[19712]:"..., 498, MSG_NOSIGNAL) = 498
time(NULL) = 1182244674
close(4) = 0
open("/var/log/exim4/mainlog", O_WRONLY|O_APPEND|O_LARGEFILE) = 4
fcntl64(4, F_GETFD) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
fstat64(4, {st_mode=S_IFREG|0640, st_size=120965624, ...}) = 0
write(4, "2007-06-19 11:17:54 1I0BG7-0003d"..., 466) = 466
time(NULL) = 1182244674
write(8, "0v\2\3\3\360\334co\0042ou=centralizado,o=rec"..., 120) = -1 EPIPE
(Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
time(NULL) = 1182244674
time(NULL) = 1182244674
write(7, "2007-06-19 11:17:54 amaria.vivas"..., 440) = 440
time(NULL) = 1182244674
time([1182244674]) = 1182244674
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
send(3, "<22>Jun 19 11:17:54 exim[19712]:"..., 492, MSG_NOSIGNAL) = 492
time(NULL) = 1182244674
close(4) = 0
open("/var/log/exim4/mainlog", O_WRONLY|O_APPEND|O_LARGEFILE) = 4
fcntl64(4, F_GETFD) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
fstat64(4, {st_mode=S_IFREG|0640, st_size=120966090, ...}) = 0
write(4, "2007-06-19 11:17:54 1I0BG7-0003d"..., 460) = 460
time(NULL) = 1182244674
write(8, "0y\2\3\3\360\335cr\0042ou=centralizado,o=rec"..., 123) = -1 EPIPE
(Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
time(NULL) = 1182244674
time(NULL) = 1182244674
write(7, "2007-06-19 11:17:54 aurora.gonza"..., 446) = 446
time(NULL) = 1182244674
time([1182244674]) = 1182244674
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=946, ...}) = 0
send(3, "<22>Jun 19 11:17:54 exim[19712]:"..., 498, MSG_NOSIGNAL) = 498
time(NULL) = 1182244674
close(4) = 0
open("/var/log/exim4/mainlog", O_WRONLY|O_APPEND|O_LARGEFILE) = 4
fcntl64(4, F_GETFD) = 0

4. GDB to see where it is interrupted...
gdb /usr/sbin/exim4 19712
GNU gdb 6.3-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux"...Using host libthread_db library
"/lib/tls/libthread_db.so.1".

Attaching to program: /usr/sbin/exim4, process 19712
`system-supplied DSO at 0xffffe000' has disappeared; keeping its symbols.
Reading symbols from /lib/tls/libresolv.so.2...done.
Loaded symbols for /lib/tls/libresolv.so.2
Reading symbols from /lib/tls/libnsl.so.1...done.
Loaded symbols for /lib/tls/libnsl.so.1
Reading symbols from /lib/tls/libcrypt.so.1...done.
Loaded symbols for /lib/tls/libcrypt.so.1
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /usr/lib/libdb-4.2.so...done.
Loaded symbols for /usr/lib/libdb-4.2.so
Reading symbols from /usr/lib/libldap_r.so.2...done.
Loaded symbols for /usr/lib/libldap_r.so.2
Reading symbols from /usr/lib/liblber.so.2...done.
Loaded symbols for /usr/lib/liblber.so.2
Reading symbols from /usr/lib/libperl.so.5.8...done.
Loaded symbols for /usr/lib/libperl.so.5.8
Reading symbols from /lib/tls/libdl.so.2...done.
Loaded symbols for /lib/tls/libdl.so.2
Reading symbols from /lib/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 1079077088 (LWP 19712)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libsasl2.so.2...done.
Loaded symbols for /usr/lib/libsasl2.so.2
Reading symbols from /usr/lib/libgnutls.so.13...done.
Loaded symbols for /usr/lib/libgnutls.so.13
Reading symbols from /usr/lib/libtasn1.so.3...done.
Loaded symbols for /usr/lib/libtasn1.so.3
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libgcrypt.so.11...done.
Loaded symbols for /usr/lib/libgcrypt.so.11
Reading symbols from /usr/lib/libgpg-error.so.0...done.
Loaded symbols for /usr/lib/libgpg-error.so.0
Reading symbols from /lib/tls/libnss_compat.so.2...done.
Loaded symbols for /lib/tls/libnss_compat.so.2
Reading symbols from /lib/tls/libnss_nis.so.2...done.
Loaded symbols for /lib/tls/libnss_nis.so.2
Reading symbols from /lib/tls/libnss_files.so.2...done.
Loaded symbols for /lib/tls/libnss_files.so.2
Reading symbols from /usr/lib/sasl2/libsasldb.so.2...done.
Loaded symbols for /usr/lib/sasl2/libsasldb.so.2
0x080a4779 in store_release_3 ()
(gdb) where
#0 0x080a4779 in store_release_3 ()
#1 0x080a5738 in string_cat ()
#2 0x080a57bc in string_append ()
#3 0x0805ff6e in post_process_one ()
#4 0x08065d03 in deliver_message ()
#5 0x080870f0 in queue_run ()
#6 0x0806ef8a in main ()
(gdb)

--
Configure bugmail: http://www.exim.org/bugzilla/userprefs.cgi?tab=email

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##