Mailing List Archive

[Bug 3446] spamd locks up randomly
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|3.1.0 |3.0.0





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From spamassassin-contrib@msquadrat.de 2004-05-30 10:11 -------
The cute "futex" seems to be threading-related. Did you (or your packager)
build your Perl with threading enabled? We had some problems with threading-
enabled Perls before which in the end always led to a borked Perl. (AFAIK
threading can be considered broken in Perl, at least in 5.6.x.)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-05-30 10:30 -------
Subject: Re: spamd locks up randomly

On Sun, May 30, 2004 at 10:11:37AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> The cute "futex" seems to be threading-related. Did you (or your packager)
> build your Perl with threading enabled? We had some problems with threading-
> enabled Perls before which in the end always led to a borked Perl. (AFAIK
> threading can be considered broken in Perl, at least in 5.6.x.)

Hrm. Well, a perl -V results in:

config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc -Dcf_by=Red Hat, Inc.
-Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr
-Dsiteprefix=/usr -Uusethreads -Uuseithreads -Uuselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Di_ndbm
-Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Dinc_version_list=5.6.0/i386-linux 5.6.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=undef d_sfio=undef uselargefiles=undef usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef

there's the -Uusethreads which leads to usethreads=undef, so I would
guess no threads.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-05-30 11:16 -------
hrm, interesting. if I run lsof on a working spamd process, there's a few things I didn't expect to see:

spamd 16002 root mem REG 8,2 24788 25485953 /usr/lib/perl5/5.6.1/i386-linux/auto/
Data/Dumper/Dumper.so
spamd 16002 root mem REG 8,2 97424 25667875 /lib/tls/libpthread-0.60.so

I don't see any of our code which uses Data::Dumper (the only reference to it is commented out), and
perl says it doesn't do threading, but then loads the pthread library?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From spamassassin-contrib@msquadrat.de 2004-05-30 12:49 -------
Maybe Data::Dumper is use'd by some helper module of us?

And to give you a clue about which module links to the threading lib, this
might help:

find /usr/lib/perl5 -name '*.so' | \
xargs ldd | \
awk '/^[/]/{ l=$0; next }; { print l $0 }' | \
grep thread




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-05-30 14:22 -------
Subject: Re: spamd locks up randomly

On Sun, May 30, 2004 at 12:49:06PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> And to give you a clue about which module links to the threading lib, this
> might help:

ooh. nice one. thanks.

of course the answer isn't surprising to me:

/usr/lib/perl5/5.6.1/i386-linux/auto/DB_File/DB_File.so: libpthread.so.0 => /lib/tls/libpthread.so.0
(0x40206000)

which seems to be because libdb is linked against pthread:

libpthread.so.0 => /lib/tls/libpthread.so.0 (0x400cd000)

<sigh>





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-05-31 21:37 -------
spamd lasted a day and is now having issues again. from some debugging:

May 31 23:53:36 eclectic spamd[4667]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.

May 31 23:30:27 eclectic spamd[6866]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.

line 724 is the opening of the DCC eval. so for right now, I've disabled dcc in my config.

from the original report:

May 30 02:58:45 eclectic spamd[821]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.
May 30 03:07:54 eclectic spamd[1332]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.
May 30 02:51:17 eclectic spamd[2669]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-01 21:41 -------
ok, it happened again. no messages in syslog.

I've now reenabled dcc and disabled flock.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-02 23:52 -------
ok...

disabled flock and enabled dcc, locked up again.

lsof reports that there are a large number of UDP sockets open on the locked up processes. and each
one has:

Jun 3 01:19:51 eclectic spamd[3892]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.
Jun 3 00:22:36 eclectic spamd[5005]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.
Jun 2 20:33:26 eclectic spamd[6349]: Attempt to free unreferenced scalar at /usr/lib/perl5/site_perl/
5.6.1/Mail/SpamAssassin/Dns.pm line 724, <DCC> line 1.


gdb reports that the process is in __lll_mutex_lock_wait, and if I go pushing things in gdb, I get
Perl_sv_clear ... the problem seems to occur only after ~24 hours btw.

I'm now disabling dcc again and we'll see.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-03 00:12 -------
I was also just noticing:

Jun 2 23:19:19 eclectic spamd[5005]: server hit by SIGCHLD, pid 1017
Jun 2 23:28:10 eclectic spamd[5005]: server hit by SIGCHLD, pid 1882
Jun 2 23:34:49 eclectic spamd[5005]: server hit by SIGCHLD, pid 2458
Jun 2 23:40:19 eclectic spamd[5005]: server hit by SIGCHLD, pid 2941
Jun 2 23:47:35 eclectic spamd[5005]: server hit by SIGCHLD, pid 3723
Jun 2 23:53:56 eclectic spamd[5005]: server hit by SIGCHLD, pid 4534
Jun 2 23:58:02 eclectic spamd[5005]: server hit by SIGCHLD, pid 5086
Jun 3 00:04:28 eclectic spamd[5005]: server hit by SIGCHLD, pid 5569
Jun 3 00:14:25 eclectic spamd[5005]: server hit by SIGCHLD, pid 6486
Jun 3 00:20:05 eclectic spamd[5005]: server hit by SIGCHLD, pid 7058
Jun 3 00:22:18 eclectic spamd[5005]: server hit by SIGCHLD, pid 7451

the children spamd's ought not to get SIGCHLD from anywhere. they're not setup to deal with that.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-03 12:41 -------
Created an attachment (id=1998)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=1998&action=view)
current patch in testing

I don't know if this is related to the futex issue or not, I guess I'll find
out tomorrow (although I'm going to set max-conn-per-child to something lower
so this hopefully triggers faster if it's going to...)

I applied this patch to my installed spamd and will see how it works.
Basically it does 2 things:

1) don't bother making a backup copy of the configuration if
--max-conn-per-child == 1 -- this behavior is basically the "process a message
then die", so copying the configuration around is just wasteful.

2) when spawning children, reset the signal handlers to DEFAULT. what was
happening is that when initially spawned, each child had sighup set (due to the
parent) and everything else default, but when the child finally dies and gets
restarted, it would inherit the sigchld, sigterm, sigint, etc handlers from the
parent.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From jm@jmason.org 2004-06-03 12:48 -------
ick! that "inheriting SIGCHLD" thing is bad, probably caused trouble, and may
even be related to this bug, then.

max-conn-per-child tweak: makes sense.

+1

(BTW, if you have to do a respin of this patch, I have a suggestion: move the
conf-copying code into a Mail::SA module, possibly Mail::SpamAssassin itself --
since its possible other callers may find it useful, such as amavisd. and move
the sig-twiddling into 2 functions for clarity.)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-03 13:23 -------
Subject: Re: spamd locks up randomly

On Thu, Jun 03, 2004 at 12:48:57PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> ick! that "inheriting SIGCHLD" thing is bad, probably caused trouble, and may
> even be related to this bug, then.

Yeah. I can't say for sure whether or not it is related, but I figured
I should fix it and see what happens.

> max-conn-per-child tweak: makes sense.
>
> +1

:)

> (BTW, if you have to do a respin of this patch, I have a suggestion: move the
> conf-copying code into a Mail::SA module, possibly Mail::SpamAssassin itself --
> since its possible other callers may find it useful, such as amavisd. and move
> the sig-twiddling into 2 functions for clarity.)

The conf code already is. M::SA::copy_config() ... :)

The sig twiddling stuff is related specifically to the daemon, so I'm
not worried about making those "generic".





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From sidney@sidney.com 2004-06-03 14:23 -------
I don't have time to check over the patch carefully right now, but make sure it
doesn't undo the last fix I checked in by moving the setting of the signal
handlers to before the pid file and the log message with the pid are written
out. The signal handlers must be in place before the pid is visible to waiting
processes.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-03 14:43 -------
Subject: Re: spamd locks up randomly

On Thu, Jun 03, 2004 at 02:23:36PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I don't have time to check over the patch carefully right now, but make sure it

'tis ok, I put the patch up more so it would exist somewhere rather than
looking for review. I want to see if there's anything else that needs
poking before it becomes "for review".

> doesn't undo the last fix I checked in by moving the setting of the signal
> handlers to before the pid file and the log message with the pid are written
> out. The signal handlers must be in place before the pid is visible to waiting
> processes.

yeah, I was trying to find out the ticket/issue to make sure I didn't
undo that. mss pointed me at you/3443.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #1998 is|0 |1
obsolete| |



------- Additional Comments From felicity@kluge.net 2004-06-03 22:32 -------
Created an attachment (id=1999)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=1999&action=view)
patch for review

running with this patch, I haven't had any problem all day, even after changing
the max-connections per child to 20 and letting the children "rotate" a few
times.

so please review this one, it's basically the same as the last except I changed
some comments, etc. I'm running spamd with flock and dcc, and the default
connections per child now, and will watch to make sure things look ok going
forward.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From jm@jmason.org 2004-06-04 18:20 -------
+1



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-04 20:12 -------
as far as I can tell, spamd's still working fine with this patch -- all 8 children have been respawned, no
futex locks, no M::SA::Dns issues... fyi. :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From jm@jmason.org 2004-06-04 20:15 -------
let's hope a few more people +1 it so I can start dogfooding on *my* server,
hint hint ;)

BTW, what do you think about starting mass-checks soon?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From felicity@kluge.net 2004-06-04 20:26 -------
Subject: Re: spamd locks up randomly

On Fri, Jun 04, 2004 at 08:15:59PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> let's hope a few more people +1 it so I can start dogfooding on *my* server,
> hint hint ;)

you can't run "patch" ? ;)

> BTW, what do you think about starting mass-checks soon?

we should get pre1 out first imho. the rest of the tickets can be done
pretty quickly, I think, but we need to announce mass-checks so people
can prep.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446





------- Additional Comments From spamassassin-contrib@msquadrat.de 2004-06-05 01:51 -------
+1 on 1999 -- I still find it weird that signal handler jump into
threading-related routines on a Perl without any threading but the patch is
definitely the right thing to do. And if it helps, just the better :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3446] spamd locks up randomly [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3446

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED



------- Additional Comments From felicity@kluge.net 2004-06-05 12:08 -------
ok, I've committed the patch. the children are still running strong for me, so hopefully that's it. :)
r20839



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.