Mailing List Archive

rsyslog segfaults?
Hi all,

may I ask if those of you that experienced the segfault problem could
re-produce it with 1.19.6? Any feedback would be deeply appreciated.

Thanks,
Rainer
rsyslog segfaults? [ In reply to ]
Patrick,

without answering the question directly: could you please try out

http://download.rsyslog.com/rsyslog/rsyslog-1.19.7.tar.gz

[**still not "real" 1.19.7 **]

I've possibly seen one thing that could be the problem cause. Not sure, though. I'd deeply appreciate feedback if this version (uploaded right NOW) causes the problem to disappear. I am not sure if I have found it, but I'd like to conduct a quick test before going any further. Thus I've no detailed analysis.

Thanks,
Rainer

> -----Original Message-----
> From: Patrick von der Hagen [mailto:hagen at rz.uni-karlsruhe.de]
> Sent: Monday, September 17, 2007 5:48 PM
> To: Rainer Gerhards
> Cc: theinric
> Subject: Re: [rsyslog] rsyslog segfaults?
>
> On Mon, 2007-09-17 at 14:51 +0200, Rainer Gerhards wrote:
> > Hi all,
> >
> > may I ask if those of you that experienced the segfault problem could
> > re-produce it with 1.19.6? Any feedback would be deeply appreciated.
> Still dumps core, haven't tried it single-threaded yet.
>
> I attach some gdb information I got when examining a core-dump.
>
> I'm no programmer and don't even understand half of it, but I'm
> troubled
> by ' ip = "129.13.185.81\000\000
> $a196a97a at akstcacsdatarecoverymnsdgs>,bayes=1.000000,autolearn=spam
> \00090$88ab8c55 at 082d6f5a>,bayes=1.000000,autolearn=spam \000n=spam
> \000id=<01c7f6df$57aeec90$6a49cd18 at 0aconingham>,bayes=1.000"'.
>
> It looks like several log-lines merged into one, which might hint at
> threading-issues.
>
>
>
> [root at mail11 ~]# cd /opt/rsyslog
> [root at mail11 rsyslog]# ls
> core.14004 lib sbin share
> [root at mail11 rsyslog]# gdb -c core.14004 /opt/rsyslog/sbin/rsyslogd
> GNU gdb Red Hat Linux (6.5-16.el5rh)
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/libthread_db.so.1".
>
> Reading symbols from /usr/lib64/libz.so.1...done.
> Loaded symbols for /usr/lib64/libz.so.1
> Reading symbols from /lib64/libpthread.so.0...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/librt.so.1...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from /lib64/libnss_dns.so.2...done.
> Loaded symbols for /lib64/libnss_dns.so.2
> Reading symbols from /lib64/libresolv.so.2...done.
> Loaded symbols for /lib64/libresolv.so.2
> Reading symbols from /lib64/libgcc_s.so.1...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> Core was generated by `/opt/rsyslog/sbin/rsyslogd -r514'.
> Program terminated with signal 6, Aborted.
> #0 0x0000003627630015 in raise () from /lib64/libc.so.6
> (gdb) backtrace
> #0 0x0000003627630015 in raise () from /lib64/libc.so.6
> #1 0x0000003627631980 in abort () from /lib64/libc.so.6
> #2 0x00000036276674db in __libc_message () from /lib64/libc.so.6
> #3 0x000000362766cb43 in malloc_consolidate () from /lib64/libc.so.6
> #4 0x000000362766eea2 in _int_malloc () from /lib64/libc.so.6
> #5 0x00000036276706dd in malloc () from /lib64/libc.so.6
> #6 0x000000362765eb4a in __fopen_internal () from /lib64/libc.so.6
> #7 0x00002aaaaaaf545a in internal_setent ()
> from /lib64/libnss_files.so.2
> #8 0x00002aaaaaaf5b47 in _nss_files_gethostbyaddr_r ()
> from /lib64/libnss_files.so.2
> #9 0x00000036276e2b42 in gethostbyaddr_r@@GLIBC_2.2.5 ()
> from /lib64/libc.so.6
> #10 0x00000036276eb07d in getnameinfo () from /lib64/libc.so.6
> #11 0x00000000004155ee in cvthname (f=0x7ffff040b0d0,
> pszHost=0x7ffff040acc0 "mailin3",
> pszHostFQDN=0x7ffff040a8b0 "mailin3.rz.uni-karlsruhe.de") at
> net.c:137
> #12 0x000000000040e49d in processSelectAfter (maxfds=5, nfds=1,
> pReadfds=0x7ffff040ba60, pWritefds=0x7ffff040b9c0) at
> syslogd.c:5619
> #13 0x000000000040ee02 in mainloop () at syslogd.c:5869
> #14 0x000000000040fc9c in main (argc=0, argv=0x7ffff040bd08) at
> syslogd.c:6315
> (gdb) where full
> #0 0x0000003627630015 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x0000003627631980 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00000036276674db in __libc_message () from /lib64/libc.so.6
> No symbol table info available.
> #3 0x000000362766cb43 in malloc_consolidate () from /lib64/libc.so.6
> No symbol table info available.
> #4 0x000000362766eea2 in _int_malloc () from /lib64/libc.so.6
> No symbol table info available.
> #5 0x00000036276706dd in malloc () from /lib64/libc.so.6
> No symbol table info available.
> #6 0x000000362765eb4a in __fopen_internal () from /lib64/libc.so.6
> No symbol table info available.
> #7 0x00002aaaaaaf545a in internal_setent ()
> from /lib64/libnss_files.so.2
> No symbol table info available.
> #8 0x00002aaaaaaf5b47 in _nss_files_gethostbyaddr_r ()
> from /lib64/libnss_files.so.2
> No symbol table info available.
> #9 0x00000036276e2b42 in gethostbyaddr_r@@GLIBC_2.2.5 ()
> from /lib64/libc.so.6
> No symbol table info available.
> #10 0x00000036276eb07d in getnameinfo () from /lib64/libc.so.6
> No symbol table info available.
> ---Type <return> to continue, or q <return> to quit---
> #11 0x00000000004155ee in cvthname (f=0x7ffff040b0d0,
> pszHost=0x7ffff040acc0 "mailin3",
> pszHostFQDN=0x7ffff040a8b0 "mailin3.rz.uni-karlsruhe.de") at
> net.c:137
> p = (uchar *) 0x7ffff040acc7 ""
> count = -264195888
> error = 0
> omask = {__val = {0, 4251742, 0, 140737224155248,
> 140737224155240,
> 14083552, 0, 18446735427676189008, 140737224155264, 4336648,
> 140737224155264, 232601472906, 0, 64, 140737224159584, 2047}}
> nmask = {__val = {1, 0 <repeats 15 times>}}
> ip = "129.13.185.81\000\000
> $a196a97a at akstcacsdatarecoverymnsdgs>,bayes=1.000000,autolearn=spam
> \00090$88ab8c55 at 082d6f5a>,bayes=1.000000,autolearn=spam \000n=spam
> \000id=<01c7f6df$57aeec90$6a49cd18 at 0aconingham>,bayes=1.000"...
> hints = {ai_flags = 4, ai_family = 0, ai_socktype = 2,
> ai_protocol = 0, ai_addrlen = 0, ai_addr = 0x0, ai_canonname = 0x0,
> ai_next = 0x0}
> res = (struct addrinfo *) 0x2e302e3732313d72
> __PRETTY_FUNCTION__ = "cvthname"
> #12 0x000000000040e49d in processSelectAfter (maxfds=5, nfds=1,
> pReadfds=0x7ffff040ba60, pWritefds=0x7ffff040b9c0) at
> syslogd.c:5619
> iRet = RS_RET_OK
> iRetLL = RS_RET_OK
> i = 1
> ---Type <return> to continue, or q <return> to quit---
> fd = 0
> line = "<22>exim[32680]: 2007-09-14 17:02:19 1IWCgS-0008Tb-B5
> Completed\n", '\0' <repeats 1984 times>
> writeFDSInfo = {pWritefds = 0x7ffff040b9c0, pMaxfds =
> 0x7ffff040a0a8}
> f = (selector_t *) 0x0
> frominet = {ss_family = 2, __ss_align = 0,
> __ss_padding = "@\023\000??*\000\000@\023\000??*\000\000@\023\000??*
> \000\000B\023\000??*\000\000O\023\000??*\000\000@\023\000??*\000\000O
> \023\000??*", '\0' <repeats 42 times>, "\017\201B\000\000\000\000\000?
> \230\aF\000\000\000"}
> socklen = 16
> fromHost = "mailin3\000rz.uni-karlsruhe.de", '\0' <repeats 125
> times>, "\210_ '6", '\0' <repeats 27 times>, "?? '6", '\0' <repeats 11
> times>, "??A'6\000\000\000o\b?'6\000\000\000\220{ '6\000\000\000\000
> \000`'6\000\000\000\000 at t'6\000\000\000\220:t'6\000\000\000\220:t'6",
> '\0' <repeats 11 times>, "\005\000\000\000\000\000\000\000\000@\224'6",
> '\0' <repeats 11 times>, "\001\000\000\000\000\000\000\000\230?\224'6
> \000\000\000\000@\024\000\000\000\000\000\003\000\000\000\000\000\000
> \000\000\000 -6\000\000\000\000p -6\000\000"...
> fromHostFQDN = "mailin3.rz.uni-karlsruhe.de", '\0' <repeats 245
> times>, "\001\000\000\000\000\000\000\000?@??\177\000\000?*q'6", '\0'
> <repeats 19 times>, "??e'6", '\0' <repeats 27 times>, "??@??\177\000
> \000/\201B\000\000\000\000\000/\201B", '\0' <repeats 13 times>,
> "?4d'6",
> '\0' <repeats 11 times>, "\200?@??\177\000\000\000\000\000\000\000\000
> \000\000??@??\177\000\0000?@??\177\000\000*\201B\000\000\000\000\000?@?
> ?
> \177", '\0' <repeats 34 times>, "????????\000\000\000---Type <return>
> to
> continue, or q <return> to quit---
> \000\002\000\000\000*"...
> iTCPSess = 0
> l = 64
> #13 0x000000000040ee02 in mainloop () at syslogd.c:5869
> readfds = {fds_bits = {32, 0 <repeats 15 times>}}
> i = 2
> maxfds = 5
> nfds = 1
> writeFDSInfo = {pWritefds = 0x7ffff040b9c0, pMaxfds =
> 0x7ffff040ba5c}
> writefds = {fds_bits = {0 <repeats 16 times>}}
> f = (selector_t *) 0x0
> iTCPSess = 14003
> #14 0x000000000040fc9c in main (argc=0, argv=0x7ffff040bd08) at
> syslogd.c:6315
> i = 1024
> p = 0x63053a ""
> num_fds = 1024
> iRet = RS_RET_OK
> ppid = 14003
> ch = -1
> hent = (struct hostent *) 0x362794bf80
> pTmp = (uchar *) 0x62fe95 ""
> sigAct = {__sigaction_handler = {sa_handler = 0x1,
> sa_sigaction = 0x1}, sa_mask = {__val = {0 <repeats 16 times>}},
> ---Type <return> to continue, or q <return> to quit---
> sa_flags = 0, sa_restorer = 0}
> (gdb)
>
>
> >
> > Thanks,
> > Rainer
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> --
> Patrick von der Hagen
> RZ Universit?t Karlsruhe (TH)
> Postmaster
rsyslog segfaults? [ In reply to ]
Excellent, thx

> -----Original Message-----
> From: Patrick von der Hagen [mailto:hagen at rz.uni-karlsruhe.de]
> Sent: Monday, September 17, 2007 6:27 PM
> To: Rainer Gerhards
> Cc: rsyslog-users
> Subject: RE: [rsyslog] rsyslog segfaults?
>
> On Mon, 2007-09-17 at 18:21 +0200, Rainer Gerhards wrote:
> > Patrick,
> >
> > without answering the question directly: could you please try out
> >
> > http://download.rsyslog.com/rsyslog/rsyslog-1.19.7.tar.gz
> >
> > [**still not "real" 1.19.7 **]
> >
> > I've possibly seen one thing that could be the problem cause. Not
> sure, though. I'd deeply appreciate feedback if this version (uploaded
> right NOW) causes the problem to disappear. I am not sure if I have
> found it, but I'd like to conduct a quick test before going any
> further. Thus I've no detailed analysis.
> I'm currently giving it a try, but don't expect any results today.
> --
> Patrick von der Hagen
> RZ Universit?t Karlsruhe (TH)
> Postmaster
rsyslog segfaults? [ In reply to ]
On Mon, Sep 17, 2007 at 02:51:10PM +0200, Rainer Gerhards wrote:
> Hi all,
>
> may I ask if those of you that experienced the segfault problem could
> re-produce it with 1.19.6? Any feedback would be deeply appreciated.

The box is in production, so I can't do too much debugging. If it
may help, I can recompile rsyslogd with debug flags set and run
gdb against it.

root at apollo# rsyslogd -v
/usr/home/mob/www
rsyslogd 1.19.6, compiled with:
FEATURE_PTHREADS (dual-threading): Yes
FEATURE_REGEXP: Yes
FEATURE_DB (MySQL): Yes
FEATURE_LARGEFILE: Yes
FEATURE_NETZIP (message compression): Yes
SYSLOG_INET (Internet/remote support): Yes

sat at apollo% gdb =rsyslogd rsyslogd.core
<...>
#0 0x000000080078189c in pthread_testcancel () from
/lib/libpthread.so.2
[New Thread 0x537000 (LWP 100117)]
[New Thread 0x52e400 (LWP 100112)]
[New Thread 0x52e000 (runnable)]
(gdb) bt
#0 0x000000080078189c in pthread_testcancel () from
/lib/libpthread.so.2
#1 0x000000080076f5c3 in sigaction () from /lib/libpthread.so.2
#2 0x00000008007710e2 in sigaction () from /lib/libpthread.so.2
#3 0x000000080076adb6 in pthread_kill () from
/lib/libpthread.so.2
#4 0x000000080076a633 in raise () from /lib/libpthread.so.2
#5 0x000000080095b16d in abort () from /lib/libc.so.6
#6 0x00000008008f4b45 in _UTF8_init () from /lib/libc.so.6
#7 0x00000008008f4b7c in _UTF8_init () from /lib/libc.so.6
#8 0x00000008008f5b1d in _UTF8_init () from /lib/libc.so.6
#9 0x000000000040fa1d in MsgDestruct ()
#10 0x00000000004066bd in dbgprintf ()
#11 0x00000000004176b1 in llExecFunc ()
#12 0x0000000000406f41 in shouldProcessThisMessage ()
#13 0x0000000000409a9e in logerrorSz ()
#14 0x0000000800772a99 in pthread_create () from
/lib/libpthread.so.2
#15 0x00000008008cfcd4 in makecontext () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()
#17 0x0000000000537000 in ?? ()
#18 0x00000000004099c0 in logerrorSz ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
Cannot access memory at address 0x7fffff5fc000
rsyslog segfaults? [ In reply to ]
Andrew,

this is quite helpful. While it does not pinpoint the actual trouble
source, it points to a different abort location. All previous segfaults
pointed to a different location (Which I have now checked multiple times
and not found any notable problems, just cosmetic things). This
strengthens the point it may be related to threading - of course,
problematic to troubleshoot.

I'll see how I proceed from here. Again, if some of you that experience
the problem could run rsyslogd compiled for single threading, that would
be most helpful.

Rainer

> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of Andrew Pantyukhin
> Sent: Wednesday, September 19, 2007 10:55 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog segfaults?
>
> On Mon, Sep 17, 2007 at 02:51:10PM +0200, Rainer Gerhards wrote:
> > Hi all,
> >
> > may I ask if those of you that experienced the segfault problem
could
> > re-produce it with 1.19.6? Any feedback would be deeply appreciated.
>
> The box is in production, so I can't do too much debugging. If it
> may help, I can recompile rsyslogd with debug flags set and run
> gdb against it.
>
> root at apollo# rsyslogd -v
> /usr/home/mob/www
> rsyslogd 1.19.6, compiled with:
> FEATURE_PTHREADS (dual-threading): Yes
> FEATURE_REGEXP: Yes
> FEATURE_DB (MySQL): Yes
> FEATURE_LARGEFILE: Yes
> FEATURE_NETZIP (message compression): Yes
> SYSLOG_INET (Internet/remote support): Yes
>
> sat at apollo% gdb =rsyslogd rsyslogd.core
> <...>
> #0 0x000000080078189c in pthread_testcancel () from
> /lib/libpthread.so.2
> [New Thread 0x537000 (LWP 100117)]
> [New Thread 0x52e400 (LWP 100112)]
> [New Thread 0x52e000 (runnable)]
> (gdb) bt
> #0 0x000000080078189c in pthread_testcancel () from
> /lib/libpthread.so.2
> #1 0x000000080076f5c3 in sigaction () from /lib/libpthread.so.2
> #2 0x00000008007710e2 in sigaction () from /lib/libpthread.so.2
> #3 0x000000080076adb6 in pthread_kill () from
> /lib/libpthread.so.2
> #4 0x000000080076a633 in raise () from /lib/libpthread.so.2
> #5 0x000000080095b16d in abort () from /lib/libc.so.6
> #6 0x00000008008f4b45 in _UTF8_init () from /lib/libc.so.6
> #7 0x00000008008f4b7c in _UTF8_init () from /lib/libc.so.6
> #8 0x00000008008f5b1d in _UTF8_init () from /lib/libc.so.6
> #9 0x000000000040fa1d in MsgDestruct ()
> #10 0x00000000004066bd in dbgprintf ()
> #11 0x00000000004176b1 in llExecFunc ()
> #12 0x0000000000406f41 in shouldProcessThisMessage ()
> #13 0x0000000000409a9e in logerrorSz ()
> #14 0x0000000800772a99 in pthread_create () from
> /lib/libpthread.so.2
> #15 0x00000008008cfcd4 in makecontext () from /lib/libc.so.6
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000537000 in ?? ()
> #18 0x00000000004099c0 in logerrorSz ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> Cannot access memory at address 0x7fffff5fc000
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog