Mailing List Archive

rsyslog still crashes
I've just tried again rsyslog on my 8 core mail server, and got the very
same crash from september/october. I've restarted the server under
valgrind control, and all seems to be running well...

A good 2009 to all!

Yours,

lorenzo



+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
On Thu, Jan 15, 2009 at 12:58 PM, Lorenzo M. Catucci
<lorenzo@sancho.ccd.uniroma2.it> wrote:
> I've just tried again rsyslog on my 8 core mail server, and got the very
> same crash from september/october. I've restarted the server under valgrind
> control, and all seems to be running well...
>
> A good 2009 to all!
>
> Yours,
>
> lorenzo


Version you're using?

-HKS
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 2009-01-16 at 01:20 +0100, Michael Biebl wrote:
> Given the -c4 command line argument, I'd expect it to be 4.1.3.
>
> Sounds familiar to
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=509292 (which is
> 3.18.6).
>
> It seems to be a more general problem with multi core (= very fast??) systems.

Yes, that is what my analysis so far points to. It's also part of the
problem, because I do not have very fast hardware to reproduce the issue
(and it is also not easy to reliably reproduce if you have...).

I've gotten a couple of reports (I think most on the mailing list) on
such problems and all they have in common is 4+ core machines.

I'll try to get hold based on what Lorenzo submits. In his environment,
the problem seems to occur most reliably (he probably has the fastest
machine...).

Lorenzo: details follow soon.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Thu, 2009-01-15 at 18:58 +0100, Lorenzo M. Catucci wrote:
> I've just tried again rsyslog on my 8 core mail server, and got the very
> same crash from september/october.

So, without valgrind, can you reproduce the issue each time you start
it? That would be very useful.

> I've restarted the server under
> valgrind control, and all seems to be running well...

I guess the issue here is that valgrind slows down things and also
simulates (I think) 2 CPUs only.

> A good 2009 to all!
same to you! Thanks for being persistent with this issue (it begins to
drive me crazy).

>From what I have learned so far we seem to have a race condition that
causes memory corrupt. The backtrace you include also points into that
direction. Those few cases where I got a usable backtrace all point to
the very same location. However, that does not mean this location has
the bug. It seems to occur some time earlier, and manifests when the
message is destructed. It could be a double-free or even some wild
memory access that accidently overwrites some structures.

If we are able to get a stable repro, and we are able to run with at
least some minimal diagnostics, we may be much better of tackeling that
beast.

First step is to see that we get a stable repro. If we do, I need to
think about minimal debug. The full debugging system makes the bug
disappear, I think because it changes the timing.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
2009/1/15 (private) HKS <hks.private@gmail.com>:
> On Thu, Jan 15, 2009 at 12:58 PM, Lorenzo M. Catucci
> <lorenzo@sancho.ccd.uniroma2.it> wrote:
>> I've just tried again rsyslog on my 8 core mail server, and got the very
>> same crash from september/october. I've restarted the server under valgrind
>> control, and all seems to be running well...
>>
>> A good 2009 to all!
>>
>> Yours,
>>
>> lorenzo
>
>
> Version you're using?

Given the -c4 command line argument, I'd expect it to be 4.1.3.

Sounds familiar to
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=509292 (which is
3.18.6).

It seems to be a more general problem with multi core (= very fast??) systems.


Cheers,
Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Thu, 15 Jan 2009, (private) HKS wrote:

pH>
pH> Version you're using?
pH>

git origin/master branch as of today. Sorry for forgetting to mention!


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
On Thu, 15 Jan 2009, Rainer Gerhards wrote:

RG> On Thu, 2009-01-15 at 18:58 +0100, Lorenzo M. Catucci wrote:
RG> > I've just tried again rsyslog on my 8 core mail server, and got the very
RG> > same crash from september/october.
RG>
RG> So, without valgrind, can you reproduce the issue each time you start
RG> it? That would be very useful.
RG>

Yes: any time I start a free-running instance, I get the very same
segmentation fault and core-file to backtrace.

RG>
RG> > I've restarted the server under
RG> > valgrind control, and all seems to be running well...
RG>
RG> I guess the issue here is that valgrind slows down things and also
RG> simulates (I think) 2 CPUs only.
RG>

Right, I didn't know valgrind both limited the CPU bandwidth and the
(v)CPU number, but any of them would hide the existing race condition

RG>
RG> From what I have learned so far we seem to have a race condition that
RG> causes memory corrupt. The backtrace you include also points into that
RG> direction. Those few cases where I got a usable backtrace all point to
RG> the very same location. However, that does not mean this location has
RG> the bug. It seems to occur some time earlier, and manifests when the
RG> message is destructed. It could be a double-free or even some wild
RG> memory access that accidently overwrites some structures.
RG>
RG> If we are able to get a stable repro, and we are able to run with at
RG> least some minimal diagnostics, we may be much better of tackeling that
RG> beast.
RG>
RG> First step is to see that we get a stable repro. If we do, I need to
RG> think about minimal debug. The full debugging system makes the bug
RG> disappear, I think because it changes the timing.
RG>

I don't think we could hope for a stable reproducer for an heisen-bug...
all I can provide is a very high throughput system generating a very high
local message rate. As a matter of facts, this rsyslog instance is
acting as a forwader to a remote instance that didn't suffer any crash.

The only differences between the engines' configurations are:
1. the remote logs to a postgres instance instead of spool files,
2. the remote does just run the postgresql instance and the logger

My gut feeling is that the different behaviour doesn't come from any of
these differences, but from the different memory-path taken from the
messages, which in the remote case are serialised from the underlying
network transport.

We'll see! Yours,

lorenzo



+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 12:29 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Thu, 15 Jan 2009, Rainer Gerhards wrote:
>
> RG> On Thu, 2009-01-15 at 18:58 +0100, Lorenzo M. Catucci wrote:
> RG> > I've just tried again rsyslog on my 8 core mail server, and got
> the very
> RG> > same crash from september/october.
> RG>
> RG> So, without valgrind, can you reproduce the issue each time you
> start
> RG> it? That would be very useful.
> RG>
>
> Yes: any time I start a free-running instance, I get the very same
> segmentation fault and core-file to backtrace.
>
> RG>
> RG> > I've restarted the server under
> RG> > valgrind control, and all seems to be running well...
> RG>
> RG> I guess the issue here is that valgrind slows down things and also
> RG> simulates (I think) 2 CPUs only.
> RG>
>
> Right, I didn't know valgrind both limited the CPU bandwidth and the
> (v)CPU number, but any of them would hide the existing race condition

Actually, valgrind executes the app in a virtual CPU/Memory environment.
So this is *quite different* from the real machine, but nevertheless
extremely useful in most cases. While in theory so the actual hardware
should not affect the valgrind outcome, my former debugging has shown it
does. Thus my first try is always valgrind. But it seems not to help
here as we have seen...

> RG>
> RG> From what I have learned so far we seem to have a race condition
> that
> RG> causes memory corrupt. The backtrace you include also points into
> that
> RG> direction. Those few cases where I got a usable backtrace all
point
> to
> RG> the very same location. However, that does not mean this location
> has
> RG> the bug. It seems to occur some time earlier, and manifests when
> the
> RG> message is destructed. It could be a double-free or even some wild
> RG> memory access that accidently overwrites some structures.
> RG>
> RG> If we are able to get a stable repro, and we are able to run with
> at
> RG> least some minimal diagnostics, we may be much better of tackeling
> that
> RG> beast.
> RG>
> RG> First step is to see that we get a stable repro. If we do, I need
> to
> RG> think about minimal debug. The full debugging system makes the bug
> RG> disappear, I think because it changes the timing.
> RG>
>
> I don't think we could hope for a stable reproducer for an heisen-
> bug...

Of course not 100%. But what you have sounds good enough. I must now see
that/how I can change the system so that we have some additional
instrumentation while the bug is still there. I'll first look at some
compile options. Is it OK for you if I just send some messages to
stdout?

> all I can provide is a very high throughput system generating a very
> high
> local message rate. As a matter of facts, this rsyslog instance is
> acting as a forwader to a remote instance that didn't suffer any
crash.
>
> The only differences between the engines' configurations are:
> 1. the remote logs to a postgres instance instead of spool files,
> 2. the remote does just run the postgresql instance and the logger
>
> My gut feeling is that the different behaviour doesn't come from any
of
> these differences, but from the different memory-path taken from the
> messages, which in the remote case are serialised from the underlying
> network transport.

This may be...

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:
RG>
RG> Of course not 100%. But what you have sounds good enough. I must now see
RG> that/how I can change the system so that we have some additional
RG> instrumentation while the bug is still there. I'll first look at some
RG> compile options. Is it OK for you if I just send some messages to
RG> stdout?
RG>

Yes, be it stdout... I'm eager to have an rsyslog instance running well,
since I've really liked what I've seen (with the small exception of the
crashes!)

See you soon,

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
Lorenzo,

I have created a new branch "raceDebug" and done a first commit to it. The change is very lightweight. Please pull, compile as usual and give it a try. It spits out some info to stdout from time to time (hopefully). I am not sure if it aborts, depending on the output it may or may not. Even if we get messages, they are probably not enough to pinpoint the bug, but I wanted to do something very light to see if the bug stays.

Feedback appreciated.

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 1:02 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
> RG>
> RG> Of course not 100%. But what you have sounds good enough. I must
> now see
> RG> that/how I can change the system so that we have some additional
> RG> instrumentation while the bug is still there. I'll first look at
> some
> RG> compile options. Is it OK for you if I just send some messages to
> RG> stdout?
> RG>
>
> Yes, be it stdout... I'm eager to have an rsyslog instance running
> well,
> since I've really liked what I've seen (with the small exception of the
> crashes!)
>
> See you soon,
>
> lorenzo
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG> Lorenzo,
RG>
RG> I have created a new branch "raceDebug" and done a first commit to it.
RG> The change is very lightweight. Please pull, compile as usual and give
RG> it a try. It spits out some info to stdout from time to time
RG> (hopefully). I am not sure if it aborts, depending on the output it
RG> may or may not. Even if we get messages, they are probably not enough
RG> to pinpoint the bug, but I wanted to do something very light to see if
RG> the bug stays.
RG>
RG> Feedback appreciated.
RG>

Rainer, I've just checked-out the branch; I've run configure with the
following command line:

./configure --prefix=/usr --enable-mysql --enable-pgsql --enable-mail
--enable-imfile --enable-debug --enable-rtinst --enable-valgrind
--no-create --no-recursion

From "git diff -r HEAD^ HEAD" I've seen an #if 0 section in the commit.
Let me know if you'd prefer if I change it to #if 1.

I've just started rsyslogd with rsyslogd -c4 -n on a screen session, with
the same configuration files I'm using since september.

Since both the "rsyslogd -c4 -n" and the later "rsyslogd -c4 -d"
invocation crashed very quickly, I've restarted it once more with stdout
redirected to a a logfile, and now it's running. Will let you know if it
crashes once more.

Yours,

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 4:23 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> Lorenzo,
> RG>
> RG> I have created a new branch "raceDebug" and done a first commit to
> it.
> RG> The change is very lightweight. Please pull, compile as usual and
> give
> RG> it a try. It spits out some info to stdout from time to time
> RG> (hopefully). I am not sure if it aborts, depending on the output it
> RG> may or may not. Even if we get messages, they are probably not
> enough
> RG> to pinpoint the bug, but I wanted to do something very light to see
> if
> RG> the bug stays.
> RG>
> RG> Feedback appreciated.
> RG>
>
> Rainer, I've just checked-out the branch; I've run configure with the
> following command line:
>
> ./configure --prefix=/usr --enable-mysql --enable-pgsql --enable-mail
> --enable-imfile --enable-debug --enable-rtinst --enable-valgrind
> --no-create --no-recursion
>
> From "git diff -r HEAD^ HEAD" I've seen an #if 0 section in the
> commit.
> Let me know if you'd prefer if I change it to #if 1.

Mmmhh... you can use debug. Yes, please then change it to 1.
>
> I've just started rsyslogd with rsyslogd -c4 -n on a screen session,
> with
> the same configuration files I'm using since september.
>
> Since both the "rsyslogd -c4 -n" and the later "rsyslogd -c4 -d"
> invocation crashed very quickly, I've restarted it once more with
> stdout
> redirected to a a logfile, and now it's running. Will let you know if
> it
> crashes once more.

That sounds good. Do you happen to have the output from those crashes? Anyway, I will be interested in what it now comes up with. As a side-note, I have introduced another race by calling the library functions. There is always some good and bad. The regular debugging system prevents this problem by protecting the writes with mutexes. That, however, affects the timing and thus we do not see the real issue. So what I have done is bad, but may be useful. I forgot to mention that with my last post...

Rainer
>
> Yours,
>
> lorenzo
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG>
RG> That sounds good. Do you happen to have the output from those crashes?
RG>

The -n crash was completely silent; the -d run was chatty (as expected);
with stdout redirected, it took a lot more time to crash, but here are
both the logfile and the gdb backtrace.

Yours,

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Lorenzo M. Catucci wrote:

LMC>
LMC> The -n crash was completely silent; the -d run was chatty (as expected);
LMC> with stdout redirected, it took a lot more time to crash, but here are
LMC> both the logfile and the gdb backtrace.
LMC>

As for the last crash, I found on the screen session the line:

rsyslogd: queue.c:1393: queueChkDiscardMsg: Assertion `(unsigned)
((obj_t*)(pUsr))->iObjCooCKiE == (unsigned) 0xBADEFEE' failed.

since I forgot redirecting stderr too.

Yours,

lorenzo

+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
Ok, this together with the others is evidence that something runs really wild and overwrites memory blocks. The reason this message did not appear earlier is that I disable the check in DestroyMsg() and permit it to return even though I then know memory is corrupted. So what you see here is a follow-up error.

The good news, I think, is that it looks (but may fool me) like the issue seems to be in temporal proximity of the abort. That would be really good news. Let me think a bit about the situation, I'll probably come up with another instrumentation. The issue is that I'd potentially need to output one or even two log lines per message, and that creates other sync issues. Plus, I don't know if I overrun your disk with that (depending on workload, which seems to be quite high).

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 5:10 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Lorenzo M. Catucci wrote:
>
> LMC>
> LMC> The -n crash was completely silent; the -d run was chatty (as
> expected);
> LMC> with stdout redirected, it took a lot more time to crash, but here
> are
> LMC> both the logfile and the gdb backtrace.
> LMC>
>
> As for the last crash, I found on the screen session the line:
>
> rsyslogd: queue.c:1393: queueChkDiscardMsg: Assertion `(unsigned)
> ((obj_t*)(pUsr))->iObjCooCKiE == (unsigned) 0xBADEFEE' failed.
>
> since I forgot redirecting stderr too.
>
> Yours,
>
> lorenzo
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
Lorenzo,

one thing: can you change the actionqueuemode to "direct" just for a short period. I would be very interested to see what happens.

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 5:10 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Lorenzo M. Catucci wrote:
>
> LMC>
> LMC> The -n crash was completely silent; the -d run was chatty (as
> expected);
> LMC> with stdout redirected, it took a lot more time to crash, but here
> are
> LMC> both the logfile and the gdb backtrace.
> LMC>
>
> As for the last crash, I found on the screen session the line:
>
> rsyslogd: queue.c:1393: queueChkDiscardMsg: Assertion `(unsigned)
> ((obj_t*)(pUsr))->iObjCooCKiE == (unsigned) 0xBADEFEE' failed.
>
> since I forgot redirecting stderr too.
>
> Yours,
>
> lorenzo
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
Lorenzo and others:

I hopefully got a system today where I can reproduce. I am setting it up right now. I also have written a stub wiki page with information useful to hunt this bug:

http://wiki.rsyslog.com/index.php/V3_Race_Condition_Hunt_Page

Lorenzo, can you please double-check I have used the right config indeed.

All others: if you can add scenarios/information, please do. I'll try to repro the problem as soon as the system is ready. Hope it will work...

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Friday, January 16, 2009 5:20 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> Lorenzo,
>
> one thing: can you change the actionqueuemode to "direct" just for a
> short period. I would be very interested to see what happens.
>
> Rainer
>
> > -----Original Message-----
> > From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> > bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> > Sent: Friday, January 16, 2009 5:10 PM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] rsyslog still crashes
> >
> > On Fri, 16 Jan 2009, Lorenzo M. Catucci wrote:
> >
> > LMC>
> > LMC> The -n crash was completely silent; the -d run was chatty (as
> > expected);
> > LMC> with stdout redirected, it took a lot more time to crash, but
> here
> > are
> > LMC> both the logfile and the gdb backtrace.
> > LMC>
> >
> > As for the last crash, I found on the screen session the line:
> >
> > rsyslogd: queue.c:1393: queueChkDiscardMsg: Assertion `(unsigned)
> > ((obj_t*)(pUsr))->iObjCooCKiE == (unsigned) 0xBADEFEE' failed.
> >
> > since I forgot redirecting stderr too.
> >
> > Yours,
> >
> > lorenzo
> >
> > +-------------------------+------------------------------------------
> --
> > --+
> > | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> > |
> > | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> > Vergata" |
> > | | Via O. Raimondo 18 ** I-00173 ROMA **
> > ITALY |
> > | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> > |
> > +-------------------------+------------------------------------------
> --
> > --+
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG> Lorenzo,
RG>
RG> one thing: can you change the actionqueuemode to "direct" just for a
RG> short period. I would be very interested to see what happens.
RG>

Very short period... it crashed about as soon as started... I'm enclosing
both the log and the backtrace.

See you soon,

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
OK, maybe we can simplify the config, that would remove code pathes from the potential bug candidate list. Could you comment out all the $ActionQueue* settings?

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 5:52 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> Lorenzo,
> RG>
> RG> one thing: can you change the actionqueuemode to "direct" just for
> a
> RG> short period. I would be very interested to see what happens.
> RG>
>
> Very short period... it crashed about as soon as started... I'm
> enclosing
> both the log and the backtrace.
>
> See you soon,
>
> lorenzo
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG> OK, maybe we can simplify the config, that would remove code pathes
RG> from the potential bug candidate list. Could you comment out all the
RG> $ActionQueue* settings?
RG>

Done, it's still crashing immediately! Here are the logs.

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG> OK, maybe we can simplify the config, that would remove code pathes
RG> from the potential bug candidate list. Could you comment out all the
RG> $ActionQueue* settings?
RG>

I've just restored the #if 0 in runtime/msg.c; it seems the immediate
crashes came from those two lines. Now logging.

Servus,

lorenzo


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
Ah, ok. Side-note: I got my machine up and it is running some test. Unfortunately no aborts so far, but is has only 4 cores... I hope something turns out...

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 6:18 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> OK, maybe we can simplify the config, that would remove code pathes
> RG> from the potential bug candidate list. Could you comment out all
> the
> RG> $ActionQueue* settings?
> RG>
>
> I've just restored the #if 0 in runtime/msg.c; it seems the immediate
> crashes came from those two lines. Now logging.
>
> Servus,
>
> lorenzo
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

RG> Ah, ok. Side-note: I got my machine up and it is running some test.
RG> Unfortunately no aborts so far, but is has only 4 cores... I hope
RG> something turns out...
RG>

I think the real problem is in keeping those cores very busy... I'd try to
spawn something like 20 loggers each spawning a couple "workers" per
second and logging startup/shutdown of any child. Maybe make each worker
sleep for a random time before exiting.

I don't have any Fedora/RedHat system; if nothing else, I'd suggest doing
your tests on a debian/testing system too.

Yours,

lorenzo

PS still running...


+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | Università degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+
Re: rsyslog still crashes [ In reply to ]
> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 6:29 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> Ah, ok. Side-note: I got my machine up and it is running some test.
> RG> Unfortunately no aborts so far, but is has only 4 cores... I hope
> RG> something turns out...
> RG>
>
> I think the real problem is in keeping those cores very busy... I'd try
> to
> spawn something like 20 loggers each spawning a couple "workers" per
> second and logging startup/shutdown of any child. Maybe make each
> worker
> sleep for a random time before exiting.

Good suggestion, thanks.

>
> I don't have any Fedora/RedHat system; if nothing else, I'd suggest
> doing
> your tests on a debian/testing system too.

That's what I am running on that machine - with components downloaded today.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: rsyslog still crashes [ In reply to ]
On Fri, 16 Jan 2009, Rainer Gerhards wrote:

> Lorenzo and others:
>
> I hopefully got a system today where I can reproduce. I am setting it up right now. I also have written a stub wiki page with information useful to hunt this bug:

one other thing that you can do for this sort of thing is to use the
amazon cloud.

to quote a message from Rob Landley to the linux-kernel mailing list

> My friend Mark's been experimenting with the amazon "cloud" thing,
> feeding in an image with a qemu instance and distcc+cross-compiler, and
> running builds under that. Renting an 8-way ~2.5 ghz server with 7
> gigabytes of ram and 1.6 terabytes of disk is 80 cents/hour through them
> plus another few cents/day for bandwidth and persistent storage and
> such. That's likely to get cheaper as time goes on.
>
> We're still planning to buy a build server of our own to have something
> in- house, but for running nightly builds it's almost to the point where
> depreciation on the hardware is more than buying time from a server
> farm. Just _one_ of those 8-way servers is enough hardware to build an
> entire distro in an hour or so.
>
> What this really allows us to do is experiment with "how parallel can we
> get our build"? Because renting ten 8-way servers in a cluster is
> $8/hour, and distcc already scales trivially over that. Down the road
> what Firmware Linux is working towards is multiple qemu instances
> running in parallel with a central instance distributing builds to each
> one, so each can do its own ./configure in parallel, distribute
> compilation to the distccd instances as it has stuff to compile, and
> then package up the resulting binary into one of those portage tarballs
> and send it back to the central node to install on a network mount that
> the lot of 'em can mount as build context, so the packages can get their
> dependencies right. (You don't want your build taking place in a
> network mount, but your OS being on one you never write to isn't so bad
> as long as you have local storage to build in.)
>
> We'll probably leverage the heck out of Portage for this, and might wind
> up modifying it heavily. Dunno yet. (We can even force dependencies on
> portage so it doesn't need to calculate 'em, the central node can do
> that and then say "you have these packages, _build_"...)
>
> But yeah, hobbyists with a laptop, network access, and a monthly budget
> of $20 can do cluster builds these days.

would it make sense to start a fund to pay for some time for you to use
like this?

David Lang




> http://wiki.rsyslog.com/index.php/V3_Race_Condition_Hunt_Page
>
> Lorenzo, can you please double-check I have used the right config indeed.
>
> All others: if you can add scenarios/information, please do. I'll try to repro the problem as soon as the system is ready. Hope it will work...
>
> Rainer
>
>> -----Original Message-----
>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
>> Sent: Friday, January 16, 2009 5:20 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] rsyslog still crashes
>>
>> Lorenzo,
>>
>> one thing: can you change the actionqueuemode to "direct" just for a
>> short period. I would be very interested to see what happens.
>>
>> Rainer
>>
>>> -----Original Message-----
>>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>>> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
>>> Sent: Friday, January 16, 2009 5:10 PM
>>> To: rsyslog-users
>>> Subject: Re: [rsyslog] rsyslog still crashes
>>>
>>> On Fri, 16 Jan 2009, Lorenzo M. Catucci wrote:
>>>
>>> LMC>
>>> LMC> The -n crash was completely silent; the -d run was chatty (as
>>> expected);
>>> LMC> with stdout redirected, it took a lot more time to crash, but
>> here
>>> are
>>> LMC> both the logfile and the gdb backtrace.
>>> LMC>
>>>
>>> As for the last crash, I found on the screen session the line:
>>>
>>> rsyslogd: queue.c:1393: queueChkDiscardMsg: Assertion `(unsigned)
>>> ((obj_t*)(pUsr))->iObjCooCKiE == (unsigned) 0xBADEFEE' failed.
>>>
>>> since I forgot redirecting stderr too.
>>>
>>> Yours,
>>>
>>> lorenzo
>>>
>>> +-------------------------+------------------------------------------
>> --
>>> --+
>>> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
>>> |
>>> | catucci@ccd.uniroma2.it | Universit? degli Studi di Roma "Tor
>>> Vergata" |
>>> | | Via O. Raimondo 18 ** I-00173 ROMA **
>>> ITALY |
>>> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
>>> |
>>> +-------------------------+------------------------------------------
>> --
>>> --+
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

1 2  View All