Mailing List Archive: rsyslog still crashes

Re: rsyslog still crashes [ In reply to ]

Jan 16, 2009, 6:20 PM

Post #26 of 48 (4793 views)

On Thu, 15 Jan 2009, Rainer Gerhards wrote:

> On Fri, 2009-01-16 at 01:20 +0100, Michael Biebl wrote:
>> Given the -c4 command line argument, I'd expect it to be 4.1.3.
>>
>> Sounds familiar to
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=509292 (which is
>> 3.18.6).
>>
>> It seems to be a more general problem with multi core (= very fast??) systems.
>
> Yes, that is what my analysis so far points to. It's also part of the
> problem, because I do not have very fast hardware to reproduce the issue
> (and it is also not easy to reliably reproduce if you have...).
>
> I've gotten a couple of reports (I think most on the mailing list) on
> such problems and all they have in common is 4+ core machines.
>
> I'll try to get hold based on what Lorenzo submits. In his environment,
> the problem seems to occur most reliably (he probably has the fastest
> machine...).
>
> Lorenzo: details follow soon.

I just got some time to work on this sort of thing again. my test system
is a 4-socket (dual core) opteron system with 16g of ram

I've done a fair amount of stress testing of the system without lockups
(around the time the 4.1 branch started) if you can describe a test setup
I can see about reproducing it.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 18, 2009, 7:45 AM

Post #27 of 48 (4752 views)

Permalink

Hi Lorenzo,

I've gone through the material once more. Indeed, it looks like the
previous tests (with the #if 1) were not really useful. Sorry for that.
Please let me know the outcome of this run here.

Also, I thought about one shot we may give it at reducing complexity. I
am not sure if it works out, but if it does, that would be a big
benefit. Could you please try the following:

Use the master branch (the one you previously used). Reduce rsyslog.conf
to just the necessary inputs (ideally only imuxsock) and a SINGLE file
writer, no further actions. Let that run and tell us if it aborts, too.
If it does, we have outruled a lot of code and we can focus much better
in our troubleshooting.

On my box, I unfortunately had no success yet in reproducing the issue -
even though I put a lot of stress on the machine. Will be trying more
today, hopefully that brings up some results...

Rainer

On Fri, 2009-01-16 at 18:17 +0100, Lorenzo M. Catucci wrote:
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> OK, maybe we can simplify the config, that would remove code pathes
> RG> from the potential bug candidate list. Could you comment out all the
> RG> $ActionQueue* settings?
> RG>
>
> I've just restored the #if 0 in runtime/msg.c; it seems the immediate
> crashes came from those two lines. Now logging.
>
> Servus,
>
> lorenzo
>
>
> +-------------------------+----------------------------------------------+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
> | catucci@ccd.uniroma2.it | UniversitÃ degli Studi di Roma "Tor Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
> +-------------------------+----------------------------------------------+
> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 19, 2009, 1:17 AM

Post #28 of 48 (4751 views)

Permalink

Hi David,

On Fri, 2009-01-16 at 18:40 -0800, david@lang.hm wrote:
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> > Lorenzo and others:
> >
> > I hopefully got a system today where I can reproduce. I am setting it up right now. I also have written a stub wiki page with information useful to hunt this bug:
>
> one other thing that you can do for this sort of thing is to use the
> amazon cloud.
>
> to quote a message from Rob Landley to the linux-kernel mailing list
>
> > My friend Mark's been experimenting with the amazon "cloud" thing,
> > feeding in an image with a qemu instance and distcc+cross-compiler, and
> > running builds under that. Renting an 8-way ~2.5 ghz server with 7
> > gigabytes of ram and 1.6 terabytes of disk is 80 cents/hour through them
> > plus another few cents/day for bandwidth and persistent storage and
> > such. That's likely to get cheaper as time goes on.
> >
> > We're still planning to buy a build server of our own to have something
> > in- house, but for running nightly builds it's almost to the point where
> > depreciation on the hardware is more than buying time from a server
> > farm. Just _one_ of those 8-way servers is enough hardware to build an
> > entire distro in an hour or so.
> >
> > What this really allows us to do is experiment with "how parallel can we
> > get our build"? Because renting ten 8-way servers in a cluster is
> > $8/hour, and distcc already scales trivially over that. Down the road
> > what Firmware Linux is working towards is multiple qemu instances
> > running in parallel with a central instance distributing builds to each
> > one, so each can do its own ./configure in parallel, distribute
> > compilation to the distccd instances as it has stuff to compile, and
> > then package up the resulting binary into one of those portage tarballs
> > and send it back to the central node to install on a network mount that
> > the lot of 'em can mount as build context, so the packages can get their
> > dependencies right. (You don't want your build taking place in a
> > network mount, but your OS being on one you never write to isn't so bad
> > as long as you have local storage to build in.)
> >
> > We'll probably leverage the heck out of Portage for this, and might wind
> > up modifying it heavily. Dunno yet. (We can even force dependencies on
> > portage so it doesn't need to calculate 'em, the central node can do
> > that and then say "you have these packages, _build_"...)
> >
> > But yeah, hobbyists with a laptop, network access, and a monthly budget
> > of $20 can do cluster builds these days.
>
> would it make sense to start a fund to pay for some time for you to use
> like this?

That's a very interesting idea, thanks for sharing. At present, however,
I think I'll try to stick with Lorenzo's system, because it seems to be
able to somewhat reliable reproduce the issue. My 4 core machine
unfortunately runs flawlessly, so I suspect that it really depends on
the mix of components, where a fast machine is a necessary perquisite,
but not a sufficient one. Some other things seem need to go into the mix
and I've unfortunately not yet identified them...

But the could sounds like an interesting long-term idea, it would
definitely be useful to be able to conduct some testing on high-end
machines.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 22, 2009, 7:58 AM

Post #29 of 48 (4741 views)

Permalink

Hi folks,

just an update on this matter. Lorenzo needed to change his system setup after some problems. We are in contact and expect to conduct further testing soon (hopefully the bug will reappear).

Even better news is that I have been able to reproduce the bug 4 times in my lab today. It's not as easy as I would hope, but at least I can get results with some patience. I am also experimenting a bit with Twitter and actually found it useful to keep track of the troubleshooting process. Those of your interested can follow it at

http://twitter.com/rgerhards

I don't promise (yet) to keep it current at all times, but I will use it during the troubleshooting effort.

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Friday, January 16, 2009 6:29 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Fri, 16 Jan 2009, Rainer Gerhards wrote:
>
> RG> Ah, ok. Side-note: I got my machine up and it is running some test.
> RG> Unfortunately no aborts so far, but is has only 4 cores... I hope
> RG> something turns out...
> RG>
>
> I think the real problem is in keeping those cores very busy... I'd try
> to
> spawn something like 20 loggers each spawning a couple "workers" per
> second and logging startup/shutdown of any child. Maybe make each
> worker
> sleep for a random time before exiting.
>
> I don't have any Fedora/RedHat system; if nothing else, I'd suggest
> doing
> your tests on a debian/testing system too.
>
> Yours,
>
> lorenzo
>
> PS still running...
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | UniversitÃ degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

lorenzo at sancho

Jan 22, 2009, 8:19 AM

Post #30 of 48 (4745 views)

Permalink

On Thu, 22 Jan 2009, Rainer Gerhards wrote:

RG> Hi folks,
RG>
RG> just an update on this matter. Lorenzo needed to change his system
RG> setup after some problems. We are in contact and expect to conduct
RG> further testing soon (hopefully the bug will reappear).
RG>

Some administration chores the last couple of days; almost finished,
big hopes for the week-end!!!

RG>
RG> Even better news is that I have been able to reproduce the bug 4 times
RG> in my lab today. It's not as easy as I would hope, but at least I can
RG> get results with some patience. I am also experimenting a bit with
RG> Twitter and actually found it useful to keep track of the
RG> troubleshooting process. Those of your interested can follow it at
RG>

This is really great news! Really, since rsyslog is been running this well
since a long time on "normal" systems, and I've been (almost) alone in
experiencing the crashes, the critters should have been hiding very well!

See you soon,

lorenzo

+-------------------------+----------------------------------------------+
| Lorenzo M. Catucci | Centro di Calcolo e Documentazione |
| catucci@ccd.uniroma2.it | UniversitÃ degli Studi di Roma "Tor Vergata" |
| | Via O. Raimondo 18 ** I-00173 ROMA ** ITALY |
| Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125 |
+-------------------------+----------------------------------------------+

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 22, 2009, 9:53 AM

Post #31 of 48 (4760 views)

Permalink

OK, an update, full history at http://twitter.com/rgerhards

It looks like there is some trouble with GCC atomic operation support. Has anyone seen this race on a non-Debian platform? I am asking because that may narrow down (or not ;)) the issue. Of course, I am not sure if atomic operations are really the root cause. However, replacing them is not very practical at some places and definitely time-consuming. So I'd like to have some feedback before I take that route.

Does anyone know if there is a problem with atomic operation support in Debian (no bashing, honest question ;))?

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Lorenzo M. Catucci
> Sent: Thursday, January 22, 2009 5:19 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Thu, 22 Jan 2009, Rainer Gerhards wrote:
>
> RG> Hi folks,
> RG>
> RG> just an update on this matter. Lorenzo needed to change his system
> RG> setup after some problems. We are in contact and expect to conduct
> RG> further testing soon (hopefully the bug will reappear).
> RG>
>
> Some administration chores the last couple of days; almost finished,
> big hopes for the week-end!!!
>
> RG>
> RG> Even better news is that I have been able to reproduce the bug 4
> times
> RG> in my lab today. It's not as easy as I would hope, but at least I
> can
> RG> get results with some patience. I am also experimenting a bit with
> RG> Twitter and actually found it useful to keep track of the
> RG> troubleshooting process. Those of your interested can follow it at
> RG>
>
> This is really great news! Really, since rsyslog is been running this
> well
> since a long time on "normal" systems, and I've been (almost) alone in
> experiencing the crashes, the critters should have been hiding very
> well!
>
> See you soon,
>
> lorenzo
>
>
> +-------------------------+--------------------------------------------
> --+
> | Lorenzo M. Catucci | Centro di Calcolo e Documentazione
> |
> | catucci@ccd.uniroma2.it | UniversitÃ degli Studi di Roma "Tor
> Vergata" |
> | | Via O. Raimondo 18 ** I-00173 ROMA **
> ITALY |
> | Tel. +39 06 7259 2255 | Fax. +39 06 7259 2125
> |
> +-------------------------+--------------------------------------------
> --+
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

mbiebl at gmail

Jan 22, 2009, 10:46 AM

Post #32 of 48 (4745 views)

Permalink

2009/1/22 Rainer Gerhards <rgerhards@hq.adiscon.com>:
> OK, an update, full history at http://twitter.com/rgerhards
>
> It looks like there is some trouble with GCC atomic operation support. Has anyone seen this race on a non-Debian platform? I am asking because that may narrow down (or not ;)) the issue. Of course, I am not sure if atomic operations are really the root cause. However, replacing them is not very practical at some places and definitely time-consuming. So I'd like to have some feedback before I take that route.
>
> Does anyone know if there is a problem with atomic operation support in Debian (no bashing, honest question ;))?

This would be a compiler (GCC) problem then, right?

I'm not aware of any such problem. FWIW Debian is using GCC 4.3 in lenny/sid
I've checked the bugs reported against the Debian gcc package [1] and
the Debian specific patches on top of gcc [2],
but I didn't find anything obvious.

Rainer, if you have a more specific question, I could forward that
question to the Debian GCC maintainers.

Cheers,
Michael

[1] http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=gcc-4.3&repeatmerged=no
[2] http://patch-tracking.debian.net/package/gcc-4.3/4.3.2-1.1

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 22, 2009, 12:18 PM

Post #33 of 48 (4750 views)

Permalink

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com
> [mailto:rsyslog-bounces@lists.adiscon.com] On Behalf Of Michael Biebl
> Sent: Thursday, January 22, 2009 7:47 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> 2009/1/22 Rainer Gerhards <rgerhards@hq.adiscon.com>:
> > OK, an update, full history at http://twitter.com/rgerhards
> >
> > It looks like there is some trouble with GCC atomic
> operation support. Has anyone seen this race on a non-Debian
> platform? I am asking because that may narrow down (or not
> ;)) the issue. Of course, I am not sure if atomic operations
> are really the root cause. However, replacing them is not
> very practical at some places and definitely time-consuming.
> So I'd like to have some feedback before I take that route.
> >
> > Does anyone know if there is a problem with atomic
> operation support in Debian (no bashing, honest question ;))?
>
> This would be a compiler (GCC) problem then, right?

Excatly

>
> I'm not aware of any such problem. FWIW Debian is using GCC
> 4.3 in lenny/sid
> I've checked the bugs reported against the Debian gcc package [1] and
> the Debian specific patches on top of gcc [2],
> but I didn't find anything obvious.
>
> Rainer, if you have a more specific question, I could forward that
> question to the Debian GCC maintainers.

Thanks, Michael. But I think before we ask other's for their time, I'll
try to do my homework. So far, I am just guessing. As I now seem to be
able to repro the problem, I can look further into it. Tomorrow, I'll
first check what it takes to replace the atomic operations by mutex
calls. I think that's quite some work, but hopefully I am wrong. Thanks
to the info you provided, this seems to be useful work.

I keep you posted.

Rainer

>
> Cheers,
> Michael
>
> [1]
> http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=gcc-4.3&repea
> tmerged=no
> [2] http://patch-tracking.debian.net/package/gcc-4.3/4.3.2-1.1
>
> --
> Why is it that all of the instruments seeking intelligent life in the
> universe are pointed away from Earth?
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 28, 2009, 9:32 AM

Post #34 of 48 (4730 views)

Permalink

Hi all,

thanks to Lorenzo's help, we made good progress. It is too much to post
inside a mail, please have a look at my analysis of the bug:

http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html

The short story is that we have at least improved the situation very
much and I hope to have fixes for all branches within the next couple of
days.

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Friday, January 16, 2009 3:22 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> Lorenzo,
>
> I have created a new branch "raceDebug" and done a first commit to it.
> The change is very lightweight. Please pull, compile as usual and give
> it a try. It spits out some info to stdout from time to time
> (hopefully). I am not sure if it aborts, depending on the output it
may
> or may not. Even if we get messages, they are probably not enough to
> pinpoint the bug, but I wanted to do something very light to see if
the
> bug stays.
>
> Feedback appreciated.
>
> Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 29, 2009, 12:20 AM

Post #35 of 48 (4714 views)

Permalink

On Wed, 28 Jan 2009, Rainer Gerhards wrote:

> Hi all,
>
> thanks to Lorenzo's help, we made good progress. It is too much to post
> inside a mail, please have a look at my analysis of the bug:
>
> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
>
> The short story is that we have at least improved the situation very
> much and I hope to have fixes for all branches within the next couple of
> days.

I just finished reading through this excellant write-up

one small thing.

you quote the spec

Accesses to cacheable memory that are split across bus widths, cache
lines, and page boundaries are not guaranteed to be atomic

and then conclude that

So aligned word-access does not guarantee (not even enhance the chance) of
atomicity.

I read that to mean that the alignment requirements are more complicated,
not that alignment is useless.

you should also look at the code that's generated by -Os, with the heavily
cached systems that we have nowdays it's common that the code being
smaller (and therefor more of the code fitting into the L1 cache) is more
of an advantage than the optimizations that -O3 provides.

congradulations on tracking down a nasty and subtle issue.

David Lang

> Rainer
>
>> -----Original Message-----
>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
>> Sent: Friday, January 16, 2009 3:22 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] rsyslog still crashes
>>
>> Lorenzo,
>>
>> I have created a new branch "raceDebug" and done a first commit to it.
>> The change is very lightweight. Please pull, compile as usual and give
>> it a try. It spits out some info to stdout from time to time
>> (hopefully). I am not sure if it aborts, depending on the output it
> may
>> or may not. Even if we get messages, they are probably not enough to
>> pinpoint the bug, but I wanted to do something very light to see if
> the
>> bug stays.
>>
>> Feedback appreciated.
>>
>> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 1:42 AM

Post #36 of 48 (4715 views)

Permalink

Hi all,

I had another interesting discussion with Lorenzo today. Those of you
interested in details my find the chatlog interesting:

http://blog.gerhards.net/2009/01/some-more-on-rsyslog-data-race.html

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Wednesday, January 28, 2009 6:32 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> Hi all,
>
> thanks to Lorenzo's help, we made good progress. It is too much to
post
> inside a mail, please have a look at my analysis of the bug:
>
> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
>
> The short story is that we have at least improved the situation very
> much and I hope to have fixes for all branches within the next couple
> of
> days.
>
> Rainer
>
> > -----Original Message-----
> > From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> > bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> > Sent: Friday, January 16, 2009 3:22 PM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] rsyslog still crashes
> >
> > Lorenzo,
> >
> > I have created a new branch "raceDebug" and done a first commit to
> it.
> > The change is very lightweight. Please pull, compile as usual and
> give
> > it a try. It spits out some info to stdout from time to time
> > (hopefully). I am not sure if it aborts, depending on the output it
> may
> > or may not. Even if we get messages, they are probably not enough to
> > pinpoint the bug, but I wanted to do something very light to see if
> the
> > bug stays.
> >
> > Feedback appreciated.
> >
> > Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 2:08 AM

Post #37 of 48 (4718 views)

Permalink

A full answer follows soon, but in essence you got it :) I will be
working on the 4.1 version today, thus the brief reply ;)

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of david@lang.hm
> Sent: Thursday, January 29, 2009 12:06 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] rsyslog still crashes
>
> On Thu, 29 Jan 2009, Rainer Gerhards wrote:
>
> > Hi all,
> >
> > I had another interesting discussion with Lorenzo today. Those of
you
> > interested in details my find the chatlog interesting:
> >
> > http://blog.gerhards.net/2009/01/some-more-on-rsyslog-data-race.html
>
> so, distilling this down I think I am reading the following.
>
> 1. mixing mutex and atomic operations is a problem, one or the other
is
> safe
>
> 2. reliable duplication of the problem requires
>
> fast machine
> multiple cores _not_ sharing L1 cache (early Intel 4-core machines or
> multi-socket machines)
> a complex rsyslog config that uses multiple thread heavily
> high traffic log volume to heavily load rsyslog
> high system load external to rsyslog increases the chancesof the race
>
> question, have you tried enabling/disabling preemption in the kernel
on
> these systems to see if that affects the probability of having a
> problem?
>
> I'm eagerly waiting for the fixes to appear in the 4.1 branch to test
> them
> out.
>
> David Lang
>
>
> > Rainer
> >
> >> -----Original Message-----
> >> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> >> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> >> Sent: Wednesday, January 28, 2009 6:32 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] rsyslog still crashes
> >>
> >> Hi all,
> >>
> >> thanks to Lorenzo's help, we made good progress. It is too much to
> > post
> >> inside a mail, please have a look at my analysis of the bug:
> >>
> >> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
> >>
> >> The short story is that we have at least improved the situation
very
> >> much and I hope to have fixes for all branches within the next
> couple
> >> of
> >> days.
> >>
> >> Rainer
> >>
> >>> -----Original Message-----
> >>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> >>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> >>> Sent: Friday, January 16, 2009 3:22 PM
> >>> To: rsyslog-users
> >>> Subject: Re: [rsyslog] rsyslog still crashes
> >>>
> >>> Lorenzo,
> >>>
> >>> I have created a new branch "raceDebug" and done a first commit to
> >> it.
> >>> The change is very lightweight. Please pull, compile as usual and
> >> give
> >>> it a try. It spits out some info to stdout from time to time
> >>> (hopefully). I am not sure if it aborts, depending on the output
it
> >> may
> >>> or may not. Even if we get messages, they are probably not enough
> to
> >>> pinpoint the bug, but I wanted to do something very light to see
if
> >> the
> >>> bug stays.
> >>>
> >>> Feedback appreciated.
> >>>
> >>> Rainer
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 29, 2009, 2:20 AM

Post #38 of 48 (4715 views)

Permalink

On Thu, 29 Jan 2009, Rainer Gerhards wrote:

> Hi all,
>
> I had another interesting discussion with Lorenzo today. Those of you
> interested in details my find the chatlog interesting:
>
> http://blog.gerhards.net/2009/01/some-more-on-rsyslog-data-race.html

so, distilling this down I think I am reading the following.

1. mixing mutex and atomic operations is a problem, one or the other is
safe

2. reliable duplication of the problem requires

fast machine
multiple cores _not_ sharing L1 cache (early Intel 4-core machines or multi-socket machines)
a complex rsyslog config that uses multiple thread heavily
high traffic log volume to heavily load rsyslog
high system load external to rsyslog increases the chancesof the race

question, have you tried enabling/disabling preemption in the kernel on
these systems to see if that affects the probability of having a problem?

I'm eagerly waiting for the fixes to appear in the 4.1 branch to test them
out.

David Lang

> Rainer
>
>> -----Original Message-----
>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
>> Sent: Wednesday, January 28, 2009 6:32 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] rsyslog still crashes
>>
>> Hi all,
>>
>> thanks to Lorenzo's help, we made good progress. It is too much to
> post
>> inside a mail, please have a look at my analysis of the bug:
>>
>> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
>>
>> The short story is that we have at least improved the situation very
>> much and I hope to have fixes for all branches within the next couple
>> of
>> days.
>>
>> Rainer
>>
>>> -----Original Message-----
>>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
>>> Sent: Friday, January 16, 2009 3:22 PM
>>> To: rsyslog-users
>>> Subject: Re: [rsyslog] rsyslog still crashes
>>>
>>> Lorenzo,
>>>
>>> I have created a new branch "raceDebug" and done a first commit to
>> it.
>>> The change is very lightweight. Please pull, compile as usual and
>> give
>>> it a try. It spits out some info to stdout from time to time
>>> (hopefully). I am not sure if it aborts, depending on the output it
>> may
>>> or may not. Even if we get messages, they are probably not enough to
>>> pinpoint the bug, but I wanted to do something very light to see if
>> the
>>> bug stays.
>>>
>>> Feedback appreciated.
>>>
>>> Rainer
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

mrdemeanour at jackpot

Jan 29, 2009, 3:12 AM

Post #39 of 48 (4716 views)

Permalink

Rainer Gerhards wrote:
> Hi all,
>
> thanks to Lorenzo's help, we made good progress. It is too much to post
> inside a mail, please have a look at my analysis of the bug:
>
> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
>
> The short story is that we have at least improved the situation very
> much and I hope to have fixes for all branches within the next couple of
> days.

Bravo, Rainer! That is the most challenging and tricky to nail of all
kinds of bug, and I'm very impressed.

--
Jack.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 8:36 AM

Post #40 of 48 (4728 views)

Permalink

On Thu, 2009-01-29 at 00:36 -0800, david@lang.hm wrote:
> On Wed, 28 Jan 2009, Rainer Gerhards wrote:
>
> > Hi all,
> >
> > thanks to Lorenzo's help, we made good progress. It is too much to post
> > inside a mail, please have a look at my analysis of the bug:
> >
> > http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
> >
> > The short story is that we have at least improved the situation very
> > much and I hope to have fixes for all branches within the next couple of
> > days.
>
> I just finished reading through this excellant write-up
>
> one small thing.
>
> you quote the spec
>
> Accesses to cacheable memory that are split across bus widths, cache
> lines, and page boundaries are not guaranteed to be atomic
>
> and then conclude that
>
> So aligned word-access does not guarantee (not even enhance the chance) of
> atomicity.
>
> I read that to mean that the alignment requirements are more complicated,
> not that alignment is useless.

I should probably have quoted more of Intel's manual. But in essence you
need to read at least the first full two pages to get the in-depth idea.
The issue is not alignment requirements. As hardware gets more and more
parallel, and caches get to more and more levels, and on-chip cores
coexist with those from other sockets ... keeping memory coherent is a
costly job.

In early CPUs, Intel made memory access atomic if some alignment
requirements were met. That was cheap. In new CPUs that atomicity is
expensive. On the other hand, most data access do not need atomicity. So
why incur the cost for many operations when only few need it? In the end
result, Intel has remove guaranteed atomicity from those memory
accesses. In order to get atomicity, the program must tell the CPU
*explicitly* that it wants that feature. To do so, a "LOCK" prefix
(opcode) must be placed before the actual opcode (note that this is only
supported for some operations). So you get the best of two world: fast
execution time for the majority of code and atomicity where you need it
(but it then incurs the cost).

The bottom line is that what was an atomic operation on an old CPU is no
longer an atomic operation on a new CPU. If you need that, you need to
include that extra "LOCK" opcode.

As I briefly said in the blogpost, I have not check old Intel manuals.
So I do not know if they formerly guaranteed, as part of the instruction
set architecture, that these operations were atomic. I guess they did
not. If so, I as a programmer made some assumptions about the
micro-architecture that no longer hold true. My fault... But even if it
is Intel's fault, the C programming language does not guarantee
atomicity nor does the compiler guarantee a specific translation to
machine code. So I, working on the C level, used assumptions that were
not valid (and as I said I knew it was dangerous, but it worked too well
for too long... ;))
>
> you should also look at the code that's generated by -Os, with the heavily
> cached systems that we have nowdays it's common that the code being
> smaller (and therefor more of the code fitting into the L1 cache) is more
> of an advantage than the optimizations that -O3 provides.

That's a good reminder. I've just checked the gcc docs. There are some
things that I do not like about -Os, especially as it disables proper
alignment of many structures, including code. That can lead to
sub-optimal cache performance.

On the other hand -O3 does things like loop unrolling, which definitely
is a bad idea with modern cache systems.

My preliminarily conclusion is that -O2 is probably best, and may be
tuned by turning on and off specific optimizations via their specific
compiler switches.
>
> congradulations on tracking down a nasty and subtle issue.

Thanks - but let's first see if this was the only issue and if things
run smooth everywhere. But it looks very promising.

Rainer
>
> David Lang
>
>
> > Rainer
> >
> >> -----Original Message-----
> >> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> >> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
> >> Sent: Friday, January 16, 2009 3:22 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] rsyslog still crashes
> >>
> >> Lorenzo,
> >>
> >> I have created a new branch "raceDebug" and done a first commit to it.
> >> The change is very lightweight. Please pull, compile as usual and give
> >> it a try. It spits out some info to stdout from time to time
> >> (hopefully). I am not sure if it aborts, depending on the output it
> > may
> >> or may not. Even if we get messages, they are probably not enough to
> >> pinpoint the bug, but I wanted to do something very light to see if
> > the
> >> bug stays.
> >>
> >> Feedback appreciated.
> >>
> >> Rainer
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 10:34 AM

Post #41 of 48 (4707 views)

Permalink

On Thu, 2009-01-29 at 21:13 -0800, david@lang.hm wrote:
> On Thu, 29 Jan 2009, david@lang.hm wrote:
> interesting note on memory useage.
>
> I'm using the default fixed array queue type on this box with a 1K max
> message length. if I hammer the box with a steady ~120K messages/sec
> (while it can write 93K/sec) the queue builds up to where it takes ~12G of
> ram. at this point the throughput takes a nose dive (not just dropping
> inbound packets, but also the number of packets written is much less)
>
> if I kill the sender, it starts emptying it's queue (interestingly, not
> quite as fast as if it is also recieving some messages), but the memory
> isn't freed up until I start sending it messages again.

This actually is expected behavior - and it has lots to do with "last
message repeated n time".

In order to implement that functionality, I need to hold on the the last
message until a new one comes in (so that I can compare new to old). As
such, a message that is fully processed can not immediately be freed.
This happens, when the next message comes in - whenever this be. Note
that each output has separate "last message..." status, so each action
keeps a copy of the previous message until a new one arrives.

What now happens is that when the queue builds up, malloc extends the
data segment size. It is fair to assume that the last message received -
on a very busy system will probably end up at a high location in the
data segment (but note it is just a probability - it may even receive a
very low location, if that was just freed immediately before).

When the queue is now drained, we free everything but this message. As
the message is still referenced for "last m...", it can not be freed. As
it has a high address, the data segment size can not be reduced. As
such, rsyslog still holds the whole data segement, with it containing
almost no actually allocated memory. I do not know if the runtime system
has a way to tell the OS it now uses a "sparse data segement", but I
guess it doesn't do that.

When the next message comes in (hours later?), the previous message can
be freed, and the runtime can then reduce the data segment size (which
should result in a sharp decrease of memory usage seen).

This is one of the reasons I don't like "last message...".

I hope this clarifies.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 11:40 AM

Post #42 of 48 (4701 views)

Permalink

Hi David,

thanks for this note, but I think it is not related to the fix (I'll
think a bit harder about that, but so far I can not find any connection
between the two).

The way the HUP is done is sub-optimal. Under typical load (one hup a
day), you don't see any issue. If you hup very frequently (like the once
a min you do) and have heavy traffic, that's another story. To solve
that case, some rework on the hup internals, actually even on the
interface definition, is needed. I'd hold all such work unless I found a
solution to the race bug - because it would have made the environment
even more different. Now that I have at least one issue, I think I can
go ahead and begin to introduce more intrusive changes again.

In any case, I'll have a more in-depth look at the hup handlers. The new
non-restart type of hup should be almost resistant against the issue you
report.

Rainer

On Thu, 2009-01-29 at 20:56 -0800, david@lang.hm wrote:
> On Thu, 29 Jan 2009, Rainer Gerhards wrote:
>
> >>
> >> congradulations on tracking down a nasty and subtle issue.
> >
> > Thanks - but let's first see if this was the only issue and if things
> > run smooth everywhere. But it looks very promising.
> >
>
> bad news, on my system the HUP doesn't always reopen the files now.
>
> high speed box receiving messages via UDP, idle except for a gzip
> compressing the files (which are rotated once a min), the system runs fine
> for a few min (higher performance than before, it's now writing ~93,000
> messages/sec instead of ~78,000 messages/sec), but it sometimes mangles
> handling a HUP and gets stuck. I have to do a kill -9 to kill and restart
> it.
>
> this is with the new HUP behavior.
>
> David Lang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 29, 2009, 12:25 PM

Post #43 of 48 (4717 views)

Permalink

On Thu, 2009-01-29 at 19:51 -0800, david@lang.hm wrote:
> the new C0x standard will add atomic ops and guarentees (some of which are
> not nessasarily provided by the chip, but have to be provided by the
> compiler/library instead), so watch for it, but test the performance of
> them before you trust them

This is very important work, especially if you think about future
advances in hardware design. However, I think we will be years away from
the point where one can actually use this and hope to be somewhat
portable. Same for performance: early implementation will probably be
sub-optimal (though it should be fairly simple to map current
compiler-specific options for atomic ops to the new standard once... but
we know what happens when new standards come out...).

> > On the other hand -O3 does things like loop unrolling, which definitely
> > is a bad idea with modern cache systems.
> >
> > My preliminarily conclusion is that -O2 is probably best, and may be
> > tuned by turning on and off specific optimizations via their specific
> > compiler switches.
>
> this has been the prevailing wisdom for many years, but I've seen myself
> many cases where -Os has ended up being faster in the real world, in spite
> of the various things that -O2 does 'better'

I think the phrase "it depends on the scenario" is very important here.

> is it the case that -Os would break things? or just that you think it's
> alignment may not be as good?

It does not break things. The alignment for any structures that are
passed as part of the API should be properly contained in the header
files. However, I have not specifically tested this.

The point is just that, at least on some machines, non-aligned addresses
severely hit cache performance. So optimizing for size, and as a
side-effect generating unaligned data accesses, can be a real
performance drawback. It may well cost more performance than the
improved L1 (or trace cache) performance offers.

In any case, if we go down to that level, I think there are better
places to test and optimize - not to mention that on the upper layer (OS
calls!) there is still room for improvement. On of my favorite CPU-level
optimizations is the "exception system" that is currently in use in
rsyslog. Thanks to your message, I've finally written down some
information on it. I've done that on the forum, so that I can easily
keep a permanent record of the discussion (and in an easier-to-follow
form than with the mail archive):

http://kb.monitorware.com/optimizing-exception-handling-t8911.html

Feedback is appreciated.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 29, 2009, 7:51 PM

Post #44 of 48 (4703 views)

Permalink

On Thu, 29 Jan 2009, Rainer Gerhards wrote:

> On Thu, 2009-01-29 at 00:36 -0800, david@lang.hm wrote:
>> On Wed, 28 Jan 2009, Rainer Gerhards wrote:
>>
>>> Hi all,
>>>
>>> thanks to Lorenzo's help, we made good progress. It is too much to post
>>> inside a mail, please have a look at my analysis of the bug:
>>>
>>> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
>>>
>>> The short story is that we have at least improved the situation very
>>> much and I hope to have fixes for all branches within the next couple of
>>> days.
>>
>> I just finished reading through this excellant write-up
>>
>> one small thing.
>>
>> you quote the spec
>>
>> Accesses to cacheable memory that are split across bus widths, cache
>> lines, and page boundaries are not guaranteed to be atomic
>>
>> and then conclude that
>>
>> So aligned word-access does not guarantee (not even enhance the chance) of
>> atomicity.
>>
>> I read that to mean that the alignment requirements are more complicated,
>> not that alignment is useless.
>
> I should probably have quoted more of Intel's manual. But in essence you
> need to read at least the first full two pages to get the in-depth idea.
> The issue is not alignment requirements. As hardware gets more and more
> parallel, and caches get to more and more levels, and on-chip cores
> coexist with those from other sockets ... keeping memory coherent is a
> costly job.
>
> In early CPUs, Intel made memory access atomic if some alignment
> requirements were met. That was cheap. In new CPUs that atomicity is
> expensive. On the other hand, most data access do not need atomicity. So
> why incur the cost for many operations when only few need it? In the end
> result, Intel has remove guaranteed atomicity from those memory
> accesses. In order to get atomicity, the program must tell the CPU
> *explicitly* that it wants that feature. To do so, a "LOCK" prefix
> (opcode) must be placed before the actual opcode (note that this is only
> supported for some operations). So you get the best of two world: fast
> execution time for the majority of code and atomicity where you need it
> (but it then incurs the cost).
>
> The bottom line is that what was an atomic operation on an old CPU is no
> longer an atomic operation on a new CPU. If you need that, you need to
> include that extra "LOCK" opcode.
>
> As I briefly said in the blogpost, I have not check old Intel manuals.
> So I do not know if they formerly guaranteed, as part of the instruction
> set architecture, that these operations were atomic. I guess they did
> not. If so, I as a programmer made some assumptions about the
> micro-architecture that no longer hold true. My fault... But even if it
> is Intel's fault, the C programming language does not guarantee
> atomicity nor does the compiler guarantee a specific translation to
> machine code. So I, working on the C level, used assumptions that were
> not valid (and as I said I knew it was dangerous, but it worked too well
> for too long... ;))

the new C0x standard will add atomic ops and guarentees (some of which are
not nessasarily provided by the chip, but have to be provided by the
compiler/library instead), so watch for it, but test the performance of
them before you trust them

>>
>> you should also look at the code that's generated by -Os, with the heavily
>> cached systems that we have nowdays it's common that the code being
>> smaller (and therefor more of the code fitting into the L1 cache) is more
>> of an advantage than the optimizations that -O3 provides.
>
> That's a good reminder. I've just checked the gcc docs. There are some
> things that I do not like about -Os, especially as it disables proper
> alignment of many structures, including code. That can lead to
> sub-optimal cache performance.

I know the linux kernel has many things where the alignment is critical
for proper functioning, but they are still able to support -Os, so there
is some way to specify alignment even for -Os

> On the other hand -O3 does things like loop unrolling, which definitely
> is a bad idea with modern cache systems.
>
> My preliminarily conclusion is that -O2 is probably best, and may be
> tuned by turning on and off specific optimizations via their specific
> compiler switches.

this has been the prevailing wisdom for many years, but I've seen myself
many cases where -Os has ended up being faster in the real world, in spite
of the various things that -O2 does 'better'

is it the case that -Os would break things? or just that you think it's
alignment may not be as good?

David Lang

>> congradulations on tracking down a nasty and subtle issue.
>
> Thanks - but let's first see if this was the only issue and if things
> run smooth everywhere. But it looks very promising.
>
> Rainer
>>
>> David Lang
>>
>>
>>> Rainer
>>>
>>>> -----Original Message-----
>>>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>>>> bounces@lists.adiscon.com] On Behalf Of Rainer Gerhards
>>>> Sent: Friday, January 16, 2009 3:22 PM
>>>> To: rsyslog-users
>>>> Subject: Re: [rsyslog] rsyslog still crashes
>>>>
>>>> Lorenzo,
>>>>
>>>> I have created a new branch "raceDebug" and done a first commit to it.
>>>> The change is very lightweight. Please pull, compile as usual and give
>>>> it a try. It spits out some info to stdout from time to time
>>>> (hopefully). I am not sure if it aborts, depending on the output it
>>> may
>>>> or may not. Even if we get messages, they are probably not enough to
>>>> pinpoint the bug, but I wanted to do something very light to see if
>>> the
>>>> bug stays.
>>>>
>>>> Feedback appreciated.
>>>>
>>>> Rainer
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 29, 2009, 8:20 PM

Post #45 of 48 (4716 views)

Permalink

On Thu, 29 Jan 2009, Rainer Gerhards wrote:

>>
>> congradulations on tracking down a nasty and subtle issue.
>
> Thanks - but let's first see if this was the only issue and if things
> run smooth everywhere. But it looks very promising.
>

bad news, on my system the HUP doesn't always reopen the files now.

high speed box receiving messages via UDP, idle except for a gzip
compressing the files (which are rotated once a min), the system runs fine
for a few min (higher performance than before, it's now writing ~93,000
messages/sec instead of ~78,000 messages/sec), but it sometimes mangles
handling a HUP and gets stuck. I have to do a kill -9 to kill and restart
it.

this is with the new HUP behavior.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 29, 2009, 8:20 PM

Post #46 of 48 (4707 views)

Permalink

On Thu, 29 Jan 2009, david@lang.hm wrote:

> On Thu, 29 Jan 2009, Rainer Gerhards wrote:
>
>>>
>>> congradulations on tracking down a nasty and subtle issue.
>>
>> Thanks - but let's first see if this was the only issue and if things
>> run smooth everywhere. But it looks very promising.
>>
>
> bad news, on my system the HUP doesn't always reopen the files now.
>
> high speed box receiving messages via UDP, idle except for a gzip
> compressing the files (which are rotated once a min), the system runs fine
> for a few min (higher performance than before, it's now writing ~93,000
> messages/sec instead of ~78,000 messages/sec), but it sometimes mangles
> handling a HUP and gets stuck. I have to do a kill -9 to kill and restart
> it.
>
> this is with the new HUP behavior.

interesting note on memory useage.

I'm using the default fixed array queue type on this box with a 1K max
message length. if I hammer the box with a steady ~120K messages/sec
(while it can write 93K/sec) the queue builds up to where it takes ~12G of
ram. at this point the throughput takes a nose dive (not just dropping
inbound packets, but also the number of packets written is much less)

if I kill the sender, it starts emptying it's queue (interestingly, not
quite as fast as if it is also recieving some messages), but the memory
isn't freed up until I start sending it messages again.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: hang on HUP - was: rsyslog still crashes [ In reply to ]

rgerhards at hq

Jan 30, 2009, 7:47 AM

Post #47 of 48 (4756 views)

Permalink

On Thu, 2009-01-29 at 20:56 -0800, david@lang.hm wrote:
> high speed box receiving messages via UDP, idle except for a gzip
> compressing the files (which are rotated once a min), the system runs fine
> for a few min (higher performance than before, it's now writing ~93,000
> messages/sec instead of ~78,000 messages/sec), but it sometimes mangles
> handling a HUP and gets stuck. I have to do a kill -9 to kill and restart
> it.
>
> this is with the new HUP behavior.

I cross-checked the HUP processing. So far, I do not see why it hangs
(and if it is related to the HUP processing). Can you reproduce it with
debug log running. I guess no, but if so, could you provide me a log
with ~1000 log lines before the hang? If debug log is no option, a stack
trace from the abort would be great.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: rsyslog still crashes [ In reply to ]

david at lang

Jan 30, 2009, 8:20 AM

Post #48 of 48 (4708 views)

Permalink

On Thu, 29 Jan 2009, Rainer Gerhards wrote:

> Hi David,
>
> thanks for this note, but I think it is not related to the fix (I'll
> think a bit harder about that, but so far I can not find any connection
> between the two).
>
> The way the HUP is done is sub-optimal. Under typical load (one hup a
> day), you don't see any issue. If you hup very frequently (like the once
> a min you do) and have heavy traffic, that's another story. To solve
> that case, some rework on the hup internals, actually even on the
> interface definition, is needed. I'd hold all such work unless I found a
> solution to the race bug - because it would have made the environment
> even more different. Now that I have at least one issue, I think I can
> go ahead and begin to introduce more intrusive changes again.
>
> In any case, I'll have a more in-depth look at the hup handlers. The new
> non-restart type of hup should be almost resistant against the issue you
> report.

I was using the new non-restart type. I'll be doing more testing today and
over the weekend. it's posible that I ended up with mixed versions with
the modules again (just before going home last night I deleted them all
and then did the install to make sure)

David Lang

> Rainer
>
> On Thu, 2009-01-29 at 20:56 -0800, david@lang.hm wrote:
>> On Thu, 29 Jan 2009, Rainer Gerhards wrote:
>>
>>>>
>>>> congradulations on tracking down a nasty and subtle issue.
>>>
>>> Thanks - but let's first see if this was the only issue and if things
>>> run smooth everywhere. But it looks very promising.
>>>
>>
>> bad news, on my system the HUP doesn't always reopen the files now.
>>
>> high speed box receiving messages via UDP, idle except for a gzip
>> compressing the files (which are rotated once a min), the system runs fine
>> for a few min (higher performance than before, it's now writing ~93,000
>> messages/sec instead of ~78,000 messages/sec), but it sometimes mangles
>> handling a HUP and gets stuck. I have to do a kill -9 to kill and restart
>> it.
>>
>> this is with the new HUP behavior.
>>
>> David Lang
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com