Mailing List Archive

Problems with spooling to disk
hi,

i yesterday deployed 3.18.3 and verified that the fix you told specified
below is in the source.

unfortunatly, this does not help and rsyslog hangs. please follow the
steps below to reproduce the error:

1) setup two servers. A is the rsyslog "client" and B is the rsyslog
server logging to mysql.

A is configured with the debian default config and:
> $WorkDirectory /var/spool/rsyslog/ # where to place spool files
> $ActionQueueFileName remote # unique name prefix for spool files
> $ActionQueueMaxDiskSpace 100M # 100mb space limit (use as much as possible)
> $ActionQueueSaveOnShutdown on # save messages to disk on shutdown
> $ActionQueueType LinkedList # run asynchronously
> $ActionResumeRetryCount -1 # infinite retries if host is down
> # remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
> *.* @@b.ipax.at:514

B is configured with the debian default config and:
> # provides UDP syslog reception
> $ModLoad imudp
> $UDPServerRun 514
>
> # provides TCP syslog reception
> $ModLoad imtcp
> $InputTCPServerRun 514
> ...
> $ModLoad ommysql
> :programname, isequal, "prog1" ~
> :programname, isequal, "prog2" ~
> *.* :ommysql:localhost,Syslog,dbuser,dbpass

2) we start A and B and verify that it works correctly:
messages on A are logged to /var/log/ and sent to B

messages on B (= originating from A and B) are logged to /var/log/ and
are sent to the mysql db.

3) i connect to the mysql db and run some sql statements:
> delete from SystemEvents where SysLogTag like "prog1[.%";
> delete from SystemEvents where SysLogTag like "prog2[%";
> delete from SystemEvents where SysLogTag like "pam_limits[%";

4) whilst the statement is running, issue the following command on B:
> logger testt

5) observe that testt is neither logged to the mysql db (thats expected
as delete locks the myisam table) but also not logged to /var/log/

this effectivly blocks something causing server A to not function
properly. e.g. apache2, sshd, bind, etc are not working properly.

please help :)

cheers,
raoul


Rainer Gerhards wrote:
> This is the v3-stable fix:
>
> http://git.adiscon.com/?p=rsyslog.git;a=blobdiff;f=plugins/imuxsock/imux
> sock.c;h=60ccaffbc526a03011be3ce5869e40bc3e32c319;hp=f8798039c5d53e58fc7
> fa17807547862e08c999e;hb=8eb888d049da12e1294a7688432b6325794ade32;hpb=02
> f768c37dac9dde424bbd31e378482750fc276c
>
> You can also easily apply it to 3.16 - I just won't do it that version
> has been superseded.
>
> Rainer
>
>> -----Original Message-----
>> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
>> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
>> Sent: Wednesday, August 13, 2008 2:33 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] Problems with spooling to disk
>>
>> On Wed, 2008-08-13 at 13:04 +0200, Raoul Bhatia [IPAX] wrote:
>>> hi,
>>>
>>> Rainer Gerhards wrote:
>>>> Thanks, I now received the log files. And, of course, they made
>>>> immediately clear what is going on ;) What you experience is flow
>>>> control kicking in. Let me first elaborate a bit:
>>>>
>>>> Rsyslog knows inputs which can be delayed a bit (like tcp receiver
>> AND
>>>> the local socket) and those who can not (like UDP). The idea is
>> that we
>>>> can slow down senders where this does not result in loss of
>> messages,
>>>> leaving room inside the queue on busy systems for those where we
>> can not
>>>> do that. We had cases where not delaying caused a lot of grief.
>>>>
>>>> There are also two watermarks, the low and high delay mark. If the
>> low
>>>> delay mark is reached, a delayable source is delayed for one
>> second. It
>>>> is at 70% of the (in-memory) queue capacity. Again, this is all
>> done for
>>>> some very good reasons.
>>>>
>>>> However, when I introduced these features, I did not make them
>>>> configurable. Also, I did classify the local unix socket as one
> who
>> can
>>>> be delayed. In your scenario, this does not stop the system, but
>> once
>>>> the local log socket is filled up, it needs to wait one second for
>> each
>>>> message before it is pulled.
>>>>
>>>> Again, the rate limiting logic is there for good reasons. So what
> I
>> now
>>>> see is that I need to add a few config statements to allow to
>> control it
>>>> according to your needs. I am also unsure if I should re-classify
>>>> imuxsock as an input which cannot be delayed - but that also can
>> have
>>>> some drawbacks. Maybe the best thing to do is make its state
>>>> configurable and use "not to be delayed" as default.
>>> ok, after reading through your explanation, i (almost) understand
> the
>>> delay mechanism and that at present it is not configurable.
>>>
>>> almost, because i do not get the part where you talk about the "70%
>> of
>>> the (in-memory) queue capacity".
>> An example: if the configured queue capacity is 10,000 (the default
>> value for the main message queue), than no flow control at all happens
>> until 7,000 messages are inside the queue. Once they are (and only as
>> long as they are), flow control is applied to those inputs that permit
>> it (imuxsock permits it).
>>
>> It doesn't matter here if the queue is disk-assisted or not. Even with
>> disk-assisted queues, this setting is very important to prevent an
>> input
>> (e.g. imfile!) to send massive amounts of data to the queue when there
>> is no need to (because the input can be flow-controlled).
>>
>>> anyways, how does this relate to the original problem, that the
>>> following code stalls the logging process.
>>>
>>>>>> #! /bin/sh
>>>>>> for ((i=1;i<=10000;i+=1)); do
>>>>>> logger Log line $i
>>>>>> done
>>> is this a capacity problem? is this another problem?
>> It is going over the flow control watermark, flow control kicks in and
>> delays imuxsock writes by one second each as long as there are more
>> than
>> 7,000 messages. No problem, intended behavior, but obviously
>> questionable. In the imuxsock case, there are also some bad cases.
>>
>>> i really consider this a critical bug, as i yesterday ran out of
>>> diskspace on my remote-rsyslog-and-mysql-on-one-host-testbed and the
>>> rsyslog-client, which logs locally and to the remote destination,
>>> stalled completely.
>> We'll - not completely, one message per second ;) But that's probably
>> complete enough ;)
>>
>> As a quick fix, I think I'll make imuxsock a non-flow-controllable
>> input
>> in the v3-stable (3.18.x). The devel will have a switch to make it
> flow
>> controlled if one desires. I'll also add some other knobs to fine-tune
>> the flow control algo, but that probably needs to wait until after my
>> summer break. These tweaks could easily break things and I don't like
>> the idea of doing them under time pressure ;)
>>
>> HTH
>> Rainer
>>> cheers,
>>> raoul
>>> ps: i am on 3.16.1-1~bpo40+1
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog


--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office at ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Problems with spooling to disk [ In reply to ]
Hi Raoul,

could you quickly mail me a debug log of the client? I know I can set up
the lab per your instructions, but that will need some time. Depending
on what I see in the debug log, we could come much quicker to a solution
[.I am currently quite busy after my vacation, so I will probably not be
able to set up the lab this week] ;). You can mail the log to me
privately.

Thanks,
Rainer

> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of Raoul Bhatia [IPAX]
> Sent: Thursday, September 04, 2008 10:22 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Problems with spooling to disk
>
> hi,
>
> i yesterday deployed 3.18.3 and verified that the fix you told
> specified
> below is in the source.
>
> unfortunatly, this does not help and rsyslog hangs. please follow the
> steps below to reproduce the error:
>
> 1) setup two servers. A is the rsyslog "client" and B is the rsyslog
> server logging to mysql.
>
> A is configured with the debian default config and:
> > $WorkDirectory /var/spool/rsyslog/ # where to place spool files
> > $ActionQueueFileName remote # unique name prefix for spool files
> > $ActionQueueMaxDiskSpace 100M # 100mb space limit (use as much as
> possible)
> > $ActionQueueSaveOnShutdown on # save messages to disk on shutdown
> > $ActionQueueType LinkedList # run asynchronously
> > $ActionResumeRetryCount -1 # infinite retries if host is down
> > # remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
> > *.* @@b.ipax.at:514
>
> B is configured with the debian default config and:
> > # provides UDP syslog reception
> > $ModLoad imudp
> > $UDPServerRun 514
> >
> > # provides TCP syslog reception
> > $ModLoad imtcp
> > $InputTCPServerRun 514
> > ...
> > $ModLoad ommysql
> > :programname, isequal, "prog1" ~
> > :programname, isequal, "prog2" ~
> > *.* :ommysql:localhost,Syslog,dbuser,dbpass
>
> 2) we start A and B and verify that it works correctly:
> messages on A are logged to /var/log/ and sent to B
>
> messages on B (= originating from A and B) are logged to /var/log/ and
> are sent to the mysql db.
>
> 3) i connect to the mysql db and run some sql statements:
> > delete from SystemEvents where SysLogTag like "prog1[.%";
> > delete from SystemEvents where SysLogTag like "prog2[.%";
> > delete from SystemEvents where SysLogTag like "pam_limits[%";
>
> 4) whilst the statement is running, issue the following command on B:
> > logger testt
>
> 5) observe that testt is neither logged to the mysql db (thats
expected
> as delete locks the myisam table) but also not logged to /var/log/
>
> this effectivly blocks something causing server A to not function
> properly. e.g. apache2, sshd, bind, etc are not working properly.
>
> please help :)
>
> cheers,
> raoul
>
>
> Rainer Gerhards wrote:
> > This is the v3-stable fix:
> >
> >
>
http://git.adiscon.com/?p=rsyslog.git;a=blobdiff;f=plugins/imuxsock/imu
> x
> >
>
sock.c;h=60ccaffbc526a03011be3ce5869e40bc3e32c319;hp=f8798039c5d53e58fc
> 7
> >
>
fa17807547862e08c999e;hb=8eb888d049da12e1294a7688432b6325794ade32;hpb=0
> 2
> > f768c37dac9dde424bbd31e378482750fc276c
> >
> > You can also easily apply it to 3.16 - I just won't do it that
> version
> > has been superseded.
> >
> > Rainer
> >
> >> -----Original Message-----
> >> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> >> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
> >> Sent: Wednesday, August 13, 2008 2:33 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] Problems with spooling to disk
> >>
> >> On Wed, 2008-08-13 at 13:04 +0200, Raoul Bhatia [IPAX] wrote:
> >>> hi,
> >>>
> >>> Rainer Gerhards wrote:
> >>>> Thanks, I now received the log files. And, of course, they made
> >>>> immediately clear what is going on ;) What you experience is flow
> >>>> control kicking in. Let me first elaborate a bit:
> >>>>
> >>>> Rsyslog knows inputs which can be delayed a bit (like tcp
receiver
> >> AND
> >>>> the local socket) and those who can not (like UDP). The idea is
> >> that we
> >>>> can slow down senders where this does not result in loss of
> >> messages,
> >>>> leaving room inside the queue on busy systems for those where we
> >> can not
> >>>> do that. We had cases where not delaying caused a lot of grief.
> >>>>
> >>>> There are also two watermarks, the low and high delay mark. If
the
> >> low
> >>>> delay mark is reached, a delayable source is delayed for one
> >> second. It
> >>>> is at 70% of the (in-memory) queue capacity. Again, this is all
> >> done for
> >>>> some very good reasons.
> >>>>
> >>>> However, when I introduced these features, I did not make them
> >>>> configurable. Also, I did classify the local unix socket as one
> > who
> >> can
> >>>> be delayed. In your scenario, this does not stop the system, but
> >> once
> >>>> the local log socket is filled up, it needs to wait one second
for
> >> each
> >>>> message before it is pulled.
> >>>>
> >>>> Again, the rate limiting logic is there for good reasons. So what
> > I
> >> now
> >>>> see is that I need to add a few config statements to allow to
> >> control it
> >>>> according to your needs. I am also unsure if I should re-classify
> >>>> imuxsock as an input which cannot be delayed - but that also can
> >> have
> >>>> some drawbacks. Maybe the best thing to do is make its state
> >>>> configurable and use "not to be delayed" as default.
> >>> ok, after reading through your explanation, i (almost) understand
> > the
> >>> delay mechanism and that at present it is not configurable.
> >>>
> >>> almost, because i do not get the part where you talk about the
"70%
> >> of
> >>> the (in-memory) queue capacity".
> >> An example: if the configured queue capacity is 10,000 (the default
> >> value for the main message queue), than no flow control at all
> happens
> >> until 7,000 messages are inside the queue. Once they are (and only
> as
> >> long as they are), flow control is applied to those inputs that
> permit
> >> it (imuxsock permits it).
> >>
> >> It doesn't matter here if the queue is disk-assisted or not. Even
> with
> >> disk-assisted queues, this setting is very important to prevent an
> >> input
> >> (e.g. imfile!) to send massive amounts of data to the queue when
> there
> >> is no need to (because the input can be flow-controlled).
> >>
> >>> anyways, how does this relate to the original problem, that the
> >>> following code stalls the logging process.
> >>>
> >>>>>> #! /bin/sh
> >>>>>> for ((i=1;i<=10000;i+=1)); do
> >>>>>> logger Log line $i
> >>>>>> done
> >>> is this a capacity problem? is this another problem?
> >> It is going over the flow control watermark, flow control kicks in
> and
> >> delays imuxsock writes by one second each as long as there are more
> >> than
> >> 7,000 messages. No problem, intended behavior, but obviously
> >> questionable. In the imuxsock case, there are also some bad cases.
> >>
> >>> i really consider this a critical bug, as i yesterday ran out of
> >>> diskspace on my remote-rsyslog-and-mysql-on-one-host-testbed and
> the
> >>> rsyslog-client, which logs locally and to the remote destination,
> >>> stalled completely.
> >> We'll - not completely, one message per second ;) But that's
> probably
> >> complete enough ;)
> >>
> >> As a quick fix, I think I'll make imuxsock a non-flow-controllable
> >> input
> >> in the v3-stable (3.18.x). The devel will have a switch to make it
> > flow
> >> controlled if one desires. I'll also add some other knobs to fine-
> tune
> >> the flow control algo, but that probably needs to wait until after
> my
> >> summer break. These tweaks could easily break things and I don't
> like
> >> the idea of doing them under time pressure ;)
> >>
> >> HTH
> >> Rainer
> >>> cheers,
> >>> raoul
> >>> ps: i am on 3.16.1-1~bpo40+1
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>
>
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia at ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office at ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog