Mailing List Archive

1 2 3  View All
Re: Development of failsafe disk based queue [ In reply to ]
On Wed, 1 Oct 2008, RB wrote:

>> I guess you've seen in the discussion (well, if you read all these
>> mails, I just noticed how many they were...)
>
> ;-) I was wondering why my hip was rattling during the entire commute today.
>
>> persisted (because that completes the "queue transaction"). I will
>> verify with the code, but my current guess is that no more than 10 lines
>> of code will be needed to support this functionality. If so, I think it
>> is worth it.
>
> If it's something you can solve without worrying about block-level
> writes and whether the underlying drive (if indeed there is even a
> "drive") has a battery-backed cache or sufficient capacitor charge to
> write your data, I'm all for it. Even better if it's POSIX.
>
> Although a block driver and other filesystem-bypassing solutions may
> be interesting in limited cases, I'd rather not see anyone stab their
> wife over it.

I agree that trying to bypass the filesystem is highly questionable, and
not something for a core change (as always a contributed vrsion can be
tested to see if it makes a difference)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
On Wed, 2008-10-01 at 08:48 -0700, david@lang.hm wrote:
> On Wed, 1 Oct 2008, RB wrote:
>
> >> I guess you've seen in the discussion (well, if you read all these
> >> mails, I just noticed how many they were...)
> >
> > ;-) I was wondering why my hip was rattling during the entire commute today.
> >
> >> persisted (because that completes the "queue transaction"). I will
> >> verify with the code, but my current guess is that no more than 10 lines
> >> of code will be needed to support this functionality. If so, I think it
> >> is worth it.
> >
> > If it's something you can solve without worrying about block-level
> > writes and whether the underlying drive (if indeed there is even a
> > "drive") has a battery-backed cache or sufficient capacitor charge to
> > write your data, I'm all for it. Even better if it's POSIX.
> >
> > Although a block driver and other filesystem-bypassing solutions may
> > be interesting in limited cases, I'd rather not see anyone stab their
> > wife over it.
>
> I agree that trying to bypass the filesystem is highly questionable, and
> not something for a core change (as always a contributed vrsion can be
> tested to see if it makes a difference)

Just FYI: preliminary analysis indicates that is probably around the 10
lines of code that need to be added in stream.c, maybe a few more. The
idea is that I can set a flag similar to O_SYNC on stream creation but
then sync when the "atomic" writes are done. This may save some few
cycles over an O_SYNC open(). However, there need to be a few config
settings, which in turn need to be passed down to the queue and stream
class. That adds more code, maybe around 100 lines (the config interface
needs to be redone, thus the many LOC required, it's on the todo
list...). An alternative is to use a simple global $AllWritesSync on/off
option, which would probably be sufficient and cut down changes
required.

For a robust implementation, some more analysis is required (including
thinking about the implications of fsync()). So it is not totally
trivial, but well doable. I am just not sure if I'll do it immediately,
there are many things in the work queue. Testing effort is probably much
bigger than implementation effort, there are so many cases to check
out...

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
> I'm not understanding the point you are trying to make.
>
> are you saying that it's a bad idea to try and have an option to do the
> syncing we are talking about for the queue?

Sync: not as long as it's optional. Worrying about sector writes and
capacitors/battery-backed cache on the underlying drives: yes. Unless
rsyslog starts writing to raw devices, it is my opinion that once it
reasonably hands the data off to the filesystem, it becomes a kernel
and/or hardware problem and unnecessarily complex for a userspace
application to govern. I'm all for anything rsyslog can do to
encourage proper behavior (like calling sync()) without getting into
the kernel/hardware space.


RB
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
On Wed, 1 Oct 2008, RB wrote:

>> I'm not understanding the point you are trying to make.
>>
>> are you saying that it's a bad idea to try and have an option to do the
>> syncing we are talking about for the queue?
>
> Sync: not as long as it's optional. Worrying about sector writes and
> capacitors/battery-backed cache on the underlying drives: yes. Unless
> rsyslog starts writing to raw devices, it is my opinion that once it
> reasonably hands the data off to the filesystem, it becomes a kernel
> and/or hardware problem and unnecessarily complex for a userspace
> application to govern. I'm all for anything rsyslog can do to
> encourage proper behavior (like calling sync()) without getting into
> the kernel/hardware space.

Ok, we are on the same page then.

in case I confused anyone, the reason I went into detail on the hardware
side of things was to explain how proper hardware selection could result
in good performance while poor hardware selection would result is dismal
performance (and to give people who aren't familiar with the options some
hints as to what they could do)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
Hi,

3.21.5 created an assertion error.

rsyslogd: msg.c:208: MsgUnlockLockingCase: Assertion `pThis != ((void
*)0)' failed.

close to the end. Propably short before the abort signal.

I mounted /rsyslog (ext3) with the option : sync

Here is a copy of my rsyslog.conf:
#---------------------------------------
$ModLoad imuxsock.so
$ModLoad imklog.so

$WorkDirectory /rsyslog
$ActionQueueFileName buffer
$ActionQueueMaxDiskSpace 1g
$ActionQueueSaveOnShutdown on
$ActionQueueType Disk
$ActionQueueMaxFileSize 1m
$ActionQueueCheckpointInterval 1
$ActionResumeRetryCount -1
*.* @@10.8.0.1:514
#---------------------------------------

I attached the output from"

rsyslogd -c 3 -f /etc/rsyslog.conf > error.txt 2>&1

Actually only one messagefile was written, no .qi file was created.

bye
David Ecker


Rainer Gerhards schrieb:
> On Wed, 2008-10-01 at 08:43 -0700, david@lang.hm wrote:
>
>> On Wed, 1 Oct 2008, David Ecker wrote:
>>
>>
>>> Already did both with 2.18.3 but'll try again with 3.21.5 and 3.18.4. My
>>> guess is, that the O_DIRECT in combination with the O_SYNC flag (turning
>>> of cache) will have an impact.
>>>
>> O_DIRECT is doing very different things. I don't think you need to worry
>> about those things, having the data not go into the OS cache is a drawback
>> not an advantage becouse it means that when you go to pull the data back
>> out of the file it will need to actually touch disk. it also imposes
>> significant alignment issues on the application that I don't think you
>> want to have to desl with.
>>
>
> plus rsyslog does not care about the alignment (at this time), so I think it is dangerous...
>
> Rainer
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
Re: Development of failsafe disk based queue [ In reply to ]
Please use the version from git. I didn't realize that the bug affects
normal operations, but obviously it does. This is fixed and I'll see
that I release 3.21.6 ASAP, but I am not sure if I manage to do this
today.

Gitweb available at http://git.adiscon.com

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of David Ecker
> Sent: Monday, October 06, 2008 2:41 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] Development of failsafe disk based queue
>
> Hi,
>
> 3.21.5 created an assertion error.
>
> rsyslogd: msg.c:208: MsgUnlockLockingCase: Assertion `pThis != ((void
> *)0)' failed.
>
> close to the end. Propably short before the abort signal.
>
> I mounted /rsyslog (ext3) with the option : sync
>
> Here is a copy of my rsyslog.conf:
> #---------------------------------------
> $ModLoad imuxsock.so
> $ModLoad imklog.so
>
> $WorkDirectory /rsyslog
> $ActionQueueFileName buffer
> $ActionQueueMaxDiskSpace 1g
> $ActionQueueSaveOnShutdown on
> $ActionQueueType Disk
> $ActionQueueMaxFileSize 1m
> $ActionQueueCheckpointInterval 1
> $ActionResumeRetryCount -1
> *.* @@10.8.0.1:514
> #---------------------------------------
>
> I attached the output from"
>
> rsyslogd -c 3 -f /etc/rsyslog.conf > error.txt 2>&1
>
> Actually only one messagefile was written, no .qi file was created.
>
> bye
> David Ecker
>
>
> Rainer Gerhards schrieb:
> > On Wed, 2008-10-01 at 08:43 -0700, david@lang.hm wrote:
> >
> >> On Wed, 1 Oct 2008, David Ecker wrote:
> >>
> >>
> >>> Already did both with 2.18.3 but'll try again with 3.21.5 and
> 3.18.4. My
> >>> guess is, that the O_DIRECT in combination with the O_SYNC flag
> (turning
> >>> of cache) will have an impact.
> >>>
> >> O_DIRECT is doing very different things. I don't think you need to
> worry
> >> about those things, having the data not go into the OS cache is a
> drawback
> >> not an advantage becouse it means that when you go to pull the data
> back
> >> out of the file it will need to actually touch disk. it also
imposes
> >> significant alignment issues on the application that I don't think
> you
> >> want to have to desl with.
> >>
> >
> > plus rsyslog does not care about the alignment (at this time), so I
> think it is dangerous...
> >
> > Rainer
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
Hi Rainer,

the assert error seemed to be fixed in HEAD.

Mounting the ext3 filesystem with (noatime,sync,dirsync,rw) seemed to
work a lot better.

/etc/rsyslog.conf
----------------
$WorkDirectory /rsyslog/
$ActionQueueFileName buffer
$ActionQueueMaxDiskSpace 1g
$ActionQueueSaveOnShutdown on
$ActionQueueType Disk
$ActionQueueMaxFileSize 100m
$ActionQueueSize 10000000
$ActionQueueCheckpointInterval 1
$ActionResumeRetryCount -1
*.* @@10.8.0.1:514
-----------------

I found out that the queue was limited to 1000 elements if you do not
define ActionQueueSize inside the config.

One difference was that version 3.18.3 actually created one file per msg
if I did set ActionQueueCheckpointInterval to 1. Right now only one
file in addition to the .qi file is created containing all messages. I
haven't waited to see what will happen if I reach 100MB on the data file.

If I shutdown rsyslog normally some status information seems to be
written to the .qi file (508 bytes -> 1024 bytes). After restarting the
qi file actually shrinks to 508 bytes again.

If I kill rsyslogd (SIGKILL) the qi file is not updated with this
information (as expected). But it looks like the queue is still working
correctly after restarting rsyslogd at least it doesn't invalidate the
queue or loosing messages.

Turning the system off immediatly seems to be working most of the times.
After 11 tries the last one failed. It actually looked like that I
turned the system off during a write.

bye
David Ecker

Rainer Gerhards schrieb:
> Please use the version from git. I didn't realize that the bug affects
> normal operations, but obviously it does. This is fixed and I'll see
> that I release 3.21.6 ASAP, but I am not sure if I manage to do this
> today.
>
> Gitweb available at http://git.adiscon.com
>
> Rainer
>
Re: Development of failsafe disk based queue [ In reply to ]
Hi David,

On Thu, 2008-10-09 at 10:53 +0200, David Ecker wrote:
> Hi Rainer,
>
> the assert error seemed to be fixed in HEAD.

It is good to hear this. I begun to have some doubts when I reviewed the
code. I would really appreciate if you could download and test this
version here:

http://download.rsyslog.com/rsyslog/rsyslog-3.21.6-Test2.tar.gz

I will probably release that tomorrow and so some indication if the
problem is actually gone would be very good. The final 3.21.6 will see
one more patch, but nothing that affects the assert in question.

>
> Mounting the ext3 filesystem with (noatime,sync,dirsync,rw) seemed to
> work a lot better.
>
> /etc/rsyslog.conf
> ----------------
> $WorkDirectory /rsyslog/
> $ActionQueueFileName buffer
> $ActionQueueMaxDiskSpace 1g
> $ActionQueueSaveOnShutdown on
> $ActionQueueType Disk
> $ActionQueueMaxFileSize 100m
> $ActionQueueSize 10000000
> $ActionQueueCheckpointInterval 1
> $ActionResumeRetryCount -1
> *.* @@10.8.0.1:514
> -----------------
>
> I found out that the queue was limited to 1000 elements if you do not
> define ActionQueueSize inside the config.

Yes, that's the default for *action* queues (the main message queue
default is different, I think 10,000).

> One difference was that version 3.18.3 actually created one file per msg
> if I did set ActionQueueCheckpointInterval to 1. Right now only one
> file in addition to the .qi file is created containing all messages. I
> haven't waited to see what will happen if I reach 100MB on the data file.

That's probably a result of a bug fixed in the repo but not yet
released. But I don't check if it is gone now. 3.18.x will only see one
more release, then we move on to 3.20.x.

> If I shutdown rsyslog normally some status information seems to be
> written to the .qi file (508 bytes -> 1024 bytes). After restarting the
> qi file actually shrinks to 508 bytes again.

That's right.

>
> If I kill rsyslogd (SIGKILL) the qi file is not updated with this
> information (as expected). But it looks like the queue is still working
> correctly after restarting rsyslogd at least it doesn't invalidate the
> queue or loosing messages.

I'd say it depends on when exactly it is doing when it is being killed.

>
> Turning the system off immediatly seems to be working most of the times.
> After 11 tries the last one failed. It actually looked like that I
> turned the system off during a write.

This is also within what I expect. If you hit it during a write, things
are really bad.

>
> bye
> David Ecker
>
> Rainer Gerhards schrieb:
> > Please use the version from git. I didn't realize that the bug affects
> > normal operations, but obviously it does. This is fixed and I'll see
> > that I release 3.21.6 ASAP, but I am not sure if I manage to do this
> > today.
> >
> > Gitweb available at http://git.adiscon.com
> >
> > Rainer
> >
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Development of failsafe disk based queue [ In reply to ]
Hi Rainer,

with the same config it behaves similar to the HEAD version I tested in
the last few days. The assert message did not appear.

bye
David Ecker

Rainer Gerhards schrieb:
> Hi David,
>
> On Thu, 2008-10-09 at 10:53 +0200, David Ecker wrote:
>
>> Hi Rainer,
>>
>> the assert error seemed to be fixed in HEAD.
>>
>
> It is good to hear this. I begun to have some doubts when I reviewed the
> code. I would really appreciate if you could download and test this
> version here:
>
> http://download.rsyslog.com/rsyslog/rsyslog-3.21.6-Test2.tar.gz
>
> I will probably release that tomorrow and so some indication if the
> problem is actually gone would be very good. The final 3.21.6 will see
> one more patch, but nothing that affects the assert in question.
>
>
>> Mounting the ext3 filesystem with (noatime,sync,dirsync,rw) seemed to
>> work a lot better.
>>
>> /etc/rsyslog.conf
>> ----------------
>> $WorkDirectory /rsyslog/
>> $ActionQueueFileName buffer
>> $ActionQueueMaxDiskSpace 1g
>> $ActionQueueSaveOnShutdown on
>> $ActionQueueType Disk
>> $ActionQueueMaxFileSize 100m
>> $ActionQueueSize 10000000
>> $ActionQueueCheckpointInterval 1
>> $ActionResumeRetryCount -1
>> *.* @@10.8.0.1:514
>> -----------------
>>
>> I found out that the queue was limited to 1000 elements if you do not
>> define ActionQueueSize inside the config.
>>
>
> Yes, that's the default for *action* queues (the main message queue
> default is different, I think 10,000).
>
>
>> One difference was that version 3.18.3 actually created one file per msg
>> if I did set ActionQueueCheckpointInterval to 1. Right now only one
>> file in addition to the .qi file is created containing all messages. I
>> haven't waited to see what will happen if I reach 100MB on the data file.
>>
>
> That's probably a result of a bug fixed in the repo but not yet
> released. But I don't check if it is gone now. 3.18.x will only see one
> more release, then we move on to 3.20.x.
>
>
>> If I shutdown rsyslog normally some status information seems to be
>> written to the .qi file (508 bytes -> 1024 bytes). After restarting the
>> qi file actually shrinks to 508 bytes again.
>>
>
> That's right.
>
>
>> If I kill rsyslogd (SIGKILL) the qi file is not updated with this
>> information (as expected). But it looks like the queue is still working
>> correctly after restarting rsyslogd at least it doesn't invalidate the
>> queue or loosing messages.
>>
>
> I'd say it depends on when exactly it is doing when it is being killed.
>
>
>> Turning the system off immediatly seems to be working most of the times.
>> After 11 tries the last one failed. It actually looked like that I
>> turned the system off during a write.
>>
>
> This is also within what I expect. If you hit it during a write, things
> are really bad.
>
>
>> bye
>> David Ecker
>>
>> Rainer Gerhards schrieb:
>>
>>> Please use the version from git. I didn't realize that the bug affects
>>> normal operations, but obviously it does. This is fixed and I'll see
>>> that I release 3.21.6 ASAP, but I am not sure if I manage to do this
>>> today.
>>>
>>> Gitweb available at http://git.adiscon.com
>>>
>>> Rainer
>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
Re: Development of failsafe disk based queue [ In reply to ]
Hi David,

thanks for the quick feedback, much appreciated.

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of David Ecker
> Sent: Thursday, October 09, 2008 2:44 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] Development of failsafe disk based queue
>
> Hi Rainer,
>
> with the same config it behaves similar to the HEAD version I tested
in
> the last few days. The assert message did not appear.
>
> bye
> David Ecker
>
> Rainer Gerhards schrieb:
> > Hi David,
> >
> > On Thu, 2008-10-09 at 10:53 +0200, David Ecker wrote:
> >
> >> Hi Rainer,
> >>
> >> the assert error seemed to be fixed in HEAD.
> >>
> >
> > It is good to hear this. I begun to have some doubts when I reviewed
> the
> > code. I would really appreciate if you could download and test this
> > version here:
> >
> > http://download.rsyslog.com/rsyslog/rsyslog-3.21.6-Test2.tar.gz
> >
> > I will probably release that tomorrow and so some indication if the
> > problem is actually gone would be very good. The final 3.21.6 will
> see
> > one more patch, but nothing that affects the assert in question.
> >
> >
> >> Mounting the ext3 filesystem with (noatime,sync,dirsync,rw) seemed
> to
> >> work a lot better.
> >>
> >> /etc/rsyslog.conf
> >> ----------------
> >> $WorkDirectory /rsyslog/
> >> $ActionQueueFileName buffer
> >> $ActionQueueMaxDiskSpace 1g
> >> $ActionQueueSaveOnShutdown on
> >> $ActionQueueType Disk
> >> $ActionQueueMaxFileSize 100m
> >> $ActionQueueSize 10000000
> >> $ActionQueueCheckpointInterval 1
> >> $ActionResumeRetryCount -1
> >> *.* @@10.8.0.1:514
> >> -----------------
> >>
> >> I found out that the queue was limited to 1000 elements if you do
> not
> >> define ActionQueueSize inside the config.
> >>
> >
> > Yes, that's the default for *action* queues (the main message queue
> > default is different, I think 10,000).
> >
> >
> >> One difference was that version 3.18.3 actually created one file
per
> msg
> >> if I did set ActionQueueCheckpointInterval to 1. Right now only
one
> >> file in addition to the .qi file is created containing all
messages.
> I
> >> haven't waited to see what will happen if I reach 100MB on the data
> file.
> >>
> >
> > That's probably a result of a bug fixed in the repo but not yet
> > released. But I don't check if it is gone now. 3.18.x will only see
> one
> > more release, then we move on to 3.20.x.
> >
> >
> >> If I shutdown rsyslog normally some status information seems to be
> >> written to the .qi file (508 bytes -> 1024 bytes). After restarting
> the
> >> qi file actually shrinks to 508 bytes again.
> >>
> >
> > That's right.
> >
> >
> >> If I kill rsyslogd (SIGKILL) the qi file is not updated with this
> >> information (as expected). But it looks like the queue is still
> working
> >> correctly after restarting rsyslogd at least it doesn't invalidate
> the
> >> queue or loosing messages.
> >>
> >
> > I'd say it depends on when exactly it is doing when it is being
> killed.
> >
> >
> >> Turning the system off immediatly seems to be working most of the
> times.
> >> After 11 tries the last one failed. It actually looked like that I
> >> turned the system off during a write.
> >>
> >
> > This is also within what I expect. If you hit it during a write,
> things
> > are really bad.
> >
> >
> >> bye
> >> David Ecker
> >>
> >> Rainer Gerhards schrieb:
> >>
> >>> Please use the version from git. I didn't realize that the bug
> affects
> >>> normal operations, but obviously it does. This is fixed and I'll
> see
> >>> that I release 3.21.6 ASAP, but I am not sure if I manage to do
> this
> >>> today.
> >>>
> >>> Gitweb available at http://git.adiscon.com
> >>>
> >>> Rainer
> >>>
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> >>
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

1 2 3  View All