Mailing List Archive: Development of failsafe disk based queue

Development of failsafe disk based queue

david at ecker-software

Oct 1, 2008, 3:00 AM

Post #1 of 60 (8511 views)

Hi,

I am looking for a failsafe solution to store syslog messages localy
until they could be send later. I already looked at the disk based
memory queue and the disk based queue. Both queue's don't work if you
just power down the system immediatly actually loosing the whole queue.
I already looked at queue.c and it seemed to me that both queues were
not designed for that kind of failure, but I could be wrong there. Since
an immediate power down of the system is the major failure which will
occure pretty often I need to create a soltution there.

Did you already start to develop something addressing that problem?
Could you help me extend rsyslog (3.18.4) so that I can develop a new
queue myself? I would contribute the code to the rsyslog project if you
would like afterwards.

bye
David Ecker

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 3:32 AM

Post #2 of 60 (8437 views)

> I am looking for a failsafe solution to store syslog messages localy
> until they could be send later. I already looked at the disk based
> memory queue and the disk based queue. Both queue's don't work if you
> just power down the system immediatly actually loosing the whole
queue.
> I already looked at queue.c and it seemed to me that both queues were
> not designed for that kind of failure, but I could be wrong there.
> Since
> an immediate power down of the system is the major failure which will
> occure pretty often I need to create a soltution there.

I doubt there is a software soution against this (one that does not
depend on a transactional file system, of course). What prevents you
from using a UPS? I'd say that a sudden power-loss is by far the least
probable error cause for a system that is configured to do any serious
work.

Please elaborate why you (or others ;)) consider this case important.

> Did you already start to develop something addressing that problem?
> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> queue myself? I would contribute the code to the rsyslog project if
you
> would like afterwards.
>
> bye
> David Ecker
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 3:50 AM

Post #3 of 60 (8424 views)

One thing I forgot to mention: a pure disk queue (not a disk-assisted
one) gets you as close to your goal as possible (well, mostly - we
could, at a considerable performance expense, require synced writing).
With that case, all data is immediately stored on disk. You can
configure it to also write the meta data out immediately (and again with
sync, not yet supported). However, you still have a window of exposure,
for example if the power loss happens right in the middle of when the
disk actually writes data to the disk sector.

I still wonder why this scenario would be useful to address...

Rainer

On Wed, 2008-10-01 at 12:00 +0200, David Ecker wrote:
> Hi,
>
> I am looking for a failsafe solution to store syslog messages localy
> until they could be send later. I already looked at the disk based
> memory queue and the disk based queue. Both queue's don't work if you
> just power down the system immediatly actually loosing the whole queue.
> I already looked at queue.c and it seemed to me that both queues were
> not designed for that kind of failure, but I could be wrong there. Since
> an immediate power down of the system is the major failure which will
> occure pretty often I need to create a soltution there.
>
> Did you already start to develop something addressing that problem?
> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> queue myself? I would contribute the code to the rsyslog project if you
> would like afterwards.
>
> bye
> David Ecker
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 4:02 AM

Post #4 of 60 (8402 views)

Rainer Gerhards schrieb:
>> I am looking for a failsafe solution to store syslog messages localy
>> until they could be send later. I already looked at the disk based
>> memory queue and the disk based queue. Both queue's don't work if you
>> just power down the system immediatly actually loosing the whole
>>
> queue.
>
>> I already looked at queue.c and it seemed to me that both queues were
>> not designed for that kind of failure, but I could be wrong there.
>> Since
>> an immediate power down of the system is the major failure which will
>> occure pretty often I need to create a soltution there.
>>
>
> I doubt there is a software soution against this (one that does not
> depend on a transactional file system, of course). What prevents you
> from using a UPS? I'd say that a sudden power-loss is by far the least
> probable error cause for a system that is configured to do any serious
> work.
>
> Please elaborate why you (or others ;)) consider this case important.
>
>
The client systems (about 200 of them planned) are stationed in public
places around the world connected to centralized servers through vpn
connections over an unreliable network connection. Since space and look
requirement is important a UPS won't fit there. There is actually no
space for an UPS. The main problem is that customers are actually
pulling the plug to restart the system, to charge their laptops or
mobile phones or just for the fun of it.

The client base image is a read-only system (Knoppix Like) with an extra
hard disk for swap and other informations like syslog messages. Since
there are no administrators close to the client system the client itself
needs to have the capability to send all the missing log information
between a network failure and an immediate power down to the central
server for error analysis since those are usualy the most important once
to pinpoint the cause of the inital error.

My approach would be to use a block device directly since a file system
if fault-prone if you shut down the system immediatly. Each entry
including the header information guarded by a checksum value. It would
be actually something like a fixed array based queue just that it would
store the information in a block device. But this is just an inital thought.

>> Did you already start to develop something addressing that problem?
>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>> queue myself? I would contribute the code to the rsyslog project if
>>
> you
>
>> would like afterwards.
>>
>> bye
>> David Ecker
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 4:15 AM

Post #5 of 60 (8395 views)

Rainer Gerhards schrieb:
> One thing I forgot to mention: a pure disk queue (not a disk-assisted
> one) gets you as close to your goal as possible (well, mostly - we
> could, at a considerable performance expense, require synced writing).
> With that case, all data is immediately stored on disk. You can
> configure it to also write the meta data out immediately (and again with
> sync, not yet supported). However, you still have a window of exposure,
> for example if the power loss happens right in the middle of when the
> disk actually writes data to the disk sector.
>
>
If that would work it would be perfect. For testing I could actually
pass the correct fctl flag inside of queue.c hardcoded if that is all
that is needed.

> I still wonder why this scenario would be useful to address...
>
>
>
David
> On Wed, 2008-10-01 at 12:00 +0200, David Ecker wrote:
>
>> Hi,
>>
>> I am looking for a failsafe solution to store syslog messages localy
>> until they could be send later. I already looked at the disk based
>> memory queue and the disk based queue. Both queue's don't work if you
>> just power down the system immediatly actually loosing the whole queue.
>> I already looked at queue.c and it seemed to me that both queues were
>> not designed for that kind of failure, but I could be wrong there. Since
>> an immediate power down of the system is the major failure which will
>> occure pretty often I need to create a soltution there.
>>
>> Did you already start to develop something addressing that problem?
>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>> queue myself? I would contribute the code to the rsyslog project if you
>> would like afterwards.
>>
>> bye
>> David Ecker
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 4:48 AM

Post #6 of 60 (8438 views)

On Wed, 2008-10-01 at 13:15 +0200, David Ecker wrote:
> Rainer Gerhards schrieb:
> > One thing I forgot to mention: a pure disk queue (not a disk-assisted
> > one) gets you as close to your goal as possible (well, mostly - we
> > could, at a considerable performance expense, require synced writing).
> > With that case, all data is immediately stored on disk. You can
> > configure it to also write the meta data out immediately (and again with
> > sync, not yet supported). However, you still have a window of exposure,
> > for example if the power loss happens right in the middle of when the
> > disk actually writes data to the disk sector.
> >
> >
> If that would work it would be perfect. For testing I could actually
> pass the correct fctl flag inside of queue.c hardcoded if that is all
> that is needed.

that works ;) You just need to fine-tune the queue params. If you find
problems with that, I am more than happy to help.

> > I still wonder why this scenario would be useful to address...

very interesting scenario. Never thought about such one :) If you need
to enhance the queue (for block devices), I can provide a some hints.
But, given the other priorities which are required by a much broader
user base, involving me more than a hint here or there would probably
require a consulting contract. Sorry for being so bluntly, but I think
it is always good to set the right expectation level.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 4:55 AM

Post #7 of 60 (8410 views)

On Wed, 1 Oct 2008, David Ecker wrote:

> Hi,
>
> I am looking for a failsafe solution to store syslog messages localy
> until they could be send later. I already looked at the disk based
> memory queue and the disk based queue. Both queue's don't work if you
> just power down the system immediatly actually loosing the whole queue.

are you sure about the disk based queue?

per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
can be set to do a commit of the metadata after each message.

Disk Queues

Disk queues use disk drives for buffering. The important fact is that the
always use the disk and do not buffer anything in memory. Thus, the queue
is ultra-reliable, but by far the slowest mode. For regular use cases,
this queue mode is not recommended. It is useful if log data is so
important that it must not be lost, even in extreme cases.

When a disk queue is written, it is done in chunks. Each chunk receives
its individual file. Files are named with a prefix (set via the
"$<object>QueueFilename" config directive) and followed by a 7-digit
number (starting at one and incremented for each file). Chunks are 10mb by
default, a different size can be set via the"$<object>QueueMaxFileSize"
config directive. Note that the size limit is not a sharp one: rsyslog
always writes one complete queue entry, even if it violates the size
limit. So chunks are actually a little but (usually less than 1k) larger
then the configured size. Each chunk also has a different size for the
same reason. If you observe different chunk sizes, you can relax: this is
not a problem.

Writing in chunks is used so that processed data can quickly be deleted
and is free for other uses - while at the same time keeping no artificial
upper limit on disk space used. If a disk quota is set (instructions
further below), be sure that the quota/chunk size allows at least two
chunks to be written. Rsyslog currently does not check that and will fail
miserably if a single chunk is over the quota.

Creating new chunks costs performance but provides quicker ability to free
disk space. The 10mb default is considered a good compromise between these
two. However, it may make sense to adapt these settings to local policies.
For example, if a disk queue is written on a dedicated 200gb disk, it may
make sense to use a 2gb (or even larger) chunk size.

Please note, however, that the disk queue by default does not update its
housekeeping structures every time it writes to disk. This is for
performance reasons. In the event of failure, data will still be lost
(except when manually is mangled with the file structures). However, disk
queues can be set to write bookkeeping information on checkpoints (every n
records), so that this can be made ultra-reliable, too. If the checkpoint
interval is set to one, no data can be lost, but the queue is
exceptionally slow.

Each queue can be placed on a different disk for best performance and/or
isolation. This is currently selected by specifying different
$WorkDirectory config directives before the queue creation statement.

To create a disk queue, use the "$<object>QueueType Disk" config
directive. Checkpoint intervals can be specified via
"$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.

you also need to specificly enable syncing (from
http://www.rsyslog.com/doc-v3compatibility.html )

Output File Syncing
Rsyslogd tries to keep as compatible to stock syslogd as possible. As
such, it retained stock syslogd's default of syncing every file write if
not specified otherwise (by placing a dash in front of the output file
name). While this was a useful feature in past days where hardware was
much less reliable and UPS seldom, this no longer is useful in today's
worl. Instead, the syncing is a high performace hit. With it, rsyslogd
writes files around 50 *times* slower than without it. It also affects
overall system performance due to the high IO activity. In rsyslog v3,
syncing has been turned off by default. This is done via a specific
configuration directive "$ActionFileEnableSync on/off" which is off by
default. So even if rsyslogd finds sync selector lines, it ignores them by
default. In order to enable file syncing, the administrator must specify
"$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
syncing only happens in some installations where the administrator
actually wanted that (performance-intense) feature. In the fast majority
of cases (if not all), this dramatically increases rsyslogd performance
without any negative effects.

> I already looked at queue.c and it seemed to me that both queues were
> not designed for that kind of failure, but I could be wrong there. Since
> an immediate power down of the system is the major failure which will
> occure pretty often I need to create a soltution there.

with checkpoint interval set to 1 and syncing enabled the data should be
in on the disk safely (assuming you have hardware that supports this) and
a power-off won't affect it.

David Lang

> Did you already start to develop something addressing that problem?
> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> queue myself? I would contribute the code to the rsyslog project if you
> would like afterwards.
>
> bye
> David Ecker
>

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 4:57 AM

Post #8 of 60 (8412 views)

David,

the file syncing mentioned in the compatibility doc applies to the
output action, only.

The queue does never do synchronous writes - I always assumed that a
critical system would have a UPS and could never think (so far) about a
valid reason for not having it. So the queue would need to have an extra
option to do sync writes. Obviously, that's not a big deal.

Performance, of course, will be extremely terrible with such a setup...

Rainer

On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
> On Wed, 1 Oct 2008, David Ecker wrote:
>
> > Hi,
> >
> > I am looking for a failsafe solution to store syslog messages localy
> > until they could be send later. I already looked at the disk based
> > memory queue and the disk based queue. Both queue's don't work if you
> > just power down the system immediatly actually loosing the whole queue.
>
> are you sure about the disk based queue?
>
> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
> can be set to do a commit of the metadata after each message.
>
> Disk Queues
>
> Disk queues use disk drives for buffering. The important fact is that the
> always use the disk and do not buffer anything in memory. Thus, the queue
> is ultra-reliable, but by far the slowest mode. For regular use cases,
> this queue mode is not recommended. It is useful if log data is so
> important that it must not be lost, even in extreme cases.
>
> When a disk queue is written, it is done in chunks. Each chunk receives
> its individual file. Files are named with a prefix (set via the
> "$<object>QueueFilename" config directive) and followed by a 7-digit
> number (starting at one and incremented for each file). Chunks are 10mb by
> default, a different size can be set via the"$<object>QueueMaxFileSize"
> config directive. Note that the size limit is not a sharp one: rsyslog
> always writes one complete queue entry, even if it violates the size
> limit. So chunks are actually a little but (usually less than 1k) larger
> then the configured size. Each chunk also has a different size for the
> same reason. If you observe different chunk sizes, you can relax: this is
> not a problem.
>
> Writing in chunks is used so that processed data can quickly be deleted
> and is free for other uses - while at the same time keeping no artificial
> upper limit on disk space used. If a disk quota is set (instructions
> further below), be sure that the quota/chunk size allows at least two
> chunks to be written. Rsyslog currently does not check that and will fail
> miserably if a single chunk is over the quota.
>
> Creating new chunks costs performance but provides quicker ability to free
> disk space. The 10mb default is considered a good compromise between these
> two. However, it may make sense to adapt these settings to local policies.
> For example, if a disk queue is written on a dedicated 200gb disk, it may
> make sense to use a 2gb (or even larger) chunk size.
>
> Please note, however, that the disk queue by default does not update its
> housekeeping structures every time it writes to disk. This is for
> performance reasons. In the event of failure, data will still be lost
> (except when manually is mangled with the file structures). However, disk
> queues can be set to write bookkeeping information on checkpoints (every n
> records), so that this can be made ultra-reliable, too. If the checkpoint
> interval is set to one, no data can be lost, but the queue is
> exceptionally slow.
>
> Each queue can be placed on a different disk for best performance and/or
> isolation. This is currently selected by specifying different
> $WorkDirectory config directives before the queue creation statement.
>
> To create a disk queue, use the "$<object>QueueType Disk" config
> directive. Checkpoint intervals can be specified via
> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>
>
>
>
>
> you also need to specificly enable syncing (from
> http://www.rsyslog.com/doc-v3compatibility.html )
>
> Output File Syncing
> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> such, it retained stock syslogd's default of syncing every file write if
> not specified otherwise (by placing a dash in front of the output file
> name). While this was a useful feature in past days where hardware was
> much less reliable and UPS seldom, this no longer is useful in today's
> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
> writes files around 50 *times* slower than without it. It also affects
> overall system performance due to the high IO activity. In rsyslog v3,
> syncing has been turned off by default. This is done via a specific
> configuration directive "$ActionFileEnableSync on/off" which is off by
> default. So even if rsyslogd finds sync selector lines, it ignores them by
> default. In order to enable file syncing, the administrator must specify
> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
> syncing only happens in some installations where the administrator
> actually wanted that (performance-intense) feature. In the fast majority
> of cases (if not all), this dramatically increases rsyslogd performance
> without any negative effects.
>
>
>
> > I already looked at queue.c and it seemed to me that both queues were
> > not designed for that kind of failure, but I could be wrong there. Since
> > an immediate power down of the system is the major failure which will
> > occure pretty often I need to create a soltution there.
>
> with checkpoint interval set to 1 and syncing enabled the data should be
> in on the disk safely (assuming you have hardware that supports this) and
> a power-off won't affect it.
>
> David Lang
>
>
>
> > Did you already start to develop something addressing that problem?
> > Could you help me extend rsyslog (3.18.4) so that I can develop a new
> > queue myself? I would contribute the code to the rsyslog project if you
> > would like afterwards.
> >
> > bye
> > David Ecker
> >
> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 5:02 AM

Post #9 of 60 (8409 views)

On Wed, 1 Oct 2008, David Ecker wrote:

> Rainer Gerhards schrieb:
>>> I am looking for a failsafe solution to store syslog messages localy
>>> until they could be send later. I already looked at the disk based
>>> memory queue and the disk based queue. Both queue's don't work if you
>>> just power down the system immediatly actually loosing the whole
>>>
>> queue.
>>
>>> I already looked at queue.c and it seemed to me that both queues were
>>> not designed for that kind of failure, but I could be wrong there.
>>> Since
>>> an immediate power down of the system is the major failure which will
>>> occure pretty often I need to create a soltution there.
>>>
>>
>> I doubt there is a software soution against this (one that does not
>> depend on a transactional file system, of course). What prevents you
>> from using a UPS? I'd say that a sudden power-loss is by far the least
>> probable error cause for a system that is configured to do any serious
>> work.
>>
>> Please elaborate why you (or others ;)) consider this case important.
>>
>>
> The client systems (about 200 of them planned) are stationed in public
> places around the world connected to centralized servers through vpn
> connections over an unreliable network connection. Since space and look
> requirement is important a UPS won't fit there. There is actually no
> space for an UPS. The main problem is that customers are actually
> pulling the plug to restart the system, to charge their laptops or
> mobile phones or just for the fun of it.

you can get UPS systems that are PCI cards, completly internal. they still
may not fit, but you at least have a chance.

> The client base image is a read-only system (Knoppix Like) with an extra
> hard disk for swap and other informations like syslog messages. Since
> there are no administrators close to the client system the client itself
> needs to have the capability to send all the missing log information
> between a network failure and an immediate power down to the central
> server for error analysis since those are usualy the most important once
> to pinpoint the cause of the inital error.
>
> My approach would be to use a block device directly since a file system
> if fault-prone if you shut down the system immediatly. Each entry
> including the header information guarded by a checksum value. It would
> be actually something like a fixed array based queue just that it would
> store the information in a block device. But this is just an inital thought.

you are inventing a new filesystem here. it's not that easy to be reliable
becouse the disk can lie to you. unless you are doing interesting things
at the ATA/SCSI command level the disk may re-order your writes and may
cache them in memory on the drive for an unknown time before actually
writing them

if you need reliable writes at anything close to a reasonable speed you
need to have a battery backed cache or solid state drive in your machine
(and the solid state drives are not all fast to write to)

David Lang

>>> Did you already start to develop something addressing that problem?
>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>> queue myself? I would contribute the code to the rsyslog project if
>>>
>> you
>>
>>> would like afterwards.
>>>
>>> bye
>>> David Ecker
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>
>

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 5:05 AM

Post #10 of 60 (8405 views)

Hi,

should I use 3.18.4 (latest stable, I would preferr that one) or do I
need the latest development version? I would actually alter queue.c
directly changing the fctl flags in the disk based queue
(O_DIRECT,O_SYNC,.O_NOATIME).

Performance is not really an issue. There will be only 1000 to 2000
Messages per hour in peak times

bye
David Ecker

Rainer Gerhards schrieb:
> David,
>
> the file syncing mentioned in the compatibility doc applies to the
> output action, only.
>
> The queue does never do synchronous writes - I always assumed that a
> critical system would have a UPS and could never think (so far) about a
> valid reason for not having it. So the queue would need to have an extra
> option to do sync writes. Obviously, that's not a big deal.
>
> Performance, of course, will be extremely terrible with such a setup...
>
> Rainer
>
> On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
>
>> On Wed, 1 Oct 2008, David Ecker wrote:
>>
>>
>>> Hi,
>>>
>>> I am looking for a failsafe solution to store syslog messages localy
>>> until they could be send later. I already looked at the disk based
>>> memory queue and the disk based queue. Both queue's don't work if you
>>> just power down the system immediatly actually loosing the whole queue.
>>>
>> are you sure about the disk based queue?
>>
>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
>> can be set to do a commit of the metadata after each message.
>>
>> Disk Queues
>>
>> Disk queues use disk drives for buffering. The important fact is that the
>> always use the disk and do not buffer anything in memory. Thus, the queue
>> is ultra-reliable, but by far the slowest mode. For regular use cases,
>> this queue mode is not recommended. It is useful if log data is so
>> important that it must not be lost, even in extreme cases.
>>
>> When a disk queue is written, it is done in chunks. Each chunk receives
>> its individual file. Files are named with a prefix (set via the
>> "$<object>QueueFilename" config directive) and followed by a 7-digit
>> number (starting at one and incremented for each file). Chunks are 10mb by
>> default, a different size can be set via the"$<object>QueueMaxFileSize"
>> config directive. Note that the size limit is not a sharp one: rsyslog
>> always writes one complete queue entry, even if it violates the size
>> limit. So chunks are actually a little but (usually less than 1k) larger
>> then the configured size. Each chunk also has a different size for the
>> same reason. If you observe different chunk sizes, you can relax: this is
>> not a problem.
>>
>> Writing in chunks is used so that processed data can quickly be deleted
>> and is free for other uses - while at the same time keeping no artificial
>> upper limit on disk space used. If a disk quota is set (instructions
>> further below), be sure that the quota/chunk size allows at least two
>> chunks to be written. Rsyslog currently does not check that and will fail
>> miserably if a single chunk is over the quota.
>>
>> Creating new chunks costs performance but provides quicker ability to free
>> disk space. The 10mb default is considered a good compromise between these
>> two. However, it may make sense to adapt these settings to local policies.
>> For example, if a disk queue is written on a dedicated 200gb disk, it may
>> make sense to use a 2gb (or even larger) chunk size.
>>
>> Please note, however, that the disk queue by default does not update its
>> housekeeping structures every time it writes to disk. This is for
>> performance reasons. In the event of failure, data will still be lost
>> (except when manually is mangled with the file structures). However, disk
>> queues can be set to write bookkeeping information on checkpoints (every n
>> records), so that this can be made ultra-reliable, too. If the checkpoint
>> interval is set to one, no data can be lost, but the queue is
>> exceptionally slow.
>>
>> Each queue can be placed on a different disk for best performance and/or
>> isolation. This is currently selected by specifying different
>> $WorkDirectory config directives before the queue creation statement.
>>
>> To create a disk queue, use the "$<object>QueueType Disk" config
>> directive. Checkpoint intervals can be specified via
>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>>
>>
>>
>>
>>
>> you also need to specificly enable syncing (from
>> http://www.rsyslog.com/doc-v3compatibility.html )
>>
>> Output File Syncing
>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
>> such, it retained stock syslogd's default of syncing every file write if
>> not specified otherwise (by placing a dash in front of the output file
>> name). While this was a useful feature in past days where hardware was
>> much less reliable and UPS seldom, this no longer is useful in today's
>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
>> writes files around 50 *times* slower than without it. It also affects
>> overall system performance due to the high IO activity. In rsyslog v3,
>> syncing has been turned off by default. This is done via a specific
>> configuration directive "$ActionFileEnableSync on/off" which is off by
>> default. So even if rsyslogd finds sync selector lines, it ignores them by
>> default. In order to enable file syncing, the administrator must specify
>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
>> syncing only happens in some installations where the administrator
>> actually wanted that (performance-intense) feature. In the fast majority
>> of cases (if not all), this dramatically increases rsyslogd performance
>> without any negative effects.
>>
>>
>>
>>
>>> I already looked at queue.c and it seemed to me that both queues were
>>> not designed for that kind of failure, but I could be wrong there. Since
>>> an immediate power down of the system is the major failure which will
>>> occure pretty often I need to create a soltution there.
>>>
>> with checkpoint interval set to 1 and syncing enabled the data should be
>> in on the disk safely (assuming you have hardware that supports this) and
>> a power-off won't affect it.
>>
>> David Lang
>>
>>
>>
>>
>>> Did you already start to develop something addressing that problem?
>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>> queue myself? I would contribute the code to the rsyslog project if you
>>> would like afterwards.
>>>
>>> bye
>>> David Ecker
>>>
>>>
>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 5:07 AM

Post #11 of 60 (8397 views)

On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> One thing I forgot to mention: a pure disk queue (not a disk-assisted
> one) gets you as close to your goal as possible (well, mostly - we
> could, at a considerable performance expense, require synced writing).
> With that case, all data is immediately stored on disk. You can
> configure it to also write the meta data out immediately (and again with
> sync, not yet supported). However, you still have a window of exposure,
> for example if the power loss happens right in the middle of when the
> disk actually writes data to the disk sector.
>
> I still wonder why this scenario would be useful to address...

not all uses of rsyslog are for simple system logs. it's a good general
purpose log tool, and there are some cases where you want to be as sure as
you possibly can be that once a message has been acknowledged it has no
chance of being lost.

useing some form of solid-state reliable storage (battery backed ram on a
raid controller, a battery backed ram disk, a flash disk) it is possible
(but not nessasarily cheap) to get the ability to do tens to hundreds of
thousands of writes + syncs per second

David Lang

> Rainer
>
> On Wed, 2008-10-01 at 12:00 +0200, David Ecker wrote:
>> Hi,
>>
>> I am looking for a failsafe solution to store syslog messages localy
>> until they could be send later. I already looked at the disk based
>> memory queue and the disk based queue. Both queue's don't work if you
>> just power down the system immediatly actually loosing the whole queue.
>> I already looked at queue.c and it seemed to me that both queues were
>> not designed for that kind of failure, but I could be wrong there. Since
>> an immediate power down of the system is the major failure which will
>> occure pretty often I need to create a soltution there.
>>
>> Did you already start to develop something addressing that problem?
>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>> queue myself? I would contribute the code to the rsyslog project if you
>> would like afterwards.
>>
>> bye
>> David Ecker
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 5:07 AM

Post #12 of 60 (8411 views)

Hi David,

the actuall problem was that the qi file was not correct after an
immediate restart. All messages were actually correctly written except
the qi file.

bye
David Ecker

david@lang.hm schrieb:
> On Wed, 1 Oct 2008, David Ecker wrote:
>
>> Hi,
>>
>> I am looking for a failsafe solution to store syslog messages localy
>> until they could be send later. I already looked at the disk based
>> memory queue and the disk based queue. Both queue's don't work if you
>> just power down the system immediatly actually loosing the whole queue.
>
> are you sure about the disk based queue?
>
> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based
> queue can be set to do a commit of the metadata after each message.
>
> Disk Queues
>
> Disk queues use disk drives for buffering. The important fact is that
> the always use the disk and do not buffer anything in memory. Thus,
> the queue is ultra-reliable, but by far the slowest mode. For regular
> use cases, this queue mode is not recommended. It is useful if log
> data is so important that it must not be lost, even in extreme cases.
>
> When a disk queue is written, it is done in chunks. Each chunk
> receives its individual file. Files are named with a prefix (set via
> the "$<object>QueueFilename" config directive) and followed by a
> 7-digit number (starting at one and incremented for each file). Chunks
> are 10mb by default, a different size can be set via
> the"$<object>QueueMaxFileSize" config directive. Note that the size
> limit is not a sharp one: rsyslog always writes one complete queue
> entry, even if it violates the size limit. So chunks are actually a
> little but (usually less than 1k) larger then the configured size.
> Each chunk also has a different size for the same reason. If you
> observe different chunk sizes, you can relax: this is not a problem.
>
> Writing in chunks is used so that processed data can quickly be
> deleted and is free for other uses - while at the same time keeping no
> artificial upper limit on disk space used. If a disk quota is set
> (instructions further below), be sure that the quota/chunk size allows
> at least two chunks to be written. Rsyslog currently does not check
> that and will fail miserably if a single chunk is over the quota.
>
> Creating new chunks costs performance but provides quicker ability to
> free disk space. The 10mb default is considered a good compromise
> between these two. However, it may make sense to adapt these settings
> to local policies. For example, if a disk queue is written on a
> dedicated 200gb disk, it may make sense to use a 2gb (or even larger)
> chunk size.
>
> Please note, however, that the disk queue by default does not update
> its housekeeping structures every time it writes to disk. This is for
> performance reasons. In the event of failure, data will still be lost
> (except when manually is mangled with the file structures). However,
> disk queues can be set to write bookkeeping information on checkpoints
> (every n records), so that this can be made ultra-reliable, too. If
> the checkpoint interval is set to one, no data can be lost, but the
> queue is exceptionally slow.
>
> Each queue can be placed on a different disk for best performance
> and/or isolation. This is currently selected by specifying different
> $WorkDirectory config directives before the queue creation statement.
>
> To create a disk queue, use the "$<object>QueueType Disk" config
> directive. Checkpoint intervals can be specified via
> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>
>
>
>
>
> you also need to specificly enable syncing (from
> http://www.rsyslog.com/doc-v3compatibility.html )
>
> Output File Syncing
> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> such, it retained stock syslogd's default of syncing every file write
> if not specified otherwise (by placing a dash in front of the output
> file name). While this was a useful feature in past days where
> hardware was much less reliable and UPS seldom, this no longer is
> useful in today's worl. Instead, the syncing is a high performace hit.
> With it, rsyslogd writes files around 50 *times* slower than without
> it. It also affects overall system performance due to the high IO
> activity. In rsyslog v3, syncing has been turned off by default. This
> is done via a specific configuration directive "$ActionFileEnableSync
> on/off" which is off by default. So even if rsyslogd finds sync
> selector lines, it ignores them by default. In order to enable file
> syncing, the administrator must specify "$ActionFileEnableSync on" at
> the top of rsyslog.conf. This ensures that syncing only happens in
> some installations where the administrator actually wanted that
> (performance-intense) feature. In the fast majority of cases (if not
> all), this dramatically increases rsyslogd performance without any
> negative effects.
>
>
>
>> I already looked at queue.c and it seemed to me that both queues were
>> not designed for that kind of failure, but I could be wrong there. Since
>> an immediate power down of the system is the major failure which will
>> occure pretty often I need to create a soltution there.
>
> with checkpoint interval set to 1 and syncing enabled the data should
> be in on the disk safely (assuming you have hardware that supports
> this) and a power-off won't affect it.
>
> David Lang
>
>
>
>> Did you already start to develop something addressing that problem?
>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>> queue myself? I would contribute the code to the rsyslog project if you
>> would like afterwards.
>>
>> bye
>> David Ecker
>>
> ------------------------------------------------------------------------
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 5:09 AM

Post #13 of 60 (8430 views)

I am currently working on the queue engine in the devel branch. In that
light, it could make sense to start with that branch. Also, I won't make
any non-bug related changes in the stable version, so if you run into
anything I can quickly change, that will happen in devel.

Rainer

On Wed, 2008-10-01 at 14:05 +0200, David Ecker wrote:
> Hi,
>
> should I use 3.18.4 (latest stable, I would preferr that one) or do I
> need the latest development version? I would actually alter queue.c
> directly changing the fctl flags in the disk based queue
> (O_DIRECT,O_SYNC,.O_NOATIME).
>
> Performance is not really an issue. There will be only 1000 to 2000
> Messages per hour in peak times
>
> bye
> David Ecker
>
> Rainer Gerhards schrieb:
> > David,
> >
> > the file syncing mentioned in the compatibility doc applies to the
> > output action, only.
> >
> > The queue does never do synchronous writes - I always assumed that a
> > critical system would have a UPS and could never think (so far) about a
> > valid reason for not having it. So the queue would need to have an extra
> > option to do sync writes. Obviously, that's not a big deal.
> >
> > Performance, of course, will be extremely terrible with such a setup...
> >
> > Rainer
> >
> > On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
> >
> >> On Wed, 1 Oct 2008, David Ecker wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> I am looking for a failsafe solution to store syslog messages localy
> >>> until they could be send later. I already looked at the disk based
> >>> memory queue and the disk based queue. Both queue's don't work if you
> >>> just power down the system immediatly actually loosing the whole queue.
> >>>
> >> are you sure about the disk based queue?
> >>
> >> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
> >> can be set to do a commit of the metadata after each message.
> >>
> >> Disk Queues
> >>
> >> Disk queues use disk drives for buffering. The important fact is that the
> >> always use the disk and do not buffer anything in memory. Thus, the queue
> >> is ultra-reliable, but by far the slowest mode. For regular use cases,
> >> this queue mode is not recommended. It is useful if log data is so
> >> important that it must not be lost, even in extreme cases.
> >>
> >> When a disk queue is written, it is done in chunks. Each chunk receives
> >> its individual file. Files are named with a prefix (set via the
> >> "$<object>QueueFilename" config directive) and followed by a 7-digit
> >> number (starting at one and incremented for each file). Chunks are 10mb by
> >> default, a different size can be set via the"$<object>QueueMaxFileSize"
> >> config directive. Note that the size limit is not a sharp one: rsyslog
> >> always writes one complete queue entry, even if it violates the size
> >> limit. So chunks are actually a little but (usually less than 1k) larger
> >> then the configured size. Each chunk also has a different size for the
> >> same reason. If you observe different chunk sizes, you can relax: this is
> >> not a problem.
> >>
> >> Writing in chunks is used so that processed data can quickly be deleted
> >> and is free for other uses - while at the same time keeping no artificial
> >> upper limit on disk space used. If a disk quota is set (instructions
> >> further below), be sure that the quota/chunk size allows at least two
> >> chunks to be written. Rsyslog currently does not check that and will fail
> >> miserably if a single chunk is over the quota.
> >>
> >> Creating new chunks costs performance but provides quicker ability to free
> >> disk space. The 10mb default is considered a good compromise between these
> >> two. However, it may make sense to adapt these settings to local policies.
> >> For example, if a disk queue is written on a dedicated 200gb disk, it may
> >> make sense to use a 2gb (or even larger) chunk size.
> >>
> >> Please note, however, that the disk queue by default does not update its
> >> housekeeping structures every time it writes to disk. This is for
> >> performance reasons. In the event of failure, data will still be lost
> >> (except when manually is mangled with the file structures). However, disk
> >> queues can be set to write bookkeeping information on checkpoints (every n
> >> records), so that this can be made ultra-reliable, too. If the checkpoint
> >> interval is set to one, no data can be lost, but the queue is
> >> exceptionally slow.
> >>
> >> Each queue can be placed on a different disk for best performance and/or
> >> isolation. This is currently selected by specifying different
> >> $WorkDirectory config directives before the queue creation statement.
> >>
> >> To create a disk queue, use the "$<object>QueueType Disk" config
> >> directive. Checkpoint intervals can be specified via
> >> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
> >>
> >>
> >>
> >>
> >>
> >> you also need to specificly enable syncing (from
> >> http://www.rsyslog.com/doc-v3compatibility.html )
> >>
> >> Output File Syncing
> >> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> >> such, it retained stock syslogd's default of syncing every file write if
> >> not specified otherwise (by placing a dash in front of the output file
> >> name). While this was a useful feature in past days where hardware was
> >> much less reliable and UPS seldom, this no longer is useful in today's
> >> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
> >> writes files around 50 *times* slower than without it. It also affects
> >> overall system performance due to the high IO activity. In rsyslog v3,
> >> syncing has been turned off by default. This is done via a specific
> >> configuration directive "$ActionFileEnableSync on/off" which is off by
> >> default. So even if rsyslogd finds sync selector lines, it ignores them by
> >> default. In order to enable file syncing, the administrator must specify
> >> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
> >> syncing only happens in some installations where the administrator
> >> actually wanted that (performance-intense) feature. In the fast majority
> >> of cases (if not all), this dramatically increases rsyslogd performance
> >> without any negative effects.
> >>
> >>
> >>
> >>
> >>> I already looked at queue.c and it seemed to me that both queues were
> >>> not designed for that kind of failure, but I could be wrong there. Since
> >>> an immediate power down of the system is the major failure which will
> >>> occure pretty often I need to create a soltution there.
> >>>
> >> with checkpoint interval set to 1 and syncing enabled the data should be
> >> in on the disk safely (assuming you have hardware that supports this) and
> >> a power-off won't affect it.
> >>
> >> David Lang
> >>
> >>
> >>
> >>
> >>> Did you already start to develop something addressing that problem?
> >>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> >>> queue myself? I would contribute the code to the rsyslog project if you
> >>> would like afterwards.
> >>>
> >>> bye
> >>> David Ecker
> >>>
> >>>
> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >>
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 5:12 AM

Post #14 of 60 (8421 views)

On Wed, 2008-10-01 at 05:07 -0700, david@lang.hm wrote:
> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>
> > One thing I forgot to mention: a pure disk queue (not a disk-assisted
> > one) gets you as close to your goal as possible (well, mostly - we
> > could, at a considerable performance expense, require synced writing).
> > With that case, all data is immediately stored on disk. You can
> > configure it to also write the meta data out immediately (and again with
> > sync, not yet supported). However, you still have a window of exposure,
> > for example if the power loss happens right in the middle of when the
> > disk actually writes data to the disk sector.
> >
> > I still wonder why this scenario would be useful to address...
>
> not all uses of rsyslog are for simple system logs. it's a good general
> purpose log tool, and there are some cases where you want to be as sure as
> you possibly can be that once a message has been acknowledged it has no
> chance of being lost.

I designed the engine for audit-class reliability. However, I assumed
that the rest of the system is also playing in that class. Doing
everything with a potential power failure in mind creates a lot of extra
demands. And I have never heard of anybody doing serious datacenter work
without a proper UPS. Is this *really* an issue?

Rainer
>
> useing some form of solid-state reliable storage (battery backed ram on a
> raid controller, a battery backed ram disk, a flash disk) it is possible
> (but not nessasarily cheap) to get the ability to do tens to hundreds of
> thousands of writes + syncs per second
>
> David Lang
>
> > Rainer
> >
> > On Wed, 2008-10-01 at 12:00 +0200, David Ecker wrote:
> >> Hi,
> >>
> >> I am looking for a failsafe solution to store syslog messages localy
> >> until they could be send later. I already looked at the disk based
> >> memory queue and the disk based queue. Both queue's don't work if you
> >> just power down the system immediatly actually loosing the whole queue.
> >> I already looked at queue.c and it seemed to me that both queues were
> >> not designed for that kind of failure, but I could be wrong there. Since
> >> an immediate power down of the system is the major failure which will
> >> occure pretty often I need to create a soltution there.
> >>
> >> Did you already start to develop something addressing that problem?
> >> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> >> queue myself? I would contribute the code to the rsyslog project if you
> >> would like afterwards.
> >>
> >> bye
> >> David Ecker
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 5:25 AM

Post #15 of 60 (8400 views)

On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> David,
>
> the file syncing mentioned in the compatibility doc applies to the
> output action, only.

ouch.

> The queue does never do synchronous writes - I always assumed that a
> critical system would have a UPS and could never think (so far) about a
> valid reason for not having it. So the queue would need to have an extra
> option to do sync writes. Obviously, that's not a big deal.

good

> Performance, of course, will be extremely terrible with such a setup...

only if you have to wait for a spinning disk to do the write.

this is the same problem that databases have. they need to guarentee that
once the database tells the writing program that the data is written it
will be there even if the system looses power immediatly.

if you run a database on standard desktop hardware (and it doesn't have
this safety disabled) you cannot do more then about 80 writes/second. If
you upgrade to the super speedy 15K rpm drives you can do ~160
writes/second.

given that you need to write the data + metadata it gets even uglier, so
what the databases do (and some journaling filesystems) is to write a log
that says what they are going to do, sync that, and then later write the
data to the actual files (updating the journal when they complete the
write)

it sounds like you order your write correctly for a disk-based queue, but
you would need the option of issuing the syncs (probably when you do the
checkpoints)

if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
wrong flash drive), the fact that you need to do four writes per log entry
(data to queue, metadata to queue, data to output, update metadata for
queue) could drop you to below 15 logs/sec (60/4 but then you loose time
to seeking as well)

however, with the correct drive to write to (say a $2,400 80G fusion-io
flash card that can do ~100k IO ops/sec) you should be able to sustain
20,000 logs/sec.

realisticly very few people need the sustained write capacity that you
would get from such a setup. but if you go with a $500-$700 raid card with
a battery-backed cache you get very similar performance, but with some
possibility that you can't sustain it forever.

David Lang

> Rainer
>
> On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
>> On Wed, 1 Oct 2008, David Ecker wrote:
>>
>>> Hi,
>>>
>>> I am looking for a failsafe solution to store syslog messages localy
>>> until they could be send later. I already looked at the disk based
>>> memory queue and the disk based queue. Both queue's don't work if you
>>> just power down the system immediatly actually loosing the whole queue.
>>
>> are you sure about the disk based queue?
>>
>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
>> can be set to do a commit of the metadata after each message.
>>
>> Disk Queues
>>
>> Disk queues use disk drives for buffering. The important fact is that the
>> always use the disk and do not buffer anything in memory. Thus, the queue
>> is ultra-reliable, but by far the slowest mode. For regular use cases,
>> this queue mode is not recommended. It is useful if log data is so
>> important that it must not be lost, even in extreme cases.
>>
>> When a disk queue is written, it is done in chunks. Each chunk receives
>> its individual file. Files are named with a prefix (set via the
>> "$<object>QueueFilename" config directive) and followed by a 7-digit
>> number (starting at one and incremented for each file). Chunks are 10mb by
>> default, a different size can be set via the"$<object>QueueMaxFileSize"
>> config directive. Note that the size limit is not a sharp one: rsyslog
>> always writes one complete queue entry, even if it violates the size
>> limit. So chunks are actually a little but (usually less than 1k) larger
>> then the configured size. Each chunk also has a different size for the
>> same reason. If you observe different chunk sizes, you can relax: this is
>> not a problem.
>>
>> Writing in chunks is used so that processed data can quickly be deleted
>> and is free for other uses - while at the same time keeping no artificial
>> upper limit on disk space used. If a disk quota is set (instructions
>> further below), be sure that the quota/chunk size allows at least two
>> chunks to be written. Rsyslog currently does not check that and will fail
>> miserably if a single chunk is over the quota.
>>
>> Creating new chunks costs performance but provides quicker ability to free
>> disk space. The 10mb default is considered a good compromise between these
>> two. However, it may make sense to adapt these settings to local policies.
>> For example, if a disk queue is written on a dedicated 200gb disk, it may
>> make sense to use a 2gb (or even larger) chunk size.
>>
>> Please note, however, that the disk queue by default does not update its
>> housekeeping structures every time it writes to disk. This is for
>> performance reasons. In the event of failure, data will still be lost
>> (except when manually is mangled with the file structures). However, disk
>> queues can be set to write bookkeeping information on checkpoints (every n
>> records), so that this can be made ultra-reliable, too. If the checkpoint
>> interval is set to one, no data can be lost, but the queue is
>> exceptionally slow.
>>
>> Each queue can be placed on a different disk for best performance and/or
>> isolation. This is currently selected by specifying different
>> $WorkDirectory config directives before the queue creation statement.
>>
>> To create a disk queue, use the "$<object>QueueType Disk" config
>> directive. Checkpoint intervals can be specified via
>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>>
>>
>>
>>
>>
>> you also need to specificly enable syncing (from
>> http://www.rsyslog.com/doc-v3compatibility.html )
>>
>> Output File Syncing
>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
>> such, it retained stock syslogd's default of syncing every file write if
>> not specified otherwise (by placing a dash in front of the output file
>> name). While this was a useful feature in past days where hardware was
>> much less reliable and UPS seldom, this no longer is useful in today's
>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
>> writes files around 50 *times* slower than without it. It also affects
>> overall system performance due to the high IO activity. In rsyslog v3,
>> syncing has been turned off by default. This is done via a specific
>> configuration directive "$ActionFileEnableSync on/off" which is off by
>> default. So even if rsyslogd finds sync selector lines, it ignores them by
>> default. In order to enable file syncing, the administrator must specify
>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
>> syncing only happens in some installations where the administrator
>> actually wanted that (performance-intense) feature. In the fast majority
>> of cases (if not all), this dramatically increases rsyslogd performance
>> without any negative effects.
>>
>>
>>
>>> I already looked at queue.c and it seemed to me that both queues were
>>> not designed for that kind of failure, but I could be wrong there. Since
>>> an immediate power down of the system is the major failure which will
>>> occure pretty often I need to create a soltution there.
>>
>> with checkpoint interval set to 1 and syncing enabled the data should be
>> in on the disk safely (assuming you have hardware that supports this) and
>> a power-off won't affect it.
>>
>> David Lang
>>
>>
>>
>>> Did you already start to develop something addressing that problem?
>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>> queue myself? I would contribute the code to the rsyslog project if you
>>> would like afterwards.
>>>
>>> bye
>>> David Ecker
>>>
>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 5:29 AM

Post #16 of 60 (8398 views)

On Wed, 2008-10-01 at 05:25 -0700, david@lang.hm wrote:
> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>
> > David,
> >
> > the file syncing mentioned in the compatibility doc applies to the
> > output action, only.
>
> ouch.
>
> > The queue does never do synchronous writes - I always assumed that a
> > critical system would have a UPS and could never think (so far) about a
> > valid reason for not having it. So the queue would need to have an extra
> > option to do sync writes. Obviously, that's not a big deal.
>
> good
>
> > Performance, of course, will be extremely terrible with such a setup...
>
> only if you have to wait for a spinning disk to do the write.

I agree to the rest of your argument below. But the question raised here
was in regard to a system without any battery backup. So I would need to
wait.

Even then, in the worst case, I think it would be possible that the disk
does only a partial write. I am not sure if that's really the case with
today's disk drives (which I think have capacitors to prevent this
scenario), but with past drives this could happen (I know all too well -
a few years ago that cost me a weekend ;)).

Rainer

>
> this is the same problem that databases have. they need to guarentee that
> once the database tells the writing program that the data is written it
> will be there even if the system looses power immediatly.
>
> if you run a database on standard desktop hardware (and it doesn't have
> this safety disabled) you cannot do more then about 80 writes/second. If
> you upgrade to the super speedy 15K rpm drives you can do ~160
> writes/second.
>
> given that you need to write the data + metadata it gets even uglier, so
> what the databases do (and some journaling filesystems) is to write a log
> that says what they are going to do, sync that, and then later write the
> data to the actual files (updating the journal when they complete the
> write)
>
> it sounds like you order your write correctly for a disk-based queue, but
> you would need the option of issuing the syncs (probably when you do the
> checkpoints)
>
> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
> wrong flash drive), the fact that you need to do four writes per log entry
> (data to queue, metadata to queue, data to output, update metadata for
> queue) could drop you to below 15 logs/sec (60/4 but then you loose time
> to seeking as well)
>
> however, with the correct drive to write to (say a $2,400 80G fusion-io
> flash card that can do ~100k IO ops/sec) you should be able to sustain
> 20,000 logs/sec.
>
> realisticly very few people need the sustained write capacity that you
> would get from such a setup. but if you go with a $500-$700 raid card with
> a battery-backed cache you get very similar performance, but with some
> possibility that you can't sustain it forever.
>
> David Lang
>
> > Rainer
> >
> > On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
> >> On Wed, 1 Oct 2008, David Ecker wrote:
> >>
> >>> Hi,
> >>>
> >>> I am looking for a failsafe solution to store syslog messages localy
> >>> until they could be send later. I already looked at the disk based
> >>> memory queue and the disk based queue. Both queue's don't work if you
> >>> just power down the system immediatly actually loosing the whole queue.
> >>
> >> are you sure about the disk based queue?
> >>
> >> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
> >> can be set to do a commit of the metadata after each message.
> >>
> >> Disk Queues
> >>
> >> Disk queues use disk drives for buffering. The important fact is that the
> >> always use the disk and do not buffer anything in memory. Thus, the queue
> >> is ultra-reliable, but by far the slowest mode. For regular use cases,
> >> this queue mode is not recommended. It is useful if log data is so
> >> important that it must not be lost, even in extreme cases.
> >>
> >> When a disk queue is written, it is done in chunks. Each chunk receives
> >> its individual file. Files are named with a prefix (set via the
> >> "$<object>QueueFilename" config directive) and followed by a 7-digit
> >> number (starting at one and incremented for each file). Chunks are 10mb by
> >> default, a different size can be set via the"$<object>QueueMaxFileSize"
> >> config directive. Note that the size limit is not a sharp one: rsyslog
> >> always writes one complete queue entry, even if it violates the size
> >> limit. So chunks are actually a little but (usually less than 1k) larger
> >> then the configured size. Each chunk also has a different size for the
> >> same reason. If you observe different chunk sizes, you can relax: this is
> >> not a problem.
> >>
> >> Writing in chunks is used so that processed data can quickly be deleted
> >> and is free for other uses - while at the same time keeping no artificial
> >> upper limit on disk space used. If a disk quota is set (instructions
> >> further below), be sure that the quota/chunk size allows at least two
> >> chunks to be written. Rsyslog currently does not check that and will fail
> >> miserably if a single chunk is over the quota.
> >>
> >> Creating new chunks costs performance but provides quicker ability to free
> >> disk space. The 10mb default is considered a good compromise between these
> >> two. However, it may make sense to adapt these settings to local policies.
> >> For example, if a disk queue is written on a dedicated 200gb disk, it may
> >> make sense to use a 2gb (or even larger) chunk size.
> >>
> >> Please note, however, that the disk queue by default does not update its
> >> housekeeping structures every time it writes to disk. This is for
> >> performance reasons. In the event of failure, data will still be lost
> >> (except when manually is mangled with the file structures). However, disk
> >> queues can be set to write bookkeeping information on checkpoints (every n
> >> records), so that this can be made ultra-reliable, too. If the checkpoint
> >> interval is set to one, no data can be lost, but the queue is
> >> exceptionally slow.
> >>
> >> Each queue can be placed on a different disk for best performance and/or
> >> isolation. This is currently selected by specifying different
> >> $WorkDirectory config directives before the queue creation statement.
> >>
> >> To create a disk queue, use the "$<object>QueueType Disk" config
> >> directive. Checkpoint intervals can be specified via
> >> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
> >>
> >>
> >>
> >>
> >>
> >> you also need to specificly enable syncing (from
> >> http://www.rsyslog.com/doc-v3compatibility.html )
> >>
> >> Output File Syncing
> >> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> >> such, it retained stock syslogd's default of syncing every file write if
> >> not specified otherwise (by placing a dash in front of the output file
> >> name). While this was a useful feature in past days where hardware was
> >> much less reliable and UPS seldom, this no longer is useful in today's
> >> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
> >> writes files around 50 *times* slower than without it. It also affects
> >> overall system performance due to the high IO activity. In rsyslog v3,
> >> syncing has been turned off by default. This is done via a specific
> >> configuration directive "$ActionFileEnableSync on/off" which is off by
> >> default. So even if rsyslogd finds sync selector lines, it ignores them by
> >> default. In order to enable file syncing, the administrator must specify
> >> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
> >> syncing only happens in some installations where the administrator
> >> actually wanted that (performance-intense) feature. In the fast majority
> >> of cases (if not all), this dramatically increases rsyslogd performance
> >> without any negative effects.
> >>
> >>
> >>
> >>> I already looked at queue.c and it seemed to me that both queues were
> >>> not designed for that kind of failure, but I could be wrong there. Since
> >>> an immediate power down of the system is the major failure which will
> >>> occure pretty often I need to create a soltution there.
> >>
> >> with checkpoint interval set to 1 and syncing enabled the data should be
> >> in on the disk safely (assuming you have hardware that supports this) and
> >> a power-off won't affect it.
> >>
> >> David Lang
> >>
> >>
> >>
> >>> Did you already start to develop something addressing that problem?
> >>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> >>> queue myself? I would contribute the code to the rsyslog project if you
> >>> would like afterwards.
> >>>
> >>> bye
> >>> David Ecker
> >>>
> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 5:35 AM

Post #17 of 60 (8421 views)

On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> On Wed, 2008-10-01 at 05:07 -0700, david@lang.hm wrote:
>> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>>
>>> One thing I forgot to mention: a pure disk queue (not a disk-assisted
>>> one) gets you as close to your goal as possible (well, mostly - we
>>> could, at a considerable performance expense, require synced writing).
>>> With that case, all data is immediately stored on disk. You can
>>> configure it to also write the meta data out immediately (and again with
>>> sync, not yet supported). However, you still have a window of exposure,
>>> for example if the power loss happens right in the middle of when the
>>> disk actually writes data to the disk sector.
>>>
>>> I still wonder why this scenario would be useful to address...
>>
>> not all uses of rsyslog are for simple system logs. it's a good general
>> purpose log tool, and there are some cases where you want to be as sure as
>> you possibly can be that once a message has been acknowledged it has no
>> chance of being lost.
>
> I designed the engine for audit-class reliability. However, I assumed
> that the rest of the system is also playing in that class. Doing
> everything with a potential power failure in mind creates a lot of extra
> demands. And I have never heard of anybody doing serious datacenter work
> without a proper UPS. Is this *really* an issue?

Yes.

UPSs fail.
generators fail
power cords come loose.
power cords get unplugged by someone who thinks they are unplugging a
different system
people bump power switches on power strips.
power supplies are defective

I had one production outage where a visiting tech pulled a power cord from
an overhead plug and dropped it on the ground, where it happened to hit
the power switch on a power strip.

I've had high-end systems with redundant power supplies go down becouse of
faulty hardware that decided to disble both power supplies at once (it
turned out that there was a defect in the whole batch of servers, but it
took IBM several weeks to figure out what was going on)

I've had UPS systems blow up (literally)

I've had a datacenter go down becouse the it was running on generator
power (due to other issues), and the refueling guy filled the tank
incorrectly and got air bubbles into the fuel system, a few min later the
500Kw diesel generator couldn't maintain constant speed and the safety
triggers kicked in and disabled it.

it's amazing the things that happen in real-life

David Lang

> Rainer
>>
>> useing some form of solid-state reliable storage (battery backed ram on a
>> raid controller, a battery backed ram disk, a flash disk) it is possible
>> (but not nessasarily cheap) to get the ability to do tens to hundreds of
>> thousands of writes + syncs per second
>>
>> David Lang
>>
>>> Rainer
>>>
>>> On Wed, 2008-10-01 at 12:00 +0200, David Ecker wrote:
>>>> Hi,
>>>>
>>>> I am looking for a failsafe solution to store syslog messages localy
>>>> until they could be send later. I already looked at the disk based
>>>> memory queue and the disk based queue. Both queue's don't work if you
>>>> just power down the system immediatly actually loosing the whole queue.
>>>> I already looked at queue.c and it seemed to me that both queues were
>>>> not designed for that kind of failure, but I could be wrong there. Since
>>>> an immediate power down of the system is the major failure which will
>>>> occure pretty often I need to create a soltution there.
>>>>
>>>> Did you already start to develop something addressing that problem?
>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>>> queue myself? I would contribute the code to the rsyslog project if you
>>>> would like afterwards.
>>>>
>>>> bye
>>>> David Ecker
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 5:39 AM

Post #18 of 60 (8432 views)

On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> On Wed, 2008-10-01 at 05:25 -0700, david@lang.hm wrote:
>> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>>
>>> David,
>>>
>>> the file syncing mentioned in the compatibility doc applies to the
>>> output action, only.
>>
>> ouch.
>>
>>> The queue does never do synchronous writes - I always assumed that a
>>> critical system would have a UPS and could never think (so far) about a
>>> valid reason for not having it. So the queue would need to have an extra
>>> option to do sync writes. Obviously, that's not a big deal.
>>
>> good
>>
>>> Performance, of course, will be extremely terrible with such a setup...
>>
>> only if you have to wait for a spinning disk to do the write.
>
> I agree to the rest of your argument below. But the question raised here
> was in regard to a system without any battery backup. So I would need to
> wait.

no UPS is not nessasarily the same as no battey backup.

you could use a compact flash drive and probably get better
performance/reliability than spinning disks with no battery at all.

> Even then, in the worst case, I think it would be possible that the disk
> does only a partial write. I am not sure if that's really the case with
> today's disk drives (which I think have capacitors to prevent this
> scenario), but with past drives this could happen (I know all too well -
> a few years ago that cost me a weekend ;)).

current disks do not have capacitors to prevent partial writes or to flush
their caches. but options like the linux ext3 data-journaled make it so
that you have your data in the journal safely, and the various solid-state
options solve that problem.

David Lang

> Rainer
>
>>
>> this is the same problem that databases have. they need to guarentee that
>> once the database tells the writing program that the data is written it
>> will be there even if the system looses power immediatly.
>>
>> if you run a database on standard desktop hardware (and it doesn't have
>> this safety disabled) you cannot do more then about 80 writes/second. If
>> you upgrade to the super speedy 15K rpm drives you can do ~160
>> writes/second.
>>
>> given that you need to write the data + metadata it gets even uglier, so
>> what the databases do (and some journaling filesystems) is to write a log
>> that says what they are going to do, sync that, and then later write the
>> data to the actual files (updating the journal when they complete the
>> write)
>>
>> it sounds like you order your write correctly for a disk-based queue, but
>> you would need the option of issuing the syncs (probably when you do the
>> checkpoints)
>>
>> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
>> wrong flash drive), the fact that you need to do four writes per log entry
>> (data to queue, metadata to queue, data to output, update metadata for
>> queue) could drop you to below 15 logs/sec (60/4 but then you loose time
>> to seeking as well)
>>
>> however, with the correct drive to write to (say a $2,400 80G fusion-io
>> flash card that can do ~100k IO ops/sec) you should be able to sustain
>> 20,000 logs/sec.
>>
>> realisticly very few people need the sustained write capacity that you
>> would get from such a setup. but if you go with a $500-$700 raid card with
>> a battery-backed cache you get very similar performance, but with some
>> possibility that you can't sustain it forever.
>>
>> David Lang
>>
>>> Rainer
>>>
>>> On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
>>>> On Wed, 1 Oct 2008, David Ecker wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking for a failsafe solution to store syslog messages localy
>>>>> until they could be send later. I already looked at the disk based
>>>>> memory queue and the disk based queue. Both queue's don't work if you
>>>>> just power down the system immediatly actually loosing the whole queue.
>>>>
>>>> are you sure about the disk based queue?
>>>>
>>>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
>>>> can be set to do a commit of the metadata after each message.
>>>>
>>>> Disk Queues
>>>>
>>>> Disk queues use disk drives for buffering. The important fact is that the
>>>> always use the disk and do not buffer anything in memory. Thus, the queue
>>>> is ultra-reliable, but by far the slowest mode. For regular use cases,
>>>> this queue mode is not recommended. It is useful if log data is so
>>>> important that it must not be lost, even in extreme cases.
>>>>
>>>> When a disk queue is written, it is done in chunks. Each chunk receives
>>>> its individual file. Files are named with a prefix (set via the
>>>> "$<object>QueueFilename" config directive) and followed by a 7-digit
>>>> number (starting at one and incremented for each file). Chunks are 10mb by
>>>> default, a different size can be set via the"$<object>QueueMaxFileSize"
>>>> config directive. Note that the size limit is not a sharp one: rsyslog
>>>> always writes one complete queue entry, even if it violates the size
>>>> limit. So chunks are actually a little but (usually less than 1k) larger
>>>> then the configured size. Each chunk also has a different size for the
>>>> same reason. If you observe different chunk sizes, you can relax: this is
>>>> not a problem.
>>>>
>>>> Writing in chunks is used so that processed data can quickly be deleted
>>>> and is free for other uses - while at the same time keeping no artificial
>>>> upper limit on disk space used. If a disk quota is set (instructions
>>>> further below), be sure that the quota/chunk size allows at least two
>>>> chunks to be written. Rsyslog currently does not check that and will fail
>>>> miserably if a single chunk is over the quota.
>>>>
>>>> Creating new chunks costs performance but provides quicker ability to free
>>>> disk space. The 10mb default is considered a good compromise between these
>>>> two. However, it may make sense to adapt these settings to local policies.
>>>> For example, if a disk queue is written on a dedicated 200gb disk, it may
>>>> make sense to use a 2gb (or even larger) chunk size.
>>>>
>>>> Please note, however, that the disk queue by default does not update its
>>>> housekeeping structures every time it writes to disk. This is for
>>>> performance reasons. In the event of failure, data will still be lost
>>>> (except when manually is mangled with the file structures). However, disk
>>>> queues can be set to write bookkeeping information on checkpoints (every n
>>>> records), so that this can be made ultra-reliable, too. If the checkpoint
>>>> interval is set to one, no data can be lost, but the queue is
>>>> exceptionally slow.
>>>>
>>>> Each queue can be placed on a different disk for best performance and/or
>>>> isolation. This is currently selected by specifying different
>>>> $WorkDirectory config directives before the queue creation statement.
>>>>
>>>> To create a disk queue, use the "$<object>QueueType Disk" config
>>>> directive. Checkpoint intervals can be specified via
>>>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> you also need to specificly enable syncing (from
>>>> http://www.rsyslog.com/doc-v3compatibility.html )
>>>>
>>>> Output File Syncing
>>>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
>>>> such, it retained stock syslogd's default of syncing every file write if
>>>> not specified otherwise (by placing a dash in front of the output file
>>>> name). While this was a useful feature in past days where hardware was
>>>> much less reliable and UPS seldom, this no longer is useful in today's
>>>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
>>>> writes files around 50 *times* slower than without it. It also affects
>>>> overall system performance due to the high IO activity. In rsyslog v3,
>>>> syncing has been turned off by default. This is done via a specific
>>>> configuration directive "$ActionFileEnableSync on/off" which is off by
>>>> default. So even if rsyslogd finds sync selector lines, it ignores them by
>>>> default. In order to enable file syncing, the administrator must specify
>>>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
>>>> syncing only happens in some installations where the administrator
>>>> actually wanted that (performance-intense) feature. In the fast majority
>>>> of cases (if not all), this dramatically increases rsyslogd performance
>>>> without any negative effects.
>>>>
>>>>
>>>>
>>>>> I already looked at queue.c and it seemed to me that both queues were
>>>>> not designed for that kind of failure, but I could be wrong there. Since
>>>>> an immediate power down of the system is the major failure which will
>>>>> occure pretty often I need to create a soltution there.
>>>>
>>>> with checkpoint interval set to 1 and syncing enabled the data should be
>>>> in on the disk safely (assuming you have hardware that supports this) and
>>>> a power-off won't affect it.
>>>>
>>>> David Lang
>>>>
>>>>
>>>>
>>>>> Did you already start to develop something addressing that problem?
>>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>>>> queue myself? I would contribute the code to the rsyslog project if you
>>>>> would like afterwards.
>>>>>
>>>>> bye
>>>>> David Ecker
>>>>>
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 5:45 AM

Post #19 of 60 (8431 views)

Rainer Gerhards schrieb:
> On Wed, 2008-10-01 at 05:25 -0700, david@lang.hm wrote:
>
>> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>>
>>
>>> David,
>>>
>>> the file syncing mentioned in the compatibility doc applies to the
>>> output action, only.
>>>
>> ouch.
>>
>>
>>> The queue does never do synchronous writes - I always assumed that a
>>> critical system would have a UPS and could never think (so far) about a
>>> valid reason for not having it. So the queue would need to have an extra
>>> option to do sync writes. Obviously, that's not a big deal.
>>>
>> good
>>
>>
>>> Performance, of course, will be extremely terrible with such a setup...
>>>
>> only if you have to wait for a spinning disk to do the write.
>>
>
> I agree to the rest of your argument below. But the question raised here
> was in regard to a system without any battery backup. So I would need to
> wait.
>
> Even then, in the worst case, I think it would be possible that the disk
> does only a partial write. I am not sure if that's really the case with
> today's disk drives (which I think have capacitors to prevent this
> scenario), but with past drives this could happen (I know all too well -
> a few years ago that cost me a weekend ;)).
>
> Rainer
>
Hi,

as long as you do sector based writes (512 byte per sector, usual) you
can be sure that the write wasn"t partial.. Writing more than one sector
or not starting at a correct offset (n*512,n=0,1,2,...x) might result in
a partial write. I'll already tested that with my devel client here. So
fencing each sector with a crc32 value would help detecting errors
during a write operation. This is actually only a problem if you are
writing directly to a block device like any filesystem does and yes,
reordering is definitly a problem. So validating the content written to
the disk afterwards is important.

If writing through a filesystem reserving space in the destination file
beforehand actually minimizes errors since the file system table doesn't
have to be updated (you should also use the Flag O_NOATIME for that
case). See for example VMWare ESX VMDK file handling.

David

>
>> this is the same problem that databases have. they need to guarentee that
>> once the database tells the writing program that the data is written it
>> will be there even if the system looses power immediatly.
>>
>> if you run a database on standard desktop hardware (and it doesn't have
>> this safety disabled) you cannot do more then about 80 writes/second. If
>> you upgrade to the super speedy 15K rpm drives you can do ~160
>> writes/second.
>>
>> given that you need to write the data + metadata it gets even uglier, so
>> what the databases do (and some journaling filesystems) is to write a log
>> that says what they are going to do, sync that, and then later write the
>> data to the actual files (updating the journal when they complete the
>> write)
>>
>> it sounds like you order your write correctly for a disk-based queue, but
>> you would need the option of issuing the syncs (probably when you do the
>> checkpoints)
>>
>> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
>> wrong flash drive), the fact that you need to do four writes per log entry
>> (data to queue, metadata to queue, data to output, update metadata for
>> queue) could drop you to below 15 logs/sec (60/4 but then you loose time
>> to seeking as well)
>>
>> however, with the correct drive to write to (say a $2,400 80G fusion-io
>> flash card that can do ~100k IO ops/sec) you should be able to sustain
>> 20,000 logs/sec.
>>
>> realisticly very few people need the sustained write capacity that you
>> would get from such a setup. but if you go with a $500-$700 raid card with
>> a battery-backed cache you get very similar performance, but with some
>> possibility that you can't sustain it forever.
>>
>> David Lang
>>
>>
>>> Rainer
>>>
>>> On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
>>>
>>>> On Wed, 1 Oct 2008, David Ecker wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking for a failsafe solution to store syslog messages localy
>>>>> until they could be send later. I already looked at the disk based
>>>>> memory queue and the disk based queue. Both queue's don't work if you
>>>>> just power down the system immediatly actually loosing the whole queue.
>>>>>
>>>> are you sure about the disk based queue?
>>>>
>>>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
>>>> can be set to do a commit of the metadata after each message.
>>>>
>>>> Disk Queues
>>>>
>>>> Disk queues use disk drives for buffering. The important fact is that the
>>>> always use the disk and do not buffer anything in memory. Thus, the queue
>>>> is ultra-reliable, but by far the slowest mode. For regular use cases,
>>>> this queue mode is not recommended. It is useful if log data is so
>>>> important that it must not be lost, even in extreme cases.
>>>>
>>>> When a disk queue is written, it is done in chunks. Each chunk receives
>>>> its individual file. Files are named with a prefix (set via the
>>>> "$<object>QueueFilename" config directive) and followed by a 7-digit
>>>> number (starting at one and incremented for each file). Chunks are 10mb by
>>>> default, a different size can be set via the"$<object>QueueMaxFileSize"
>>>> config directive. Note that the size limit is not a sharp one: rsyslog
>>>> always writes one complete queue entry, even if it violates the size
>>>> limit. So chunks are actually a little but (usually less than 1k) larger
>>>> then the configured size. Each chunk also has a different size for the
>>>> same reason. If you observe different chunk sizes, you can relax: this is
>>>> not a problem.
>>>>
>>>> Writing in chunks is used so that processed data can quickly be deleted
>>>> and is free for other uses - while at the same time keeping no artificial
>>>> upper limit on disk space used. If a disk quota is set (instructions
>>>> further below), be sure that the quota/chunk size allows at least two
>>>> chunks to be written. Rsyslog currently does not check that and will fail
>>>> miserably if a single chunk is over the quota.
>>>>
>>>> Creating new chunks costs performance but provides quicker ability to free
>>>> disk space. The 10mb default is considered a good compromise between these
>>>> two. However, it may make sense to adapt these settings to local policies.
>>>> For example, if a disk queue is written on a dedicated 200gb disk, it may
>>>> make sense to use a 2gb (or even larger) chunk size.
>>>>
>>>> Please note, however, that the disk queue by default does not update its
>>>> housekeeping structures every time it writes to disk. This is for
>>>> performance reasons. In the event of failure, data will still be lost
>>>> (except when manually is mangled with the file structures). However, disk
>>>> queues can be set to write bookkeeping information on checkpoints (every n
>>>> records), so that this can be made ultra-reliable, too. If the checkpoint
>>>> interval is set to one, no data can be lost, but the queue is
>>>> exceptionally slow.
>>>>
>>>> Each queue can be placed on a different disk for best performance and/or
>>>> isolation. This is currently selected by specifying different
>>>> $WorkDirectory config directives before the queue creation statement.
>>>>
>>>> To create a disk queue, use the "$<object>QueueType Disk" config
>>>> directive. Checkpoint intervals can be specified via
>>>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> you also need to specificly enable syncing (from
>>>> http://www.rsyslog.com/doc-v3compatibility.html )
>>>>
>>>> Output File Syncing
>>>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
>>>> such, it retained stock syslogd's default of syncing every file write if
>>>> not specified otherwise (by placing a dash in front of the output file
>>>> name). While this was a useful feature in past days where hardware was
>>>> much less reliable and UPS seldom, this no longer is useful in today's
>>>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
>>>> writes files around 50 *times* slower than without it. It also affects
>>>> overall system performance due to the high IO activity. In rsyslog v3,
>>>> syncing has been turned off by default. This is done via a specific
>>>> configuration directive "$ActionFileEnableSync on/off" which is off by
>>>> default. So even if rsyslogd finds sync selector lines, it ignores them by
>>>> default. In order to enable file syncing, the administrator must specify
>>>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
>>>> syncing only happens in some installations where the administrator
>>>> actually wanted that (performance-intense) feature. In the fast majority
>>>> of cases (if not all), this dramatically increases rsyslogd performance
>>>> without any negative effects.
>>>>
>>>>
>>>>
>>>>
>>>>> I already looked at queue.c and it seemed to me that both queues were
>>>>> not designed for that kind of failure, but I could be wrong there. Since
>>>>> an immediate power down of the system is the major failure which will
>>>>> occure pretty often I need to create a soltution there.
>>>>>
>>>> with checkpoint interval set to 1 and syncing enabled the data should be
>>>> in on the disk safely (assuming you have hardware that supports this) and
>>>> a power-off won't affect it.
>>>>
>>>> David Lang
>>>>
>>>>
>>>>
>>>>
>>>>> Did you already start to develop something addressing that problem?
>>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
>>>>> queue myself? I would contribute the code to the rsyslog project if you
>>>>> would like afterwards.
>>>>>
>>>>> bye
>>>>> David Ecker
>>>>>
>>>>>
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 5:55 AM

Post #20 of 60 (8412 views)

On Wed, 2008-10-01 at 14:45 +0200, David Ecker wrote:

[snip]

> as long as you do sector based writes (512 byte per sector, usual) you
> can be sure that the write wasn"t partial.. Writing more than one sector
> or not starting at a correct offset (n*512,n=0,1,2,...x) might result in
> a partial write. I'll already tested that with my devel client here. So
> fencing each sector with a crc32 value would help detecting errors
> during a write operation. This is actually only a problem if you are
> writing directly to a block device like any filesystem does and yes,
> reordering is definitly a problem. So validating the content written to
> the disk afterwards is important.
>
> If writing through a filesystem reserving space in the destination file
> beforehand actually minimizes errors since the file system table doesn't
> have to be updated (you should also use the Flag O_NOATIME for that
> case). See for example VMWare ESX VMDK file handling.

Well, first of all let me re-iterate that I do not intend to do a block
device driver for rsyslog (but I definitely do not object getting one
contributed ;)).

Still thinking about the case and thinking about non-solid-state,
non-internal-battery-backed-up disk, I can't see how you can be sure the
data will be written. David just told me there are no capacitors. So if
power fails, it fails rather quickly. So how can you be sure the disk
will be able to finish writing that sector? Let's say the drive has
begun to write the sector and been able to write the first 5 bytes. Now
power fails. No capacitors, no battery-backup, so why should there be
enough power to drive the disk write head for another 507 bytes? It the
drives assures it can do that, it needs capacitors - doesn't it?

Am I overlooking something obvious?

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 5:57 AM

Post #21 of 60 (8395 views)

David,

going back to the higher layer: do you say that immediate power failure
is a case that you consider needed to be addressed in an enterprise
logging system?

Anybody else with an opinion?
Rainer

On Wed, 2008-10-01 at 05:39 -0700, david@lang.hm wrote:
> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
>
> > On Wed, 2008-10-01 at 05:25 -0700, david@lang.hm wrote:
> >> On Wed, 1 Oct 2008, Rainer Gerhards wrote:
> >>
> >>> David,
> >>>
> >>> the file syncing mentioned in the compatibility doc applies to the
> >>> output action, only.
> >>
> >> ouch.
> >>
> >>> The queue does never do synchronous writes - I always assumed that a
> >>> critical system would have a UPS and could never think (so far) about a
> >>> valid reason for not having it. So the queue would need to have an extra
> >>> option to do sync writes. Obviously, that's not a big deal.
> >>
> >> good
> >>
> >>> Performance, of course, will be extremely terrible with such a setup...
> >>
> >> only if you have to wait for a spinning disk to do the write.
> >
> > I agree to the rest of your argument below. But the question raised here
> > was in regard to a system without any battery backup. So I would need to
> > wait.
>
> no UPS is not nessasarily the same as no battey backup.
>
> you could use a compact flash drive and probably get better
> performance/reliability than spinning disks with no battery at all.
>
> > Even then, in the worst case, I think it would be possible that the disk
> > does only a partial write. I am not sure if that's really the case with
> > today's disk drives (which I think have capacitors to prevent this
> > scenario), but with past drives this could happen (I know all too well -
> > a few years ago that cost me a weekend ;)).
>
> current disks do not have capacitors to prevent partial writes or to flush
> their caches. but options like the linux ext3 data-journaled make it so
> that you have your data in the journal safely, and the various solid-state
> options solve that problem.
>
> David Lang
>
> > Rainer
> >
> >>
> >> this is the same problem that databases have. they need to guarentee that
> >> once the database tells the writing program that the data is written it
> >> will be there even if the system looses power immediatly.
> >>
> >> if you run a database on standard desktop hardware (and it doesn't have
> >> this safety disabled) you cannot do more then about 80 writes/second. If
> >> you upgrade to the super speedy 15K rpm drives you can do ~160
> >> writes/second.
> >>
> >> given that you need to write the data + metadata it gets even uglier, so
> >> what the databases do (and some journaling filesystems) is to write a log
> >> that says what they are going to do, sync that, and then later write the
> >> data to the actual files (updating the journal when they complete the
> >> write)
> >>
> >> it sounds like you order your write correctly for a disk-based queue, but
> >> you would need the option of issuing the syncs (probably when you do the
> >> checkpoints)
> >>
> >> if you do this on the wrong hardware (say a laptop 5200 rpm drive or the
> >> wrong flash drive), the fact that you need to do four writes per log entry
> >> (data to queue, metadata to queue, data to output, update metadata for
> >> queue) could drop you to below 15 logs/sec (60/4 but then you loose time
> >> to seeking as well)
> >>
> >> however, with the correct drive to write to (say a $2,400 80G fusion-io
> >> flash card that can do ~100k IO ops/sec) you should be able to sustain
> >> 20,000 logs/sec.
> >>
> >> realisticly very few people need the sustained write capacity that you
> >> would get from such a setup. but if you go with a $500-$700 raid card with
> >> a battery-backed cache you get very similar performance, but with some
> >> possibility that you can't sustain it forever.
> >>
> >> David Lang
> >>
> >>> Rainer
> >>>
> >>> On Wed, 2008-10-01 at 04:55 -0700, david@lang.hm wrote:
> >>>> On Wed, 1 Oct 2008, David Ecker wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am looking for a failsafe solution to store syslog messages localy
> >>>>> until they could be send later. I already looked at the disk based
> >>>>> memory queue and the disk based queue. Both queue's don't work if you
> >>>>> just power down the system immediatly actually loosing the whole queue.
> >>>>
> >>>> are you sure about the disk based queue?
> >>>>
> >>>> per file:///usr/src/rsyslog-3.21.4/doc/queues.html the disk based queue
> >>>> can be set to do a commit of the metadata after each message.
> >>>>
> >>>> Disk Queues
> >>>>
> >>>> Disk queues use disk drives for buffering. The important fact is that the
> >>>> always use the disk and do not buffer anything in memory. Thus, the queue
> >>>> is ultra-reliable, but by far the slowest mode. For regular use cases,
> >>>> this queue mode is not recommended. It is useful if log data is so
> >>>> important that it must not be lost, even in extreme cases.
> >>>>
> >>>> When a disk queue is written, it is done in chunks. Each chunk receives
> >>>> its individual file. Files are named with a prefix (set via the
> >>>> "$<object>QueueFilename" config directive) and followed by a 7-digit
> >>>> number (starting at one and incremented for each file). Chunks are 10mb by
> >>>> default, a different size can be set via the"$<object>QueueMaxFileSize"
> >>>> config directive. Note that the size limit is not a sharp one: rsyslog
> >>>> always writes one complete queue entry, even if it violates the size
> >>>> limit. So chunks are actually a little but (usually less than 1k) larger
> >>>> then the configured size. Each chunk also has a different size for the
> >>>> same reason. If you observe different chunk sizes, you can relax: this is
> >>>> not a problem.
> >>>>
> >>>> Writing in chunks is used so that processed data can quickly be deleted
> >>>> and is free for other uses - while at the same time keeping no artificial
> >>>> upper limit on disk space used. If a disk quota is set (instructions
> >>>> further below), be sure that the quota/chunk size allows at least two
> >>>> chunks to be written. Rsyslog currently does not check that and will fail
> >>>> miserably if a single chunk is over the quota.
> >>>>
> >>>> Creating new chunks costs performance but provides quicker ability to free
> >>>> disk space. The 10mb default is considered a good compromise between these
> >>>> two. However, it may make sense to adapt these settings to local policies.
> >>>> For example, if a disk queue is written on a dedicated 200gb disk, it may
> >>>> make sense to use a 2gb (or even larger) chunk size.
> >>>>
> >>>> Please note, however, that the disk queue by default does not update its
> >>>> housekeeping structures every time it writes to disk. This is for
> >>>> performance reasons. In the event of failure, data will still be lost
> >>>> (except when manually is mangled with the file structures). However, disk
> >>>> queues can be set to write bookkeeping information on checkpoints (every n
> >>>> records), so that this can be made ultra-reliable, too. If the checkpoint
> >>>> interval is set to one, no data can be lost, but the queue is
> >>>> exceptionally slow.
> >>>>
> >>>> Each queue can be placed on a different disk for best performance and/or
> >>>> isolation. This is currently selected by specifying different
> >>>> $WorkDirectory config directives before the queue creation statement.
> >>>>
> >>>> To create a disk queue, use the "$<object>QueueType Disk" config
> >>>> directive. Checkpoint intervals can be specified via
> >>>> "$<object>QueueCheckpointInterval", with 0 meaning no checkpoints.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> you also need to specificly enable syncing (from
> >>>> http://www.rsyslog.com/doc-v3compatibility.html )
> >>>>
> >>>> Output File Syncing
> >>>> Rsyslogd tries to keep as compatible to stock syslogd as possible. As
> >>>> such, it retained stock syslogd's default of syncing every file write if
> >>>> not specified otherwise (by placing a dash in front of the output file
> >>>> name). While this was a useful feature in past days where hardware was
> >>>> much less reliable and UPS seldom, this no longer is useful in today's
> >>>> worl. Instead, the syncing is a high performace hit. With it, rsyslogd
> >>>> writes files around 50 *times* slower than without it. It also affects
> >>>> overall system performance due to the high IO activity. In rsyslog v3,
> >>>> syncing has been turned off by default. This is done via a specific
> >>>> configuration directive "$ActionFileEnableSync on/off" which is off by
> >>>> default. So even if rsyslogd finds sync selector lines, it ignores them by
> >>>> default. In order to enable file syncing, the administrator must specify
> >>>> "$ActionFileEnableSync on" at the top of rsyslog.conf. This ensures that
> >>>> syncing only happens in some installations where the administrator
> >>>> actually wanted that (performance-intense) feature. In the fast majority
> >>>> of cases (if not all), this dramatically increases rsyslogd performance
> >>>> without any negative effects.
> >>>>
> >>>>
> >>>>
> >>>>> I already looked at queue.c and it seemed to me that both queues were
> >>>>> not designed for that kind of failure, but I could be wrong there. Since
> >>>>> an immediate power down of the system is the major failure which will
> >>>>> occure pretty often I need to create a soltution there.
> >>>>
> >>>> with checkpoint interval set to 1 and syncing enabled the data should be
> >>>> in on the disk safely (assuming you have hardware that supports this) and
> >>>> a power-off won't affect it.
> >>>>
> >>>> David Lang
> >>>>
> >>>>
> >>>>
> >>>>> Did you already start to develop something addressing that problem?
> >>>>> Could you help me extend rsyslog (3.18.4) so that I can develop a new
> >>>>> queue myself? I would contribute the code to the rsyslog project if you
> >>>>> would like afterwards.
> >>>>>
> >>>>> bye
> >>>>> David Ecker
> >>>>>
> >>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >>>> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com
> >>>
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >>>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

david at ecker-software

Oct 1, 2008, 6:09 AM

Post #22 of 60 (8402 views)

Hi,

[quote]
Depending on the media and the block device driver design, individual
sector writes may
not be atomic. A physical sector in typical devices is 512 bytes. In
most cases, physical
sector writes are atomic (either completely written, or not modified at
all). A truly
reliable file system, however, cannot count on this.
[/quote]

In most cases it works but some way of validating the data is needed if
you want ultra reliability, which I don't need. If the last few messages
a few seconds before an immediate shutdown are lost but all other
messages are send correctly afterwards then that would be OK in my case.

I'll just test version 2.21.5 with the altered open behauvior. The disk
based queue-array developed by myself is just a fallback solution if the
disk-based queue doesn't work with an immediate shutdown.

David

Rainer Gerhards schrieb:
> On Wed, 2008-10-01 at 14:45 +0200, David Ecker wrote:
>
> [snip]
>
>
>> as long as you do sector based writes (512 byte per sector, usual) you
>> can be sure that the write wasn"t partial.. Writing more than one sector
>> or not starting at a correct offset (n*512,n=0,1,2,...x) might result in
>> a partial write. I'll already tested that with my devel client here. So
>> fencing each sector with a crc32 value would help detecting errors
>> during a write operation. This is actually only a problem if you are
>> writing directly to a block device like any filesystem does and yes,
>> reordering is definitly a problem. So validating the content written to
>> the disk afterwards is important.
>>
>> If writing through a filesystem reserving space in the destination file
>> beforehand actually minimizes errors since the file system table doesn't
>> have to be updated (you should also use the Flag O_NOATIME for that
>> case). See for example VMWare ESX VMDK file handling.
>>
>
> Well, first of all let me re-iterate that I do not intend to do a block
> device driver for rsyslog (but I definitely do not object getting one
> contributed ;)).
>
> Still thinking about the case and thinking about non-solid-state,
> non-internal-battery-backed-up disk, I can't see how you can be sure the
> data will be written. David just told me there are no capacitors. So if
> power fails, it fails rather quickly. So how can you be sure the disk
> will be able to finish writing that sector? Let's say the drive has
> begun to write the sector and been able to write the first 5 bytes. Now
> power fails. No capacitors, no battery-backup, so why should there be
> enough power to drive the disk write head for another 507 bytes? It the
> drives assures it can do that, it needs capacitors - doesn't it?
>
> Am I overlooking something obvious?
>
> Rainer
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 6:11 AM

Post #23 of 60 (8414 views)

Sorry, I overlooked this mail in the big bunch of messages. That's good
reasoning.

To cover these scenarios, we need to do everything with syncing. This
also means that you can not use any of the disk-assisted modes, because
in these modes we always try to keep things in memory in order to save
writes.

So while you have convinced me things can go wrong, I'd still say that
is is very unusual (at least very costly) to care for all these things.
But, of course, there are situations where it is needed. I'll probably
see that I provide a facility to open files in "always sync" mode, but
that for sure will not be the default setting ;)

But even with the fast solid state disks (and similar methods) you
mention, I think there will be a severe impact on performance because
everything now needs to go through two write (data+metadata) and two
read (again, data+metadata) OS call where we currently simply update an
in-memory structure.

Just out of curiosity: do you expect the majority of you rollouts to be
using such methods?

Rainer

On Wed, 2008-10-01 at 05:35 -0700, david@lang.hm wrote:
> > ... And I have never heard of anybody doing serious datacenter work
> > without a proper UPS. Is this *really* an issue?
>
> Yes.
>
> UPSs fail.
> generators fail
> power cords come loose.
> power cords get unplugged by someone who thinks they are unplugging a
> different system
> people bump power switches on power strips.
> power supplies are defective
>
> I had one production outage where a visiting tech pulled a power cord from
> an overhead plug and dropped it on the ground, where it happened to hit
> the power switch on a power strip.
>
> I've had high-end systems with redundant power supplies go down becouse of
> faulty hardware that decided to disble both power supplies at once (it
> turned out that there was a defect in the whole batch of servers, but it
> took IBM several weeks to figure out what was going on)
>
> I've had UPS systems blow up (literally)
>
> I've had a datacenter go down becouse the it was running on generator
> power (due to other issues), and the refueling guy filled the tank
> incorrectly and got air bubbles into the fuel system, a few min later the
> 500Kw diesel generator couldn't maintain constant speed and the safety
> triggers kicked in and disabled it.
>
> it's amazing the things that happen in real-life

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

rgerhards at hq

Oct 1, 2008, 6:17 AM

Post #24 of 60 (8424 views)

On Wed, 2008-10-01 at 15:09 +0200, David Ecker wrote:
> Hi,
>
> [quote]
> Depending on the media and the block device driver design, individual
> sector writes may
> not be atomic. A physical sector in typical devices is 512 bytes. In
> most cases, physical
> sector writes are atomic (either completely written, or not modified at
> all). A truly
> reliable file system, however, cannot count on this.
> [/quote]
>
> In most cases it works

exactly - in most cases. That means it does not work always.

> but some way of validating the data

how can you validate if there is no power and the machine is off?

> is needed if
> you want ultra reliability, which I don't need. If the last few messages
> a few seconds before an immediate shutdown are lost but all other
> messages are send correctly afterwards then that would be OK in my case.

but we can not guarantee that, at least not in all cases. Let's assume
the disk died in the middle of the write access. Chances are good you'll
never be able to read that sector again. Using a journaling file system
will help, but without it, you may just have destroyed the sector that
contained the .qi file. So on next startup the .qi is either not
readable at all or not pointing at the correct information. The end
result can be total loss of information.

This scenario is probably acceptable in your case, because it is really,
really highly unlikely. But it still exists.

> I'll just test version 2.21.5 with the altered open behauvior. The disk
> based queue-array developed by myself is just a fallback solution if the
> disk-based queue doesn't work with an immediate shutdown.

If it does not work under the constraints described here, this would
point to a problem in the queue implementation (I have to admit the
reason to provide a capability to write periodic qi file updates was
related to a scenario like this, though not thought in this extreme ;)).

Rainer
>
> David
>
> Rainer Gerhards schrieb:
> > On Wed, 2008-10-01 at 14:45 +0200, David Ecker wrote:
> >
> > [snip]
> >
> >
> >> as long as you do sector based writes (512 byte per sector, usual) you
> >> can be sure that the write wasn"t partial.. Writing more than one sector
> >> or not starting at a correct offset (n*512,n=0,1,2,...x) might result in
> >> a partial write. I'll already tested that with my devel client here. So
> >> fencing each sector with a crc32 value would help detecting errors
> >> during a write operation. This is actually only a problem if you are
> >> writing directly to a block device like any filesystem does and yes,
> >> reordering is definitly a problem. So validating the content written to
> >> the disk afterwards is important.
> >>
> >> If writing through a filesystem reserving space in the destination file
> >> beforehand actually minimizes errors since the file system table doesn't
> >> have to be updated (you should also use the Flag O_NOATIME for that
> >> case). See for example VMWare ESX VMDK file handling.
> >>
> >
> > Well, first of all let me re-iterate that I do not intend to do a block
> > device driver for rsyslog (but I definitely do not object getting one
> > contributed ;)).
> >
> > Still thinking about the case and thinking about non-solid-state,
> > non-internal-battery-backed-up disk, I can't see how you can be sure the
> > data will be written. David just told me there are no capacitors. So if
> > power fails, it fails rather quickly. So how can you be sure the disk
> > will be able to finish writing that sector? Let's say the drive has
> > begun to write the sector and been able to write the first 5 bytes. Now
> > power fails. No capacitors, no battery-backup, so why should there be
> > enough power to drive the disk write head for another 507 bytes? It the
> > drives assures it can do that, it needs capacitors - doesn't it?
> >
> > Am I overlooking something obvious?
> >
> > Rainer
> >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: Development of failsafe disk based queue [ In reply to ]

Oct 1, 2008, 6:17 AM

Post #25 of 60 (8405 views)

On Wed, 1 Oct 2008, Rainer Gerhards wrote:

> Sorry, I overlooked this mail in the big bunch of messages. That's good
> reasoning.

I'm replying out of order as I see things anyway

> To cover these scenarios, we need to do everything with syncing. This
> also means that you can not use any of the disk-assisted modes, because
> in these modes we always try to keep things in memory in order to save
> writes.

I think you are saying that we must use the disk-only mode which is
correct.

> So while you have convinced me things can go wrong, I'd still say that
> is is very unusual (at least very costly) to care for all these things.

absolutly!!

> But, of course, there are situations where it is needed. I'll probably
> see that I provide a facility to open files in "always sync" mode, but
> that for sure will not be the default setting ;)

thanks.

> But even with the fast solid state disks (and similar methods) you
> mention, I think there will be a severe impact on performance because
> everything now needs to go through two write (data+metadata) and two
> read (again, data+metadata) OS call where we currently simply update an
> in-memory structure.

given the performance gains that we have seen by eliminating syscalls, it
will hurt to add these back in, even with solid-state disks. that being
said, it looks like the output module is nowhere close to being the limit
(when I could get a good, stable reading on it, it looked like it was
eating ~15% cpu compared to the input module at 100%) so it may not make
much of a difference.

> Just out of curiosity: do you expect the majority of you rollouts to be
> using such methods?

absolutly not.

I have one case I am considering (the one I am talking to you about more
efficiant database writes) that would be this paranoid, but the rest of it
will be optimized for speed (battery-backed disk caches on the final
server, but everything else can just use ram)

David Lang

> Rainer
>
> On Wed, 2008-10-01 at 05:35 -0700, david@lang.hm wrote:
>>> ... And I have never heard of anybody doing serious datacenter work
>>> without a proper UPS. Is this *really* an issue?
>>
>> Yes.
>>
>> UPSs fail.
>> generators fail
>> power cords come loose.
>> power cords get unplugged by someone who thinks they are unplugging a
>> different system
>> people bump power switches on power strips.
>> power supplies are defective
>>
>> I had one production outage where a visiting tech pulled a power cord from
>> an overhead plug and dropped it on the ground, where it happened to hit
>> the power switch on a power strip.
>>
>> I've had high-end systems with redundant power supplies go down becouse of
>> faulty hardware that decided to disble both power supplies at once (it
>> turned out that there was a defect in the whole batch of servers, but it
>> took IBM several weeks to figure out what was going on)
>>
>> I've had UPS systems blow up (literally)
>>
>> I've had a datacenter go down becouse the it was running on generator
>> power (due to other issues), and the refueling guy filled the tank
>> incorrectly and got air bubbles into the fuel system, a few min later the
>> 500Kw diesel generator couldn't maintain constant speed and the safety
>> triggers kicked in and disabled it.
>>
>> it's amazing the things that happen in real-life
>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com