Mailing List Archive

Baclogged files to disk are pretty slow
I have been dealing with this problem for a few days now and perhaps I
will be able to solicit some advice or help. Here is the issue. I have
an rsyslog relay writing to a remote database server and caching to
disk. The write to the database uses a MySQL stored procedure that can
write about 4000 records per second. The rsyslog.conf parts are set up
like so:

$ModLoad immark
$ModLoadd imudp
$UDPServerAddress 172.16.12.138
$UDPServerRun 514
$ModLoad imtcp
$ModLoad imuxsock
$ModLoad imklog
$ModLoad ommysql.so

$template template1,"CALL
SAT2_RSYSLOG_EVENT_INSERT('%timestamp:::date-mysql%',
'%timegenerated:::date-mysql%', '%syslogfacility%', '%syslogpriority%',
'%hostname%', '%syslogtag%', '%msg%')", sql

$WorkDirectory /rsyslog/work
$ActionQueueType LinkedList # use asynchronous processing
$ActionQueueFileName dbq # set file name, also enables disk mode
$ActionResumeRetryCount -1 # infinite retries on insert failure

*.* >172.16.2.238,rsyslog,syslogwriter,topsecret;template1

If I turn off the database, in this case I turned it off for almost a
day, it backlogs nearly a 1 GB worth of information. The problem is
that it takes nearly 6 hours to catch back up from this. While catching
up, it only uses about 1% of the proc. Bandwidth is not an issue as the
fibre link is only about 50% saturated. Is there a way to force
rsyslogd to consume more of the proc and move faster. I have placed a
-20 nice value on the process in hopes that would help but it really has
not. Is there a way to force rsyslogd to use a pool of MySQL
connections or intiate a new connection each time a record is written?


Daniel M. Anson
Linux Systems Engineer
Rackspace Managed Hosting
danson@rackspace.com




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse@rackspace.com, and delete the original message.
Your cooperation is appreciated.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Baclogged files to disk are pretty slow [ In reply to ]
Mhhh... with the current design, it submits messages individually to the
database. I think what you experience is simply the turn-around from the
database call (no other idea what it could be). It doesn't use more CPU
because the database layer seems not to return any faster.

There has been some discussion on batching multiple statements together,
but this is non-trivial. I lost funding and things like this need a
corporate sponsor now (they are not of importance for the non-commercial
user field...).

You could try to run the action on its own queue and with multiple
workers. That could (could!) improve performance. But it is just a
guess. Do you have any chance to see how long the query takes inside the
SQL engine?

Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Daniel Anson
> Sent: Thursday, January 15, 2009 6:29 PM
> To: rsyslog@lists.adiscon.com
> Subject: [rsyslog] Baclogged files to disk are pretty slow
>
> I have been dealing with this problem for a few days now and perhaps I
> will be able to solicit some advice or help. Here is the issue. I
> have
> an rsyslog relay writing to a remote database server and caching to
> disk. The write to the database uses a MySQL stored procedure that
can
> write about 4000 records per second. The rsyslog.conf parts are set
up
> like so:
>
> $ModLoad immark
> $ModLoadd imudp
> $UDPServerAddress 172.16.12.138
> $UDPServerRun 514
> $ModLoad imtcp
> $ModLoad imuxsock
> $ModLoad imklog
> $ModLoad ommysql.so
>
> $template template1,"CALL
> SAT2_RSYSLOG_EVENT_INSERT('%timestamp:::date-mysql%',
> '%timegenerated:::date-mysql%', '%syslogfacility%',
'%syslogpriority%',
> '%hostname%', '%syslogtag%', '%msg%')", sql
>
> $WorkDirectory /rsyslog/work
> $ActionQueueType LinkedList # use asynchronous processing
> $ActionQueueFileName dbq # set file name, also enables disk mode
> $ActionResumeRetryCount -1 # infinite retries on insert failure
>
> *.* >172.16.2.238,rsyslog,syslogwriter,topsecret;template1
>
> If I turn off the database, in this case I turned it off for almost a
> day, it backlogs nearly a 1 GB worth of information. The problem is
> that it takes nearly 6 hours to catch back up from this. While
> catching
> up, it only uses about 1% of the proc. Bandwidth is not an issue as
> the
> fibre link is only about 50% saturated. Is there a way to force
> rsyslogd to consume more of the proc and move faster. I have placed a
> -20 nice value on the process in hopes that would help but it really
> has
> not. Is there a way to force rsyslogd to use a pool of MySQL
> connections or intiate a new connection each time a record is written?
>
>
> Daniel M. Anson
> Linux Systems Engineer
> Rackspace Managed Hosting
> danson@rackspace.com
>
>
>
>
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use
> of the
> individual or entity to which this message is addressed, and unless
> otherwise
> expressly indicated, is confidential and privileged information of
> Rackspace.
> Any dissemination, distribution or copying of the enclosed material is
> prohibited.
> If you receive this transmission in error, please notify us
immediately
> by e-mail
> at abuse@rackspace.com, and delete the original message.
> Your cooperation is appreciated.
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Baclogged files to disk are pretty slow [ In reply to ]
Many more measurements are needed before declaring a conclusive cause,
but on the surface it seems that your bottleneck is not rsyslog or the
sending server but the database itself. Comments below.

On Thu, Jan 15, 2009 at 10:28, Daniel Anson <danson@rackspace.com> wrote:
> I have been dealing with this problem for a few days now and perhaps I
> will be able to solicit some advice or help. Here is the issue. I have
> an rsyslog relay writing to a remote database server and caching to
> disk. The write to the database uses a MySQL stored procedure that can
> write about 4000 records per second. The rsyslog.conf parts are set up

Is that 4000 TPS burst or sustained speed?

> If I turn off the database, in this case I turned it off for almost a
> day, it backlogs nearly a 1 GB worth of information. The problem is

Roughly how many records?

> that it takes nearly 6 hours to catch back up from this. While catching
> up, it only uses about 1% of the proc. Bandwidth is not an issue as the

What's the processor and disk load look like on your MySQL server?

> fibre link is only about 50% saturated. Is there a way to force

Presuming 50% is your bps, what was your PPS? Depending on how large
your average event/transaction are, you may never see 100% due to
small packets.

> not. Is there a way to force rsyslogd to use a pool of MySQL
> connections or intiate a new connection each time a record is written?

Ranier confirmed my suspicion that rsyslog executes a single
transaction per event, which is (as he also notes) sub-optimal for
performance. Batching really should be about the same logic as the
MARK functionality: every N foo, output "bar".

Multiple actions per transaction (batching) is a classic query tuning
technique and can be approached many ways, but you probably need to
verify your database I/O is indeed the bottleneck.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Baclogged files to disk are pretty slow [ In reply to ]
A few things about the MySQL server itself, I have eliminated bandwidth,
proc speed, disk I/O as potential bottlenecks. The obvious bottleneck
is the MySQL server.

For a temporary solution, I have placed an rsyslog relay on the MySQL
server. So:

Client_message -> local_datacenter_relay -> remote_datacenter_relay ->
MySQL_server

The messages are traveling much faster (kudos to the socket programming
there) as the remote relay writes to a local MySQL server. I do not
believe this to be an optimal solution. In an earlier email, Rainer
mentions and I quote:

"You could try to run the action on its own queue and with multiple
workers. That could (could!) improve performance. But it is just a
guess. Do you have any chance to see how long the query takes inside the
SQL engine?"

MySQL will run about 4000 inserts per second (constant speed). I am
willing to try what Rainer suggests; however, I am unsure how to direct
specific actions to act on a queue. Any help s appreciated. I know I
could add the two following lines and create worker threads:

$ActionQueueWorkerThreads 20
$MainMsgQueueWorkerThreads 20

Would I have to add additional lines to the config. My config once
again looks like so:

$ModLoad immark
$ModLoadd imudp
$UDPServerAddress 172.16.12.138
$UDPServerRun 514
$ModLoad imtcp
$ModLoad imuxsock
$ModLoad imklog
$ModLoad ommysql.so

$template template1,"CALL
SAT2_RSYSLOG_EVENT_INSERT('%timestamp:::date-mysql%',
'%timegenerated:::date-mysql%', '%syslogfacility%',
syslogpriority%',
'%hostname%', '%syslogtag%', '%msg%')", sql

$WorkDirectory /rsyslog/work
$ActionQueueType LinkedList # use asynchronous processing
$ActionQueueFileName dbq # set file name, also enables disk mode
$ActionResumeRetryCount -1 # infinite retries on insert failure

*.* >172.16.2.238,rsyslog,syslogwriter,topsecret;template1


I would hope that there is an easy solution as my next idea is to write
some type of daemonized process that can insert messages from a pool of
MySQL connections. I can achieve this in C but would rather hopefully
find a solution inside of the configuration.

Daniel M. Anson
Linux Systems Engineer
Rackspace Managed Hosting
danson@rackspace.com

-----Original Message-----
From: rsyslog-bounces@lists.adiscon.com
[mailto:rsyslog-bounces@lists.adiscon.com] On Behalf Of RB
Sent: Thursday, January 15, 2009 1:11 PM
To: rsyslog-users
Subject: Re: [rsyslog] Baclogged files to disk are pretty slow

Many more measurements are needed before declaring a conclusive cause,
but on the surface it seems that your bottleneck is not rsyslog or the
sending server but the database itself. Comments below.

On Thu, Jan 15, 2009 at 10:28, Daniel Anson <danson@rackspace.com>
wrote:
> I have been dealing with this problem for a few days now and perhaps I
> will be able to solicit some advice or help. Here is the issue. I
have
> an rsyslog relay writing to a remote database server and caching to
> disk. The write to the database uses a MySQL stored procedure that
can
> write about 4000 records per second. The rsyslog.conf parts are set
up

Is that 4000 TPS burst or sustained speed?

> If I turn off the database, in this case I turned it off for almost a
> day, it backlogs nearly a 1 GB worth of information. The problem is

Roughly how many records?

> that it takes nearly 6 hours to catch back up from this. While
catching
> up, it only uses about 1% of the proc. Bandwidth is not an issue as
the

What's the processor and disk load look like on your MySQL server?

> fibre link is only about 50% saturated. Is there a way to force

Presuming 50% is your bps, what was your PPS? Depending on how large
your average event/transaction are, you may never see 100% due to
small packets.

> not. Is there a way to force rsyslogd to use a pool of MySQL
> connections or intiate a new connection each time a record is written?

Ranier confirmed my suspicion that rsyslog executes a single
transaction per event, which is (as he also notes) sub-optimal for
performance. Batching really should be about the same logic as the
MARK functionality: every N foo, output "bar".

Multiple actions per transaction (batching) is a classic query tuning
technique and can be approached many ways, but you probably need to
verify your database I/O is indeed the bottleneck.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse@rackspace.com, and delete the original message.
Your cooperation is appreciated.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Baclogged files to disk are pretty slow [ In reply to ]
On Thu, Jan 15, 2009 at 16:01, Daniel Anson <danson@rackspace.com> wrote:
> I would hope that there is an easy solution as my next idea is to write
> some type of daemonized process that can insert messages from a pool of
> MySQL connections. I can achieve this in C but would rather hopefully
> find a solution inside of the configuration.

Short of implementing the queue/worker configuration (no idea how), it
seems the only current option would be to implement something of the
sort, either by an update to the ommysql module (optimal, as it gets
your code supported by someone else for its lifetim) or by some
external program.

I'd think an optimal external solution would be some sort of
relp2mysql bridge, but suspect that would end up reimplementing a good
chunk of rsyslog.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: Baclogged files to disk are pretty slow [ In reply to ]
On Fri, 16 Jan 2009, RB wrote:

> On Thu, Jan 15, 2009 at 16:01, Daniel Anson <danson@rackspace.com> wrote:
>> I would hope that there is an easy solution as my next idea is to write
>> some type of daemonized process that can insert messages from a pool of
>> MySQL connections. I can achieve this in C but would rather hopefully
>> find a solution inside of the configuration.
>
> Short of implementing the queue/worker configuration (no idea how), it
> seems the only current option would be to implement something of the
> sort, either by an update to the ommysql module (optimal, as it gets
> your code supported by someone else for its lifetim) or by some
> external program.

multiple workers will help if mySQL can handle more transactions at a time
if the hit in parallel.

the fact that you are doing 4000/sec indicates that you are not doing a
fsync for each insert, so it is unlikly to help (if you are fsync limited
the data rates are probably gong to be closer to 100-200/sec depending on
your drives)

> I'd think an optimal external solution would be some sort of
> relp2mysql bridge, but suspect that would end up reimplementing a good
> chunk of rsyslog.

actually, the optimal solution is to modify rsyslog to be able to handle
multiple messages at once in the output queues. that is a major effort
(2-4 man weeks) that will require a sponser. Once this is implemented I
would expect the throughput to go up by 2-3 orders of magnatude for
database inserts.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com