Mailing List Archive: Higher than High performance Rsyslog advice/suggestions?

Higher than High performance Rsyslog advice/suggestions?

Jul 14, 2023, 7:05 AM

Post #1 of 5 (215 views)

Ubuntu 22.04LTS
Rsyslog 8.2112.0

This server is setup to receive Syslog data from up to 13 sources, mostly networking like Cisco and Meraki. Recently had been troubleshooting an issue where the Rsyslog daemon will quit after 4-5 days. Have not been able to determine an actual cause.. but when looking at the server I began to tail the rsyslog_stat.log file. Today over the course of ~7 hours the enqueued value for the `firewall` log for example, rose from 0 to 3.8M. There were no signs of it ever emptying. The same goes with Meraki and VSCA.

So I found the document here: https://www.rsyslog.com/doc/master/examples/high_performance.html Made some changes to what I think might help but so far it?s been ~4 hours and the stats log is exhibiting the same things as before.

Is this a valid way to determining the performance of Rsyslog? If not, is there a better way?
Am I understanding queues correctly in that they should not just increase in count forever?

Yesterday at around 3pm EST I restarted rsyslog, checking this morning the `enequeue` values for meraki is 37.5M, vcsa 7.4M and firewalls 3.4M. I feel like I?m doing something wrong here..

Below are the related conf files:

Rsyslog.conf:
$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog # provides kernel logging support (previously done by rklogd)
$ModLoad immark # provides --MARK-- message capability

module(load="imudp" threads="2" timeRequery="8" batchSize="128")
input(type="imudp" port=["514","20514","20515","20516","20517","20518","20519","20520","20525","20526","20527","20528","20529","20530"]
name="" name.appendPort="on")

module(load="impstats" interval="10" log.file="/var/log/rsyslog_stats.log" log.syslog="off")
module(load="imtcp" MaxSessions="500")
input(type="imtcp" port="514")

/etc/rsyslog.d/05-remote-syslog.conf:

ruleset(name="switches20514" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="3000"){
action(type="omfile" file="/var/log/remote-syslog/switches.log")
}
ruleset(name="routers20515" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="3000"){
action(type="omfile" file="/var/log/remote-syslog/routers.log")
}

ruleset(name="wlan20516" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="5000"){
action(type="omfile" file="/var/log/remote-syslog/wlan.log")
}

ruleset(name="firewalls20517" queue.type="fixedArray" queue.size="250000" queue.dequeueBatchSize="4096" queue.workerThreads="6" queue.workerThreadMinimumMessages="60000"){
action(type="omfile" file="/var/log/remote-syslog/firewalls.log" ioBufferSize="64K" flushOnTXEnd="off")
}

ruleset(name="stealth20518" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="5000"){
action(type="omfile" file="/var/log/remote-syslog/stealth.log")
}

ruleset(name="nexus20519" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="5000"){
action(type="omfile" file="/var/log/remote-syslog/nexus.log")
}

ruleset(name="lomsmx20521" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="6000"){
action(type="omfile" file="/var/log/remote-syslog/lom_smx11.log")
}

ruleset(name="vcsa20525" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="3000"){
action(type="omfile" file="/var/log/remote-syslog/vcsa.log")
}

ruleset(name="ciscoasa20526" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="3000"){
action(type="omfile" file="/var/log/remote-syslog/asa.log")
}

ruleset(name="pwrapc20527" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="3000"){
action(type="omfile" file="/var/log/remote-syslog/power_apc.log")
}

ruleset(name="pwrraritan20528" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="6000"){
action(type="omfile" file="/var/log/remote-syslog/power_raritan.log")
}
ruleset(name="ise20529" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="5000"){
action(type="omfile" file="/var/log/remote-syslog/ise.log")
}

ruleset(name="meraki20530" queue.type="fixedArray" queue.size="250000" queue.dequeueBatchSize="4096" queue.workerThreads="4" queue.workerThreadMinimumMessages="60000"){
action(type="omfile" file="/var/log/remote-syslog/meraki.log" ioBufferSize="64K" flushOnTXEnd="off")
}

input(type="imudp" port="20514" ruleset="switches20514")
input(type="imudp" port="20515" ruleset="routers20515")
input(type="imudp" port="20516" ruleset="wlan20516")
input(type="imudp" port="20517" ruleset="firewalls20517")
input(type="imudp" port="20518" ruleset="stealth20518")
input(type="imudp" port="20519" ruleset="nexus20519")
input(type="imudp" port="20521" ruleset="lomsmx20521")
input(type="imudp" port="20525" ruleset="vcsa20525")
input(type="imudp" port="20526" ruleset="ciscoasa20526")
input(type="imudp" port="20527" ruleset="pwrapc20527")
input(type="imudp" port="20528" ruleset="pwrraritan20528")
input(type="imudp" port="20529" ruleset="ise20529")
input(type="imudp" port="20530" ruleset="meraki20530")

[Jamf]

Ben Hart
IT Systems Administrator II
100 Washington Ave S, Minneapolis, MN 55401
[Phone]
+00 1 989 424 0187
[Email]
ben.hart@jamf.com
[Web]
www.jamf.com<https://www.jamf.com>
[Facebook] [Twitter] [LinkedIn] [YouTube]

Re: Higher than High performance Rsyslog advice/suggestions? [ In reply to ]

rsyslog at lists

Jul 14, 2023, 9:25 AM

Post #2 of 5 (215 views)

Permalink

enqueued is a running total of how many messages have been put in teh queue
since you restarted (unless you configure impstats to reset it's counters each
run, but that can lose some data due to race conditions)

it's sad but true that most attempts to optimize rsyslog actually end up hurting
performance mroe than they help, and rsyslog with simple configs is frequently
fast enough to not need any optimization.

having too many threads and too many queues can actually slow you down.

with omfile for example, the overhead of locking the queue with one thread,
inserting the message, unlocking the queue and
then locking the queue with a different thread, marking that you are starting
to work on the message, unlocking the queue, locking the queue, marking that you
processed the message and unlocking the queue absolutly dwarf the cost of just
writing the log to disk

multiple threads can also cause more locking overhead. you should only increase
threads if your measurements show that you have a thread maxing out a core (top,
then hit H to show threads, see if any thread is hitting 100% cpu)

multiple thread when you are using omfile is even worse, as the omfile then has
to do locking itself to prevent the multiple threads from writing at the same
time.

you only want to use threads when you have expensive processing (which can be a
bad template, but there are ways to improve that)

now, a queue on a ruleset that is being tied to an input is a bit different,
that queue then replaces the use (and locking) of the main queue and can be a
win.

the bigger win is usually just increasing the batch size, but increasing the
size produces diminishing returns, above a few hundred to a few thousand is
seldom useful

What is the volume of logs you are trying to process? what is making you think
you need to change things to improve performance?

please show a couple rounds of impstats output under load, and ideally a
smapshot of top (with H to show the threads), and iostat -cdtyz 10 or something
similar to show the disk activity during this time.

David Lang

Re: Higher than High performance Rsyslog advice/suggestions? [ In reply to ]

rsyslog at lists

Jul 15, 2023, 5:30 AM

Post #3 of 5 (213 views)

Permalink

Let me add some maths to this.

Firstly, 3.8 millions events over 7 hours is roughly 150EPS. It's not
even close to high performance syslog. OK. Assuming you aggregate
several such sources, you're still in the range of "low thousands". It's
something rsyslog with pretty default config handles relatively easy in
terms of processing the events. (I have instances of rsyslog which
process about 30-35k EPS receiving from multiple sources and forwarding
to specified destinations and they don't even break a sweat).

But it can of course all go to pieces if your storage (understood as a
whole from the underlying device up to the OS-level filesystem
parameters) can't keep up. But again - thousand EPS with - let's assume
- a 1kB per event is just 1MBps of constant data stream - it's not
something modern systems can't handle. (unless of course you're trying
to write over CIFS to a remote share with write-through caching).

On 14.07.2023 18:25, David Lang via rsyslog wrote:
> enqueued is a running total of how many messages have been put in teh
> queue since you restarted (unless you configure impstats to reset it's
> counters each run, but that can lose some data due to race conditions)
>
> it's sad but true that most attempts to optimize rsyslog actually end
> up hurting performance mroe than they help, and rsyslog with simple
> configs is frequently fast enough to not need any optimization.
>
> having too many threads and too many queues can actually slow you down.
>
> with omfile for example, the overhead of locking the queue with one
> thread, inserting the message, unlocking the queue and then locking
> the queue with a different thread, marking that you are starting to
> work on the message, unlocking the queue, locking the queue, marking
> that you processed the message and unlocking the queue absolutly dwarf
> the cost of just writing the log to disk
>
> multiple threads can also cause more locking overhead. you should only
> increase threads if your measurements show that you have a thread
> maxing out a core (top, then hit H to show threads, see if any thread
> is hitting 100% cpu)
>
> multiple thread when you are using omfile is even worse, as the omfile
> then has to do locking itself to prevent the multiple threads from
> writing at the same time.
>
> you only want to use threads when you have expensive processing (which
> can be a bad template, but there are ways to improve that)
>
> now, a queue on a ruleset that is being tied to an input is a bit
> different, that queue then replaces the use (and locking) of the main
> queue and can be a win.
>
> the bigger win is usually just increasing the batch size, but
> increasing the size produces diminishing returns, above a few hundred
> to a few thousand is seldom useful
>
>
> What is the volume of logs you are trying to process? what is making
> you think you need to change things to improve performance?
>
> please show a couple rounds of impstats output under load, and ideally
> a smapshot of top (with H to show the threads), and iostat -cdtyz 10
> or something similar to show the disk activity during this time.
>
> David Lang
>
> Ubuntu 22.04LTS
> Rsyslog 8.2112.0
>
> This server is setup to receive Syslog data from up to 13 sources, mostly networking like Cisco and Meraki. Recently had been troubleshooting an issue where the Rsyslog daemon will quit after 4-5 days. Have not been able to determine an actual cause.. but when looking at the server I began to tail the rsyslog_stat.log file. Today over the course of ~7 hours the enqueued value for the `firewall` log for example, rose from 0 to 3.8M. There were no signs of it ever emptying. The same goes with Meraki and VSCA.
>
> So I found the document here: https://www.rsyslog.com/doc/master/examples/high_performance.html Made some changes to what I think might help but so far it’s been ~4 hours and the stats log is exhibiting the same things as before.
>
> Is this a valid way to determining the performance of Rsyslog? If not, is there a better way?
> Am I understanding queues correctly in that they should not just increase in count forever?
>
> Yesterday at around 3pm EST I restarted rsyslog, checking this morning the `enequeue` values for meraki is 37.5M, vcsa 7.4M and firewalls 3.4M. I feel like I’m doing something wrong here..
>
> Below are the related conf files:
>
> Rsyslog.conf:
> $ModLoad imuxsock # provides support for local system logging
> $ModLoad imklog # provides kernel logging support (previously done by rklogd)
> $ModLoad immark # provides --MARK-- message capability
>
> module(load="imudp" threads="2" timeRequery="8" batchSize="128")
> input(type="imudp" port=["514","20514","20515","20516","20517","20518","20519","20520","20525","20526","20527","20528","20529","20530"]
> name="" name.appendPort="on")
>
> module(load="impstats" interval="10" log.file="/var/log/rsyslog_stats.log" log.syslog="off")
> module(load="imtcp" MaxSessions="500")
> input(type="imtcp" port="514")
>
> /etc/rsyslog.d/05-remote-syslog.conf:
>
> ruleset(name="switches20514" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="3000"){
> action(type="omfile" file="/var/log/remote-syslog/switches.log")
> }
> ruleset(name="routers20515" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="3000"){
> action(type="omfile" file="/var/log/remote-syslog/routers.log")
> }
>
> ruleset(name="wlan20516" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="5000"){
> action(type="omfile" file="/var/log/remote-syslog/wlan.log")
> }
>
> ruleset(name="firewalls20517" queue.type="fixedArray" queue.size="250000" queue.dequeueBatchSize="4096" queue.workerThreads="6" queue.workerThreadMinimumMessages="60000"){
> action(type="omfile" file="/var/log/remote-syslog/firewalls.log" ioBufferSize="64K" flushOnTXEnd="off")
> }
>
> ruleset(name="stealth20518" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="5000"){
> action(type="omfile" file="/var/log/remote-syslog/stealth.log")
> }
>
> ruleset(name="nexus20519" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="5000"){
> action(type="omfile" file="/var/log/remote-syslog/nexus.log")
> }
>
> ruleset(name="lomsmx20521" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="6000"){
> action(type="omfile" file="/var/log/remote-syslog/lom_smx11.log")
> }
>
> ruleset(name="vcsa20525" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="3000"){
> action(type="omfile" file="/var/log/remote-syslog/vcsa.log")
> }
>
> ruleset(name="ciscoasa20526" queue.type="linkedlist" queue.workerThreads="2" queue.workerThreadMinimumMessages="3000"){
> action(type="omfile" file="/var/log/remote-syslog/asa.log")
> }
>
> ruleset(name="pwrapc20527" queue.type="linkedlist" queue.workerThreads="1" queue.workerThreadMinimumMessages="3000"){
> action(type="omfile" file="/var/log/remote-syslog/power_apc.log")
> }
>
> ruleset(name="pwrraritan20528" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="6000"){
> action(type="omfile" file="/var/log/remote-syslog/power_raritan.log")
> }
> ruleset(name="ise20529" queue.type="linkedlist" queue.workerThreads="4" queue.workerThreadMinimumMessages="5000"){
> action(type="omfile" file="/var/log/remote-syslog/ise.log")
> }
>
> ruleset(name="meraki20530" queue.type="fixedArray" queue.size="250000" queue.dequeueBatchSize="4096" queue.workerThreads="4" queue.workerThreadMinimumMessages="60000"){
> action(type="omfile" file="/var/log/remote-syslog/meraki.log" ioBufferSize="64K" flushOnTXEnd="off")
> }
>
> input(type="imudp" port="20514" ruleset="switches20514")
> input(type="imudp" port="20515" ruleset="routers20515")
> input(type="imudp" port="20516" ruleset="wlan20516")
> input(type="imudp" port="20517" ruleset="firewalls20517")
> input(type="imudp" port="20518" ruleset="stealth20518")
> input(type="imudp" port="20519" ruleset="nexus20519")
> input(type="imudp" port="20521" ruleset="lomsmx20521")
> input(type="imudp" port="20525" ruleset="vcsa20525")
> input(type="imudp" port="20526" ruleset="ciscoasa20526")
> input(type="imudp" port="20527" ruleset="pwrapc20527")
> input(type="imudp" port="20528" ruleset="pwrraritan20528")
> input(type="imudp" port="20529" ruleset="ise20529")
> input(type="imudp" port="20530" ruleset="meraki20530")
>
>
>
>
>
>
> [Jamf]
>
>
> Ben Hart
> IT Systems Administrator II
> 100 Washington Ave S, Minneapolis, MN 55401
> [Phone]
> +00 1 989 424 0187
> [Email]
> ben.hart@jamf.com
> [Web]
> www.jamf.com<https://www.jamf.com>
> [Facebook] [Twitter] [LinkedIn] [YouTube]
>
>
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Re: Higher than High performance Rsyslog advice/suggestions? [ In reply to ]

rsyslog at lists

Jul 17, 2023, 9:41 AM

Post #4 of 5 (210 views)

Permalink

in terms of rsyslog scale, I've run rsyslog where it saturated Gb ethernet and
had it keep up without a problem on reasonably modest hardware (300k logs/sec)
and in stripped down configs on faster networks, others have hit 1m logs/sec

500kb of logs every hour is pretty trivial, even 500k messages/hour is ~8k
logs/min or ~140 logs/sec.

and since a queue on the ruleset with an input bound to that ruleset is almost
the same as running a completely separate instance of rsyslog for that ruleset,
things become even more trivial

so there should be no need to configure a thread count >1 anywhere with this
volume

Mon Jul 17 09:08:11 2023: f_all: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0

enqueued means 'added to the queue'

size means 'number of items in the queue at the time of sample'

maxqsize means 'max number of items in the queue since startup'

full means 'the number of times things could not be added to the queue because it was full'

discarded* means 'the number of logs thrown away because the queue was too full'
(this is based on the watermark settings, not something that happens by default)

everything but size is a running total

Mon Jul 17 09:08:11 2023: imudp(*/20525/IPv4): origin=imudp submitted=35802111 disallowed=0

submitted means 'the number of log messages that arrived via this port'

Mon Jul 17 09:08:11 2023: dynafile cache d_wlc: origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0

watch these lines for missed/evicted to become large. If they shoot up you need
to set the dynafilecachesize larger (if you have something like dates in your
template, each time the date changes you will see a miss and eventually an
eviction. But if the cache size is smaller than the working set that you will be
actively writing to, these will be huge and performance will plummet)

Mon Jul 17 09:08:11 2023: action-12-builtin:omfile: origin=core.action processed=143880635 failed=0 suspended=0 suspended.duration=0 resumed=0

processed means 'the number of log mesages its handled'
suspended means 'the number of times it's stopped processing'
failed means 'the number of times the connection has just failed'

your iostat output didn't include the extended information from -x, one of those
items is the percent utilization of the disk. your numbers look low enough that
I wouldn't expect there to be any significant problem.

cpu utilization looks trivial

David Lang

On Mon, 17 Jul 2023, Ben Hart wrote:

> Much appreciated David!
>
> I had been searching for this `enqueued` term and found almost nothing.. I?m glad to hear that?s just more of a running tally of items queued and not so much indicative of queued-but-unprocessed-items.
> Glad to hear I was on the right track in the beginning by going with the ruleset with individual queues.
>
> So here?s the situation: This UF host receives and forwards log data to Splunk Cloud from networking devices that are un-able to communication to an HTTP SplunkCloud listener.
> Networking reported data missing from SplunkCloud, So I head off to this host and start poking around. The Rsyslog daemon was running, no obvious errors that I could see. The Universal Forward was the same although I admit it?s harder to find potential performance issues in the UF especially when you only have visibility from one side (I have no access to SC directly).
>
> Anyway.. the data coming into SC was kinda sporadic.. and being that I did not know what enqueued meant. To me en-queued would mean ?in the queue? you know? Anyway that figure kept growing and growing, I went looking for high performance tips for rsyslog.
> The two largest (and most important) log files grow by roughly 50k (firewall.log) and 500k (meraki.log) every hour. To me that?s pretty high.. to those more experienced with Rsyslog possibly not.
>
> In any case I was just wanting to make sure I had the best possible performing Rsyslog config I could get. The info you requested is attached, maybe it shows that I?m worried over nothing, or maybe it shows I have resources for improvement.
>
> Thanks!
>
> From: David Lang <david@lang.hm>
> Date: Friday, July 14, 2023 at 12:26 PM
> To: Ben Hart via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Ben Hart <ben.hart@jamf.com>
> Subject: Re: [rsyslog] Higher than High performance Rsyslog advice/suggestions?
> enqueued is a running total of how many messages have been put in teh queue
> since you restarted (unless you configure impstats to reset it's counters each
> run, but that can lose some data due to race conditions)
>
> it's sad but true that most attempts to optimize rsyslog actually end up hurting
> performance mroe than they help, and rsyslog with simple configs is frequently
> fast enough to not need any optimization.
>
> having too many threads and too many queues can actually slow you down.
>
> with omfile for example, the overhead of locking the queue with one thread,
> inserting the message, unlocking the queue and
> then locking the queue with a different thread, marking that you are starting
> to work on the message, unlocking the queue, locking the queue, marking that you
> processed the message and unlocking the queue absolutly dwarf the cost of just
> writing the log to disk
>
> multiple threads can also cause more locking overhead. you should only increase
> threads if your measurements show that you have a thread maxing out a core (top,
> then hit H to show threads, see if any thread is hitting 100% cpu)
>
> multiple thread when you are using omfile is even worse, as the omfile then has
> to do locking itself to prevent the multiple threads from writing at the same
> time.
>
> you only want to use threads when you have expensive processing (which can be a
> bad template, but there are ways to improve that)
>
> now, a queue on a ruleset that is being tied to an input is a bit different,
> that queue then replaces the use (and locking) of the main queue and can be a
> win.
>
> the bigger win is usually just increasing the batch size, but increasing the
> size produces diminishing returns, above a few hundred to a few thousand is
> seldom useful
>
>
> What is the volume of logs you are trying to process? what is making you think
> you need to change things to improve performance?
>
> please show a couple rounds of impstats output under load, and ideally a
> smapshot of top (with H to show the threads), and iostat -cdtyz 10 or something
> similar to show the disk activity during this time.
>
> David Lang
> Caution: This email originated from outside of Jamf. DO NOT click on links or open attachments unless you were expecting, recognize, and know the content is safe.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Re: Higher than High performance Rsyslog advice/suggestions? [ In reply to ]

rsyslog at lists

Jul 20, 2023, 7:31 AM

Post #5 of 5 (201 views)

Permalink

Thanks for the clarification and additional info David.

From: David Lang <david@lang.hm>
Date: Monday, July 17, 2023 at 12:41 PM
To: Ben Hart <ben.hart@jamf.com>
Cc: David Lang <david@lang.hm>, Ben Hart via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Higher than High performance Rsyslog advice/suggestions?
in terms of rsyslog scale, I've run rsyslog where it saturated Gb ethernet and
had it keep up without a problem on reasonably modest hardware (300k logs/sec)
and in stripped down configs on faster networks, others have hit 1m logs/sec

500kb of logs every hour is pretty trivial, even 500k messages/hour is ~8k
logs/min or ~140 logs/sec.

and since a queue on the ruleset with an input bound to that ruleset is almost
the same as running a completely separate instance of rsyslog for that ruleset,
things become even more trivial

so there should be no need to configure a thread count >1 anywhere with this
volume

Mon Jul 17 09:08:11 2023: f_all: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0

enqueued means 'added to the queue'

size means 'number of items in the queue at the time of sample'

maxqsize means 'max number of items in the queue since startup'

full means 'the number of times things could not be added to the queue because it was full'

discarded* means 'the number of logs thrown away because the queue was too full'
(this is based on the watermark settings, not something that happens by default)

everything but size is a running total

Mon Jul 17 09:08:11 2023: imudp(*/20525/IPv4): origin=imudp submitted=35802111 disallowed=0

submitted means 'the number of log messages that arrived via this port'

Mon Jul 17 09:08:11 2023: dynafile cache d_wlc: origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0

watch these lines for missed/evicted to become large. If they shoot up you need
to set the dynafilecachesize larger (if you have something like dates in your
template, each time the date changes you will see a miss and eventually an
eviction. But if the cache size is smaller than the working set that you will be
actively writing to, these will be huge and performance will plummet)

Mon Jul 17 09:08:11 2023: action-12-builtin:omfile: origin=core.action processed=143880635 failed=0 suspended=0 suspended.duration=0 resumed=0

processed means 'the number of log mesages its handled'
suspended means 'the number of times it's stopped processing'
failed means 'the number of times the connection has just failed'

your iostat output didn't include the extended information from -x, one of those
items is the percent utilization of the disk. your numbers look low enough that
I wouldn't expect there to be any significant problem.

cpu utilization looks trivial

David Lang

On Mon, 17 Jul 2023, Ben Hart wrote:

> Much appreciated David!
>
> I had been searching for this `enqueued` term and found almost nothing.. I?m glad to hear that?s just more of a running tally of items queued and not so much indicative of queued-but-unprocessed-items.
> Glad to hear I was on the right track in the beginning by going with the ruleset with individual queues.
>
> So here?s the situation: This UF host receives and forwards log data to Splunk Cloud from networking devices that are un-able to communication to an HTTP SplunkCloud listener.
> Networking reported data missing from SplunkCloud, So I head off to this host and start poking around. The Rsyslog daemon was running, no obvious errors that I could see. The Universal Forward was the same although I admit it?s harder to find potential performance issues in the UF especially when you only have visibility from one side (I have no access to SC directly).
>
> Anyway.. the data coming into SC was kinda sporadic.. and being that I did not know what enqueued meant. To me en-queued would mean ?in the queue? you know? Anyway that figure kept growing and growing, I went looking for high performance tips for rsyslog.
> The two largest (and most important) log files grow by roughly 50k (firewall.log) and 500k (meraki.log) every hour. To me that?s pretty high.. to those more experienced with Rsyslog possibly not.
>
> In any case I was just wanting to make sure I had the best possible performing Rsyslog config I could get. The info you requested is attached, maybe it shows that I?m worried over nothing, or maybe it shows I have resources for improvement.
>
> Thanks!
>
> From: David Lang <david@lang.hm>
> Date: Friday, July 14, 2023 at 12:26 PM
> To: Ben Hart via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Ben Hart <ben.hart@jamf.com>
> Subject: Re: [rsyslog] Higher than High performance Rsyslog advice/suggestions?
> enqueued is a running total of how many messages have been put in teh queue
> since you restarted (unless you configure impstats to reset it's counters each
> run, but that can lose some data due to race conditions)
>
> it's sad but true that most attempts to optimize rsyslog actually end up hurting
> performance mroe than they help, and rsyslog with simple configs is frequently
> fast enough to not need any optimization.
>
> having too many threads and too many queues can actually slow you down.
>
> with omfile for example, the overhead of locking the queue with one thread,
> inserting the message, unlocking the queue and
> then locking the queue with a different thread, marking that you are starting
> to work on the message, unlocking the queue, locking the queue, marking that you
> processed the message and unlocking the queue absolutly dwarf the cost of just
> writing the log to disk
>
> multiple threads can also cause more locking overhead. you should only increase
> threads if your measurements show that you have a thread maxing out a core (top,
> then hit H to show threads, see if any thread is hitting 100% cpu)
>
> multiple thread when you are using omfile is even worse, as the omfile then has
> to do locking itself to prevent the multiple threads from writing at the same
> time.
>
> you only want to use threads when you have expensive processing (which can be a
> bad template, but there are ways to improve that)
>
> now, a queue on a ruleset that is being tied to an input is a bit different,
> that queue then replaces the use (and locking) of the main queue and can be a
> win.
>
> the bigger win is usually just increasing the batch size, but increasing the
> size produces diminishing returns, above a few hundred to a few thousand is
> seldom useful
>
>
> What is the volume of logs you are trying to process? what is making you think
> you need to change things to improve performance?
>
> please show a couple rounds of impstats output under load, and ideally a
> smapshot of top (with H to show the threads), and iostat -cdtyz 10 or something
> similar to show the disk activity during this time.
>
> David Lang
> Caution: This email originated from outside of Jamf. DO NOT click on links or open attachments unless you were expecting, recognize, and know the content is safe.
>
Caution: This email originated from outside of Jamf. DO NOT click on links or open attachments unless you were expecting, recognize, and know the content is safe.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

Mailing List Archive

Attached Files:

Attached Files: