Mailing List Archive

Basic Rsyslog Troubleshooting
Greetings list

New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.

The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.

We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.

Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.

The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).

Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)

Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.

Thanks much

Regards,

Steven.

_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
One problem with TCP load balancing of syslog messages is that the load
balancers do not understand the syslog protocol, so they can't rebalance at a
message boundry.

A second problem is that when a firewall or load balancer drops a connection,
the sender doesn't know that it's dropped until the next time it tries to
deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
software "that message that you submitted to an open connection, and I accepted,
it can't be delivered" (once the OS accepts the message, the sender has to
assume that it will be delivered)

As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
The 'common sense wisdom' is that TCP is reliable because dropped packets inside
an ongoing connection will get retried, but dropped packets are actually very
uncommon inside a datacenter. They may happen when a firewall/router is
overloaded, but it's not very common. Back in 2006 or so I did testing and found
that within a local network, UDP was almost perfectly reliable (as long as the
receiver could keep up and not overflow the OS buffers)

Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
reconnect periodically so that the load balancer has a chance to make a new
balancing decision.

you also want to make sure that the log stram is not idle for too long ('mark'
was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
it's not much larger and an extremely dense set of information that can be very
useful when troubleshooting)

The other thing to look at is the RELP protocol, it was developed specifically
because TCP was designed to be reliable over an unreliable wire, but assumes
that both ends will remain up and the connection will not be cut by a middlebox.
RELP does full application level acks so that the sender knows that the receiver
rsyslog actually processed the message

with plain TCP, once the sending software submits data to the OS stack and the
OS stack says it's accepted the data, the data then sits in a buffer on the
sending machine, then gets sent over the wire (with retries), then sits in a
buffer on the receiving machine until the receiving software reads it. If
anything causes the connection to be terminated (firewall, load balancer, crash
on the receiving machine, etc) the data will be lost and the sending software
has no way of learning about it.

David Lang


On Sun, 24 Apr 2022, Steven D via rsyslog wrote:

> Date: Sun, 24 Apr 2022 12:14:35 +0000
> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>
> Greetings list
>
> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>
> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>
> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>
> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>
> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>
> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>
> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>
> Thanks much
>
> Regards,
>
> Steven.
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
David

Thanks for the depth of this reply. Let me feed back in some additional info.

We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.

So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.

As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.

Thanks again.

Regards,

Steven.



-------- Original message --------
From: David Lang <david@lang.hm>
Date: 4/24/22 8:27 AM (GMT-05:00)
To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
Cc: Steven D <pheerless@hotmail.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

One problem with TCP load balancing of syslog messages is that the load
balancers do not understand the syslog protocol, so they can't rebalance at a
message boundry.

A second problem is that when a firewall or load balancer drops a connection,
the sender doesn't know that it's dropped until the next time it tries to
deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
software "that message that you submitted to an open connection, and I accepted,
it can't be delivered" (once the OS accepts the message, the sender has to
assume that it will be delivered)

As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
The 'common sense wisdom' is that TCP is reliable because dropped packets inside
an ongoing connection will get retried, but dropped packets are actually very
uncommon inside a datacenter. They may happen when a firewall/router is
overloaded, but it's not very common. Back in 2006 or so I did testing and found
that within a local network, UDP was almost perfectly reliable (as long as the
receiver could keep up and not overflow the OS buffers)

Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
reconnect periodically so that the load balancer has a chance to make a new
balancing decision.

you also want to make sure that the log stram is not idle for too long ('mark'
was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
it's not much larger and an extremely dense set of information that can be very
useful when troubleshooting)

The other thing to look at is the RELP protocol, it was developed specifically
because TCP was designed to be reliable over an unreliable wire, but assumes
that both ends will remain up and the connection will not be cut by a middlebox.
RELP does full application level acks so that the sender knows that the receiver
rsyslog actually processed the message

with plain TCP, once the sending software submits data to the OS stack and the
OS stack says it's accepted the data, the data then sits in a buffer on the
sending machine, then gets sent over the wire (with retries), then sits in a
buffer on the receiving machine until the receiving software reads it. If
anything causes the connection to be terminated (firewall, load balancer, crash
on the receiving machine, etc) the data will be lost and the sending software
has no way of learning about it.

David Lang


On Sun, 24 Apr 2022, Steven D via rsyslog wrote:

> Date: Sun, 24 Apr 2022 12:14:35 +0000
> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>
> Greetings list
>
> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>
> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>
> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>
> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>
> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>
> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>
> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>
> Thanks much
>
> Regards,
>
> Steven.
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
yes, both sender and receiver need to use relp

are you trying to use encryption on the connection?
what version of rsyslog are you running?

you mention a couple hundred senders writing to files based on the sender, did
you change the dynafilecachsize from the default? if not, this is something you
need to do (it uses more memory if too large, but cripples performance if even
slightly too small)

when troubleshooting, it's a good idea to setup impstats writing to a file
directly, it gives a lot of info about what's going on, queue sizes, etc.

when you are bypassing the load balancer, do you still have the extra stuck
connections? I think I saw some discussion on reducing those recently, so the
just releases v8.2204 may include a fix to help that. (I'd have to go digging
back through the mail archives), but if we fix the other things, I don't expect
it to be an issue in any case.

David Lang


On Sun, 24 Apr 2022, Steven D wrote:

> Date: Sun, 24 Apr 2022 12:57:43 +0000
> From: Steven D <pheerless@hotmail.com>
> To: David Lang <david@lang.hm>,
> Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: RE: [rsyslog] Basic Rsyslog Troubleshooting
>
> David
>
> Thanks for the depth of this reply. Let me feed back in some additional info.
>
> We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.
>
> So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.
>
> As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.
>
> Thanks again.
>
> Regards,
>
> Steven.
>
>
>
> -------- Original message --------
> From: David Lang <david@lang.hm>
> Date: 4/24/22 8:27 AM (GMT-05:00)
> To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> One problem with TCP load balancing of syslog messages is that the load
> balancers do not understand the syslog protocol, so they can't rebalance at a
> message boundry.
>
> A second problem is that when a firewall or load balancer drops a connection,
> the sender doesn't know that it's dropped until the next time it tries to
> deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
> software "that message that you submitted to an open connection, and I accepted,
> it can't be delivered" (once the OS accepts the message, the sender has to
> assume that it will be delivered)
>
> As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
> The 'common sense wisdom' is that TCP is reliable because dropped packets inside
> an ongoing connection will get retried, but dropped packets are actually very
> uncommon inside a datacenter. They may happen when a firewall/router is
> overloaded, but it's not very common. Back in 2006 or so I did testing and found
> that within a local network, UDP was almost perfectly reliable (as long as the
> receiver could keep up and not overflow the OS buffers)
>
> Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
> reconnect periodically so that the load balancer has a chance to make a new
> balancing decision.
>
> you also want to make sure that the log stram is not idle for too long ('mark'
> was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
> it's not much larger and an extremely dense set of information that can be very
> useful when troubleshooting)
>
> The other thing to look at is the RELP protocol, it was developed specifically
> because TCP was designed to be reliable over an unreliable wire, but assumes
> that both ends will remain up and the connection will not be cut by a middlebox.
> RELP does full application level acks so that the sender knows that the receiver
> rsyslog actually processed the message
>
> with plain TCP, once the sending software submits data to the OS stack and the
> OS stack says it's accepted the data, the data then sits in a buffer on the
> sending machine, then gets sent over the wire (with retries), then sits in a
> buffer on the receiving machine until the receiving software reads it. If
> anything causes the connection to be terminated (firewall, load balancer, crash
> on the receiving machine, etc) the data will be lost and the sending software
> has no way of learning about it.
>
> David Lang
>
>
> On Sun, 24 Apr 2022, Steven D via rsyslog wrote:
>
>> Date: Sun, 24 Apr 2022 12:14:35 +0000
>> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
>> Cc: Steven D <pheerless@hotmail.com>
>> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Greetings list
>>
>> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>>
>> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>>
>> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>>
>> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>>
>> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>>
>> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>>
>> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>>
>> Thanks much
>>
>> Regards,
>>
>> Steven.
>>
>> _______________________________________________
>> rsyslog mailing list
>> https://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
No sir, no encryption is in play. Just plain ol TCP syslog. We're running rsyslog v8.1911

I don't think we've enforced any additional dynafile settings, but i'll double check there's not an override I don't know about. Do youz have a rough order guidance for dynafilecachesize settings?

After getting thoroughly lost in the Google rabbit hole, I had other questions;

Would using imptcp over imtcp help?

* I saw mentions that imptcp handles connections better/performs better.

Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?

* The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.

For reference, our main input config looks like this. There are a number of other input entries, but they're all variations on this for different log source types (servers, apps, etc).

module(load="imudp")
module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000")

input(type="imudp" port="10514" ruleset="firewall_rule")
input(type="imtcp" port="10514" ruleset="firewall_rule")
template(name="firewall_logs" type="string" string="/data/logs/firewall/10514/%fromhost-ip%/syslog.log")
ruleset(name="firewall_rule") {
action(type="omfile"
FileCreateMode="0744"
DirCreateMode="0755"
FileOwner="loguser"
FileGroup="loguser"
DirOwner="loguser"
DirGroup="loguser"
DynaFile="firewall_logs")
}
.
.
.
[snip]

Thank again, really appreciate the insight.
________________________________
From: rsyslog <rsyslog-bounces@lists.adiscon.com> on behalf of Steven D via rsyslog <rsyslog@lists.adiscon.com>
Sent: Sunday, April 24, 2022 8:57 AM
To: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
Cc: Steven D <pheerless@hotmail.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

David

Thanks for the depth of this reply. Let me feed back in some additional info.

We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.

So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.

As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.

Thanks again.

Regards,

Steven.



-------- Original message --------
From: David Lang <david@lang.hm>
Date: 4/24/22 8:27 AM (GMT-05:00)
To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
Cc: Steven D <pheerless@hotmail.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

One problem with TCP load balancing of syslog messages is that the load
balancers do not understand the syslog protocol, so they can't rebalance at a
message boundry.

A second problem is that when a firewall or load balancer drops a connection,
the sender doesn't know that it's dropped until the next time it tries to
deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
software "that message that you submitted to an open connection, and I accepted,
it can't be delivered" (once the OS accepts the message, the sender has to
assume that it will be delivered)

As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
The 'common sense wisdom' is that TCP is reliable because dropped packets inside
an ongoing connection will get retried, but dropped packets are actually very
uncommon inside a datacenter. They may happen when a firewall/router is
overloaded, but it's not very common. Back in 2006 or so I did testing and found
that within a local network, UDP was almost perfectly reliable (as long as the
receiver could keep up and not overflow the OS buffers)

Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
reconnect periodically so that the load balancer has a chance to make a new
balancing decision.

you also want to make sure that the log stram is not idle for too long ('mark'
was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
it's not much larger and an extremely dense set of information that can be very
useful when troubleshooting)

The other thing to look at is the RELP protocol, it was developed specifically
because TCP was designed to be reliable over an unreliable wire, but assumes
that both ends will remain up and the connection will not be cut by a middlebox.
RELP does full application level acks so that the sender knows that the receiver
rsyslog actually processed the message

with plain TCP, once the sending software submits data to the OS stack and the
OS stack says it's accepted the data, the data then sits in a buffer on the
sending machine, then gets sent over the wire (with retries), then sits in a
buffer on the receiving machine until the receiving software reads it. If
anything causes the connection to be terminated (firewall, load balancer, crash
on the receiving machine, etc) the data will be lost and the sending software
has no way of learning about it.

David Lang


On Sun, 24 Apr 2022, Steven D via rsyslog wrote:

> Date: Sun, 24 Apr 2022 12:14:35 +0000
> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>
> Greetings list
>
> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>
> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>
> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>
> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>
> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>
> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>
> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>
> Thanks much
>
> Regards,
>
> Steven.
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
In a RHEL environment, the imptcp module performs better than the imtcp module IF encryption is not required. The imtcp module is required if encryption is required.

Regards,



> On Apr 24, 2022, at 09:12, Steven D via rsyslog <rsyslog@lists.adiscon.com> wrote:
>
> No sir, no encryption is in play. Just plain ol TCP syslog. We're running rsyslog v8.1911
>
> I don't think we've enforced any additional dynafile settings, but i'll double check there's not an override I don't know about. Do youz have a rough order guidance for dynafilecachesize settings?
>
> After getting thoroughly lost in the Google rabbit hole, I had other questions;
>
> Would using imptcp over imtcp help?
>
> * I saw mentions that imptcp handles connections better/performs better.
>
> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>
> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.
>
> For reference, our main input config looks like this. There are a number of other input entries, but they're all variations on this for different log source types (servers, apps, etc).
>
> module(load="imudp")
> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000")
>
> input(type="imudp" port="10514" ruleset="firewall_rule")
> input(type="imtcp" port="10514" ruleset="firewall_rule")
> template(name="firewall_logs" type="string" string="/data/logs/firewall/10514/%fromhost-ip%/syslog.log")
> ruleset(name="firewall_rule") {
> action(type="omfile"
> FileCreateMode="0744"
> DirCreateMode="0755"
> FileOwner="loguser"
> FileGroup="loguser"
> DirOwner="loguser"
> DirGroup="loguser"
> DynaFile="firewall_logs")
> }
> .
> .
> .
> [snip]
>
> Thank again, really appreciate the insight.
> ________________________________
> From: rsyslog <rsyslog-bounces@lists.adiscon.com> on behalf of Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Sent: Sunday, April 24, 2022 8:57 AM
> To: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> David
>
> Thanks for the depth of this reply. Let me feed back in some additional info.
>
> We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.
>
> So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.
>
> As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.
>
> Thanks again.
>
> Regards,
>
> Steven.
>
>
>
> -------- Original message --------
> From: David Lang <david@lang.hm>
> Date: 4/24/22 8:27 AM (GMT-05:00)
> To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> One problem with TCP load balancing of syslog messages is that the load
> balancers do not understand the syslog protocol, so they can't rebalance at a
> message boundry.
>
> A second problem is that when a firewall or load balancer drops a connection,
> the sender doesn't know that it's dropped until the next time it tries to
> deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
> software "that message that you submitted to an open connection, and I accepted,
> it can't be delivered" (once the OS accepts the message, the sender has to
> assume that it will be delivered)
>
> As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
> The 'common sense wisdom' is that TCP is reliable because dropped packets inside
> an ongoing connection will get retried, but dropped packets are actually very
> uncommon inside a datacenter. They may happen when a firewall/router is
> overloaded, but it's not very common. Back in 2006 or so I did testing and found
> that within a local network, UDP was almost perfectly reliable (as long as the
> receiver could keep up and not overflow the OS buffers)
>
> Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
> reconnect periodically so that the load balancer has a chance to make a new
> balancing decision.
>
> you also want to make sure that the log stram is not idle for too long ('mark'
> was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
> it's not much larger and an extremely dense set of information that can be very
> useful when troubleshooting)
>
> The other thing to look at is the RELP protocol, it was developed specifically
> because TCP was designed to be reliable over an unreliable wire, but assumes
> that both ends will remain up and the connection will not be cut by a middlebox.
> RELP does full application level acks so that the sender knows that the receiver
> rsyslog actually processed the message
>
> with plain TCP, once the sending software submits data to the OS stack and the
> OS stack says it's accepted the data, the data then sits in a buffer on the
> sending machine, then gets sent over the wire (with retries), then sits in a
> buffer on the receiving machine until the receiving software reads it. If
> anything causes the connection to be terminated (firewall, load balancer, crash
> on the receiving machine, etc) the data will be lost and the sending software
> has no way of learning about it.
>
> David Lang
>
>
> On Sun, 24 Apr 2022, Steven D via rsyslog wrote:
>
>> Date: Sun, 24 Apr 2022 12:14:35 +0000
>> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
>> Cc: Steven D <pheerless@hotmail.com>
>> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Greetings list
>>
>> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>>
>> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>>
>> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>>
>> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>>
>> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>>
>> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>>
>> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>>
>> Thanks much
>>
>> Regards,
>>
>> Steven.
>>
>> _______________________________________________
>> rsyslog mailing list
>> https://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Sun, 24 Apr 2022, Steven D wrote:

> No sir, no encryption is in play. Just plain ol TCP syslog. We're running rsyslog v8.1911
>
> I don't think we've enforced any additional dynafile settings, but i'll double check there's not an override I don't know about. Do youz have a rough order guidance for dynafilecachesize settings?

larger than the number of files that will be open at any time. If this is
smaller than the number of files being actively written to, every time you get a
log for a new file, rsyslog has to first close (and flush writes to) an old
file. performance utterly collapses.

> After getting thoroughly lost in the Google rabbit hole, I had other questions;
>
> Would using imptcp over imtcp help?
>
> * I saw mentions that imptcp handles connections better/performs better.

should not be a significant difference for the volume you describe, but won't
hurt to try

> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>
> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.

yes, that should help avoid the connection being idle long enough for the load
balancer to break the connection (and each break probably causes a log loss), I
would also enable rebind interval. I like to set it to reconnect every few
seconds under high log volume to give the load balancers the best chance to work

you can use a different method for failover. I like using pacemaker/corosync to
move an IP between the two systems (it has the added advantage that you can use
the CLUSTERIP feature to do rough load balancing between systems without an
external loadbalancer)

David Lang

> For reference, our main input config looks like this. There are a number of other input entries, but they're all variations on this for different log source types (servers, apps, etc).
>
> module(load="imudp")
> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000")
>
> input(type="imudp" port="10514" ruleset="firewall_rule")
> input(type="imtcp" port="10514" ruleset="firewall_rule")
> template(name="firewall_logs" type="string" string="/data/logs/firewall/10514/%fromhost-ip%/syslog.log")
> ruleset(name="firewall_rule") {
> action(type="omfile"
> FileCreateMode="0744"
> DirCreateMode="0755"
> FileOwner="loguser"
> FileGroup="loguser"
> DirOwner="loguser"
> DirGroup="loguser"
> DynaFile="firewall_logs")
> }
> .
> .
> .
> [snip]
>
> Thank again, really appreciate the insight.
> ________________________________
> From: rsyslog <rsyslog-bounces@lists.adiscon.com> on behalf of Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Sent: Sunday, April 24, 2022 8:57 AM
> To: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> David
>
> Thanks for the depth of this reply. Let me feed back in some additional info.
>
> We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.
>
> So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.
>
> As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.
>
> Thanks again.
>
> Regards,
>
> Steven.
>
>
>
> -------- Original message --------
> From: David Lang <david@lang.hm>
> Date: 4/24/22 8:27 AM (GMT-05:00)
> To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> One problem with TCP load balancing of syslog messages is that the load
> balancers do not understand the syslog protocol, so they can't rebalance at a
> message boundry.
>
> A second problem is that when a firewall or load balancer drops a connection,
> the sender doesn't know that it's dropped until the next time it tries to
> deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
> software "that message that you submitted to an open connection, and I accepted,
> it can't be delivered" (once the OS accepts the message, the sender has to
> assume that it will be delivered)
>
> As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
> The 'common sense wisdom' is that TCP is reliable because dropped packets inside
> an ongoing connection will get retried, but dropped packets are actually very
> uncommon inside a datacenter. They may happen when a firewall/router is
> overloaded, but it's not very common. Back in 2006 or so I did testing and found
> that within a local network, UDP was almost perfectly reliable (as long as the
> receiver could keep up and not overflow the OS buffers)
>
> Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
> reconnect periodically so that the load balancer has a chance to make a new
> balancing decision.
>
> you also want to make sure that the log stram is not idle for too long ('mark'
> was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
> it's not much larger and an extremely dense set of information that can be very
> useful when troubleshooting)
>
> The other thing to look at is the RELP protocol, it was developed specifically
> because TCP was designed to be reliable over an unreliable wire, but assumes
> that both ends will remain up and the connection will not be cut by a middlebox.
> RELP does full application level acks so that the sender knows that the receiver
> rsyslog actually processed the message
>
> with plain TCP, once the sending software submits data to the OS stack and the
> OS stack says it's accepted the data, the data then sits in a buffer on the
> sending machine, then gets sent over the wire (with retries), then sits in a
> buffer on the receiving machine until the receiving software reads it. If
> anything causes the connection to be terminated (firewall, load balancer, crash
> on the receiving machine, etc) the data will be lost and the sending software
> has no way of learning about it.
>
> David Lang
>
>
> On Sun, 24 Apr 2022, Steven D via rsyslog wrote:
>
>> Date: Sun, 24 Apr 2022 12:14:35 +0000
>> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
>> Cc: Steven D <pheerless@hotmail.com>
>> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Greetings list
>>
>> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>>
>> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>>
>> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>>
>> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>>
>> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>>
>> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>>
>> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>>
>> Thanks much
>>
>> Regards,
>>
>> Steven.
>>
>> _______________________________________________
>> rsyslog mailing list
>> https://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Also, run top and do H to get the per-thread view and see if you have a rsyslog
thread that's hitting 100% cpu, that can also cripple performance. If you have
dynacachesize set too small you will see lots of wait time and probably low cpu
utilization (but impstats would show large queues developing)

David Lang

On Sun, 24 Apr
2022, Steven D wrote:

> Date: Sun, 24 Apr 2022 14:12:13 +0000
> From: Steven D <pheerless@hotmail.com>
> To: David Lang <david@lang.hm>,
> Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> No sir, no encryption is in play. Just plain ol TCP syslog. We're running rsyslog v8.1911
>
> I don't think we've enforced any additional dynafile settings, but i'll double check there's not an override I don't know about. Do youz have a rough order guidance for dynafilecachesize settings?
>
> After getting thoroughly lost in the Google rabbit hole, I had other questions;
>
> Would using imptcp over imtcp help?
>
> * I saw mentions that imptcp handles connections better/performs better.
>
> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>
> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.
>
> For reference, our main input config looks like this. There are a number of other input entries, but they're all variations on this for different log source types (servers, apps, etc).
>
> module(load="imudp")
> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000")
>
> input(type="imudp" port="10514" ruleset="firewall_rule")
> input(type="imtcp" port="10514" ruleset="firewall_rule")
> template(name="firewall_logs" type="string" string="/data/logs/firewall/10514/%fromhost-ip%/syslog.log")
> ruleset(name="firewall_rule") {
> action(type="omfile"
> FileCreateMode="0744"
> DirCreateMode="0755"
> FileOwner="loguser"
> FileGroup="loguser"
> DirOwner="loguser"
> DirGroup="loguser"
> DynaFile="firewall_logs")
> }
> .
> .
> .
> [snip]
>
> Thank again, really appreciate the insight.
> ________________________________
> From: rsyslog <rsyslog-bounces@lists.adiscon.com> on behalf of Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Sent: Sunday, April 24, 2022 8:57 AM
> To: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> David
>
> Thanks for the depth of this reply. Let me feed back in some additional info.
>
> We've removed the load balancer from the syslog path as part of troubleshooting and the behavior didn't change. We continued to see log loss using TCP(most notably for our firewalls), even when directing to a single receiver server.
>
> So far as the network goes there is a clean path from sending log source (firewall) to the rsyslog receiver. So I guess I'm looking for guidance on what knobs I should look at turning on the RHEL/rsyslog side.
>
> As far as RELP, will that receive standard inbound TCP syslog on the receiver server? I was under the impression (possibly mistaken) both sender/receiver needed to use RELP. I'm happy to test using that module versus imtcp if I'm wrong.
>
> Thanks again.
>
> Regards,
>
> Steven.
>
>
>
> -------- Original message --------
> From: David Lang <david@lang.hm>
> Date: 4/24/22 8:27 AM (GMT-05:00)
> To: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Cc: Steven D <pheerless@hotmail.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> One problem with TCP load balancing of syslog messages is that the load
> balancers do not understand the syslog protocol, so they can't rebalance at a
> message boundry.
>
> A second problem is that when a firewall or load balancer drops a connection,
> the sender doesn't know that it's dropped until the next time it tries to
> deliver a message. Since TCP doesn't have any way for the OS TCP stack to tell
> software "that message that you submitted to an open connection, and I accepted,
> it can't be delivered" (once the OS accepts the message, the sender has to
> assume that it will be delivered)
>
> As a result, it's very easy for TCP syslog to be less reliable than UDP syslog.
> The 'common sense wisdom' is that TCP is reliable because dropped packets inside
> an ongoing connection will get retried, but dropped packets are actually very
> uncommon inside a datacenter. They may happen when a firewall/router is
> overloaded, but it's not very common. Back in 2006 or so I did testing and found
> that within a local network, UDP was almost perfectly reliable (as long as the
> receiver could keep up and not overflow the OS buffers)
>
> Rsyslog has the rebindinterval feature, which tells the sender to disconnect and
> reconnect periodically so that the load balancer has a chance to make a new
> balancing decision.
>
> you also want to make sure that the log stram is not idle for too long ('mark'
> was the historical method of doing that, I prefer vmstat 60 |logger -t vmstat as
> it's not much larger and an extremely dense set of information that can be very
> useful when troubleshooting)
>
> The other thing to look at is the RELP protocol, it was developed specifically
> because TCP was designed to be reliable over an unreliable wire, but assumes
> that both ends will remain up and the connection will not be cut by a middlebox.
> RELP does full application level acks so that the sender knows that the receiver
> rsyslog actually processed the message
>
> with plain TCP, once the sending software submits data to the OS stack and the
> OS stack says it's accepted the data, the data then sits in a buffer on the
> sending machine, then gets sent over the wire (with retries), then sits in a
> buffer on the receiving machine until the receiving software reads it. If
> anything causes the connection to be terminated (firewall, load balancer, crash
> on the receiving machine, etc) the data will be lost and the sending software
> has no way of learning about it.
>
> David Lang
>
>
> On Sun, 24 Apr 2022, Steven D via rsyslog wrote:
>
>> Date: Sun, 24 Apr 2022 12:14:35 +0000
>> From: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
>> Cc: Steven D <pheerless@hotmail.com>
>> Subject: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Greetings list
>>
>> New to rsyslog list, not new to logging. We're experiencing an odd issue where TCP syslog messages are being dropped at seemingly random intervals...hoping to get some input.
>>
>> The TLDR on our architecture is we have set up a couple rsyslog receivers behind a Netscaler Load balancer. Multiple platforms/devices are configured to send syslog to the load balancer, which distributes to the receivers. Receivers are running RHEL v8 and rsyslog v8.1911. Receivers write files to disk, which we then read with a SIEM agent.
>>
>> We've got a modestly sized environment with a syslog client base of 200-300 servers, 30 networking devices (including firewalls) and some applications all directing logging to the load balancer.
>>
>> Our config file is pretty vanilla, no cache, or advanced tweaks. Just using the "imtcp" and "imudp" modules and rulesets to write files to disk based on the sending host IP/port.
>>
>> The first problem we're seeing is that hosts sending via TCP have log messages missed (never written to disk), where UDP seems more reliable. When switching the firewalls to UDP, throughput nearly doubles and message loss is less noticeable (yeah I know it's still UDP).
>>
>> Possibly related is that we've noticed that each receiver also holds a lot of "Established" connections for back to the clients, but different ports. (Possible session/connection exhaustion?)
>>
>> Any guidance on how we can approach and troubleshoot this issue would be appreciated. Commands, dummy guides, sarcasm all welcome.
>>
>> Thanks much
>>
>> Regards,
>>
>> Steven.
>>
>> _______________________________________________
>> rsyslog mailing list
>> https://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Sun, 24 Apr 2022, Steven D wrote:

> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>
> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.

by the way, I think the fact that the load balancer cuts the connection and the
server doesn't know it's cut and has to wait for it to time out (a very long
time) is the cause of the large number of ESTABLISHED connections

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Re: Load balancer - that makes sense to me as well.

I've added this line to our config, does it seem appropriate for pstats? Our Linux team keeps a tight grip on rights, so i'm pretty limited in what I can do/access outside of rsyslog and the SIEM agent configs... I'll have to write the file out where I can actually access it (rolleyes)

module(load="impstats" interval="30" ruleset="pstats_rule")

ruleset(name="pstats_rule") {
action(type="omfile"
File="/var/log/rsyslog_pstats.log"
FileCreateMode="0744"
FileOwner="loguser"
FileGroup="loguser")
}

Running Top + H now to get a feel on resource usage, but at first glance nothing is really about 1~2%

________________________________
From: David Lang <david@lang.hm>
Sent: Sunday, April 24, 2022 10:39 AM
To: Steven D <pheerless@hotmail.com>
Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

On Sun, 24 Apr 2022, Steven D wrote:

> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>
> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.

by the way, I think the fact that the load balancer cuts the connection and the
server doesn't know it's cut and has to wait for it to time out (a very long
time) is the cause of the large number of ESTABLISHED connections

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Sun, 24 Apr 2022, Steven D wrote:

> Re: Load balancer - that makes sense to me as well.
>
> I've added this line to our config, does it seem appropriate for pstats? Our Linux team keeps a tight grip on rights, so i'm pretty limited in what I can do/access outside of rsyslog and the SIEM agent configs... I'll have to write the file out where I can actually access it (rolleyes)
>
> module(load="impstats" interval="30" ruleset="pstats_rule")

when things are working normally this is good, when they aren't, it's best to
have the module write to a file directly (see the module options)

> ruleset(name="pstats_rule") {
> action(type="omfile"
> File="/var/log/rsyslog_pstats.log"
> FileCreateMode="0744"
> FileOwner="loguser"
> FileGroup="loguser")
> }
>
> Running Top + H now to get a feel on resource usage, but at first glance nothing is really about 1~2%

what does wait time look like?

David Lang

> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 10:39 AM
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> Would setting the KeepAlives in the rsyslog config on the server-side help to manage the (zombie?) TCP connections.?
>>
>> * The load balancer being in the middle feels like it's the cause of repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a necessary evil.
>
> by the way, I think the fact that the load balancer cuts the connection and the
> server doesn't know it's cut and has to wait for it to time out (a very long
> time) is the cause of the large number of ESTABLISHED connections
>
> David Lang
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
and if you can post a couple cycles of the pstats output I can help explain
what's what there and see if there's anything obvious.

David Lang

On Sun, 24 Apr 2022, David Lang wrote:

> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
> From: David Lang <david@lang.hm>
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>,
> Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> Re: Load balancer - that makes sense to me as well.
>>
>> I've added this line to our config, does it seem appropriate for pstats?
>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>> to write the file out where I can actually access it (rolleyes)
>>
>> module(load="impstats" interval="30" ruleset="pstats_rule")
>
> when things are working normally this is good, when they aren't, it's best to
> have the module write to a file directly (see the module options)
>
>> ruleset(name="pstats_rule") {
>> action(type="omfile"
>> File="/var/log/rsyslog_pstats.log"
>> FileCreateMode="0744"
>> FileOwner="loguser"
>> FileGroup="loguser")
>> }
>>
>> Running Top + H now to get a feel on resource usage, but at first glance
>> nothing is really about 1~2%
>
> what does wait time look like?
>
> David Lang
>
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 10:39 AM
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>> <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>> to manage the (zombie?) TCP connections.?
>>>
>>> * The load balancer being in the middle feels like it's the cause of
>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>> necessary evil.
>>
>> by the way, I think the fact that the load balancer cuts the connection and
>> the
>> server doesn't know it's cut and has to wait for it to time out (a very
>> long
>> time) is the cause of the large number of ESTABLISHED connections
>>
>> David Lang
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Great, I'll eyeball the impstats module options a lil more closely.

Attached is a few cycles with the current settings, sanitized some of the rule names.
________________________________
From: David Lang <david@lang.hm>
Sent: Sunday, April 24, 2022 11:06 AM
To: David Lang <david@lang.hm>
Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

and if you can post a couple cycles of the pstats output I can help explain
what's what there and see if there's anything obvious.

David Lang

On Sun, 24 Apr 2022, David Lang wrote:

> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
> From: David Lang <david@lang.hm>
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>,
> Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> Re: Load balancer - that makes sense to me as well.
>>
>> I've added this line to our config, does it seem appropriate for pstats?
>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>> to write the file out where I can actually access it (rolleyes)
>>
>> module(load="impstats" interval="30" ruleset="pstats_rule")
>
> when things are working normally this is good, when they aren't, it's best to
> have the module write to a file directly (see the module options)
>
>> ruleset(name="pstats_rule") {
>> action(type="omfile"
>> File="/var/log/rsyslog_pstats.log"
>> FileCreateMode="0744"
>> FileOwner="loguser"
>> FileGroup="loguser")
>> }
>>
>> Running Top + H now to get a feel on resource usage, but at first glance
>> nothing is really about 1~2%
>
> what does wait time look like?
>
> David Lang
>
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 10:39 AM
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>> <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>> to manage the (zombie?) TCP connections.?
>>>
>>> * The load balancer being in the middle feels like it's the cause of
>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>> necessary evil.
>>
>> by the way, I think the fact that the load balancer cuts the connection and
>> the
>> server doesn't know it's cut and has to wait for it to time out (a very
>> long
>> time) is the cause of the large number of ESTABLISHED connections
>>
>> David Lang
>>
>
Re: Basic Rsyslog Troubleshooting [ In reply to ]
you definantly need to increase the dynacachesize for the firewall logs

also, if you add name= to the action, the pstats lines will be named by that
rather than action #

bump up the cache size so that it can keep track of all the files that will be
getting logs at the same time (plus a bit to be on the safe side, it REALLY
hurts to have it below the working set size) and see what that does to things.
I'll bet that cpu utilization increases and you have less problems with losing
logs.

if you continue to have problems, try to get a pstats dump of the period where
you lose some logs so we can see what it looks like.

having the max main queue size hit almost 4k seems likely to be an indication of
a problem as well, but that may go away once we get the cache size reasonable

David Lang

On Sun, 24 Apr 2022, Steven D wrote:

> Date: Sun, 24 Apr 2022 15:27:47 +0000
> From: Steven D <pheerless@hotmail.com>
> To: David Lang <david@lang.hm>
> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> Great, I'll eyeball the impstats module options a lil more closely.
>
> Attached is a few cycles with the current settings, sanitized some of the rule names.
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 11:06 AM
> To: David Lang <david@lang.hm>
> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> and if you can post a couple cycles of the pstats output I can help explain
> what's what there and see if there's anything obvious.
>
> David Lang
>
> On Sun, 24 Apr 2022, David Lang wrote:
>
>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>> From: David Lang <david@lang.hm>
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>,
>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Re: Load balancer - that makes sense to me as well.
>>>
>>> I've added this line to our config, does it seem appropriate for pstats?
>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>> to write the file out where I can actually access it (rolleyes)
>>>
>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>
>> when things are working normally this is good, when they aren't, it's best to
>> have the module write to a file directly (see the module options)
>>
>>> ruleset(name="pstats_rule") {
>>> action(type="omfile"
>>> File="/var/log/rsyslog_pstats.log"
>>> FileCreateMode="0744"
>>> FileOwner="loguser"
>>> FileGroup="loguser")
>>> }
>>>
>>> Running Top + H now to get a feel on resource usage, but at first glance
>>> nothing is really about 1~2%
>>
>> what does wait time look like?
>>
>> David Lang
>>
>>> ________________________________
>>> From: David Lang <david@lang.hm>
>>> Sent: Sunday, April 24, 2022 10:39 AM
>>> To: Steven D <pheerless@hotmail.com>
>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>> <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>
>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>> to manage the (zombie?) TCP connections.?
>>>>
>>>> * The load balancer being in the middle feels like it's the cause of
>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>> necessary evil.
>>>
>>> by the way, I think the fact that the load balancer cuts the connection and
>>> the
>>> server doesn't know it's cut and has to wait for it to time out (a very
>>> long
>>> time) is the cause of the large number of ESTABLISHED connections
>>>
>>> David Lang
>>>
>>
>
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Just to make sure

* dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
* Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?

Regards,
Steven
________________________________
From: David Lang <david@lang.hm>
Sent: Sunday, April 24, 2022 11:37 AM
To: Steven D <pheerless@hotmail.com>
Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

you definantly need to increase the dynacachesize for the firewall logs

also, if you add name= to the action, the pstats lines will be named by that
rather than action #

bump up the cache size so that it can keep track of all the files that will be
getting logs at the same time (plus a bit to be on the safe side, it REALLY
hurts to have it below the working set size) and see what that does to things.
I'll bet that cpu utilization increases and you have less problems with losing
logs.

if you continue to have problems, try to get a pstats dump of the period where
you lose some logs so we can see what it looks like.

having the max main queue size hit almost 4k seems likely to be an indication of
a problem as well, but that may go away once we get the cache size reasonable

David Lang

On Sun, 24 Apr 2022, Steven D wrote:

> Date: Sun, 24 Apr 2022 15:27:47 +0000
> From: Steven D <pheerless@hotmail.com>
> To: David Lang <david@lang.hm>
> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> Great, I'll eyeball the impstats module options a lil more closely.
>
> Attached is a few cycles with the current settings, sanitized some of the rule names.
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 11:06 AM
> To: David Lang <david@lang.hm>
> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> and if you can post a couple cycles of the pstats output I can help explain
> what's what there and see if there's anything obvious.
>
> David Lang
>
> On Sun, 24 Apr 2022, David Lang wrote:
>
>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>> From: David Lang <david@lang.hm>
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>,
>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Re: Load balancer - that makes sense to me as well.
>>>
>>> I've added this line to our config, does it seem appropriate for pstats?
>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>> to write the file out where I can actually access it (rolleyes)
>>>
>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>
>> when things are working normally this is good, when they aren't, it's best to
>> have the module write to a file directly (see the module options)
>>
>>> ruleset(name="pstats_rule") {
>>> action(type="omfile"
>>> File="/var/log/rsyslog_pstats.log"
>>> FileCreateMode="0744"
>>> FileOwner="loguser"
>>> FileGroup="loguser")
>>> }
>>>
>>> Running Top + H now to get a feel on resource usage, but at first glance
>>> nothing is really about 1~2%
>>
>> what does wait time look like?
>>
>> David Lang
>>
>>> ________________________________
>>> From: David Lang <david@lang.hm>
>>> Sent: Sunday, April 24, 2022 10:39 AM
>>> To: Steven D <pheerless@hotmail.com>
>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>> <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>
>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>> to manage the (zombie?) TCP connections.?
>>>>
>>>> * The load balancer being in the middle feels like it's the cause of
>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>> necessary evil.
>>>
>>> by the way, I think the fact that the load balancer cuts the connection and
>>> the
>>> server doesn't know it's cut and has to wait for it to time out (a very
>>> long
>>> time) is the cause of the large number of ESTABLISHED connections
>>>
>>> David Lang
>>>
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Sun, 24 Apr 2022, Steven D wrote:

> * dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
I believe that it's per action (with the action() syntax, there are no global
settings except encryption)

> * Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?

no, that sounds very reasonable. The current pstats output shows hundreds of
thousands of cache evictions, that should drop to near zero (pretty much only
showing up if you have date as part of it and the date changes). A small number
is fine, thousands is bad, hundreds of thousands very bad

David Lang

> Regards,
> Steven
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 11:37 AM
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> you definantly need to increase the dynacachesize for the firewall logs
>
> also, if you add name= to the action, the pstats lines will be named by that
> rather than action #
>
> bump up the cache size so that it can keep track of all the files that will be
> getting logs at the same time (plus a bit to be on the safe side, it REALLY
> hurts to have it below the working set size) and see what that does to things.
> I'll bet that cpu utilization increases and you have less problems with losing
> logs.
>
> if you continue to have problems, try to get a pstats dump of the period where
> you lose some logs so we can see what it looks like.
>
> having the max main queue size hit almost 4k seems likely to be an indication of
> a problem as well, but that may go away once we get the cache size reasonable
>
> David Lang
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> Date: Sun, 24 Apr 2022 15:27:47 +0000
>> From: Steven D <pheerless@hotmail.com>
>> To: David Lang <david@lang.hm>
>> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Great, I'll eyeball the impstats module options a lil more closely.
>>
>> Attached is a few cycles with the current settings, sanitized some of the rule names.
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 11:06 AM
>> To: David Lang <david@lang.hm>
>> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> and if you can post a couple cycles of the pstats output I can help explain
>> what's what there and see if there's anything obvious.
>>
>> David Lang
>>
>> On Sun, 24 Apr 2022, David Lang wrote:
>>
>>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>>> From: David Lang <david@lang.hm>
>>> To: Steven D <pheerless@hotmail.com>
>>> Cc: David Lang <david@lang.hm>,
>>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>
>>>> Re: Load balancer - that makes sense to me as well.
>>>>
>>>> I've added this line to our config, does it seem appropriate for pstats?
>>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>>> to write the file out where I can actually access it (rolleyes)
>>>>
>>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>>
>>> when things are working normally this is good, when they aren't, it's best to
>>> have the module write to a file directly (see the module options)
>>>
>>>> ruleset(name="pstats_rule") {
>>>> action(type="omfile"
>>>> File="/var/log/rsyslog_pstats.log"
>>>> FileCreateMode="0744"
>>>> FileOwner="loguser"
>>>> FileGroup="loguser")
>>>> }
>>>>
>>>> Running Top + H now to get a feel on resource usage, but at first glance
>>>> nothing is really about 1~2%
>>>
>>> what does wait time look like?
>>>
>>> David Lang
>>>
>>>> ________________________________
>>>> From: David Lang <david@lang.hm>
>>>> Sent: Sunday, April 24, 2022 10:39 AM
>>>> To: Steven D <pheerless@hotmail.com>
>>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>>> <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>
>>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>>> to manage the (zombie?) TCP connections.?
>>>>>
>>>>> * The load balancer being in the middle feels like it's the cause of
>>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>>> necessary evil.
>>>>
>>>> by the way, I think the fact that the load balancer cuts the connection and
>>>> the
>>>> server doesn't know it's cut and has to wait for it to time out (a very
>>>> long
>>>> time) is the cause of the large number of ESTABLISHED connections
>>>>
>>>> David Lang
>>>>
>>>
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
David,

Thanks for all your help today, I committed a few changed to our config today and i'll keep an eye out for changes next day or so.

Mind if I drop another pstats here after some bake in for a re-review?


Here's what I updated it too based on your pointers.

module(load="imudp")
module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000" KeepAlive="on" KeepAlive.Probes="1" KeepAlive.Time="10")
module(load="impstats" interval="30" ruleset="pstats_rule")

ruleset(name="pstats_rule") {
action(name="pstats_rule"
type="omfile"
File="/var/log/rsyslog_pstats.log"
FileCreateMode="0744"
FileOwner="loguser"
FileGroup="loguser")
}


input(type="imudp" port="10514" ruleset="firewall_rule")
input(type="imtcp" port="10514" ruleset="firewall_rule")
template(name="firewall_logs" type="string" string="/data/logs/pan/10514/%fromhost-ip%/syslog.log")
ruleset(name="firewall_rule") {
action(name="firewall_rule"
type="omfile"
FileCreateMode="0744"
DirCreateMode="0755"
FileOwner="loguser"
FileGroup="loguser"
DirOwner="loguser"
DirGroup="loguser"
DynaFile="firewall_logs"
DynaFileCacheSize = "50")
}

________________________________
From: David Lang <david@lang.hm>
Sent: Sunday, April 24, 2022 12:32 PM
To: Steven D <pheerless@hotmail.com>
Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

On Sun, 24 Apr 2022, Steven D wrote:

> * dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
I believe that it's per action (with the action() syntax, there are no global
settings except encryption)

> * Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?

no, that sounds very reasonable. The current pstats output shows hundreds of
thousands of cache evictions, that should drop to near zero (pretty much only
showing up if you have date as part of it and the date changes). A small number
is fine, thousands is bad, hundreds of thousands very bad

David Lang

> Regards,
> Steven
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 11:37 AM
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> you definantly need to increase the dynacachesize for the firewall logs
>
> also, if you add name= to the action, the pstats lines will be named by that
> rather than action #
>
> bump up the cache size so that it can keep track of all the files that will be
> getting logs at the same time (plus a bit to be on the safe side, it REALLY
> hurts to have it below the working set size) and see what that does to things.
> I'll bet that cpu utilization increases and you have less problems with losing
> logs.
>
> if you continue to have problems, try to get a pstats dump of the period where
> you lose some logs so we can see what it looks like.
>
> having the max main queue size hit almost 4k seems likely to be an indication of
> a problem as well, but that may go away once we get the cache size reasonable
>
> David Lang
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> Date: Sun, 24 Apr 2022 15:27:47 +0000
>> From: Steven D <pheerless@hotmail.com>
>> To: David Lang <david@lang.hm>
>> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> Great, I'll eyeball the impstats module options a lil more closely.
>>
>> Attached is a few cycles with the current settings, sanitized some of the rule names.
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 11:06 AM
>> To: David Lang <david@lang.hm>
>> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> and if you can post a couple cycles of the pstats output I can help explain
>> what's what there and see if there's anything obvious.
>>
>> David Lang
>>
>> On Sun, 24 Apr 2022, David Lang wrote:
>>
>>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>>> From: David Lang <david@lang.hm>
>>> To: Steven D <pheerless@hotmail.com>
>>> Cc: David Lang <david@lang.hm>,
>>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>
>>>> Re: Load balancer - that makes sense to me as well.
>>>>
>>>> I've added this line to our config, does it seem appropriate for pstats?
>>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>>> to write the file out where I can actually access it (rolleyes)
>>>>
>>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>>
>>> when things are working normally this is good, when they aren't, it's best to
>>> have the module write to a file directly (see the module options)
>>>
>>>> ruleset(name="pstats_rule") {
>>>> action(type="omfile"
>>>> File="/var/log/rsyslog_pstats.log"
>>>> FileCreateMode="0744"
>>>> FileOwner="loguser"
>>>> FileGroup="loguser")
>>>> }
>>>>
>>>> Running Top + H now to get a feel on resource usage, but at first glance
>>>> nothing is really about 1~2%
>>>
>>> what does wait time look like?
>>>
>>> David Lang
>>>
>>>> ________________________________
>>>> From: David Lang <david@lang.hm>
>>>> Sent: Sunday, April 24, 2022 10:39 AM
>>>> To: Steven D <pheerless@hotmail.com>
>>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>>> <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>
>>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>>> to manage the (zombie?) TCP connections.?
>>>>>
>>>>> * The load balancer being in the middle feels like it's the cause of
>>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>>> necessary evil.
>>>>
>>>> by the way, I think the fact that the load balancer cuts the connection and
>>>> the
>>>> server doesn't know it's cut and has to wait for it to time out (a very
>>>> long
>>>> time) is the cause of the large number of ESTABLISHED connections
>>>>
>>>> David Lang
>>>>
>>>
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Mon, 25 Apr 2022, Steven D wrote:

> David,
>
> Thanks for all your help today, I committed a few changed to our config today and i'll keep an eye out for changes next day or so.
>
> Mind if I drop another pstats here after some bake in for a re-review?

not a problem

>
> Here's what I updated it too based on your pointers.
>
> module(load="imudp")
> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000" KeepAlive="on" KeepAlive.Probes="1" KeepAlive.Time="10")
> module(load="impstats" interval="30" ruleset="pstats_rule")
>
> ruleset(name="pstats_rule") {
> action(name="pstats_rule"
> type="omfile"
> File="/var/log/rsyslog_pstats.log"
> FileCreateMode="0744"
> FileOwner="loguser"
> FileGroup="loguser")
> }
>
>
> input(type="imudp" port="10514" ruleset="firewall_rule")
> input(type="imtcp" port="10514" ruleset="firewall_rule")
> template(name="firewall_logs" type="string" string="/data/logs/pan/10514/%fromhost-ip%/syslog.log")
> ruleset(name="firewall_rule") {
> action(name="firewall_rule"
> type="omfile"
> FileCreateMode="0744"
> DirCreateMode="0755"
> FileOwner="loguser"
> FileGroup="loguser"
> DirOwner="loguser"
> DirGroup="loguser"
> DynaFile="firewall_logs"
> DynaFileCacheSize = "50")

hopefully this was 500 not 50 based on your prior comment (but the pstats will
be clear once you get this large enough)

David Lang

> }
>
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 12:32 PM
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> * dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
> I believe that it's per action (with the action() syntax, there are no global
> settings except encryption)
>
>> * Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?
>
> no, that sounds very reasonable. The current pstats output shows hundreds of
> thousands of cache evictions, that should drop to near zero (pretty much only
> showing up if you have date as part of it and the date changes). A small number
> is fine, thousands is bad, hundreds of thousands very bad
>
> David Lang
>
>> Regards,
>> Steven
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 11:37 AM
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> you definantly need to increase the dynacachesize for the firewall logs
>>
>> also, if you add name= to the action, the pstats lines will be named by that
>> rather than action #
>>
>> bump up the cache size so that it can keep track of all the files that will be
>> getting logs at the same time (plus a bit to be on the safe side, it REALLY
>> hurts to have it below the working set size) and see what that does to things.
>> I'll bet that cpu utilization increases and you have less problems with losing
>> logs.
>>
>> if you continue to have problems, try to get a pstats dump of the period where
>> you lose some logs so we can see what it looks like.
>>
>> having the max main queue size hit almost 4k seems likely to be an indication of
>> a problem as well, but that may go away once we get the cache size reasonable
>>
>> David Lang
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Date: Sun, 24 Apr 2022 15:27:47 +0000
>>> From: Steven D <pheerless@hotmail.com>
>>> To: David Lang <david@lang.hm>
>>> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> Great, I'll eyeball the impstats module options a lil more closely.
>>>
>>> Attached is a few cycles with the current settings, sanitized some of the rule names.
>>> ________________________________
>>> From: David Lang <david@lang.hm>
>>> Sent: Sunday, April 24, 2022 11:06 AM
>>> To: David Lang <david@lang.hm>
>>> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> and if you can post a couple cycles of the pstats output I can help explain
>>> what's what there and see if there's anything obvious.
>>>
>>> David Lang
>>>
>>> On Sun, 24 Apr 2022, David Lang wrote:
>>>
>>>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>>>> From: David Lang <david@lang.hm>
>>>> To: Steven D <pheerless@hotmail.com>
>>>> Cc: David Lang <david@lang.hm>,
>>>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>
>>>>> Re: Load balancer - that makes sense to me as well.
>>>>>
>>>>> I've added this line to our config, does it seem appropriate for pstats?
>>>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>>>> to write the file out where I can actually access it (rolleyes)
>>>>>
>>>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>>>
>>>> when things are working normally this is good, when they aren't, it's best to
>>>> have the module write to a file directly (see the module options)
>>>>
>>>>> ruleset(name="pstats_rule") {
>>>>> action(type="omfile"
>>>>> File="/var/log/rsyslog_pstats.log"
>>>>> FileCreateMode="0744"
>>>>> FileOwner="loguser"
>>>>> FileGroup="loguser")
>>>>> }
>>>>>
>>>>> Running Top + H now to get a feel on resource usage, but at first glance
>>>>> nothing is really about 1~2%
>>>>
>>>> what does wait time look like?
>>>>
>>>> David Lang
>>>>
>>>>> ________________________________
>>>>> From: David Lang <david@lang.hm>
>>>>> Sent: Sunday, April 24, 2022 10:39 AM
>>>>> To: Steven D <pheerless@hotmail.com>
>>>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>>>> <rsyslog@lists.adiscon.com>
>>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>>
>>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>>
>>>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>>>> to manage the (zombie?) TCP connections.?
>>>>>>
>>>>>> * The load balancer being in the middle feels like it's the cause of
>>>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>>>> necessary evil.
>>>>>
>>>>> by the way, I think the fact that the load balancer cuts the connection and
>>>>> the
>>>>> server doesn't know it's cut and has to wait for it to time out (a very
>>>>> long
>>>>> time) is the cause of the large number of ESTABLISHED connections
>>>>>
>>>>> David Lang
>>>>>
>>>>
>>>
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Since it was action specific, I changed the dynafilecache to the appointment amount relative to each action. We only have about 30ish firewalls.

We've got 200ish rhel servers sending syslog though, so I adjusted accordingly for that action. (Not shown)



Regards,

Steven.



-------- Original message --------
From: David Lang <david@lang.hm>
Date: 4/24/22 8:21 PM (GMT-05:00)
To: Steven D <pheerless@hotmail.com>
Cc: David Lang <david@lang.hm>, Steven D via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

On Mon, 25 Apr 2022, Steven D wrote:

> David,
>
> Thanks for all your help today, I committed a few changed to our config today and i'll keep an eye out for changes next day or so.
>
> Mind if I drop another pstats here after some bake in for a re-review?

not a problem

>
> Here's what I updated it too based on your pointers.
>
> module(load="imudp")
> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000" KeepAlive="on" KeepAlive.Probes="1" KeepAlive.Time="10")
> module(load="impstats" interval="30" ruleset="pstats_rule")
>
> ruleset(name="pstats_rule") {
> action(name="pstats_rule"
> type="omfile"
> File="/var/log/rsyslog_pstats.log"
> FileCreateMode="0744"
> FileOwner="loguser"
> FileGroup="loguser")
> }
>
>
> input(type="imudp" port="10514" ruleset="firewall_rule")
> input(type="imtcp" port="10514" ruleset="firewall_rule")
> template(name="firewall_logs" type="string" string="/data/logs/pan/10514/%fromhost-ip%/syslog.log")
> ruleset(name="firewall_rule") {
> action(name="firewall_rule"
> type="omfile"
> FileCreateMode="0744"
> DirCreateMode="0755"
> FileOwner="loguser"
> FileGroup="loguser"
> DirOwner="loguser"
> DirGroup="loguser"
> DynaFile="firewall_logs"
> DynaFileCacheSize = "50")

hopefully this was 500 not 50 based on your prior comment (but the pstats will
be clear once you get this large enough)

David Lang

> }
>
> ________________________________
> From: David Lang <david@lang.hm>
> Sent: Sunday, April 24, 2022 12:32 PM
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Sun, 24 Apr 2022, Steven D wrote:
>
>> * dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
> I believe that it's per action (with the action() syntax, there are no global
> settings except encryption)
>
>> * Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?
>
> no, that sounds very reasonable. The current pstats output shows hundreds of
> thousands of cache evictions, that should drop to near zero (pretty much only
> showing up if you have date as part of it and the date changes). A small number
> is fine, thousands is bad, hundreds of thousands very bad
>
> David Lang
>
>> Regards,
>> Steven
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 11:37 AM
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> you definantly need to increase the dynacachesize for the firewall logs
>>
>> also, if you add name= to the action, the pstats lines will be named by that
>> rather than action #
>>
>> bump up the cache size so that it can keep track of all the files that will be
>> getting logs at the same time (plus a bit to be on the safe side, it REALLY
>> hurts to have it below the working set size) and see what that does to things.
>> I'll bet that cpu utilization increases and you have less problems with losing
>> logs.
>>
>> if you continue to have problems, try to get a pstats dump of the period where
>> you lose some logs so we can see what it looks like.
>>
>> having the max main queue size hit almost 4k seems likely to be an indication of
>> a problem as well, but that may go away once we get the cache size reasonable
>>
>> David Lang
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> Date: Sun, 24 Apr 2022 15:27:47 +0000
>>> From: Steven D <pheerless@hotmail.com>
>>> To: David Lang <david@lang.hm>
>>> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> Great, I'll eyeball the impstats module options a lil more closely.
>>>
>>> Attached is a few cycles with the current settings, sanitized some of the rule names.
>>> ________________________________
>>> From: David Lang <david@lang.hm>
>>> Sent: Sunday, April 24, 2022 11:06 AM
>>> To: David Lang <david@lang.hm>
>>> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> and if you can post a couple cycles of the pstats output I can help explain
>>> what's what there and see if there's anything obvious.
>>>
>>> David Lang
>>>
>>> On Sun, 24 Apr 2022, David Lang wrote:
>>>
>>>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>>>> From: David Lang <david@lang.hm>
>>>> To: Steven D <pheerless@hotmail.com>
>>>> Cc: David Lang <david@lang.hm>,
>>>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>
>>>>> Re: Load balancer - that makes sense to me as well.
>>>>>
>>>>> I've added this line to our config, does it seem appropriate for pstats?
>>>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>>>> to write the file out where I can actually access it (rolleyes)
>>>>>
>>>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>>>
>>>> when things are working normally this is good, when they aren't, it's best to
>>>> have the module write to a file directly (see the module options)
>>>>
>>>>> ruleset(name="pstats_rule") {
>>>>> action(type="omfile"
>>>>> File="/var/log/rsyslog_pstats.log"
>>>>> FileCreateMode="0744"
>>>>> FileOwner="loguser"
>>>>> FileGroup="loguser")
>>>>> }
>>>>>
>>>>> Running Top + H now to get a feel on resource usage, but at first glance
>>>>> nothing is really about 1~2%
>>>>
>>>> what does wait time look like?
>>>>
>>>> David Lang
>>>>
>>>>> ________________________________
>>>>> From: David Lang <david@lang.hm>
>>>>> Sent: Sunday, April 24, 2022 10:39 AM
>>>>> To: Steven D <pheerless@hotmail.com>
>>>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>>>> <rsyslog@lists.adiscon.com>
>>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>>
>>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>>
>>>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>>>> to manage the (zombie?) TCP connections.?
>>>>>>
>>>>>> * The load balancer being in the middle feels like it's the cause of
>>>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>>>> necessary evil.
>>>>>
>>>>> by the way, I think the fact that the load balancer cuts the connection and
>>>>> the
>>>>> server doesn't know it's cut and has to wait for it to time out (a very
>>>>> long
>>>>> time) is the cause of the large number of ESTABLISHED connections
>>>>>
>>>>> David Lang
>>>>>
>>>>
>>>
>>
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
As a slightly unrelated side question - what is this SIEM you're gonna
read the files with and send the events to? Can't you use some solution
to send events directly to SIEM, without intermediate files?

MK

On 25.04.2022 02:35, Steven D via rsyslog wrote:
> Since it was action specific, I changed the dynafilecache to the appointment amount relative to each action. We only have about 30ish firewalls.
>
> We've got 200ish rhel servers sending syslog though, so I adjusted accordingly for that action. (Not shown)
>
>
>
> Regards,
>
> Steven.
>
>
>
> -------- Original message --------
> From: David Lang <david@lang.hm>
> Date: 4/24/22 8:21 PM (GMT-05:00)
> To: Steven D <pheerless@hotmail.com>
> Cc: David Lang <david@lang.hm>, Steven D via rsyslog <rsyslog@lists.adiscon.com>
> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>
> On Mon, 25 Apr 2022, Steven D wrote:
>
>> David,
>>
>> Thanks for all your help today, I committed a few changed to our config today and i'll keep an eye out for changes next day or so.
>>
>> Mind if I drop another pstats here after some bake in for a re-review?
> not a problem
>
>> Here's what I updated it too based on your pointers.
>>
>> module(load="imudp")
>> module(load="imtcp" MaxListeners="100" AddtlFrameDelimiter="000" KeepAlive="on" KeepAlive.Probes="1" KeepAlive.Time="10")
>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>
>> ruleset(name="pstats_rule") {
>> action(name="pstats_rule"
>> type="omfile"
>> File="/var/log/rsyslog_pstats.log"
>> FileCreateMode="0744"
>> FileOwner="loguser"
>> FileGroup="loguser")
>> }
>>
>>
>> input(type="imudp" port="10514" ruleset="firewall_rule")
>> input(type="imtcp" port="10514" ruleset="firewall_rule")
>> template(name="firewall_logs" type="string" string="/data/logs/pan/10514/%fromhost-ip%/syslog.log")
>> ruleset(name="firewall_rule") {
>> action(name="firewall_rule"
>> type="omfile"
>> FileCreateMode="0744"
>> DirCreateMode="0755"
>> FileOwner="loguser"
>> FileGroup="loguser"
>> DirOwner="loguser"
>> DirGroup="loguser"
>> DynaFile="firewall_logs"
>> DynaFileCacheSize = "50")
> hopefully this was 500 not 50 based on your prior comment (but the pstats will
> be clear once you get this large enough)
>
> David Lang
>
>> }
>>
>> ________________________________
>> From: David Lang <david@lang.hm>
>> Sent: Sunday, April 24, 2022 12:32 PM
>> To: Steven D <pheerless@hotmail.com>
>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>
>> On Sun, 24 Apr 2022, Steven D wrote:
>>
>>> * dynafilecachsize is a global setting, I don't need to specify it per ruleset/action?
>> I believe that it's per action (with the action() syntax, there are no global
>> settings except encryption)
>>
>>> * Assuming so and I have ~300 unique hosts writing to files, would "dynafilecachsize = 500" be too much?
>> no, that sounds very reasonable. The current pstats output shows hundreds of
>> thousands of cache evictions, that should drop to near zero (pretty much only
>> showing up if you have date as part of it and the date changes). A small number
>> is fine, thousands is bad, hundreds of thousands very bad
>>
>> David Lang
>>
>>> Regards,
>>> Steven
>>> ________________________________
>>> From: David Lang <david@lang.hm>
>>> Sent: Sunday, April 24, 2022 11:37 AM
>>> To: Steven D <pheerless@hotmail.com>
>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>
>>> you definantly need to increase the dynacachesize for the firewall logs
>>>
>>> also, if you add name= to the action, the pstats lines will be named by that
>>> rather than action #
>>>
>>> bump up the cache size so that it can keep track of all the files that will be
>>> getting logs at the same time (plus a bit to be on the safe side, it REALLY
>>> hurts to have it below the working set size) and see what that does to things.
>>> I'll bet that cpu utilization increases and you have less problems with losing
>>> logs.
>>>
>>> if you continue to have problems, try to get a pstats dump of the period where
>>> you lose some logs so we can see what it looks like.
>>>
>>> having the max main queue size hit almost 4k seems likely to be an indication of
>>> a problem as well, but that may go away once we get the cache size reasonable
>>>
>>> David Lang
>>>
>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>
>>>> Date: Sun, 24 Apr 2022 15:27:47 +0000
>>>> From: Steven D <pheerless@hotmail.com>
>>>> To: David Lang <david@lang.hm>
>>>> Cc: Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> Great, I'll eyeball the impstats module options a lil more closely.
>>>>
>>>> Attached is a few cycles with the current settings, sanitized some of the rule names.
>>>> ________________________________
>>>> From: David Lang <david@lang.hm>
>>>> Sent: Sunday, April 24, 2022 11:06 AM
>>>> To: David Lang <david@lang.hm>
>>>> Cc: Steven D <pheerless@hotmail.com>; Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>
>>>> and if you can post a couple cycles of the pstats output I can help explain
>>>> what's what there and see if there's anything obvious.
>>>>
>>>> David Lang
>>>>
>>>> On Sun, 24 Apr 2022, David Lang wrote:
>>>>
>>>>> Date: Sun, 24 Apr 2022 08:05:22 -0700 (PDT)
>>>>> From: David Lang <david@lang.hm>
>>>>> To: Steven D <pheerless@hotmail.com>
>>>>> Cc: David Lang <david@lang.hm>,
>>>>> Steven D via rsyslog <rsyslog@lists.adiscon.com>
>>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>>
>>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>>
>>>>>> Re: Load balancer - that makes sense to me as well.
>>>>>>
>>>>>> I've added this line to our config, does it seem appropriate for pstats?
>>>>>> Our Linux team keeps a tight grip on rights, so i'm pretty limited in what
>>>>>> I can do/access outside of rsyslog and the SIEM agent configs... I'll have
>>>>>> to write the file out where I can actually access it (rolleyes)
>>>>>>
>>>>>> module(load="impstats" interval="30" ruleset="pstats_rule")
>>>>> when things are working normally this is good, when they aren't, it's best to
>>>>> have the module write to a file directly (see the module options)
>>>>>
>>>>>> ruleset(name="pstats_rule") {
>>>>>> action(type="omfile"
>>>>>> File="/var/log/rsyslog_pstats.log"
>>>>>> FileCreateMode="0744"
>>>>>> FileOwner="loguser"
>>>>>> FileGroup="loguser")
>>>>>> }
>>>>>>
>>>>>> Running Top + H now to get a feel on resource usage, but at first glance
>>>>>> nothing is really about 1~2%
>>>>> what does wait time look like?
>>>>>
>>>>> David Lang
>>>>>
>>>>>> ________________________________
>>>>>> From: David Lang <david@lang.hm>
>>>>>> Sent: Sunday, April 24, 2022 10:39 AM
>>>>>> To: Steven D <pheerless@hotmail.com>
>>>>>> Cc: David Lang <david@lang.hm>; Steven D via rsyslog
>>>>>> <rsyslog@lists.adiscon.com>
>>>>>> Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting
>>>>>>
>>>>>> On Sun, 24 Apr 2022, Steven D wrote:
>>>>>>
>>>>>>> Would setting the KeepAlives in the rsyslog config on the server-side help
>>>>>>> to manage the (zombie?) TCP connections.?
>>>>>>>
>>>>>>> * The load balancer being in the middle feels like it's the cause of
>>>>>>> repeated ESTABLISHED connections, but to keep HA/redundancy it's kind of a
>>>>>>> necessary evil.
>>>>>> by the way, I think the fact that the load balancer cuts the connection and
>>>>>> the
>>>>>> server doesn't know it's cut and has to wait for it to time out (a very
>>>>>> long
>>>>>> time) is the cause of the large number of ESTABLISHED connections
>>>>>>
>>>>>> David Lang
>>>>>>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On Mon, 25 Apr 2022, Mariusz Kruk via rsyslog wrote:

> As a slightly unrelated side question - what is this SIEM you're gonna read
> the files with and send the events to? Can't you use some solution to send
> events directly to SIEM, without intermediate files?

As surprising as it seems, a lot of the SIEM tools do a pretty lousy job of
processing network syslog.

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
On 25.04.2022 07:35, David Lang wrote:
> On Mon, 25 Apr 2022, Mariusz Kruk via rsyslog wrote:
>
>> As a slightly unrelated side question - what is this SIEM you're
>> gonna read the files with and send the events to? Can't you use some
>> solution to send events directly to SIEM, without intermediate files?
>
> As surprising as it seems, a lot of the SIEM tools do a pretty lousy
> job of processing network syslog.


I agree, but for some of them you can do much better by processing the
events internally and send the properly formatted event directly to SIEM
with - for example - omhttp than dumping events to files and have the
agent have to use even more iops to read and process them. I know that
perhaps mapping fields to CEF for Arcsight might not be very easy within
rsyslog itself but sending events to Splunk's HEC works like a charm.

MK

_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
David's not wrong... lol

Also buffering / caching the syslog down to disk also provides and then reading of to the SIEM a bit of data loss resiliency. It also provides us a small margin (a few days) of raw logging to reference in the event of some other need. SIEM outage, forensic needs, random app owner audit need, etc.
________________________________
From: rsyslog <rsyslog-bounces@lists.adiscon.com> on behalf of David Lang via rsyslog <rsyslog@lists.adiscon.com>
Sent: Monday, April 25, 2022 1:35 AM
To: Mariusz Kruk via rsyslog <rsyslog@lists.adiscon.com>
Cc: David Lang <david@lang.hm>
Subject: Re: [rsyslog] Basic Rsyslog Troubleshooting

On Mon, 25 Apr 2022, Mariusz Kruk via rsyslog wrote:

> As a slightly unrelated side question - what is this SIEM you're gonna read
> the files with and send the events to? Can't you use some solution to send
> events directly to SIEM, without intermediate files?

As surprising as it seems, a lot of the SIEM tools do a pretty lousy job of
processing network syslog.

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Also, since mose SIEM tools are rather pricy, and price based on the volume of
logs, sending the logs to a syslog server lets you have the option of saving all
the logs, but only ingesting a portion of the logs into the SIEM (but retaining
the ability to import them all if needed)

with the price of some SIEM tools, this can be a 6 figure per year savings on a
medium size environment. And even big companies notice this sort of cost.

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Basic Rsyslog Troubleshooting [ In reply to ]
Sure. I work with them, I know ;-)

It's just that for some, you can do the same but using rsyslog to
process the message (even filter some events out or trim them or do many
other fancy stuff) an send them directly to SIEM (by means of native
SIEM API, not by syslog)

instead of killing the server with IOPS. That's all.

MK

On 25.04.2022 14:33, David Lang via rsyslog wrote:
> Also, since mose SIEM tools are rather pricy, and price based on the
> volume of logs, sending the logs to a syslog server lets you have the
> option of saving all the logs, but only ingesting a portion of the
> logs into the SIEM (but retaining the ability to import them all if
> needed)
>
> with the price of some SIEM tools, this can be a 6 figure per year
> savings on a medium size environment. And even big companies notice
> this sort of cost.
>
> David Lang
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

1 2  View All