Mailing List Archive

Re: Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors
Unfortunately the system is still having issues.

I enabled debug logging earlier, copied the debug log aside and *then* disabled debug-on-demand logging (last time I forgot that not copying the file elsewhere first complete wipes the log file).

I have 18 MB worth of captured details for a short timeframe.

Out of that log file I'm seeing a lot of these entries:

7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, size 221
7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: 'e'
7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start
7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 200 OK

7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, send returned -1 [errno 32]
7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode 10014, emsg 'error sending relp: Broken pipe'
7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended
7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]'
7317.804097763:main thread : janitor.c: janitorRun() called

The pattern seems to be that a remote target is interrupted and then the main queue (main Q) starts filling up. All omrelp actions have queues attached and none of them ever filled (going by impstats log file which was regularly receiving updates).

Going by the local log file on the primary receiver it noted these details:

2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 1 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try https://www.rsyslog.com/e/2027 ]
2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 2 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try https://www.rsyslog.com/e/2291 ]
2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
According to the impstats log, the main queue started filling up around 11:21 am, about the time that several of our mail relays were rebooting from maintenance.

This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.

Not sure if it is relevant, but the existing rsyslog packages were not swapped out with replacement 18.04-based packages as part of the upgrade as expected.

dpkg -l | grep rsyslog
ii rsyslog 8.2006.0-0adiscon2xenial1 amd64 a rocket-fast system for log processing
ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog
ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 amd64 The rsyslog-mmnormalize package provides log normalization
ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 amd64 The mmrm1stspace module permits to strip the leading space from
ii rsyslog-relp 8.2006.0-0adiscon2xenial1 amd64 RELP protocol support for rsyslog

I was planning to wait for the next rsyslog packages to be released for the Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" based packages. I suspect this is completely unrelated to the problem experienced, but wanted to make sure and note it here.

Recent changes (not *exactly* flailing around, but not very confident) from a few days ago (which didn't help today):

* Modified forward queues (attached to actions within a ruleset)
** increase queue size from 10K to 100K
** increase worker threads from 1 to 4
** disable explicit high water mark setting, allow default setting to apply
** disable explicit low water mark setting, allow default setting to apply

* Modified 'main Q' (as impstats lists it)
** adjust worker threads from 1 to 4
** adjust queue size from 10K to 500K

Changes made tonight after restarting rsyslog:

* Modified ruleset used to write out messages to local log files on the receiver
** increase queue size to 250K
** increase worker threads to 4

* Disable lightDelayMark on main queue (as discussed on this thread)


I'm not sure what's going on. This box was quite stable up until we upgraded the OS from Ubuntu 16.04 to 18.04.

Any ideas/suggestions are welcome.

Thanks.

-----Original Message-----
From: Adam Chalkley
Sent: Friday, August 28, 2020 4:15 PM
To: rsyslog-users <rsyslog@lists.adiscon.com>
Cc: alorbach@adiscon.com
Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver

Hi Andre,

Thank you for the additional feedback.

As you suggested, the problem is likely tied back to the TCP probe. We've not had it enabled since last Sunday and rsyslog has been running fine without making the changes we previously discussed (I'm still interested in making them, I've just been pulled in other directions).

Do you happen to know of a safe way to check that the port is open remotely without triggering a failure from rsyslog's perspective? I'm guessing that a minimal RELP-compatible client would be the best approach. Is there such a tool that you're aware of that could be called periodically to confirm that a rsyslog receiver (RELP-enabled port) is functioning properly?

Just thought I would ask.

Thanks!

-----Original Message-----
From: Andre Lorbach <alorbach@adiscon.com>
Sent: Monday, August 24, 2020 4:06 AM
To: rsyslog-users <rsyslog@lists.adiscon.com>
Cc: Adam Chalkley <atc0005@auburn.edu>
Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver

I think those errors were there all the time but not reported in older
librelp version.
I reviewed the code and we added this error output about 2 years ago in
librelp.
Ubuntu 16.04 most likely is using an older librelp version, so you did not
see the error there.

The problem is caused by the TCP Probe, it may helps if you try to receive
data before you drop the connection.

Best regards,
Andre Lorbach
--
Adiscon GmbH
Mozartstr. 21
97950 Großrinderfeld, Germany
Ph. +49-9349-9298530
Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
560610
Ust.-IDNr.: DE 81 22 04 622
Web: www.adiscon.com - Mail: info@adiscon.com

Informations regarding your data privacy policy can be found here:
https://www.adiscon.com/data-privacy-policy/

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient or have received this e-mail in error
please notify the sender immediately and delete this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
Weitergabe dieser E-Mail sind nicht gestattet.



> -----Ursprüngliche Nachricht-----
> Von: rsyslog <rsyslog-bounces@lists.adiscon.com> Im Auftrag von Adam
> Chalkley via rsyslog
> Gesendet: Mittwoch, 19. August 2020 18:38
> An: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: Adam Chalkley <atc0005@auburn.edu>
> Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
connections
> from clients failing with a high number of CLOSE_WAIT connections on
> receiver
>
> Hi,
>
> We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> (4.4 kernel) to 18.04 (4.15 kernel).
>
> We are using the upstream PPA, so running 8.2006.0 on receivers and
> endpoints.
>
> When we started getting reports from our Nagios instance that the
rsyslog
> forward queues endpoints were beginning to fill we checked our receiver
> (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> from our Nagios server, most of them I suspect from the TCP port
connection
> test performed every 5 minutes.
>
> Log samples from the receiver system (which are related to port probes
from
> our Nagios instance):
>
> 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
'server
> closed relp session, session broken', object 'lstn 2514: conn to clt
> 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
try
> https://www.rsyslog.com/e/2353 ]
> 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
'error
> sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
try
> https://www.rsyslog.com/e/2353 ]
> 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
'server
> closed relp session, session broken', object 'lstn 2514: conn to clt
> 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
try
> https://www.rsyslog.com/e/2353 ]
> 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
'error
> sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
try
> https://www.rsyslog.com/e/2353 ]
> 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
'server
> closed relp session, session broken', object 'lstn 2514: conn to clt
> 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
try
> https://www.rsyslog.com/e/2353 ]
>
> Log samples from the Nagios instance:
>
> 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
session
> state, session broken', object 'conn to srvr
lincoln.lib.auburn.edu:2514' -
> action may not work as intended [v8.2006.0 try
> https://www.rsyslog.com/e/2353 ]
> 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' -
action may
> not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
>
> Is there a setting I can apply to rsyslog to help resolve this?
>
> Is this a known bug?
>
> We didn't have the issue with v8.2006.0 on our receiver when it was
running
> Ubuntu 16.04 (the prior OS release), even though it made the same
> complaints about the TCP port probes from Nagios.
>
> Thanks in advance.
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
beyond
> our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors [ In reply to ]
I don't see the rest of this thread, can you (re?) post you config?

Rainer

El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
(<rsyslog@lists.adiscon.com>) escribió:
>
> Unfortunately the system is still having issues.
>
> I enabled debug logging earlier, copied the debug log aside and *then* disabled debug-on-demand logging (last time I forgot that not copying the file elsewhere first complete wipes the log file).
>
> I have 18 MB worth of captured details for a short timeframe.
>
> Out of that log file I'm seeing a lot of these entries:
>
> 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, size 221
> 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: 'e'
> 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start
> 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 200 OK
>
> 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, send returned -1 [errno 32]
> 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode 10014, emsg 'error sending relp: Broken pipe'
> 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended
> 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]'
> 7317.804097763:main thread : janitor.c: janitorRun() called
>
> The pattern seems to be that a remote target is interrupted and then the main queue (main Q) starts filling up. All omrelp actions have queues attached and none of them ever filled (going by impstats log file which was regularly receiving updates).
>
> Going by the local log file on the primary receiver it noted these details:
>
> 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 1 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try https://www.rsyslog.com/e/2027 ]
> 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 2 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try https://www.rsyslog.com/e/2291 ]
> 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> According to the impstats log, the main queue started filling up around 11:21 am, about the time that several of our mail relays were rebooting from maintenance.
>
> This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
>
> Not sure if it is relevant, but the existing rsyslog packages were not swapped out with replacement 18.04-based packages as part of the upgrade as expected.
>
> dpkg -l | grep rsyslog
> ii rsyslog 8.2006.0-0adiscon2xenial1 amd64 a rocket-fast system for log processing
> ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog
> ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 amd64 The rsyslog-mmnormalize package provides log normalization
> ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 amd64 The mmrm1stspace module permits to strip the leading space from
> ii rsyslog-relp 8.2006.0-0adiscon2xenial1 amd64 RELP protocol support for rsyslog
>
> I was planning to wait for the next rsyslog packages to be released for the Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" based packages. I suspect this is completely unrelated to the problem experienced, but wanted to make sure and note it here.
>
> Recent changes (not *exactly* flailing around, but not very confident) from a few days ago (which didn't help today):
>
> * Modified forward queues (attached to actions within a ruleset)
> ** increase queue size from 10K to 100K
> ** increase worker threads from 1 to 4
> ** disable explicit high water mark setting, allow default setting to apply
> ** disable explicit low water mark setting, allow default setting to apply
>
> * Modified 'main Q' (as impstats lists it)
> ** adjust worker threads from 1 to 4
> ** adjust queue size from 10K to 500K
>
> Changes made tonight after restarting rsyslog:
>
> * Modified ruleset used to write out messages to local log files on the receiver
> ** increase queue size to 250K
> ** increase worker threads to 4
>
> * Disable lightDelayMark on main queue (as discussed on this thread)
>
>
> I'm not sure what's going on. This box was quite stable up until we upgraded the OS from Ubuntu 16.04 to 18.04.
>
> Any ideas/suggestions are welcome.
>
> Thanks.
>
> -----Original Message-----
> From: Adam Chalkley
> Sent: Friday, August 28, 2020 4:15 PM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: alorbach@adiscon.com
> Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> Hi Andre,
>
> Thank you for the additional feedback.
>
> As you suggested, the problem is likely tied back to the TCP probe. We've not had it enabled since last Sunday and rsyslog has been running fine without making the changes we previously discussed (I'm still interested in making them, I've just been pulled in other directions).
>
> Do you happen to know of a safe way to check that the port is open remotely without triggering a failure from rsyslog's perspective? I'm guessing that a minimal RELP-compatible client would be the best approach. Is there such a tool that you're aware of that could be called periodically to confirm that a rsyslog receiver (RELP-enabled port) is functioning properly?
>
> Just thought I would ask.
>
> Thanks!
>
> -----Original Message-----
> From: Andre Lorbach <alorbach@adiscon.com>
> Sent: Monday, August 24, 2020 4:06 AM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: Adam Chalkley <atc0005@auburn.edu>
> Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> I think those errors were there all the time but not reported in older
> librelp version.
> I reviewed the code and we added this error output about 2 years ago in
> librelp.
> Ubuntu 16.04 most likely is using an older librelp version, so you did not
> see the error there.
>
> The problem is caused by the TCP Probe, it may helps if you try to receive
> data before you drop the connection.
>
> Best regards,
> Andre Lorbach
> --
> Adiscon GmbH
> Mozartstr. 21
> 97950 Großrinderfeld, Germany
> Ph. +49-9349-9298530
> Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> 560610
> Ust.-IDNr.: DE 81 22 04 622
> Web: www.adiscon.com - Mail: info@adiscon.com
>
> Informations regarding your data privacy policy can be found here:
> https://www.adiscon.com/data-privacy-policy/
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient or have received this e-mail in error
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> Weitergabe dieser E-Mail sind nicht gestattet.
>
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: rsyslog <rsyslog-bounces@lists.adiscon.com> Im Auftrag von Adam
> > Chalkley via rsyslog
> > Gesendet: Mittwoch, 19. August 2020 18:38
> > An: rsyslog-users <rsyslog@lists.adiscon.com>
> > Cc: Adam Chalkley <atc0005@auburn.edu>
> > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> connections
> > from clients failing with a high number of CLOSE_WAIT connections on
> > receiver
> >
> > Hi,
> >
> > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > (4.4 kernel) to 18.04 (4.15 kernel).
> >
> > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > endpoints.
> >
> > When we started getting reports from our Nagios instance that the
> rsyslog
> > forward queues endpoints were beginning to fill we checked our receiver
> > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > from our Nagios server, most of them I suspect from the TCP port
> connection
> > test performed every 5 minutes.
> >
> > Log samples from the receiver system (which are related to port probes
> from
> > our Nagios instance):
> >
> > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> >
> > Log samples from the Nagios instance:
> >
> > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> session
> > state, session broken', object 'conn to srvr
> lincoln.lib.auburn.edu:2514' -
> > action may not work as intended [v8.2006.0 try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' -
> action may
> > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> >
> > Is there a setting I can apply to rsyslog to help resolve this?
> >
> > Is this a known bug?
> >
> > We didn't have the issue with v8.2006.0 on our receiver when it was
> running
> > Ubuntu 16.04 (the prior OS release), even though it made the same
> > complaints about the TCP port probes from Nagios.
> >
> > Thanks in advance.
> >
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> beyond
> > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors [ In reply to ]
Rainer,

Thanks for the reply.

I've not shared the config, but I can work on doing so. It's rather long and likely over complicated, but it would be great to have other eyes on it.

It may take some time to sanitize the contents before sharing, but when I'm able, should I share it here as attachments, inline as plaintext or via a GitHub repo? I'm leaning towards the latter if you don't have a specific preference.

Thank you for offering to take a look at the configuration!

-----Original Message-----
From: rsyslog <rsyslog-bounces@lists.adiscon.com> On Behalf Of Rainer Gerhards via rsyslog
Sent: Wednesday, September 2, 2020 3:14 AM
To: rsyslog-users <rsyslog@lists.adiscon.com>
Cc: Rainer Gerhards <rgerhards@hq.adiscon.com>
Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors

I don't see the rest of this thread, can you (re?) post you config?

Rainer

El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
(<rsyslog@lists.adiscon.com>) escribió:
>
> Unfortunately the system is still having issues.
>
> I enabled debug logging earlier, copied the debug log aside and *then* disabled debug-on-demand logging (last time I forgot that not copying the file elsewhere first complete wipes the log file).
>
> I have 18 MB worth of captured details for a short timeframe.
>
> Out of that log file I'm seeing a lot of these entries:
>
> 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, size 221
> 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: 'e'
> 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start
> 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 200 OK
>
> 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, send returned -1 [errno 32]
> 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode 10014, emsg 'error sending relp: Broken pipe'
> 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended
> 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]'
> 7317.804097763:main thread : janitor.c: janitorRun() called
>
> The pattern seems to be that a remote target is interrupted and then the main queue (main Q) starts filling up. All omrelp actions have queues attached and none of them ever filled (going by impstats log file which was regularly receiving updates).
>
> Going by the local log file on the primary receiver it noted these details:
>
> 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 1 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try https://www.rsyslog.com/e/2027 ]
> 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 2 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try https://www.rsyslog.com/e/2291 ]
> 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> According to the impstats log, the main queue started filling up around 11:21 am, about the time that several of our mail relays were rebooting from maintenance.
>
> This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
>
> Not sure if it is relevant, but the existing rsyslog packages were not swapped out with replacement 18.04-based packages as part of the upgrade as expected.
>
> dpkg -l | grep rsyslog
> ii rsyslog 8.2006.0-0adiscon2xenial1 amd64 a rocket-fast system for log processing
> ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog
> ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 amd64 The rsyslog-mmnormalize package provides log normalization
> ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 amd64 The mmrm1stspace module permits to strip the leading space from
> ii rsyslog-relp 8.2006.0-0adiscon2xenial1 amd64 RELP protocol support for rsyslog
>
> I was planning to wait for the next rsyslog packages to be released for the Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" based packages. I suspect this is completely unrelated to the problem experienced, but wanted to make sure and note it here.
>
> Recent changes (not *exactly* flailing around, but not very confident) from a few days ago (which didn't help today):
>
> * Modified forward queues (attached to actions within a ruleset)
> ** increase queue size from 10K to 100K
> ** increase worker threads from 1 to 4
> ** disable explicit high water mark setting, allow default setting to apply
> ** disable explicit low water mark setting, allow default setting to apply
>
> * Modified 'main Q' (as impstats lists it)
> ** adjust worker threads from 1 to 4
> ** adjust queue size from 10K to 500K
>
> Changes made tonight after restarting rsyslog:
>
> * Modified ruleset used to write out messages to local log files on the receiver
> ** increase queue size to 250K
> ** increase worker threads to 4
>
> * Disable lightDelayMark on main queue (as discussed on this thread)
>
>
> I'm not sure what's going on. This box was quite stable up until we upgraded the OS from Ubuntu 16.04 to 18.04.
>
> Any ideas/suggestions are welcome.
>
> Thanks.
>
> -----Original Message-----
> From: Adam Chalkley
> Sent: Friday, August 28, 2020 4:15 PM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: alorbach@adiscon.com
> Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> Hi Andre,
>
> Thank you for the additional feedback.
>
> As you suggested, the problem is likely tied back to the TCP probe. We've not had it enabled since last Sunday and rsyslog has been running fine without making the changes we previously discussed (I'm still interested in making them, I've just been pulled in other directions).
>
> Do you happen to know of a safe way to check that the port is open remotely without triggering a failure from rsyslog's perspective? I'm guessing that a minimal RELP-compatible client would be the best approach. Is there such a tool that you're aware of that could be called periodically to confirm that a rsyslog receiver (RELP-enabled port) is functioning properly?
>
> Just thought I would ask.
>
> Thanks!
>
> -----Original Message-----
> From: Andre Lorbach <alorbach@adiscon.com>
> Sent: Monday, August 24, 2020 4:06 AM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: Adam Chalkley <atc0005@auburn.edu>
> Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> I think those errors were there all the time but not reported in older
> librelp version.
> I reviewed the code and we added this error output about 2 years ago in
> librelp.
> Ubuntu 16.04 most likely is using an older librelp version, so you did not
> see the error there.
>
> The problem is caused by the TCP Probe, it may helps if you try to receive
> data before you drop the connection.
>
> Best regards,
> Andre Lorbach
> --
> Adiscon GmbH
> Mozartstr. 21
> 97950 Großrinderfeld, Germany
> Ph. +49-9349-9298530
> Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> 560610
> Ust.-IDNr.: DE 81 22 04 622
> Web: www.adiscon.com - Mail: info@adiscon.com
>
> Informations regarding your data privacy policy can be found here:
> https://www.adiscon.com/data-privacy-policy/
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient or have received this e-mail in error
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> Weitergabe dieser E-Mail sind nicht gestattet.
>
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: rsyslog <rsyslog-bounces@lists.adiscon.com> Im Auftrag von Adam
> > Chalkley via rsyslog
> > Gesendet: Mittwoch, 19. August 2020 18:38
> > An: rsyslog-users <rsyslog@lists.adiscon.com>
> > Cc: Adam Chalkley <atc0005@auburn.edu>
> > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> connections
> > from clients failing with a high number of CLOSE_WAIT connections on
> > receiver
> >
> > Hi,
> >
> > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > (4.4 kernel) to 18.04 (4.15 kernel).
> >
> > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > endpoints.
> >
> > When we started getting reports from our Nagios instance that the
> rsyslog
> > forward queues endpoints were beginning to fill we checked our receiver
> > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > from our Nagios server, most of them I suspect from the TCP port
> connection
> > test performed every 5 minutes.
> >
> > Log samples from the receiver system (which are related to port probes
> from
> > our Nagios instance):
> >
> > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> >
> > Log samples from the Nagios instance:
> >
> > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> session
> > state, session broken', object 'conn to srvr
> lincoln.lib.auburn.edu:2514' -
> > action may not work as intended [v8.2006.0 try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' -
> action may
> > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> >
> > Is there a setting I can apply to rsyslog to help resolve this?
> >
> > Is this a known bug?
> >
> > We didn't have the issue with v8.2006.0 on our receiver when it was
> running
> > Ubuntu 16.04 (the prior OS release), even though it made the same
> > complaints about the TCP port probes from Nagios.
> >
> > Thanks in advance.
> >
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> beyond
> > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors [ In reply to ]
all is fine - maybe you start with a snippet of just some of the
affected actions.

Rainer

El jue., 3 sept. 2020 a las 15:13, Adam Chalkley
(<atc0005@auburn.edu>) escribió:
>
> Rainer,
>
> Thanks for the reply.
>
> I've not shared the config, but I can work on doing so. It's rather long and likely over complicated, but it would be great to have other eyes on it.
>
> It may take some time to sanitize the contents before sharing, but when I'm able, should I share it here as attachments, inline as plaintext or via a GitHub repo? I'm leaning towards the latter if you don't have a specific preference.
>
> Thank you for offering to take a look at the configuration!
>
> -----Original Message-----
> From: rsyslog <rsyslog-bounces@lists.adiscon.com> On Behalf Of Rainer Gerhards via rsyslog
> Sent: Wednesday, September 2, 2020 3:14 AM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: Rainer Gerhards <rgerhards@hq.adiscon.com>
> Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors
>
> I don't see the rest of this thread, can you (re?) post you config?
>
> Rainer
>
> El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
> (<rsyslog@lists.adiscon.com>) escribió:
> >
> > Unfortunately the system is still having issues.
> >
> > I enabled debug logging earlier, copied the debug log aside and *then* disabled debug-on-demand logging (last time I forgot that not copying the file elsewhere first complete wipes the log file).
> >
> > I have 18 MB worth of captured details for a short timeframe.
> >
> > Out of that log file I'm seeing a lot of these entries:
> >
> > 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> > 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> > 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, size 221
> > 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: 'e'
> > 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> > 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> > 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start
> > 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 200 OK
> >
> > 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, send returned -1 [errno 32]
> > 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode 10014, emsg 'error sending relp: Broken pipe'
> > 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended
> > 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]'
> > 7317.804097763:main thread : janitor.c: janitorRun() called
> >
> > The pattern seems to be that a remote target is interrupted and then the main queue (main Q) starts filling up. All omrelp actions have queues attached and none of them ever filled (going by impstats log file which was regularly receiving updates).
> >
> > Going by the local log file on the primary receiver it noted these details:
> >
> > 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 1 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> > 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try https://www.rsyslog.com/e/2027 ]
> > 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> > 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> > 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 2 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> > 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try https://www.rsyslog.com/e/2291 ]
> > 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> > 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> > According to the impstats log, the main queue started filling up around 11:21 am, about the time that several of our mail relays were rebooting from maintenance.
> >
> > This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
> >
> > Not sure if it is relevant, but the existing rsyslog packages were not swapped out with replacement 18.04-based packages as part of the upgrade as expected.
> >
> > dpkg -l | grep rsyslog
> > ii rsyslog 8.2006.0-0adiscon2xenial1 amd64 a rocket-fast system for log processing
> > ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog
> > ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 amd64 The rsyslog-mmnormalize package provides log normalization
> > ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 amd64 The mmrm1stspace module permits to strip the leading space from
> > ii rsyslog-relp 8.2006.0-0adiscon2xenial1 amd64 RELP protocol support for rsyslog
> >
> > I was planning to wait for the next rsyslog packages to be released for the Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" based packages. I suspect this is completely unrelated to the problem experienced, but wanted to make sure and note it here.
> >
> > Recent changes (not *exactly* flailing around, but not very confident) from a few days ago (which didn't help today):
> >
> > * Modified forward queues (attached to actions within a ruleset)
> > ** increase queue size from 10K to 100K
> > ** increase worker threads from 1 to 4
> > ** disable explicit high water mark setting, allow default setting to apply
> > ** disable explicit low water mark setting, allow default setting to apply
> >
> > * Modified 'main Q' (as impstats lists it)
> > ** adjust worker threads from 1 to 4
> > ** adjust queue size from 10K to 500K
> >
> > Changes made tonight after restarting rsyslog:
> >
> > * Modified ruleset used to write out messages to local log files on the receiver
> > ** increase queue size to 250K
> > ** increase worker threads to 4
> >
> > * Disable lightDelayMark on main queue (as discussed on this thread)
> >
> >
> > I'm not sure what's going on. This box was quite stable up until we upgraded the OS from Ubuntu 16.04 to 18.04.
> >
> > Any ideas/suggestions are welcome.
> >
> > Thanks.
> >
> > -----Original Message-----
> > From: Adam Chalkley
> > Sent: Friday, August 28, 2020 4:15 PM
> > To: rsyslog-users <rsyslog@lists.adiscon.com>
> > Cc: alorbach@adiscon.com
> > Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
> >
> > Hi Andre,
> >
> > Thank you for the additional feedback.
> >
> > As you suggested, the problem is likely tied back to the TCP probe. We've not had it enabled since last Sunday and rsyslog has been running fine without making the changes we previously discussed (I'm still interested in making them, I've just been pulled in other directions).
> >
> > Do you happen to know of a safe way to check that the port is open remotely without triggering a failure from rsyslog's perspective? I'm guessing that a minimal RELP-compatible client would be the best approach. Is there such a tool that you're aware of that could be called periodically to confirm that a rsyslog receiver (RELP-enabled port) is functioning properly?
> >
> > Just thought I would ask.
> >
> > Thanks!
> >
> > -----Original Message-----
> > From: Andre Lorbach <alorbach@adiscon.com>
> > Sent: Monday, August 24, 2020 4:06 AM
> > To: rsyslog-users <rsyslog@lists.adiscon.com>
> > Cc: Adam Chalkley <atc0005@auburn.edu>
> > Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
> >
> > I think those errors were there all the time but not reported in older
> > librelp version.
> > I reviewed the code and we added this error output about 2 years ago in
> > librelp.
> > Ubuntu 16.04 most likely is using an older librelp version, so you did not
> > see the error there.
> >
> > The problem is caused by the TCP Probe, it may helps if you try to receive
> > data before you drop the connection.
> >
> > Best regards,
> > Andre Lorbach
> > --
> > Adiscon GmbH
> > Mozartstr. 21
> > 97950 Großrinderfeld, Germany
> > Ph. +49-9349-9298530
> > Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> > 560610
> > Ust.-IDNr.: DE 81 22 04 622
> > Web: www.adiscon.com - Mail: info@adiscon.com
> >
> > Informations regarding your data privacy policy can be found here:
> > https://www.adiscon.com/data-privacy-policy/
> >
> > This e-mail may contain confidential and/or privileged information. If you
> > are not the intended recipient or have received this e-mail in error
> > please notify the sender immediately and delete this e-mail. Any
> > unauthorized copying, disclosure or distribution of the material in this
> > e-mail is strictly forbidden.
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> > vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> > Weitergabe dieser E-Mail sind nicht gestattet.
> >
> >
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: rsyslog <rsyslog-bounces@lists.adiscon.com> Im Auftrag von Adam
> > > Chalkley via rsyslog
> > > Gesendet: Mittwoch, 19. August 2020 18:38
> > > An: rsyslog-users <rsyslog@lists.adiscon.com>
> > > Cc: Adam Chalkley <atc0005@auburn.edu>
> > > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> > connections
> > > from clients failing with a high number of CLOSE_WAIT connections on
> > > receiver
> > >
> > > Hi,
> > >
> > > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > > (4.4 kernel) to 18.04 (4.15 kernel).
> > >
> > > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > > endpoints.
> > >
> > > When we started getting reports from our Nagios instance that the
> > rsyslog
> > > forward queues endpoints were beginning to fill we checked our receiver
> > > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > > from our Nagios server, most of them I suspect from the TCP port
> > connection
> > > test performed every 5 minutes.
> > >
> > > Log samples from the receiver system (which are related to port probes
> > from
> > > our Nagios instance):
> > >
> > > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object 'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'error
> > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object 'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'error
> > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object 'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > >
> > > Log samples from the Nagios instance:
> > >
> > > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> > session
> > > state, session broken', object 'conn to srvr
> > lincoln.lib.auburn.edu:2514' -
> > > action may not work as intended [v8.2006.0 try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' -
> > action may
> > > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> > >
> > > Is there a setting I can apply to rsyslog to help resolve this?
> > >
> > > Is this a known bug?
> > >
> > > We didn't have the issue with v8.2006.0 on our receiver when it was
> > running
> > > Ubuntu 16.04 (the prior OS release), even though it made the same
> > > complaints about the TCP port probes from Nagios.
> > >
> > > Thanks in advance.
> > >
> > > _______________________________________________
> > > rsyslog mailing list
> > > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > > http://www.rsyslog.com/professional-services/
> > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> > beyond
> > > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors [ In reply to ]
Follow-up:

Of all things, I found that ommail is configured on our primary receiver without a queue attached and this indirectly led to the problems we were encountering after the Ubuntu 16.04 to 18.04 OS upgrade. The local Postfix daemon was setup as a null-client/forwarding instance, relying on our relays to handle all cleanup and final forwarding of notifications. The Postfix version used by 18.04 introduced new requirements that our old config didn't have and we didn't initially catch this. I happened to stumble upon it after checking what feels like nearly everything else.

As I restarted Postfix to apply the updated config, rsyslog logged this:

2020-09-16T17:18:58.104710-05:00 woodchuck1 rsyslogd: action 'email-alert-test' suspended (module 'ommail'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2008.0 try https://www.rsyslog.com/e/2007 ]
2020-09-16T17:18:58.104895-05:00 woodchuck1 rsyslogd: action 'email-alert-test' suspended (module 'ommail'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2008.0 try https://www.rsyslog.com/e/2007 ]
2020-09-16T17:18:58.104968-05:00 woodchuck1 rsyslogd: action 'email-alert-test' suspended (module 'ommail'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2008.0 try https://www.rsyslog.com/e/2007 ]


Immediately after the queue associated with a ruleset (that in turn called the ruleset that action was part of) immediately cleared.

Unfortunately the impstats log file didn't help much, perhaps just due to ignorance of what it is trying to tell me.

Wed Sep 16 16:25:35 2020: email-alert-test: origin=core.action processed=3 failed=0 suspended=0 suspended.duration=0 resumed=0

After I send this I will go back and add a queue to each ommail action in the hope that it will help surface any future issues related to that module.

-----Original Message-----
From: rsyslog <rsyslog-bounces@lists.adiscon.com> On Behalf Of Adam Chalkley via rsyslog
Sent: Thursday, September 3, 2020 8:14 AM
To: Rainer Gerhards <rgerhards@hq.adiscon.com>
Cc: Adam Chalkley <atc0005@auburn.edu>; rsyslog-users <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors

Rainer,

Thanks for the reply.

I've not shared the config, but I can work on doing so. It's rather long and likely over complicated, but it would be great to have other eyes on it.

It may take some time to sanitize the contents before sharing, but when I'm able, should I share it here as attachments, inline as plaintext or via a GitHub repo? I'm leaning towards the latter if you don't have a specific preference.

Thank you for offering to take a look at the configuration!

-----Original Message-----
From: rsyslog <rsyslog-bounces@lists.adiscon.com> On Behalf Of Rainer Gerhards via rsyslog
Sent: Wednesday, September 2, 2020 3:14 AM
To: rsyslog-users <rsyslog@lists.adiscon.com>
Cc: Rainer Gerhards <rgerhards@hq.adiscon.com>
Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main queue filling up, imrelp-related errors

I don't see the rest of this thread, can you (re?) post you config?

Rainer

El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
(<rsyslog@lists.adiscon.com>) escribió:
>
> Unfortunately the system is still having issues.
>
> I enabled debug logging earlier, copied the debug log aside and *then* disabled debug-on-demand logging (last time I forgot that not copying the file elsewhere first complete wipes the log file).
>
> I have 18 MB worth of captured details for a short timeframe.
>
> Out of that log file I'm seeing a lot of these entries:
>
> 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, size 221
> 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: 0, pszTS: 'e'
> 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start
> 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() on 2 fds
> 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start
> 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 200 OK
>
> 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, send returned -1 [errno 32]
> 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode 10014, emsg 'error sending relp: Broken pipe'
> 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended
> 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]'
> 7317.804097763:main thread : janitor.c: janitorRun() called
>
> The pattern seems to be that a remote target is interrupted and then the main queue (main Q) starts filling up. All omrelp actions have queues attached and none of them ever filled (going by impstats log file which was regularly receiving updates).
>
> Going by the local log file on the primary receiver it noted these details:
>
> 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 1 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try https://www.rsyslog.com/e/2027 ]
> 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high activity - starting 1 additional worker thread(s), currently 2 active worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try https://www.rsyslog.com/e/2291 ]
> 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try https://www.rsyslog.com/e/2359 ]
> According to the impstats log, the main queue started filling up around 11:21 am, about the time that several of our mail relays were rebooting from maintenance.
>
> This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
>
> Not sure if it is relevant, but the existing rsyslog packages were not swapped out with replacement 18.04-based packages as part of the upgrade as expected.
>
> dpkg -l | grep rsyslog
> ii rsyslog 8.2006.0-0adiscon2xenial1 amd64 a rocket-fast system for log processing
> ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog
> ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 amd64 The rsyslog-mmnormalize package provides log normalization
> ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 amd64 The mmrm1stspace module permits to strip the leading space from
> ii rsyslog-relp 8.2006.0-0adiscon2xenial1 amd64 RELP protocol support for rsyslog
>
> I was planning to wait for the next rsyslog packages to be released for the Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" based packages. I suspect this is completely unrelated to the problem experienced, but wanted to make sure and note it here.
>
> Recent changes (not *exactly* flailing around, but not very confident) from a few days ago (which didn't help today):
>
> * Modified forward queues (attached to actions within a ruleset)
> ** increase queue size from 10K to 100K
> ** increase worker threads from 1 to 4
> ** disable explicit high water mark setting, allow default setting to apply
> ** disable explicit low water mark setting, allow default setting to apply
>
> * Modified 'main Q' (as impstats lists it)
> ** adjust worker threads from 1 to 4
> ** adjust queue size from 10K to 500K
>
> Changes made tonight after restarting rsyslog:
>
> * Modified ruleset used to write out messages to local log files on the receiver
> ** increase queue size to 250K
> ** increase worker threads to 4
>
> * Disable lightDelayMark on main queue (as discussed on this thread)
>
>
> I'm not sure what's going on. This box was quite stable up until we upgraded the OS from Ubuntu 16.04 to 18.04.
>
> Any ideas/suggestions are welcome.
>
> Thanks.
>
> -----Original Message-----
> From: Adam Chalkley
> Sent: Friday, August 28, 2020 4:15 PM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: alorbach@adiscon.com
> Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> Hi Andre,
>
> Thank you for the additional feedback.
>
> As you suggested, the problem is likely tied back to the TCP probe. We've not had it enabled since last Sunday and rsyslog has been running fine without making the changes we previously discussed (I'm still interested in making them, I've just been pulled in other directions).
>
> Do you happen to know of a safe way to check that the port is open remotely without triggering a failure from rsyslog's perspective? I'm guessing that a minimal RELP-compatible client would be the best approach. Is there such a tool that you're aware of that could be called periodically to confirm that a rsyslog receiver (RELP-enabled port) is functioning properly?
>
> Just thought I would ask.
>
> Thanks!
>
> -----Original Message-----
> From: Andre Lorbach <alorbach@adiscon.com>
> Sent: Monday, August 24, 2020 4:06 AM
> To: rsyslog-users <rsyslog@lists.adiscon.com>
> Cc: Adam Chalkley <atc0005@auburn.edu>
> Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver
>
> I think those errors were there all the time but not reported in older
> librelp version.
> I reviewed the code and we added this error output about 2 years ago in
> librelp.
> Ubuntu 16.04 most likely is using an older librelp version, so you did not
> see the error there.
>
> The problem is caused by the TCP Probe, it may helps if you try to receive
> data before you drop the connection.
>
> Best regards,
> Andre Lorbach
> --
> Adiscon GmbH
> Mozartstr. 21
> 97950 Großrinderfeld, Germany
> Ph. +49-9349-9298530
> Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> 560610
> Ust.-IDNr.: DE 81 22 04 622
> Web: www.adiscon.com - Mail: info@adiscon.com
>
> Informations regarding your data privacy policy can be found here:
> https://www.adiscon.com/data-privacy-policy/
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient or have received this e-mail in error
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> Weitergabe dieser E-Mail sind nicht gestattet.
>
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: rsyslog <rsyslog-bounces@lists.adiscon.com> Im Auftrag von Adam
> > Chalkley via rsyslog
> > Gesendet: Mittwoch, 19. August 2020 18:38
> > An: rsyslog-users <rsyslog@lists.adiscon.com>
> > Cc: Adam Chalkley <atc0005@auburn.edu>
> > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> connections
> > from clients failing with a high number of CLOSE_WAIT connections on
> > receiver
> >
> > Hi,
> >
> > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > (4.4 kernel) to 18.04 (4.15 kernel).
> >
> > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > endpoints.
> >
> > When we started getting reports from our Nagios instance that the
> rsyslog
> > forward queues endpoints were beginning to fill we checked our receiver
> > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > from our Nagios server, most of them I suspect from the TCP port
> connection
> > test performed every 5 minutes.
> >
> > Log samples from the receiver system (which are related to port probes
> from
> > our Nagios instance):
> >
> > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object 'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> >
> > Log samples from the Nagios instance:
> >
> > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> session
> > state, session broken', object 'conn to srvr
> lincoln.lib.auburn.edu:2514' -
> > action may not work as intended [v8.2006.0 try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' -
> action may
> > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> >
> > Is there a setting I can apply to rsyslog to help resolve this?
> >
> > Is this a known bug?
> >
> > We didn't have the issue with v8.2006.0 on our receiver when it was
> running
> > Ubuntu 16.04 (the prior OS release), even though it made the same
> > complaints about the TCP port probes from Nagios.
> >
> > Thanks in advance.
> >
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> beyond
> > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.