Mailing List Archive

imfile readTimeout behavior question
Hello rsyslog community,

I am working on a Centos 7 system that has rsyslog 8.2202.0 configured to monitor and process perhaps 2 dozen log files with the imfile module.
The module is loaded with the module directive like this:

module(load="imfile" mode="inotify")

And the file references are set up using several config files in the /etc/rsyslog.d directory containing entries similar to the following:

input(type="imfile"
File="/var/log/activity*.log"
Severity="info"
Tag="pfx:activity"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

Some of the referenced files normally receive one or more entries every second.
Other referenced files grow very slowly, receiving maybe one or two entries a day.

We must be able to handle that some subset of the entries will be multi-line entries.
Every entry begins with a date stamp. We use the startmsg.regex capability to detect the date stamp at the start of a line so that imfile can correctly detect the start of a new record.
We expect that if no new line is added to any specific file within 2 seconds, then any existing line(s) of unprocessed data in that file can be considered a complete record, so we have configured the "readTimeout" to be "2".

What we expect is that after 2 seconds of no additional content being added to a file, the readTimeout timer will expire and the last line of that file will be considered a complete entry and be processed.
What we are actually seeing is that as long as ANY monitored file has new data added to it within 2 seconds, then no timer expires.

Because some files receive data every second, the last line in each slow growing files is not processed until hours, maybe days after it is written, when another entry is added and the date stamp on the new line matches the startmsg.regex.

When the software on the host is running normally, the continuous stream of log records being added to the frequently updated files prevents the readTimeout from ever occurring. This makes it appear as if the readTimeout never expires. I originally thought this was a recurrence of #1445 (https://github.com/rsyslog/rsyslog/issues/1445), which was fixed in 8.28 by #1598 (https://github.com/rsyslog/rsyslog/pull/1598). Our version (8.2202.0) should contain that fix (I do see it in the release note history). However, the behavior is not exactly the same as described in #1445, because I can cause the final lines to be processed by stopping the processes that are generating the frequent log records. Once there are no new lines to process in any monitored file, then the readTimeout occurs and the final lines of the slow growing files are processed.

Is this the intended behavior? Am I misunderstanding the configuration of the imfile module? Is there any way to get the timeouts to occur on a file-by-file basis instead of requiring all of the monitored files to be unchanged for the readTimeout?

I really wish I could disable the multi-line handling entirely, but as you might guess, the reason we had to configure it this way is that some of the files are generated by software that is not under our control.
I would prefer not to try to write "dummy" records into the files from another process to push out the "straggler" final lines from the slow growing files. Even though they aren't written very often, I just know that someday the external process and the software we're trying to collect the data from will both try to write at nearly the same time and the result will be a mangled mix of the two records.

Thank you for any suggestions,

Russ Miranda

_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: imfile readTimeout behavior question [ In reply to ]
I remember hearing about this sort of problem before, the version you are using
is 2 years old, please update to a current version and check again.

note that the version you are running is different from the 8.2202 that the
rsyslog project shipped, and is entirely supported by redhat as a result.

David Lang

On Tue, 6 Feb 2024, russel.miranda--- via rsyslog wrote:

> Hello rsyslog community,
>
> I am working on a Centos 7 system that has rsyslog 8.2202.0 configured to monitor and process perhaps 2 dozen log files with the imfile module.
> The module is loaded with the module directive like this:
>
> module(load="imfile" mode="inotify")
>
> And the file references are set up using several config files in the /etc/rsyslog.d directory containing entries similar to the following:
>
> input(type="imfile"
> File="/var/log/activity*.log"
> Severity="info"
> Tag="pfx:activity"
> readTimeout="2"
> reopenOnTruncate="on"
> startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
> )
>
> Some of the referenced files normally receive one or more entries every second.
> Other referenced files grow very slowly, receiving maybe one or two entries a day.
>
> We must be able to handle that some subset of the entries will be multi-line entries.
> Every entry begins with a date stamp. We use the startmsg.regex capability to detect the date stamp at the start of a line so that imfile can correctly detect the start of a new record.
> We expect that if no new line is added to any specific file within 2 seconds, then any existing line(s) of unprocessed data in that file can be considered a complete record, so we have configured the "readTimeout" to be "2".
>
> What we expect is that after 2 seconds of no additional content being added to a file, the readTimeout timer will expire and the last line of that file will be considered a complete entry and be processed.
> What we are actually seeing is that as long as ANY monitored file has new data added to it within 2 seconds, then no timer expires.
>
> Because some files receive data every second, the last line in each slow growing files is not processed until hours, maybe days after it is written, when another entry is added and the date stamp on the new line matches the startmsg.regex.
>
> When the software on the host is running normally, the continuous stream of log records being added to the frequently updated files prevents the readTimeout from ever occurring. This makes it appear as if the readTimeout never expires. I originally thought this was a recurrence of #1445 (https://github.com/rsyslog/rsyslog/issues/1445), which was fixed in 8.28 by #1598 (https://github.com/rsyslog/rsyslog/pull/1598). Our version (8.2202.0) should contain that fix (I do see it in the release note history). However, the behavior is not exactly the same as described in #1445, because I can cause the final lines to be processed by stopping the processes that are generating the frequent log records. Once there are no new lines to process in any monitored file, then the readTimeout occurs and the final lines of the slow growing files are processed.
>
> Is this the intended behavior? Am I misunderstanding the configuration of the imfile module? Is there any way to get the timeouts to occur on a file-by-file basis instead of requiring all of the monitored files to be unchanged for the readTimeout?
>
> I really wish I could disable the multi-line handling entirely, but as you might guess, the reason we had to configure it this way is that some of the files are generated by software that is not under our control.
> I would prefer not to try to write "dummy" records into the files from another process to push out the "straggler" final lines from the slow growing files. Even though they aren't written very often, I just know that someday the external process and the software we're trying to collect the data from will both try to write at nearly the same time and the result will be a mangled mix of the two records.
>
> Thank you for any suggestions,
>
> Russ Miranda
>
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: imfile readTimeout behavior question [ In reply to ]
Hello again, rsyslog community!

I did not see any responses to my earlier question (2024-02-05) about imfile timeouts, so I have created an example configuration and some shell scripting that demonstrates the problem.

Here is an abbreviated version of the configuration file I created for this experiment:
=====================================================

module(load="imfile" mode="inotify")

input(type="imfile" File="/var/log/imfile-issue/input/proc00.log" Severity="info" Tag="pfx:proc00"
readTimeout="2" reopenOnTruncate="on" startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

input(type="imfile" File="/var/log/imfile-issue/input/proc01.log" Severity="info" Tag="pfx:proc01"
readTimeout="2" reopenOnTruncate="on" startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

input(type="imfile" File="/var/log/imfile-issue/input/proc02.log" Severity="info" Tag="pfx:proc02"
readTimeout="2" reopenOnTruncate="on" startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

=== I think you can see the pattern here, so I'm going to skip to the bottom, because there are 50 of them in total ===

input(type="imfile" File="/var/log/imfile-issue/input/proc49.log" Severity="info" Tag="pfx:proc49"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

input(type="imfile" File="/var/log/imfile-issue/input/slowLogger.log" Severity="info" Tag="pfx:slowLogger"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

# Fowarding to outlog
if ( $syslogtag startswith 'pfx:' ) then {
*.info action(type="omfile"
dirCreateMode="0755"
FileCreateMode="0644"
File="/var/log/imfile-issue/output/output.log")
stop
}

=====================================================

I then created the input and output directories:
$ sudo mkdir -p /var/log/imfile-issue/input /var/log/imfile-issue/output

Made the input directory writable so I could generate lots of sample entries (do not try this at home! I wouldn't suggest anyone leave a directory in /var/log open for writing by everyone):
$ sudo chmod o+w /var/log/imfile-issue/input

I then ran the following bash snippet to produce non-stop output into the 50 "procXX.log" files named /var/log/imfile-issue/input/proc00.log through /var/log/imfile-issue/input/proc49.log:
$ while true; do for x in $(seq 0 49); do procId=$(printf "proc%02d" "${x}"); date +"%F %T - Test record (${procId})" | tee -a "/var/log/imfile-issue/input/${procId}.log"; done; done

And finally, from another shell, I ran this this snippet to write into the "slowLogger.log" file every 15 seconds:
$ while true; do sleep 15; date +"%F %T - slow log file" | tee -a /var/log/imfile-issue/input/slowLogger.log; done

The resulting output in /var/log/imfile-issue/output/output.log looks like this:

...
Feb 23 13:19:15 carme5 pfx:proc18 2024-02-23 13:19:15 - Test record (proc18)
Feb 23 13:19:15 carme5 pfx:proc19 2024-02-23 13:19:15 - Test record (proc19)
Feb 23 13:19:15 carme5 pfx:proc20 2024-02-23 13:19:15 - Test record (proc20)
Feb 23 13:19:15 carme5 pfx:proc21 2024-02-23 13:19:15 - Test record (proc21)
...

The timestamp on the left (MON DD HH:MM:SS) is when each record was written into output.log. The timestamp in the middle is the timestamp that was in the record when it was written into the input file.
For the rapidly updated logs, the timestamps are near-identical (written into output file within a second of being generated).
However, the records written into the "slow logger", show a delay:

...
Feb 23 13:19:07 carme5 pfx:proc33 2024-02-23 13:19:06 - Test record (proc33)
Feb 23 13:19:07 carme5 pfx:slowLogger 2024-02-23 13:18:52 - slow log file
Feb 23 13:19:07 carme5 pfx:proc34 2024-02-23 13:19:06 - Test record (proc34)
...

Note that the record from the slow log file took 15 seconds to get into the output log - in fact, it was put into the output stream exactly when the next record was written into "slowLogger.log".

The readTimeout, which I expected to occur at 13:18:54 (two seconds after 13:18:52 when the record was written) never occurred.
It wasn't until the startMsg.regex matched the beginning of the next record at 13:19:07 that the previous record (from 13:18:52) was processed.

I then stopped the "slowLogger" shell loop and just wrote individual log records into "slowLogger.log" at longer intervals, to see if the timer would timeout. Here's an example of 1m4s between when a record was put into the input file, and when it hit the output log:

Feb 23 13:31:00 carme5 pfx:proc02 2024-02-23 13:30:59 - Test record (proc02)
Feb 23 13:31:00 carme5 pfx:slowLogger 2024-02-23 13:29:56 - slow log file
Feb 23 13:31:00 carme5 pfx:proc03 2024-02-23 13:30:59 - Test record (proc03)

As I was seeing in our production system, the only way to get the final log record from "slowLogger.log" to show up in the output.log was to either write another log entry into slowLogger.log, or to stop the script that was writing into the procXX.log files.
If I stopped the loop writing into the procXX.log files, the last log entry from slowLogger.txt would pop out, followed 2 seconds (readTimeout) later by the last records that were written into the procXX.log files.

(Sorry for how long this message is; my thanks if you've made it this far!)

The result of this behavior is that important log records written into slow growing log files are sometimes not showing up in our Graylog stream for hours, or even days after they've been written to the local files, because rsyslog imfile sits on the records, waiting for another record to be written, or for a timeout that will not come unless / until all our application software is shut down and writing ceases to the other log files.

My question is:

Is there any way to make that read timeout to occur on the slow log source inputs while data is being written regularly into the other log files that are being monitored?

A million thanks for any help you can provide.

Sincerely,
Russ Miranda

From: Miranda, Russel (US)
Sent: Monday, February 5, 2024 11:50 PM
To: rsyslog@lists.adiscon.com
Subject: imfile readTimeout behavior question

Hello rsyslog community,

I am working on a Centos 7 system that has rsyslog 8.2202.0 configured to monitor and process perhaps 2 dozen log files with the imfile module.
The module is loaded with the module directive like this:

module(load="imfile" mode="inotify")

And the file references are set up using several config files in the /etc/rsyslog.d directory containing entries similar to the following:

input(type="imfile"
File="/var/log/activity*.log"
Severity="info"
Tag="pfx:activity"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)

Some of the referenced files normally receive one or more entries every second.
Other referenced files grow very slowly, receiving maybe one or two entries a day.

We must be able to handle that some subset of the entries will be multi-line entries.
Every entry begins with a date stamp. We use the startmsg.regex capability to detect the date stamp at the start of a line so that imfile can correctly detect the start of a new record.
We expect that if no new line is added to any specific file within 2 seconds, then any existing line(s) of unprocessed data in that file can be considered a complete record, so we have configured the "readTimeout" to be "2".

What we expect is that after 2 seconds of no additional content being added to a file, the readTimeout timer will expire and the last line of that file will be considered a complete entry and be processed.
What we are actually seeing is that as long as ANY monitored file has new data added to it within 2 seconds, then no timer expires.

Because some files receive data every second, the last line in each slow growing files is not processed until hours, maybe days after it is written, when another entry is added and the date stamp on the new line matches the startmsg.regex.

When the software on the host is running normally, the continuous stream of log records being added to the frequently updated files prevents the readTimeout from ever occurring. This makes it appear as if the readTimeout never expires. I originally thought this was a recurrence of #1445 (https://github.com/rsyslog/rsyslog/issues/1445), which was fixed in 8.28 by #1598 (https://github.com/rsyslog/rsyslog/pull/1598). Our version (8.2202.0) should contain that fix (I do see it in the release note history). However, the behavior is not exactly the same as described in #1445, because I can cause the final lines to be processed by stopping the processes that are generating the frequent log records. Once there are no new lines to process in any monitored file, then the readTimeout occurs and the final lines of the slow growing files are processed.

Is this the intended behavior? Am I misunderstanding the configuration of the imfile module? Is there any way to get the timeouts to occur on a file-by-file basis instead of requiring all of the monitored files to be unchanged for the readTimeout?

I really wish I could disable the multi-line handling entirely, but as you might guess, the reason we had to configure it this way is that some of the files are generated by software that is not under our control.
I would prefer not to try to write "dummy" records into the files from another process to push out the "straggler" final lines from the slow growing files. Even though they aren't written very often, I just know that someday the external process and the software we're trying to collect the data from will both try to write at nearly the same time and the result will be a mangled mix of the two records.

Thank you for any suggestions,

Russ Miranda

_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.