Hello Rainer,
Could you check if this patch is correct and suitable for submission?
I came across an annoying bug using ommysql but I think it would be
happening to any action with a queue that returns RS_RET_DATAFAIL -- or
maybe only for transactional actions?
# rsyslog-fix-infinite-loop-on-action-datafail.patch
I believe this commit has introduced the issue:
59627f23bee26f3acec19d491d5884bcd1fb672e
core: fix message loss on target unavailibility during shutdown
closes https://github.com/rsyslog/rsyslog/issues/2760
Elements of configuration to reproduce the bug are attached (not the
easiest because it is using ommysql to demonstrate):
# rsyslog.conf
# rsyslog-db.sql
To trigger the RS_RET_DATAFAIL, I do an INSERT with a value that do not
fit the column size (it returns mysql error code 1406).
echo "<1>test: hop $((++i))" > /dev/udp/127.0.0.1/5514
With MariaDB strict mode it produces:
rsyslogd: ommysql: db error (1406): Data too long for column 'service'
at row 1 [v8.2104.0]
In ommysql this returns RS_RET_DATAFAIL.
rsyslogd: action 'out-sql-test' (module 'ommysql') message lost, could
not be processed. Check for additional error messages before this one.
[v8.2104.0 try https://www.rsyslog.com/e/2218 ]
Now generate a second RS_RET_DATAFAIL.
echo "<1>test: hop $((++i))" > /dev/udp/127.0.0.1/5514
Function processMsgMain() called from processBatchMain() returns
RS_RET_DATAFAIL and then starts an infinite loop, the message is never
removed from the queue.
First time:
4820.707506050:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: etry 0 state 3
4820.707525035:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects
4820.707538508:nagiosevents-comment queue:Reg/w0: queue.c: rger:
deleteBatchFromQStore, nElem 1
4820.707552043:nagiosevents-comment queue:Reg/w0: queue.c:
doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
Second time, loop, message is always re-enqueued:
4829.519359069:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: etry 0 state 0
4829.519382642:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: we deleted 0 objects and enqueued 1 objects
4829.519393587:nagiosevents-comment queue:Reg/w0: queue.c: rger:
deleteBatchFromQStore, nElem 1
4829.519411018:nagiosevents-comment queue:Reg/w0: queue.c:
doDeleteBatch: delete batch from store, new sizes: log 1, phys 1
Kind regards,
Julien
Could you check if this patch is correct and suitable for submission?
I came across an annoying bug using ommysql but I think it would be
happening to any action with a queue that returns RS_RET_DATAFAIL -- or
maybe only for transactional actions?
# rsyslog-fix-infinite-loop-on-action-datafail.patch
I believe this commit has introduced the issue:
59627f23bee26f3acec19d491d5884bcd1fb672e
core: fix message loss on target unavailibility during shutdown
closes https://github.com/rsyslog/rsyslog/issues/2760
Elements of configuration to reproduce the bug are attached (not the
easiest because it is using ommysql to demonstrate):
# rsyslog.conf
# rsyslog-db.sql
To trigger the RS_RET_DATAFAIL, I do an INSERT with a value that do not
fit the column size (it returns mysql error code 1406).
echo "<1>test: hop $((++i))" > /dev/udp/127.0.0.1/5514
With MariaDB strict mode it produces:
rsyslogd: ommysql: db error (1406): Data too long for column 'service'
at row 1 [v8.2104.0]
In ommysql this returns RS_RET_DATAFAIL.
rsyslogd: action 'out-sql-test' (module 'ommysql') message lost, could
not be processed. Check for additional error messages before this one.
[v8.2104.0 try https://www.rsyslog.com/e/2218 ]
Now generate a second RS_RET_DATAFAIL.
echo "<1>test: hop $((++i))" > /dev/udp/127.0.0.1/5514
Function processMsgMain() called from processBatchMain() returns
RS_RET_DATAFAIL and then starts an infinite loop, the message is never
removed from the queue.
First time:
4820.707506050:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: etry 0 state 3
4820.707525035:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects
4820.707538508:nagiosevents-comment queue:Reg/w0: queue.c: rger:
deleteBatchFromQStore, nElem 1
4820.707552043:nagiosevents-comment queue:Reg/w0: queue.c:
doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
Second time, loop, message is always re-enqueued:
4829.519359069:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: etry 0 state 0
4829.519382642:nagiosevents-comment queue:Reg/w0: queue.c:
DeleteProcessedBatch: we deleted 0 objects and enqueued 1 objects
4829.519393587:nagiosevents-comment queue:Reg/w0: queue.c: rger:
deleteBatchFromQStore, nElem 1
4829.519411018:nagiosevents-comment queue:Reg/w0: queue.c:
doDeleteBatch: delete batch from store, new sizes: log 1, phys 1
Kind regards,
Julien