Mailing List Archive

Re: [rsyslog/rsyslog-doc] documentation is hard to use and badly structured (#394)
We have received complaints about rsyslog documentation repeatedly, We have
a lot of detail, but it's all written for someone already fairly familiar
with things.

Here is a 3am first pass from me at writing an overview of how rsyslog works,
with the idea that this could be made pretty with diagrams, click through links
to more specific pages with detail, etc.

I'm replying to the github issue to see if the user who complained about the
documentation and RainerScript would find this more useful, but also to
rsyslog-users to get feedback from others on this.

some of the sections here should possibly be broken into sub-pages (some
sub-pages already exist that cover some of these and can/should be simplified),
or it make make sense to have a simple version on an overview page with the
ability to click down for the gory details.

David Lang



Rsyslog architecture is very straightforward, but in it's simplicity it hides a lot of flexibility.

Rsyslog has one or more inputs that each receive one or more messages and pass the batch of messages to a ruleset

Each input runs the incoming log through a stack of possible parser modules
until it hits one that reports success in parsing the log (pointer to parser
module documentation and the default stack)

Multiple inputs can feed to the same ruleset (by default, all inputs feed to
the Default ruleset which uses the 'main' queue) [1]

Worker threads pull batches of logs from a queue, then process the logs in the
batch using the statements in a ruleset

Conceptually, it really is that trivial. As always, looking at details makes it
seem more complicated.



Rsyslog config file(s)

Rsyslog reads in the config file and all included files and combines them before
evaluating anything (see -o option for how files are combined), which file a
statement is in has no impact (other than as part of the ordering of
statements).
(insert link to Rainer's recent post on mis-use of config includes??)

At startup time, Rsyslog evaluates the combined config and implements all module
loading, input definition, template definitions and other global settings.

All other statements get put into the default ruleset unless a ruleset is
specified. None of these statements are evaluated (beyond syntax checking) at
startup.

The Rsyslog team believes very strongly in maintaining backwards compatibility
(a config that works should never break or change behavior when rsyslog is
updated to a new version) as such there are multiple ways of doing the same
thing, and some ways are no longer recommended. When you see that something
is depriciated, that means it is recommended not to use it in a new
configuration for confusion/feature reasons, not that it is scheduled to go
away/break in a new version.

The config statments that existed prior to v6 of rsyslog were an evolution of
the syslog format from the 90's, doing complex things by setting a bunch of
values that then got used by later statements. By v5 of rsyslog, this was
resulting in such complex interactions that even core developers were having
trouble understanding what complex configs did. V6 introduced RainerScript,
which deliberately requires you to specify all options rather than 'inheriting'
them from prior statements. This can be significantly more verbose as it
requires you to specify all options each time, but makes it much clearer exactly
what is happening. There are times when the old syntax is shorter and more
obvious to use than the new syntax, and in those cases, it's recommended to use
the old syntax. But if the old syntax requires multiple lines to do something,
you are probably better off using the new syntax.



Rulesets are the heart of log processing, defining what happens with each log
message. The statements in a ruleset are evaluated for every log message as it
is processed.

Rulesets and Actions can have a queue defined for them (insert link to queue
turn lane post, possibnly with updates). The 'default' ruleset uses the 'main'
queue.

The contents of a ruleset are a series of statements, which can be:
1. call an action to use an output module
1a. legacy formats:
/var/log/messages (write to a file)
@1.2.3.4 (send to a remote system via UDP
@@1.2.3.4 (send to a remote system via TCP)
1b. action() format
2. set/clear variables (link to functions)
3. call a message modification module (can modify the log message being processed and set variables) commonly used to parse messages
4. call another ruleset and then retur
5. statement block
{ statement statement }
usually used after a filter to have the filter apply to multiple statements
& <statememt>
apply the last filter to this statement [2]
6. stop processing this log message
6a. ~ [2]
6b. stop
ignore all following statements in the ruleset. Rsyslog will warn you if you have statements after an unconditional stop
7. apply a filter
7a. legacy syslog facility.serverity filters
i.e. mail.info /var/log/mail
7b. rsyslog property based filters [2]
i.e. :msg, contains, "foo" <statemnent>
7c. expression based filters (if..then..else with continue) (link to
functions and conditionals)
i.d. if $msg contains("foo") then <statement>
8. atomic stats update (see impstats module)
9. foreach execute a block of statements on each value in a variable array


Variable types:
built-in/legacy properties start with $ or $$ (link to property page)
user modifyable variables exist as a tree internally represented as a json structure. There are three trees that can be used:
"normal variables" start with $!
"local variables" start with $. (exist so that you can include all $! variables in a template without including everything)
"global variables" start with $\ (persist past the log message where they were set, performance pigs)

Templates are used by output modules. They are used to create larger strings that use variable values for use with the module. These allow you to change the format of the output, what file or database table the log gets written to and other similar things. The details of what the result of the template means varies from one module to another. There is a common misconception that a template can be used to match and parse a log message. Templates are output only.



[1] it may make sense to have a 'are you sure' message at startup about inputs
that feed to rulesets that don't have a queue defined for the ruleset. I don't
think a new warning would be a breaking change

[2] supported for backwards compatibility, use is discouraged

(note, there should possibly be two versions of this, one showing the
straighforward, single-message-at-a-time process, and a second one that shows
the advanced, batch supporting, version that includes showing where locking
happens, the atomic stats and foreach would be in the advanced version)


On Wed, 1 Nov 2023, computerquip-work wrote:

> The documentation is so poor that it's almost unusable. Using RainerScript is hands down the most painful thing I've ever used in software.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog/rsyslog-doc] documentation is hard to use and badly structured (#394) [ In reply to ]
On Thu, 2 Nov 2023, computerquip-work wrote:

> This is a bit unorganized of a take so I'm going to apologize ahead of time. These are the things I could think of off the top of my head.
>
> 1. Documentation is unclear and doesn't take itself seriously.
>
> What I mean by this is that it states things that you can't take at face
> value. For example, in your overview, you state that `[2]`, or legacy syntax,
> is discouraged but the documentation says `this is the format best used to
> express basic things`. People take that comment seriously and I've seen a lot
> of mixing and matching of both formats, where it then ends up as two things in
> the same configuration file that get expressed in two different ways. As a
> result, you can't just know RainerScript or legacy syntax, you have to
> understand both if you want to read a configuration file. Even the sample
> configuration often used as a default doesn't use RainerScript, it uses the
> legacy syntax. https://github.com/rsyslog/rsyslog/blob/master/sample.conf

Yes, this is true. RainerScript is a recent addition because attempts to graft
more functionality into the old syslog syntax got so ugly that even the rsyslog
developers were having trouble reading configs and understanding what they do.

Initially there was talk about phasing out the old syntax, but to maintain
backwards compatibility (avoid breaking existing configs) we decided to maintain
support for both.

> You give a solid overview that matches how I view the legacy vs RainerScript
> situation... but also, while RainerScript is more verbose, it's incredibly
> confusing to mix and match several syntax together. It is not clear to me at
> all what's "recommended" anymore and rsyslog (both as a community and a
> product) itself seems unclear on the topic.

we have to support both to avoid breaking existing configs, the recommendation
is to use whichever is the clearest to the team maintining the config, but if
you need to use multiple lines to configure something in the legacy format, you
are probably better off using the new format.

> 2. Variables and their use are a mess.
>
> I'm still not sure how to express variables in RainerScript. For examples that are used in the documentation:
> * `property(name="$!usr!msgnum")`
> * `constant(outname="@version" value="1" format="jsonf")` (Actually isn't a variable at all)
> * `set $!usr!tpl2!dataflow = field($msg, 58, 2);`
> * `property(name="$!")`
> * `set $.tnow = $$now-unixtimestamp`
>
> Where am I supposed to look in the documentation to interpret these? There is
> some explanation
> [here](https://www.rsyslog.com/doc/master/rainerscript/variable_property_types.html).
> But notice that it's not comprehensive. It doesn't mention all of the formats
> above at all. I'm basically on my own for anything not documented for the
> examples above. I've ended up using `$.` for most everything since I don't
> have any idea why I'd used `$!` and I still to this day have no clue what `$$`
> means (the best I can figure is that the actual variable name is
> `$now-unixtimestamp` and it's just stuck like that). There's no mention on
> scoping (or lack thereof), there's no real mention on how to set your own
> variables, only that you can do it.
>
> 3. Templates are split into different formats.
> Similar to `1`, templates have several different ways to express themselves and it's not clear why you'd use one over the other. For the most part, I've just used the more expressive version with explicit `constant`, `property`, etc. in a list. There are a couple of instances where I couldn't figure out how to express that in a list so I did use string.

These are both the legacy of how things were added to rsyslog (along with the
implementation details), and can't be cleaned up without breaking backwards
compatibility. Yes, in retrospect it's bad and ugly and should have been done
differently back in the really early days, but we don't see a way to get out of
it. I can give you an explaniation of what is and why it got this way, I'd
appriciate any suggestions in how we can better document this (as I said before,
the people who wrote the documentation are too close to the code)

initially there were 'message properties' such as timestamp and hostname.
then system properties were added such as $myhostname
https://www.rsyslog.com/doc/v7-stable/configuration/properties.html

these were referenced in templates as
$template foo, "this uses a variable %timestamp% or %$myhostname%"
when rainerscript was added, they were referenced as $timestamp and $$myhostname
in an if statement.

RFC-5424 was written to standardize syslog formats better than the prior
RFC-3164, and it included an ability to add structured data to log messages.
Pretty much nobody used it. A few years later, the various logging projects got
together to try and define a standard for structuring logs in messages. The only
part of it that survived was the idea to encode messages as JSON in the body of
the message, and then have the logging systems parse the messages with ! as a
reserved character so:
{'a': 'foo', 'b': {'c': 'bar', 'd':'baz'}}
would let you use
$! (returning "{'a': 'foo', 'b': {'c': 'bar', 'd':'baz'}")
$!a (returning 'foo')
$!b (returning "{'c': 'bar', 'd':'baz'}"
$!b!c (returning 'bar')
This is when user definable variables were added to rsyslog (initially just as
the result of a message modification module parsing messages, but then the
set/unset statements were added allowing manipulation of variables in the
config)

I am responsible for us adding the $. namespace so that we could have a place to
put variables that we don't want to include when we refer to $!, this is things
like variables that you use for conditions, things you will use in file path
templates, etc. Other than the fact that parsing message modification modules
default to populating $!, there is no technical difference in how $! and $.
variables can be used, they are simply two different namespaces (sometimes $. is
referrred to as 'local' variables, reflecting the history of using it for
internal processing while $! is historically used for things that will end up in
an outbound message)

If you log a message using the RSYSLOG_DebugFormat you will see these variable
namespaces down at the bottom of the message block.

$\ was added at the same time as $. so that there is a way to set a variable
that will persist past the processing of a single message. These aren't used
much, and the cost of locking around making them reasonably reliable to use
makes them something to avoid if you can.

the simple template definition doesn't work well when complex escaping is
needed, thigns needed to be formatted into json structures, etc and so new ways
of defining a template were added. I'm not sure the new string format should
have been added (it's just more syntactical suger around the old way of defining
templates), but that was in the days when doing a break with the existing config
format was being considered.

personally, I almost always use the legacy format for template definitions.

Not doing a break with the old config ended up being a significant advantage, it
is what allowed the distros to switch from sysklogd (which wasn't being
maintained) to rsyslog with minimal disruption. If we had made that change
require the new syntax, I think odds are good that syslog-ng would have been
selected and rsyslog may have faded away (syslog-ng has now gone the freemium
route where you have to pay to get the full feature set)

the documentation for all of this was mostly written one page at a time as
things changed, grafting the pages into the existing documentation


Now that I have given you the 'what is' and the history behind it, do you have
suggestions for how we can update the documentation to better show and explain
this? The docs tend to be a very dry reference material structure, but it may be
that we need to give this history somewhere in there to explain the 'why' around
this.

And if you can suggest changes that we can make to make things more consistant,
please do (but keep in mind that for backwards compatibility, we aren't going to
be able to remove support for the existing stuff)


> 4. Text is displayed in a not-friendly manner.
> Some parts of the online documentation requires you scroll over a ridiculous amount to actually read it: https://i.imgur.com/Ujl289L.png

do you mean horizontal scrolling? we thought we ad fixed this

> 6. The index is too empty.

> Not sure what's up with the index but there's basically nothing in there. No
> reference to `global()`, `input()`, or various other keywords and terms that
> would be very useful. For example, if I want to see how the `contains`
> expression work, I'd imagine I could go to the index to find a page related to
> it.

good point, thanks
I have been tripped up myself looking for global() a time or two

> 7. There is no search function.
> The search function for the site doesn't appear to pertain to the
> documentation unless I'm misunderstanding. If I want to search for the
> expression `contains` or `global`, there's no way to do so. Even if I search
> for something very specific such as `RuleSetCreateMainQueue`, I get no useful
> results.

this is actually designed to be packaged and shipped with your distro. But I
agree that it would be good to add a specific search the docs capability (I
mostly use google and look for hits on rsyslog.com but I know enough of what I'm
looking for to find it)

I think it would also be fantastic if it was possible to get sponsorship for the
doc site and eliminate the advertising there (I don't know how much adiscon gets
from those ads, so I don't know how much sponsorship money would be needed to
eliminate them)

> For a practical example, let's say I see `$Ruleset RSYSLOG_DefaultRuleset` and
> I want to figure out what exactly that does. Where do I even begin? This
> *looks* like legacy but if I look over in [Legacy Configuration
> Directives](https://www.rsyslog.com/doc/master/configuration/index_directives.html),
> there's no mention of it. There's no mention of it on the [conversion
> page](https://www.rsyslog.com/doc/master/configuration/converting_to_new_format.html).
> I see documentation for rulesets over in [basic
> structure](https://www.rsyslog.com/doc/master/configuration/basic_structure.html)
> but still no mention of $Ruleset although it *does* mention
> RSYSLOG_DefaultRuleset. Search doesn't work so I can't do that. It's not
> listed in the index. At the bottom of the Table of Contents, there's a page
> named [Multiple Rulesets in
> rsyslog](https://www.rsyslog.com/doc/master/concepts/multi_ruleset.html) where
> it lists what it does and what that particular ruleset means but I have to
> know to look there.
>
> I think the example is on the ridiculous side because I think most people
> should be able to assume that $Ruleset just changes the current ruleset. But
> there are parts in the example that should have worked, such as search or
> index, that failed. `$Ruleset` *is* legacy syntax but there's nowhere it's
> listed as such. If you apply this to other things you might find in an older
> configuration like `$RuleSetCreateMainQueue`, each time you have to search
> through the documentation is a different path in the maze to finally get to
> where you need to be.

that's a good example, and it perfectly shows the problem we have. rulesets
weren't initially in rsyslog, when they were added the concepts page was written
to explain them, but the rest of the documenation wasn't significantly changed
(other than to add the 'call' capability and the ability to tie a ruleset to an
input), years later when the page on legacy statements was added, that one was
missed.

Rainer, is there a relatively easy way to search the code for legacy type
statements to make sure they are all documented on the legacy config page?

David Lang
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog/rsyslog-doc] documentation is hard to use and badly structured (#394) [ In reply to ]
Hi David,

documentation is definitely a sore point of rsyslog (and most other
opensource projects for that matter).
Writing good documentation is hard.
A notable exception I remember (even if its is obsolete by now), is
upstart which shipped its upstart cookbook.
The original documentation is unfortunately no longer available, only
a wayback snapshot
https://web.archive.org/web/20230322064449/https://upstart.ubuntu.com/cookbook/

I personally like this style of documentation.

Am Do., 2. Nov. 2023 um 12:41 Uhr schrieb David Lang via rsyslog
<rsyslog@lists.adiscon.com>:
>
> We have received complaints about rsyslog documentation repeatedly, We have
> a lot of detail, but it's all written for someone already fairly familiar
> with things.
>
> Here is a 3am first pass from me at writing an overview of how rsyslog works,
> with the idea that this could be made pretty with diagrams, click through links
> to more specific pages with detail, etc.
>
> I'm replying to the github issue to see if the user who complained about the
> documentation and RainerScript would find this more useful, but also to
> rsyslog-users to get feedback from others on this.
>
> some of the sections here should possibly be broken into sub-pages (some
> sub-pages already exist that cover some of these and can/should be simplified),
> or it make make sense to have a simple version on an overview page with the
> ability to click down for the gory details.
>
> David Lang
>
>
>
> Rsyslog architecture is very straightforward, but in it's simplicity it hides a lot of flexibility.
>
> Rsyslog has one or more inputs that each receive one or more messages and pass the batch of messages to a ruleset
>
> Each input runs the incoming log through a stack of possible parser modules
> until it hits one that reports success in parsing the log (pointer to parser
> module documentation and the default stack)
>
> Multiple inputs can feed to the same ruleset (by default, all inputs feed to
> the Default ruleset which uses the 'main' queue) [1]
>
> Worker threads pull batches of logs from a queue, then process the logs in the
> batch using the statements in a ruleset
>
> Conceptually, it really is that trivial. As always, looking at details makes it
> seem more complicated.
>
>
>
> Rsyslog config file(s)
>
> Rsyslog reads in the config file and all included files and combines them before
> evaluating anything (see -o option for how files are combined), which file a
> statement is in has no impact (other than as part of the ordering of
> statements).
> (insert link to Rainer's recent post on mis-use of config includes??)
>
> At startup time, Rsyslog evaluates the combined config and implements all module
> loading, input definition, template definitions and other global settings.
>
> All other statements get put into the default ruleset unless a ruleset is
> specified. None of these statements are evaluated (beyond syntax checking) at
> startup.
>
> The Rsyslog team believes very strongly in maintaining backwards compatibility
> (a config that works should never break or change behavior when rsyslog is
> updated to a new version) as such there are multiple ways of doing the same
> thing, and some ways are no longer recommended. When you see that something
> is depriciated, that means it is recommended not to use it in a new
> configuration for confusion/feature reasons, not that it is scheduled to go
> away/break in a new version.
>
> The config statments that existed prior to v6 of rsyslog were an evolution of
> the syslog format from the 90's, doing complex things by setting a bunch of
> values that then got used by later statements. By v5 of rsyslog, this was
> resulting in such complex interactions that even core developers were having
> trouble understanding what complex configs did. V6 introduced RainerScript,
> which deliberately requires you to specify all options rather than 'inheriting'
> them from prior statements. This can be significantly more verbose as it
> requires you to specify all options each time, but makes it much clearer exactly
> what is happening. There are times when the old syntax is shorter and more
> obvious to use than the new syntax, and in those cases, it's recommended to use
> the old syntax. But if the old syntax requires multiple lines to do something,
> you are probably better off using the new syntax.
>
>
>
> Rulesets are the heart of log processing, defining what happens with each log
> message. The statements in a ruleset are evaluated for every log message as it
> is processed.
>
> Rulesets and Actions can have a queue defined for them (insert link to queue
> turn lane post, possibnly with updates). The 'default' ruleset uses the 'main'
> queue.
>
> The contents of a ruleset are a series of statements, which can be:
> 1. call an action to use an output module
> 1a. legacy formats:
> /var/log/messages (write to a file)
> @1.2.3.4 (send to a remote system via UDP
> @@1.2.3.4 (send to a remote system via TCP)
> 1b. action() format
> 2. set/clear variables (link to functions)
> 3. call a message modification module (can modify the log message being processed and set variables) commonly used to parse messages
> 4. call another ruleset and then retur
> 5. statement block
> { statement statement }
> usually used after a filter to have the filter apply to multiple statements
> & <statememt>
> apply the last filter to this statement [2]
> 6. stop processing this log message
> 6a. ~ [2]
> 6b. stop
> ignore all following statements in the ruleset. Rsyslog will warn you if you have statements after an unconditional stop
> 7. apply a filter
> 7a. legacy syslog facility.serverity filters
> i.e. mail.info /var/log/mail
> 7b. rsyslog property based filters [2]
> i.e. :msg, contains, "foo" <statemnent>
> 7c. expression based filters (if..then..else with continue) (link to
> functions and conditionals)
> i.d. if $msg contains("foo") then <statement>
> 8. atomic stats update (see impstats module)
> 9. foreach execute a block of statements on each value in a variable array
>
>
> Variable types:
> built-in/legacy properties start with $ or $$ (link to property page)
> user modifyable variables exist as a tree internally represented as a json structure. There are three trees that can be used:
> "normal variables" start with $!
> "local variables" start with $. (exist so that you can include all $! variables in a template without including everything)
> "global variables" start with $\ (persist past the log message where they were set, performance pigs)
>
> Templates are used by output modules. They are used to create larger strings that use variable values for use with the module. These allow you to change the format of the output, what file or database table the log gets written to and other similar things. The details of what the result of the template means varies from one module to another. There is a common misconception that a template can be used to match and parse a log message. Templates are output only.
>
>
>
> [1] it may make sense to have a 'are you sure' message at startup about inputs
> that feed to rulesets that don't have a queue defined for the ruleset. I don't
> think a new warning would be a breaking change
>
> [2] supported for backwards compatibility, use is discouraged
>
> (note, there should possibly be two versions of this, one showing the
> straighforward, single-message-at-a-time process, and a second one that shows
> the advanced, batch supporting, version that includes showing where locking
> happens, the atomic stats and foreach would be in the advanced version)
>
>
> On Wed, 1 Nov 2023, computerquip-work wrote:
>
> > The documentation is so poor that it's almost unusable. Using RainerScript is hands down the most painful thing I've ever used in software.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.