Mailing List Archive: Running hashes

Running hashes

Aug 23, 2008, 12:45 AM

Post #1 of 3 (785 views)

Acting on an old comment from a friend, I decided to look into what
it'd take to get rsyslog to perform running hashes of logs.
Conceptually, it's pretty simple - every Nth message inject one
message containing the hash of the previous N messages (including the
previous hash message). It also gave me an excuse to start digging
into the rsyslog code.

At first I thought I could do it with a property replacer, but that
seems a wash since those are wholly message-based and don't [seem to]
give the opportunity to store information (even a running hash) of
prior messages. A plugin was my next hope, but there doesn't seem to
be a good mechanism to pipeline those together - AFAICT they're
expected to be single ingress/egress points, with no interstitial
stages. I see the code for loading other objects as Rainer mentioned
in April, but that seems more for central functionality than for
chaining modules together.

This all brings me back to one of my original questions for rsyslog -
is module chaining something that is even on your radar? I'm thinking
normalization, hashing, encryption, etc. Almost feels like there
should be another layer here, maybe a "mangle" plugin interface that
could stack in after im* and before om*?

RB

Running hashes [ In reply to ]

rgerhards at hq

Sep 1, 2008, 7:56 AM

Post #2 of 3 (765 views)

Permalink

On Sat, 2008-08-23 at 01:45 -0600, RB wrote:
> Acting on an old comment from a friend, I decided to look into what
> it'd take to get rsyslog to perform running hashes of logs.
> Conceptually, it's pretty simple - every Nth message inject one
> message containing the hash of the previous N messages (including the
> previous hash message).

Yes - this is the way IETF's upcoming syslog-sign is using hashes. It
may be tempting to use that mode for the logs, too...

> It also gave me an excuse to start digging
> into the rsyslog code.
>
> At first I thought I could do it with a property replacer, but that
> seems a wash since those are wholly message-based and don't [seem to]
> give the opportunity to store information (even a running hash) of
> prior messages.

That's right and that's by design. The property replacer is a one-way
road.

> A plugin was my next hope, but there doesn't seem to
> be a good mechanism to pipeline those together - AFAICT they're
> expected to be single ingress/egress points, with no interstitial
> stages.

That's definitely true for input and output plugins. They, by very
definition, are at either end of the processing flow.

> I see the code for loading other objects as Rainer mentioned
> in April, but that seems more for central functionality than for
> chaining modules together.

You are absolutely correct. These objects are (mostly) for internal
things, so to save us static binding - and the ability to use different
drivers for different needs, e.g. for different network transport
protocols (and maybe later for different, even-smaller and more
abstracted, database drivers).

> This all brings me back to one of my original questions for rsyslog -
> is module chaining something that is even on your radar? I'm thinking
> normalization, hashing, encryption, etc. Almost feels like there
> should be another layer here, maybe a "mangle" plugin interface that
> could stack in after im* and before om*?

You are definitely on the right route. If you look at the plugin
definitions, there is another class, not yet implemented, of plugins:
they are called "filter plugins". They will have the ability to take a
message, modify it (or inject a new one based on previous messages) and
so on - at least this is how it was designed roughly a year ago.

With the arrival of the scripting language (in February, if I correctly
remember), new players have entered the field: library plugins that
expose function calls to the script engine. So far, they do NOT yet
exist, but they are pretty well thought out. They are probably the first
thing I will implement when I start bringing the script engine to its
full power. These functions may also be a good place to provide a
"mangling" interface. I am still a bit undecided, maybe I will not 100%
design this, but let the code evolve (sometimes, I think, it is good to
look at code written and see if that "extra bit" just naturally fits
in...).

A bit problematic is the order of events (and message modification) when
the message object is run asynchronously through multiple action queues.
For each queue, the message is forked, so it requires careful design
when to modify which of the forked messages. This can be very powerful,
but also quite complex to configure (where the scripting language
hopefully comes in handy).

Another, internal, thing is message synchronization. So far, property
creation and modification never requires sync objects (like mutexes),
because the always happen in a single thread (for ONE specific message,
of course). With mangling capabilities, things change considerably,
property set methods than need sync capability. This must be implemented
in a way that does not hurt the (for more frequent) case of message
object creation.

Of course, all of this is doable and the base design is there :) I just
wanted to provide an idea why it may take a little while to implement.
Comments and suggestions are most welcome, especially as this is among
the next things on my todo list (if priorities don't change).

Rainer
>
>
> RB
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog

Running hashes [ In reply to ]

rgerhards at hq

Sep 1, 2008, 8:00 AM

Post #3 of 3 (776 views)

Permalink

I forgot one thing. If you did not already find it, have a look at this:

http://www.rsyslog.com/doc-generic_design.html

The filter/function plugins are conceptually on the "PLG Ext" et al
layer in this chart. This is not specifically rsyslog design, but it
comes from the same source and conveys the same basic idea ;) [I
initially did that description for some generic IETF work].

Maybe it is useful.

Rainer

> -----Original Message-----
> From: rsyslog-bounces at lists.adiscon.com [mailto:rsyslog-
> bounces at lists.adiscon.com] On Behalf Of Rainer Gerhards
> Sent: Monday, September 01, 2008 4:57 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] Running hashes
>
> On Sat, 2008-08-23 at 01:45 -0600, RB wrote:
> > Acting on an old comment from a friend, I decided to look into what
> > it'd take to get rsyslog to perform running hashes of logs.
> > Conceptually, it's pretty simple - every Nth message inject one
> > message containing the hash of the previous N messages (including
the
> > previous hash message).
>
> Yes - this is the way IETF's upcoming syslog-sign is using hashes. It
> may be tempting to use that mode for the logs, too...
>
> > It also gave me an excuse to start digging
> > into the rsyslog code.
> >
> > At first I thought I could do it with a property replacer, but that
> > seems a wash since those are wholly message-based and don't [seem
to]
> > give the opportunity to store information (even a running hash) of
> > prior messages.
>
> That's right and that's by design. The property replacer is a one-way
> road.
>
> > A plugin was my next hope, but there doesn't seem to
> > be a good mechanism to pipeline those together - AFAICT they're
> > expected to be single ingress/egress points, with no interstitial
> > stages.
>
> That's definitely true for input and output plugins. They, by very
> definition, are at either end of the processing flow.
>
> > I see the code for loading other objects as Rainer mentioned
> > in April, but that seems more for central functionality than for
> > chaining modules together.
>
> You are absolutely correct. These objects are (mostly) for internal
> things, so to save us static binding - and the ability to use
different
> drivers for different needs, e.g. for different network transport
> protocols (and maybe later for different, even-smaller and more
> abstracted, database drivers).
>
> > This all brings me back to one of my original questions for rsyslog
-
> > is module chaining something that is even on your radar? I'm
> thinking
> > normalization, hashing, encryption, etc. Almost feels like there
> > should be another layer here, maybe a "mangle" plugin interface that
> > could stack in after im* and before om*?
>
> You are definitely on the right route. If you look at the plugin
> definitions, there is another class, not yet implemented, of plugins:
> they are called "filter plugins". They will have the ability to take a
> message, modify it (or inject a new one based on previous messages)
and
> so on - at least this is how it was designed roughly a year ago.
>
> With the arrival of the scripting language (in February, if I
correctly
> remember), new players have entered the field: library plugins that
> expose function calls to the script engine. So far, they do NOT yet
> exist, but they are pretty well thought out. They are probably the
> first
> thing I will implement when I start bringing the script engine to its
> full power. These functions may also be a good place to provide a
> "mangling" interface. I am still a bit undecided, maybe I will not
100%
> design this, but let the code evolve (sometimes, I think, it is good
to
> look at code written and see if that "extra bit" just naturally fits
> in...).
>
> A bit problematic is the order of events (and message modification)
> when
> the message object is run asynchronously through multiple action
> queues.
> For each queue, the message is forked, so it requires careful design
> when to modify which of the forked messages. This can be very
powerful,
> but also quite complex to configure (where the scripting language
> hopefully comes in handy).
>
> Another, internal, thing is message synchronization. So far, property
> creation and modification never requires sync objects (like mutexes),
> because the always happen in a single thread (for ONE specific
message,
> of course). With mangling capabilities, things change considerably,
> property set methods than need sync capability. This must be
> implemented
> in a way that does not hurt the (for more frequent) case of message
> object creation.
>
> Of course, all of this is doable and the base design is there :) I
just
> wanted to provide an idea why it may take a little while to implement.
> Comments and suggestions are most welcome, especially as this is among
> the next things on my todo list (if priorities don't change).
>
> Rainer
> >
> >
> > RB
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog