Mailing List Archive

Re: RFC: On rsyslog output modules and support for batchoperations
> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of Luis Fernando Muñoz Mejías
> Sent: Wednesday, April 01, 2009 6:02 PM
> To: rsyslog-users
> Subject: [rsyslog] RFC: On rsyslog output modules and support for
> batchoperations
>
> Hello, world.
>
> I discussed this in private with Rainer, and he suggested me to bring
> the discussion here.
>
> I'm already developing an output module for feeding an Oracle database
> with rsyslog input. Rainer already committed some patches to the
> "oracle" branch, in git. Let me remember that this is highly
> experimental, and I'm sending a big semantic change today. But, in
> principle, the module does what you'd expect from it: it connects to a
> DB, receives a SQL statement via doAction, prepares that statement,
> runs
> it, commits.

If I didn't screw up, everything should be committed now.

> It works, but it's way too slow for my needs. As I said when I started
> this project, I need to be very fast, to prepare the statement at
> connection time, run it many times, and definitely want batch
> operations. Say, I want to insert 1000 entries with a single call to
> the
> Oracle interface, then commit.
>
> With what I know now of rsyslog, I can do it more or less like this:
>
> $OmoracleStatementTemplate,"insert into foo(field1, field2, field3)
> values(:val1, :val2, :val3)"
>
> which is the statement to prepare by Oracle. This way, I can prepare
> the
> statement at createInstance() time. Then, I can specify the batch size
> with something like
>
> $OmoracleBatchSize 1000
>
> With this, also at createInstance() time I can specify that doAction is
> called only if there are 1000 entries pending for this selector, like
> this:
>
> CODE_STD_STRING_REQUESTparseSelectorAct(batch_size);
>
> The bad part is that rsyslog will deliver to the output module a single
> string per entry. So, I'd have to split each entry into its fields as
> part of the doAction() code. I'd need some funny separator for each
> field, to avoid problems. So far, it can be done. But the configuration
> would look like this:
>
> $OmoracleDB logdb
> $OmoracleDBUser dbuser
> $OmoracleDBPassword dbpassword
> $OmoracleStatement "insert into foo(col1, col2) values (:fied1,
> :field2)"
> $OmoracleBatchSize 1000
> $OmoracleFieldSeparator ****
>
> *.* :omoracle:;"%field1%****%field2%"
>
> and make doAction split the fields appropriately.

There are a couple of subtleties, but I think it can work. In essence, you
need a template that feeds into the values via ($template!) and also a config
string for the prepared statement. It's actually not even that hard to do. It
may be useful (and of course doable) to enable the property replace to escape
special characters, so that, for example, we could use CSV and replace commas
by two of them.
>
> I bet it works. But it's probably too ugly for users. Cleaner ways may
> need deeper changes into rsyslog's API so that the module gets direct
> access to each field. That's probably a lot of work and I can't wait
> for
> that.

I need to check if there are actually larger changes required. The main
reason for this interface initially was security (do not pass to the module
the full object). Assuming that I have the object available at the time of
the plugin call, I could use a different entry point to pass that data in. If
so, that would not be too much effort. Security concerns could be (somewhat)
addressed by a config statement which enables such object access for the next
action, so one could specifically grant that privilege.

What is the overall opinion on this list? Should we look further into that
direction?

Rainer
>
> So, my questions (at last!): Are there any other alternatives? Is this
> "ugly" way of working good for other users? Should I keep it for
> internal use?
>
> Thanks a lot.
> --
> Luis Fernando Muñoz Mejías
> Luis.Fernando.Munoz.Mejias@cern.ch
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: RFC: On rsyslog output modules and support for batchoperations [ In reply to ]
> > I'm already developing an output module for feeding an Oracle
> > database with rsyslog input. [...] But, in principle, the module
> > does what you'd expect from it: [...]
>
> If I didn't screw up, everything should be committed now.

I just checked. It is. I've also tested the changes you applied and work
perfectly. Thanks a lot for the reviews!! :)

> There are a couple of subtleties, but I think it can work. In essence,
> you need a template that feeds into the values via ($template!) and
> also a config string for the prepared statement. It's actually not
> even that hard to do. It may be useful (and of course doable) to
> enable the property replace to escape special characters, so that, for
> example, we could use CSV and replace commas by two of them.

Making properties in CSV format is indeed a good idea.

> > I bet it works. But it's probably too ugly for users. Cleaner ways
> > may need deeper changes into rsyslog's API so that the module gets
> > direct access to each field. That's probably a lot of work and I
> > can't wait for that.

> I need to check if there are actually larger changes required. The
> main reason for this interface initially was security (do not pass to
> the module the full object).

It's a good reason. If it's easy to generate and pass a deep copy of the
object (and it's not a performance killer, it shouldn't), we can discuss
it. Otherwise, I don't think this is worth the effort.

> Assuming that I have the object available
> at the time of the plugin call, I could use a different entry point to
> pass that data in. If so, that would not be too much effort. Security
> concerns could be (somewhat) addressed by a config statement which
> enables such object access for the next action, so one could
> specifically grant that privilege.

I'm not quite sure about this: if two entries request direct access to
the same object, one is buggy and modifies it, then the second one can
suffer unpredictable consequences. I think it's better to pass a deep
copy, free it once the module call returns, and do it only for modules
that actually need that new entry point. If such deep copies are
expensive, then we are just fine the way we are now.

Cheers.


--
Luis Fernando Muñoz Mejías
Luis.Fernando.Munoz.Mejias@cern.ch
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: RFC: On rsyslog output modules and support for batchoperations [ In reply to ]
Just a partial response (but quick ;))

> I confess to being a bit confused as to why the existing output module
> interface wasn't readily extending to batching,

That would be the real solution (and David Lang suggested it long ago). The
only problem is it takes quite some effort, as we need to make sure we do not
lose messages along that way. It is still on my agenda, but without a sponsor
I fear it'll stay there for quite a while. In most cases, rsyslog is simply
too fast to see a bottleneck.

> since I've tended to
> see the output modules as more of thin, final-hop proxies. IMHO,
> database output modules should still pretty much blindly execute
> whatever SQL rsyslog hands them, be that wrapped in a transaction or
> not.
>
> That said (and more a question for Rainer), do rsyslog templates have
> support for a null character? If so, it may be a more viable approach
> for delimiting simple fields than changing the output module API. Of
> course the CSV approach works too, but seems easier to break out of
> than null-delimiting.

Nope, also on the agenda. Here sysklogd legacy bites. All C-strings
internally, so while not complex, a *lot* of work is required to change that
(basically all string operations must be touched, and that in a program that
mostly does string operations...).

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: RFC: On rsyslog output modules and support for batchoperations [ In reply to ]
> > context. So I'm looking here for the balance between rsyslog doing
> work
> > for me and rsyslog performing as good as I need it. Perhaps exposing
> the
> > structures is not a good idea, either.
>
> Perhaps you could [ab]use the fact that ppString is an array and do
> something like ommail does, using more than one string/template when
> using a custom subject. What I don't know off the top of my head is
> whether this would limit the number of different Oracle outputs you
> could connect to.

The problem is that this number is expected to be fixed at compile time. I
think it is possible that it is dynamically changed upon action creation, but
it is a very "creative" use of this facility and I am not sure how well it
will work.

It may be worth considering a linked list of strings to pass, but on the
other hand CSV parsing should involve not much overhead (just it is not as
nice as a generic solution...).

Raienr
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: RFC: On rsyslog output modules and support for batchoperations [ In reply to ]
On Wed, 1 Apr 2009, Rainer Gerhards wrote:

>> -----Original Message-----
>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>> bounces@lists.adiscon.com] On Behalf Of Luis Fernando Mu?oz Mej?as
>>
>> I bet it works. But it's probably too ugly for users. Cleaner ways may
>> need deeper changes into rsyslog's API so that the module gets direct
>> access to each field. That's probably a lot of work and I can't wait
>> for
>> that.
>
> I need to check if there are actually larger changes required. The main
> reason for this interface initially was security (do not pass to the module
> the full object).

given that rsyslog is multi-threaded, not multi-process, any thread can
get at the memory of any other thread. this significantly limits the
amount of security that you can get by not passing a direct pointer to the
full object.

while I am a security person (it's my full time job), I'm not sure that
it's worth it to limit the official module interface like this.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: RFC: On rsyslog output modules and support for batchoperations [ In reply to ]
On Thu, 2 Apr 2009, Rainer Gerhards wrote:

> Just a partial response (but quick ;))
>
>> I confess to being a bit confused as to why the existing output module
>> interface wasn't readily extending to batching,
>
> That would be the real solution (and David Lang suggested it long ago). The
> only problem is it takes quite some effort, as we need to make sure we do not
> lose messages along that way.

for those who are interested, what I proposed was to shift completely away
from the idea that the output module processes a fixed number of records,
and instead have a loop something like the following.

while (events)
if (# events > N)
grab first N events
else
grab all events
create sql string
insert to database
mark the events grabbed as written


with the create sql string being something like the following perlish code
$sql=$header.join($mid,@events).$footer;

so you could say
$header='insert into table logs values ('
$mid = '),('
$footer=);

and if you pass it three events you get
insert into table logs values (msg1),(msg2),(msg3);

five values you would get
insert into table logs values (msg1),(msg2),(msg3),(msg4),(msg5);

I was not concerned about the command parsing time, due to the fact that
if it takes a little longer, it just means that there are more events in
the queue for the next pass to handle. there could reach a point where you
have so many events that it matters, but since this process could easily
insert hundreds or thousands of messages in one statement the overhead is
pretty low


> It is still on my agenda, but without a sponsor I fear it'll stay there
> for quite a while.

still hoping

> In most cases, rsyslog is simply too fast to see a bottleneck.

true

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com