Mailing List Archive

Log file and databases
I've been working my way up to some changes that are badly needed
for my site regarding logging. I would like to throw out a few
thoughts and get some feedback regarding the direction that the
group would like to go with logging in general.

Thoughts....

CLF is too restrictive (we have said this before)

It would be nice to be able to log *all* of the server
request info into a database format.

I have implemented a Perl-CGI approach and can log the CGI
environment to dbm logfiles. However this requires the
index.cgi approach in every directory.

Questions....

Some new members of this group mentioned that they were working
on Sybase interface? Did this include logging?

With some recent discussion regarding logging and security issues,
would it not make sense to let the parent process running as root
handle all of the logging? (non-forker)

Any comments on the system load that could be generated by the
above mentioned index.cgi approach?

Related to the Perl-CGI approach, there is an interesting API
developing that can spawn "Minisvr" processes to provide some
statefulness to the session. Comments?

There are several public domain databases out there that would
be relatively easy to add support for. ie Msql, Postgres96, DBM.
Any comments on preferences?

Should the logger be a separate program?


I am at the crossroads on this, and there are many options.
Comments welcome.

-Randy
Re: Log file and databases [ In reply to ]
On Mon, 8 May 1995, Randy Terbush wrote:

> Questions....
>
> Some new members of this group mentioned that they were working
> on Sybase interface? Did this include logging?

We currently have all authentication working via virtual passwd and group
files in a Sybase database, but have not pushed logging to the database
*yet*. Our final round of testing is going on right now on the Apache
server with Sybase'd authentication, and after that is deemed stable we
will begin work on other goodies like logging directly to the database.

The one big bonus that we saw (aside from excellent response time, since
we will have several thousand accounts shortly) with the Sybase
authentication was security. We have the database on the other side of
an Eagle Raptor firewall, and the www daemon makes queries through it for
authentication.

-r
Re: Log file and databases [ In reply to ]
> I have implemented a Perl-CGI approach and can log the CGI
> environment to dbm logfiles. However this requires the
> index.cgi approach in every directory.
>
> Hmmm... if external Perl code is involved, I'm not sure how requests
> which don't go through some external CGI script get logged.

Exactly, which is why the overhead could be severe. The Perl "Minisvr"
app mentioned below would be a way to have a "small" server process
following the session and logging requests, etc.. Here again, not
cheap.


> With some recent discussion regarding logging and security issues,
> would it not make sense to let the parent process running as root
> handle all of the logging? (non-forker)
>
> This would be a potential bottleneck. (Besides which, one nice
> feature of the current non-forking code is that the parent process,
> which runs as root, is not involved in handling transactions at all;
> this makes it simply impossible for skillfully constructed requests to
> abuse its privileges).

Naive question warning:
Would it not be possible for the children to open a socket back to the
parent for logging transactions?

It has been mentioned before that the parent doesn't really have much
to do asside from handing jobs to the children.

IF this could be done, this would possibly solve some of the current
security issues. It would be relativly easy to write the logging
modules for different database formats.


> Related to the Perl-CGI approach, there is an interesting API
> developing that can spawn "Minisvr" processes to provide some
> statefulness to the session. Comments?
>
> Don't know about this. References?

This is a Perl5 dynamic module. There has been some reference
to it on Roy's cgi-lib mailing list. I understand that the current
source can be found at http://web.nexor.co.uk/contrib/perl/.
Nexor is unreachable so I cannot verify that.

> Should the logger be a separate program?
>
> I would prefer for it not to be --- it's more efficient to have the
> processes serving requests write whatever they would have written to
> the "separate logger" to a flat file instead, and to process the
> contents of that file off-line.

Which might be the argument for feeding the data to the database
through a named pipe or something.
Re: Log file and databases [ In reply to ]
Date: Mon, 08 May 1995 10:08:02 -0500
From: Randy Terbush <randy@dsndata.com>
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

CLF is too restrictive (we have said this before)

It would be nice to be able to log *all* of the server
request info into a database format.

I have implemented a Perl-CGI approach and can log the CGI
environment to dbm logfiles. However this requires the
index.cgi approach in every directory.

Hmmm... if external Perl code is involved, I'm not sure how requests
which don't go through some external CGI script get logged.

With some recent discussion regarding logging and security issues,
would it not make sense to let the parent process running as root
handle all of the logging? (non-forker)

This would be a potential bottleneck. (Besides which, one nice
feature of the current non-forking code is that the parent process,
which runs as root, is not involved in handling transactions at all;
this makes it simply impossible for skillfully constructed requests to
abuse its privileges).

Any comments on the system load that could be generated by the
above mentioned index.cgi approach?

If every transaction involves a CGI hit, it's *quite* severe,
involving a fork *and exec* on each transaction.

Related to the Perl-CGI approach, there is an interesting API
developing that can spawn "Minisvr" processes to provide some
statefulness to the session. Comments?

Don't know about this. References?

Should the logger be a separate program?

I would prefer for it not to be --- it's more efficient to have the
processes serving requests write whatever they would have written to
the "separate logger" to a flat file instead, and to process the
contents of that file off-line.

rst
Re: Log file and databases [ In reply to ]
Date: Mon, 08 May 1995 10:08:02 -0500
From: Randy Terbush <randy@dsndata.com>
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

CLF is too restrictive (we have said this before)

It would be nice to be able to log *all* of the server
request info into a database format.

I have implemented a Perl-CGI approach and can log the CGI
environment to dbm logfiles. However this requires the
index.cgi approach in every directory.

Hmmm... if external Perl code is involved, I'm not sure how requests
which don't go through some external CGI script get logged.

With some recent discussion regarding logging and security issues,
would it not make sense to let the parent process running as root
handle all of the logging? (non-forker)

This would be a potential bottleneck. (Besides which, one nice
feature of the current non-forking code is that the parent process,
which runs as root, is not involved in handling transactions at all;
this makes it simply impossible for skillfully constructed requests to
abuse its privileges).

Any comments on the system load that could be generated by the
above mentioned index.cgi approach?

If every transaction involves a CGI hit, it's *quite* severe,
involving a fork *and exec* on each transaction.

Related to the Perl-CGI approach, there is an interesting API
developing that can spawn "Minisvr" processes to provide some
statefulness to the session. Comments?

Don't know about this. References?

Should the logger be a separate program?

I would prefer for it not to be --- it's more efficient to have the
processes serving requests write whatever they would have written to
the "separate logger" to a flat file instead, and to process the
contents of that file off-line.

rst
Re: Log file and databases [ In reply to ]
> Date: Mon, 08 May 1995 14:53:19 -0500
> From: Randy Terbush <randy@dsndata.com>
>
> Naive question warning:
> Would it not be possible for the children to open a socket back to the
> parent for logging transactions?
>
> You don't even need that much --- they could use pipes (as in the NCSA
> 1.4 code, in which children use pipes back to the parent process to
> let it know when they're ready for another transaction).

This requires IPC support?


> IF this could be done, this would possibly solve some of the current
> security issues. It would be relativly easy to write the logging
> modules for different database formats.
>
> It's certainly true that you can't seek a pipe. Using a named pipe as
> the logfile, as you suggest below, might be a good way to prototype
> something like this --- you don't have to hack the server code at all.
> Alternatively, I can imagine config file entries like
>
> TransferLog "| /etc/xfer_log_maint -mode count -db /var/logs/xfer.dbm"
>
> Of course, this begs the question of exactly *what* gets sent to this
> process over the pipe. CLF entries are a start for the xfer_log, but
> they aren't completely satisfactory to anybody, and the error_log is
> at this point completely unstructured.

I played with this a bit more last night and this is the direction I am
migrating.

It seems that we could leave the basic CLF code unaltered in the server,
and change the xfer_log, etc. to a named pipe. This *should* essentially
be the same as writing to a flat file in terms of server load and allows
some sites to continue using the existing methods.

Different modules could then be written to read from these pipes and do
the right thing. I envision the following additions to the logging
code to allow the flexibility that we wish for.

An entry in httpd.conf:

<Logfile logs/access_log.pipe>
USER_AGENT
DATE "%T %d %Y"
BYTES_SENT
URL
REFERRER
</Logfile>

<Logfile logs/error_log.pipe>
REFFERER
ERROR_*
DATE "%T %d %Y"
URL
</Logfile>

....

These variables are sketchy, I know... but you get the picture.

The logging modules could parse the .conf file and go nuts.
Re: Log file and databases [ In reply to ]
rst writes...

> But it may
> well be just as easy for the parent process to fork off yet *another*
> child at startup time solely to handle the logging, and to lay in the
> pipe-work for that.

Hmm, I remember suggesting this way back when we started this project...
it wasn't a popular idea then, but I still like it.

> (*Most* of the code probably wouldn't have to
> know about either of these arrangements --- the existing xfer_log and
> error_log handles could just be arranged to point at the pipes).
>
> The danger, either way, is that if the logging process doesn't get
> scheduled often enough, these pipes could back up, resulting in
> unnecessary delays before a child process is back in service. We
> already know this is a problem at the head end (accepting connections)
> --- it *might* be a problem at the back end as well.

This isn't such a big danger, the children can write to the parent
using a non-blocking local socket.. they require absolutely no
acknowledgement (unlike say the NCSA 1.4 accept scheduling system and
even that can keep up with a heavy workload)

> Having a separate logging process also raises at least one other
> fairly knotty issue --- how to handle cases where the logging process
> dies an untimely death perhaps through no fault of its own.

How many of us have seen a parent die in this way ? - anyone ?
I think this is an unimportant point.


Can we revive the discussion about the CLF replacement that
we breifly talked about before..

Have something like

$ENV_VAR [%h:%m:%s] $BLAH $FOO


One objection to this idea was that it'd slow things down.. well
if we have logging process, it could cache the translation of these
strings (yes I'd like to see it configurable on a dirrctory basis),
e.g
$REMOTE_USER [%h:%m:%s] $ORIGINAL_URL $STATUS $BYTES
could be cached as

"%s [%s] %s %s %s", remote_user, strftime(..etc), original_url, status, bytes

This is all thinking in real-time, so it's not meant to be anything
other than a catalyst to better ideas.

As for the load this would introduce... well, on the scale of things,
log formatting is pretty cheap, and a non-forking Apache will keep the
load very low anyway.

Let's at least prototype something along these lines.

--
Rob Hartill
http://nqcd.lanl.gov/~hartill/
Re: Log file and databases [ In reply to ]
Date: Mon, 08 May 1995 14:53:19 -0500
From: Randy Terbush <randy@dsndata.com>

Naive question warning:
Would it not be possible for the children to open a socket back to the
parent for logging transactions?

You don't even need that much --- they could use pipes (as in the NCSA
1.4 code, in which children use pipes back to the parent process to
let it know when they're ready for another transaction). But it may
well be just as easy for the parent process to fork off yet *another*
child at startup time solely to handle the logging, and to lay in the
pipe-work for that. (*Most* of the code probably wouldn't have to
know about either of these arrangements --- the existing xfer_log and
error_log handles could just be arranged to point at the pipes).

The danger, either way, is that if the logging process doesn't get
scheduled often enough, these pipes could back up, resulting in
unnecessary delays before a child process is back in service. We
already know this is a problem at the head end (accepting connections)
--- it *might* be a problem at the back end as well.

Having a separate logging process also raises at least one other
fairly knotty issue --- how to handle cases where the logging process
dies an untimely death, perhaps through no fault of its own. (There
are some Unices which simply zing a process at random when resource
contention gets severe).

IF this could be done, this would possibly solve some of the current
security issues. It would be relativly easy to write the logging
modules for different database formats.

It's certainly true that you can't seek a pipe. Using a named pipe as
the logfile, as you suggest below, might be a good way to prototype
something like this --- you don't have to hack the server code at all.
Alternatively, I can imagine config file entries like

TransferLog "| /etc/xfer_log_maint -mode count -db /var/logs/xfer.dbm"

Of course, this begs the question of exactly *what* gets sent to this
process over the pipe. CLF entries are a start for the xfer_log, but
they aren't completely satisfactory to anybody, and the error_log is
at this point completely unstructured.

This is a Perl5 dynamic module. There has been some reference
to it on Roy's cgi-lib mailing list. I understand that the current
source can be found at http://web.nexor.co.uk/contrib/perl/.
Nexor is unreachable so I cannot verify that.

Thanks, I'll look it up.

rst
Re: Log file and databases [ In reply to ]
Date: Mon, 08 May 1995 14:53:19 -0500
From: Randy Terbush <randy@dsndata.com>

This is a Perl5 dynamic module. There has been some reference
to it on Roy's cgi-lib mailing list. I understand that the current
source can be found at http://web.nexor.co.uk/contrib/perl/.
Nexor is unreachable so I cannot verify that.

Hmmm... Nexor is reachable from here, but the above URL gives me a
"file not found"...

rst
Re: Log file and databases [ In reply to ]
Date: Tue, 09 May 1995 08:58:26 -0500
From: Randy Terbush <randy@dsndata.com>
Precedence: bulk
Reply-To: new-httpd@hyperreal.com


> Date: Mon, 08 May 1995 14:53:19 -0500
> From: Randy Terbush <randy@dsndata.com>
>
> Naive question warning:
> Would it not be possible for the children to open a socket back to the
> parent for logging transactions?
>
> You don't even need that much --- they could use pipes (as in the NCSA
> 1.4 code, in which children use pipes back to the parent process to
> let it know when they're ready for another transaction).

This requires IPC support?

Not for the logging application --- plain 'ol Unix pipes work fine for
that. Where NCSA 1.4 gets into real arcana is that it requires the
ability to transfer a *file descriptor* from one process to another
--- a feature which is present in most Unix variants these days, but
not all, and to which there is no single standard interface which can
be relied on to work...

rst
Re: Log file and databases [ In reply to ]
> Date: Mon, 08 May 1995 14:53:19 -0500
> From: Randy Terbush <randy@dsndata.com>
>
> This is a Perl5 dynamic module. There has been some reference
> to it on Roy's cgi-lib mailing list. I understand that the current
> source can be found at http://web.nexor.co.uk/contrib/perl/.
> Nexor is unreachable so I cannot verify that.
>
> Hmmm... Nexor is reachable from here, but the above URL gives me a
> "file not found"...
>
> rst

The mailing list archive for the modules I mentioned is at:

http://www.webstorm.com/local/cgi-perl/

I can't clarify that nexor archive location at this time.
Re: Log file and databases [ In reply to ]
> Date: Tue, 09 May 1995 19:47:53 -0500
> From: Randy Terbush <randy@zyzzyva.com>
>
> The mailing list archive for the modules I mentioned is at:
>
> http://www.webstorm.com/local/cgi-perl/
>
> I can't clarify that nexor archive location at this time.
>
> OK... I looked over the messages in that archive related to the
> "Mini-server", and it looked like it was oriented more towards
> management of connection state than at logging per se.
>
> Randy, since you're following this particular discussion more closely
> than I am, do you know much about how it relates to some of the other
> approaches to connection state that are floating around (Netscape
> "cookies", server-side java applets implementing private protocols,
> etc.)?

Can't really comment. I have been reading some of Brian's comments
regarding this and just stumbled upon the Mini-server. My first
thought was that it could be used for logging instead of the other
separate logging process that has been mentioned. LOT's of logging
processes! NOT! It does appear that this site (webstorm.com)
has some sort of stateful connection setup. There are some
interesting things being discussed in the above mentioned group.

I'm at the point of figuring out how to use some of the tools
and figure out what some people are doing with them.

As for the logging question, I have a perl client listening on
the named pipe approach. The more I play with this, the more
I like it. This outside process could handle all of the splitting
into various logfiles, reporting to databases, etc.. From
the simplest approach, we can change the logging in Apache
all we want. The module can massage it back into CLF if that's
what people want.

This has lot's of possibilities. Warning: This is my learning Perl
project. It may take a bit before I am ready to show this. As
a "concept" example, I will have a module that writes CLF to dbm
format in the next couple of days. (err, family visit this weekend...)
Make that next week.
Re: Log file and databases [ In reply to ]
Date: Tue, 09 May 1995 19:47:53 -0500
From: Randy Terbush <randy@zyzzyva.com>

The mailing list archive for the modules I mentioned is at:

http://www.webstorm.com/local/cgi-perl/

I can't clarify that nexor archive location at this time.

OK... I looked over the messages in that archive related to the
"Mini-server", and it looked like it was oriented more towards
management of connection state than at logging per se.

Randy, since you're following this particular discussion more closely
than I am, do you know much about how it relates to some of the other
approaches to connection state that are floating around (Netscape
"cookies", server-side java applets implementing private protocols,
etc.)?

rst
Re: Log file and databases [ In reply to ]
On Wed, 10 May 1995, Randy Terbush wrote:
> This has lot's of possibilities. Warning: This is my learning Perl
> project.

*Every* project is a learning Perl project. It's a feature that its
learning curve is a straight diagonal line :)

> It may take a bit before I am ready to show this. As
> a "concept" example, I will have a module that writes CLF to dbm
> format in the next couple of days. (err, family visit this weekend...)
> Make that next week.

Quick question - what's the "key" in the DBM file format? DBM files are
just simple hash tables remember, so logging to a DBM file would be
useful in a "bean count" system like Rob Mc alluded to a while back, but
not to a format ready for entry into a relational analysis program.

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Log file and databases [ In reply to ]
> > It may take a bit before I am ready to show this. As
> > a "concept" example, I will have a module that writes CLF to dbm
> > format in the next couple of days. (err, family visit this weekend...)
> > Make that next week.
>
> Quick question - what's the "key" in the DBM file format? DBM files are
> just simple hash tables remember, so logging to a DBM file would be
> useful in a "bean count" system like Rob Mc alluded to a while back, but
> not to a format ready for entry into a relational analysis program.

The key() I have used for this concept test is an integer conversion of
the date entry. A relational database would make this *much* more
useable and is definitely the direction I want to go with this.

-Randy
Re: Log file and databases [ In reply to ]
On Wed, 10 May 1995, Randy Terbush wrote:
> > > It may take a bit before I am ready to show this. As
> > > a "concept" example, I will have a module that writes CLF to dbm
> > > format in the next couple of days. (err, family visit this weekend...)
> > > Make that next week.
> >
> > Quick question - what's the "key" in the DBM file format? DBM files are
> > just simple hash tables remember, so logging to a DBM file would be
> > useful in a "bean count" system like Rob Mc alluded to a while back, but
> > not to a format ready for entry into a relational analysis program.
>
> The key() I have used for this concept test is an integer conversion of
> the date entry. A relational database would make this *much* more
> useable and is definitely the direction I want to go with this.

Hmm... I guess I'm having trouble figuring out why you want to use a DBM
file then, as all the DBM file you're creating will be good for is
looking up accesses on a particular second.

Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Log file and databases [ In reply to ]
> > > Quick question - what's the "key" in the DBM file format? DBM files are
> > > just simple hash tables remember, so logging to a DBM file would be
> > > useful in a "bean count" system like Rob Mc alluded to a while back, but
> > > not to a format ready for entry into a relational analysis program.
> >
> > The key() I have used for this concept test is an integer conversion of
> > the date entry. A relational database would make this *much* more
> > useable and is definitely the direction I want to go with this.
>
> Hmm... I guess I'm having trouble figuring out why you want to use a DBM
> file then, as all the DBM file you're creating will be good for is
> looking up accesses on a particular second.

True, but I figured I could query accesses in a specific time range,
then pull out of that the information I want regarding data transfer, etc.
I am under the impression that I can get this information out of a DBM
format faster than I can from a *large* flat file.

I agree that this is not the ideal setup. Mainly just an excercise to
get a better idea of what can be done. I have 2 database engines waiting
on input from this logger. (Msql, and Postgres95) The only real snag
to making the database thing work is me finding time to learn how to
write a useable SQL schema for the data we will be logging. I have
been playing with the Perl APIs for both of these databases. If someone
with SQL knowledge could design a schema for the information we would
be logging from Apache, we're off and running.
Re: Log file and databases [ In reply to ]
> Hmm... I guess I'm having trouble figuring out why you want to use a DBM
> file then, as all the DBM file you're creating will be good for is
> looking up accesses on a particular second.

One thing you can do with a perl log writer is
maintain up to date statistics e.g.

$URL_COUNT{$current_url}++
$HOST_COUNT{$current_host}++
$HOUR_COUNT{$current_hour}++
$DAY_COUNT{$current_date}++

For many people, it's these counts which are the only
thing of interest that come out of a logfile. Most of us probably
do some statistical analysis like this on a nightly/weekly basis.
Having the data available in real-time would be nice.

You can also abbreviate the logifile by assigning keys to each unique
URL and client addresss - this makes the log much smaller. A simple
filter expands the abbreviated log to CLF or whatever you prefer. I
did this kind of log abbreviation for a while in order to reduce the
size of old logfiles.


--
Rob Hartill
http://nqcd.lanl.gov/~hartill/
Re: Log file and databases [ In reply to ]
On Wed, 10 May 1995, Randy Terbush wrote:
> > Hmm... I guess I'm having trouble figuring out why you want to use a DBM
> > file then, as all the DBM file you're creating will be good for is
> > looking up accesses on a particular second.
>
> True, but I figured I could query accesses in a specific time range,
> then pull out of that the information I want regarding data transfer, etc.
> I am under the impression that I can get this information out of a DBM
> format faster than I can from a *large* flat file.

You'd have to search for every second (or whatever granularity you're
using) within that time range, as there's no sorting implied on the
keys...

Could you provide pointers to Msql and Postgres95?

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Re: Log file and databases [ In reply to ]
> Could you provide pointers to Msql and Postgres95?
>

Msql is available at: ftp://bond.edu.au/pub/Minerva/msql/

As for Postgres95, I can push a copy over to hyperreal later
if you're interested. The site where I got it offered it
temporarily.