Mailing List Archive

LogFormat in analog.cfg broken
Hi folks:

I could use some help figuring out the correct syntax for a new
LogFormat line in my analog.cfg file after changing the LogFormat
settings for Apache a few weeks ago.

In my httpd.conf I have the following line:

LogFormat "%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined

In analog.cfg I use:

LOGFORMAT (%h "%u" [%d/%M/%Y:%h:%n:%j] "%f" %c %b)

which doesn't work (resulting in corrupt logfile lines)

Thanks!

tallbiker66



+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
> Hi folks:
>
> I could use some help figuring out the correct syntax for a new
> LogFormat line in my analog.cfg file after changing the LogFormat
> settings for Apache a few weeks ago.
>
> In my httpd.conf I have the following line:
>
> LogFormat "%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\"" combined
>
> In analog.cfg I use:
>
> LOGFORMAT (%h "%u" [%d/%M/%Y:%h:%n:%j] "%f" %c %b)
>
> which doesn't work (resulting in corrupt logfile lines)


Have you tried just using your Apache command directly?

APACHELOGFORMAT ("%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"")

If that doesn't work, can you post a couple of actual lines from your logfile. Your Apache statement has bytes before the Referrer, but your Analog statement has the bytes at the end, and doesn't even mention the request.

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Quoting Aengus <analog07@eircom.net>:

> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>> Hi folks:
>>
>> I could use some help figuring out the correct syntax for a new
>> LogFormat line in my analog.cfg file after changing the LogFormat
>> settings for Apache a few weeks ago.
>>
>> In my httpd.conf I have the following line:
>>
>> LogFormat "%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>> \"%{User-Agent}i\"" combined
>>
>> In analog.cfg I use:
>>
>> LOGFORMAT (%h "%u" [%d/%M/%Y:%h:%n:%j] "%f" %c %b)
>>
>> which doesn't work (resulting in corrupt logfile lines)
>
>
> Have you tried just using your Apache command directly?
>
> APACHELOGFORMAT ("%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\"")
>
> If that doesn't work, can you post a couple of actual lines from
> your logfile. Your Apache statement has bytes before the Referrer,
> but your Analog statement has the bytes at the end, and doesn't even
> mention the request.
>
> Aengus


Hi Aengus:

I tried APACHELOGFORMAT in my analog.cfg without success (still
corrupt logfile lines).

My Apache access_log looks like this:

1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/black.gif
HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/white.gif
HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"

Thank you for helping me.

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
> Quoting Aengus <analog07@eircom.net>:
>
>> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>>> Hi folks:
>>>
>>> I could use some help figuring out the correct syntax for a new
>>> LogFormat line in my analog.cfg file after changing the LogFormat
>>> settings for Apache a few weeks ago.
>>>
>>> In my httpd.conf I have the following line:
>>>
>>> LogFormat "%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>>> \"%{User-Agent}i\"" combined
>>>
>>
>> Have you tried just using your Apache command directly?
>>
>> APACHELOGFORMAT ("%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>> \"%{User-Agent}i\"")
>>
>> Aengus
>
>
> Hi Aengus:
>
> I tried APACHELOGFORMAT in my analog.cfg without success (still
> corrupt logfile lines).
>
> My Apache access_log looks like this:
>
> 1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/black.gif
> HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
> Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
> 1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/white.gif
> HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
> Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"

My mistake - I forgot to strip the double-quotes from the start and end of your line from the Apache entry.

APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")

This works for the sample lines that you posted.

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Quoting Aengus <analog07@eircom.net>:

> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>> Quoting Aengus <analog07@eircom.net>:
>>
>>> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>>>> Hi folks:
>>>>
>>>> I could use some help figuring out the correct syntax for a new
>>>> LogFormat line in my analog.cfg file after changing the LogFormat
>>>> settings for Apache a few weeks ago.
>>>>
>>>> In my httpd.conf I have the following line:
>>>>
>>>> LogFormat "%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>>>> \"%{User-Agent}i\"" combined
>>>>
>>>
>>> Have you tried just using your Apache command directly?
>>>
>>> APACHELOGFORMAT ("%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>>> \"%{User-Agent}i\"")
>>>
>>> Aengus
>>
>>
>> Hi Aengus:
>>
>> I tried APACHELOGFORMAT in my analog.cfg without success (still
>> corrupt logfile lines).
>>
>> My Apache access_log looks like this:
>>
>> 1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/black.gif
>> HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
>> Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
>> 1.2.3.4 - "-" [02/Oct/2008:09:31:33 -0600] "GET /images/white.gif
>> HTTP/1.1" 200 43 "http://www.unm.edu/" "Mozilla/5.0 (Windows; U;
>> Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
>
> My mistake - I forgot to strip the double-quotes from the start and
> end of your line from the Apache entry.
>
> APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\")
>
> This works for the sample lines that you posted.
>
> Aengus

I have to admit that the analog configuration is getting quite
confusing to me. Unfortunately the APACHELOGFORMAT line doesn't solve
my problem, so please allow me to provide you with a little more
information regarding the purpose of the updated analog.cfg, as well
as what I'm doing before the problem occurs.

1. Copy the previous month Apache log to a temporary location
2. Run a script to extract page visitor data from the general Apache
log file and store it in a separate file
3. Run a bash 'for i' loop on the new log files and store the data in
page visitor sub directores

Unfortunately I decided that Apache has to write more information to
its access_log log file, which is finally the reason why there is
issues with analog now. According to the analog documentation there is
a way to set up a hierarchy so that it will understand a log file
syntax even if it changes from old to new over time, but I haven't
been able to figure out how to make it work.

The question is how do I let analog know that the log file syntax for
the reports has changed over time. If it's not possible, I can
probably incorporate an if statement in the scripts that checks for
the date the change occurs and feeds analog a new configuration file
that will work, but I would love to avoid that.

My old Apache log file lines look like this:

1.2.3.4 - "-" [19/Jun/2008:23:36:03 +0000] "GET /images/black.gif
HTTP/1.1" 200 43
1.2.3.4 - "-" [19/Jun/2008:23:36:03 +0000] "GET /images/white.gif
HTTP/1.1" 200 43
1.2.3.4 - "-" [19/Jun/2008:23:36:03 +0000] "GET /images/headlines.gif
HTTP/1.1" 200 4352
1.2.3.4 - "-" [19/Jun/2008:23:36:03 +0000] "GET /images/gray.gif
HTTP/1.1" 200 43
1.2.3.4 - "-" [19/Jun/2008:23:36:03 +0000] "GET /images/red.gif HTTP/1.1" 200
45

My new Apache log file line look like this:

1.2.3.4 - "-" [10/Sep/2008:15:12:11 -0600] "GET /images/header.jpg
HTTP/1.1" 200 38750 "http://ladb.unm.edu/" "Mozilla/5.0 (Macintosh; U;
PPC Mac OS
X Mach-O; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16"
1.2.3.4 - "-" [10/Sep/2008:15:12:12 -0600] "GET /images/black.gif
HTTP/1.1" 200 43 "http://ladb.unm.edu/" "Mozilla/5.0 (Macintosh; U;
PPC Mac OS X Ma
ch-O; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16"
1.2.3.4 - "-" [10/Sep/2008:15:12:12 -0600] "GET /images/white.gif
HTTP/1.1" 200 43 "http://ladb.unm.edu/" "Mozilla/5.0 (Macintosh; U;
PPC Mac OS X Ma
ch-O; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16"

My analog.cfg looks like this:

APACHELOGFORMAT (%S %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\")

DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)

LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
LOGFORMAT (%S %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")

DEBUG ON

LOGFILE log-????-??.gz

# OUTFILE Report.html

REQINCLUDE pages


Thank you.

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Ulf Hofemeier <ulf@ladb.unm.edu> wrote:

> I have to admit that the analog configuration is getting quite
> confusing to me. Unfortunately the APACHELOGFORMAT line doesn't solve
> my problem, so please allow me to provide you with a little more
> information regarding the purpose of the updated analog.cfg, as well
> as what I'm doing before the problem occurs.
>
> 1. Copy the previous month Apache log to a temporary location
> 2. Run a script to extract page visitor data from the general Apache
> log file and store it in a separate file
> 3. Run a bash 'for i' loop on the new log files and store the data in
> page visitor sub directores
>
> Unfortunately I decided that Apache has to write more information to
> its access_log log file, which is finally the reason why there is
> issues with analog now. According to the analog documentation there is
> a way to set up a hierarchy so that it will understand a log file
> syntax even if it changes from old to new over time, but I haven't
> been able to figure out how to make it work.

If you have multiple LOGFORMAT statements, Analog will try them each in turn until it finds one that matches the entries in each of your logiles. That means that if you have multiple logfiles, and they aren't all the same format, Analog can still create a single report from these different logfiles. Obviously the report may understate this items that weren't recorded in some of the logfiles - for example, you might have a million requests, but only only 200,000 Browser strings if you only added that field in leater log files.

LOGFORMAT commands apply to LOGFILEs that are specified after the LOGFORMAT in the .cfg file. DEFAULTLOGFORMAT commands apply to logfiles that are specified on the command line.

It's not clear from your description whether your script calls Analog and passes it the name of the logfile as a paramter, or whether Analog picks up the logfile from the LOGFILE log-????-??.gz statement in your .cfg file.

If you're speciying the LOGFILES in the .cfg file, then these lines should do the job:
APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")


If you're calling Analog with the logfiles specified on the command line, then these lines should work:
DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Quoting Aengus <analog07@eircom.net>:

> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>
>> I have to admit that the analog configuration is getting quite
>> confusing to me. Unfortunately the APACHELOGFORMAT line doesn't solve
>> my problem, so please allow me to provide you with a little more
>> information regarding the purpose of the updated analog.cfg, as well
>> as what I'm doing before the problem occurs.
>>
>> 1. Copy the previous month Apache log to a temporary location
>> 2. Run a script to extract page visitor data from the general Apache
>> log file and store it in a separate file
>> 3. Run a bash 'for i' loop on the new log files and store the data in
>> page visitor sub directores
>>
>> Unfortunately I decided that Apache has to write more information to
>> its access_log log file, which is finally the reason why there is
>> issues with analog now. According to the analog documentation there is
>> a way to set up a hierarchy so that it will understand a log file
>> syntax even if it changes from old to new over time, but I haven't
>> been able to figure out how to make it work.
>
> If you have multiple LOGFORMAT statements, Analog will try them each
> in turn until it finds one that matches the entries in each of your
> logiles. That means that if you have multiple logfiles, and they
> aren't all the same format, Analog can still create a single report
> from these different logfiles. Obviously the report may understate
> this items that weren't recorded in some of the logfiles - for
> example, you might have a million requests, but only only 200,000
> Browser strings if you only added that field in leater log files.
>
> LOGFORMAT commands apply to LOGFILEs that are specified after the
> LOGFORMAT in the .cfg file. DEFAULTLOGFORMAT commands apply to
> logfiles that are specified on the command line.
>
> It's not clear from your description whether your script calls
> Analog and passes it the name of the logfile as a paramter, or
> whether Analog picks up the logfile from the LOGFILE log-????-??.gz
> statement in your .cfg file.
>
> If you're speciying the LOGFILES in the .cfg file, then these lines
> should do the job:
> APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
> APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\")

My script calls analog like this:

cd $datadir/$domain

# Determine the range of months from the list of log files that were
not empty.
first=`ls log*|sort|head -1|cut -b5-`
first=`echo $first |rev |cut -b4- |rev` ; # YEAR-MO
last=`ls access_log*|sort|tail -1|cut -b12-`
last=`echo $last |rev |cut -b4- |rev` ; # YEAR-MO
range="$first--$last"; # YEAR-MO--YEAR-MO

# Collect summary information from all the log files.
/data/stats/analog/analog access_log.????-??.gz >
$analogdir/$domain/$range.html

# Collect information by month in seperate files.
for i in access_log.????-??.gz ;
do
file=`echo $i |cut -b12-` # YEAR-MO.gz
file=`echo $file |rev |cut -b4- |rev` ; # YEAR-MO
/data/stats/analog/analog $i > $analogdir/$domain/$file.html
done

So I pass the log file to analog as a parameter on the command line
rather than using analog.cfg.

> If you're calling Analog with the logfiles specified on the command
> line, then these lines should work:
> DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
> DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b
> \"%{Referer}i\" \"%{User-Agent}i\")
>

I will give these two lines a try in my analog.cfg.

Thank you.

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Quoting "Ulf Hofemeier" <ulf@ladb.unm.edu>:

> Quoting Aengus <analog07@eircom.net>:
>
>> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>>
>>> I have to admit that the analog configuration is getting quite
>>> confusing to me. Unfortunately the APACHELOGFORMAT line doesn't solve
>>> my problem, so please allow me to provide you with a little more
>>> information regarding the purpose of the updated analog.cfg, as well
>>> as what I'm doing before the problem occurs.
>>>
>>> 1. Copy the previous month Apache log to a temporary location
>>> 2. Run a script to extract page visitor data from the general Apache
>>> log file and store it in a separate file
>>> 3. Run a bash 'for i' loop on the new log files and store the data in
>>> page visitor sub directores
>>>
>>> Unfortunately I decided that Apache has to write more information to
>>> its access_log log file, which is finally the reason why there is
>>> issues with analog now. According to the analog documentation there is
>>> a way to set up a hierarchy so that it will understand a log file
>>> syntax even if it changes from old to new over time, but I haven't
>>> been able to figure out how to make it work.
>>
>> If you have multiple LOGFORMAT statements, Analog will try them
>> each in turn until it finds one that matches the entries in each of
>> your logiles. That means that if you have multiple logfiles, and
>> they aren't all the same format, Analog can still create a single
>> report from these different logfiles. Obviously the report may
>> understate this items that weren't recorded in some of the logfiles
>> - for example, you might have a million requests, but only only
>> 200,000 Browser strings if you only added that field in leater log
>> files.
>>
>> LOGFORMAT commands apply to LOGFILEs that are specified after the
>> LOGFORMAT in the .cfg file. DEFAULTLOGFORMAT commands apply to
>> logfiles that are specified on the command line.
>>
>> It's not clear from your description whether your script calls
>> Analog and passes it the name of the logfile as a paramter, or
>> whether Analog picks up the logfile from the LOGFILE log-????-??.gz
>> statement in your .cfg file.
>>
>> If you're speciying the LOGFILES in the .cfg file, then these lines
>> should do the job:
>> APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
>> APACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
>> \"%{User-Agent}i\")
>
> My script calls analog like this:
>
> cd $datadir/$domain
>
> # Determine the range of months from the list of log files that were
> not empty.
> first=`ls log*|sort|head -1|cut -b5-`
> first=`echo $first |rev |cut -b4- |rev` ; # YEAR-MO
> last=`ls access_log*|sort|tail -1|cut -b12-`
> last=`echo $last |rev |cut -b4- |rev` ; # YEAR-MO
> range="$first--$last"; # YEAR-MO--YEAR-MO
>
> # Collect summary information from all the log files.
> /data/stats/analog/analog access_log.????-??.gz >
> $analogdir/$domain/$range.html
>
> # Collect information by month in seperate files.
> for i in access_log.????-??.gz ;
> do
> file=`echo $i |cut -b12-` # YEAR-MO.gz
> file=`echo $file |rev |cut -b4- |rev` ; # YEAR-MO
> /data/stats/analog/analog $i > $analogdir/$domain/$file.html
> done
>
> So I pass the log file to analog as a parameter on the command line
> rather than using analog.cfg.
>
>> If you're calling Analog with the logfiles specified on the command
>> line, then these lines should work:
>> DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
>> DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b
>> \"%{Referer}i\" \"%{User-Agent}i\")
>>
>
> I will give these two lines a try in my analog.cfg.
>
> Thank you.

Adding DEFAULTAPACHELOGFORMAT didn't work (or my analog.cfg is still broken)

analog.cfg looks like this:

# If you need a LOGFORMAT command (most people don't -- try it without
first!),
# it must go here, above the LOGFILE commands.
APACHELOGFORMAT (%S %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\")

DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)

DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b)
DEFAULTAPACHELOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\")

LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
LOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
LOGFORMAT (%S %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")

DEBUG ON

LOGFILE log-????-??.gz

The output after running analog on the latest apache log file looks like this:

/data/stats/analog/analog -G +g../../analog.cfg access_log.2008-09.gz
>test.html

C: 1.2.3.4 - "-" [17/Sep/2008:13:12:52 -0600] "GET
/images/sandiamountains.jpg HTTP/1.1" 200 61671 "http://ladb.unm.edu/"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13
(KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13"
C:
*
C: 1.2.3.4 - "-" [17/Sep/2008:13:12:53 -0600] "GET /favicon.ico
HTTP/1.1" 200 437 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)
AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13"
C:
*
C: 1.2.3.4 - "-" [27/Sep/2008:17:55:25 -0600] "GET / HTTP/1.1" 200
6576
"http://www.google.com/search?q=latin+america+databases&ie=utf-8&oe=utf-8&aq=t&rls=com.yahoo:en-US:official&client=firefox" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 YFF3
Firefox/3.0.1"
C:
*
F: Closing logfile access_log.2008-09.gz
S: Successful requests: 0
S: Redirected requests: 0
S: Failed requests: 0
S: Requests returning informational status code: 0
S: Status code not given: 0
S: Unwanted lines: 0
S: Corrupt lines: 23
/data/stats/analog/analog: Warning L: Large number of corrupt lines in logfile
access_log.2008-09.gz: turn debugging on or try different LOGFORMAT
Current logfile format:
%S %j "%j" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b\n
%S %j "%j" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b\n
%S %j "%j" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b\n
%S %j "%j" [%d/%M/%Y:%h:%n:%j] "%r" %c %b\n
F: Opening stdout as output file
F: Opening /data/stats/analog/requireanalogheader as header file
F: Closing header file /data/stats/analog/requireanalogheader
F: Opening /data/stats/analog/requireanalogfooter as footer file
F: Closing footer file /data/stats/analog/requireanalogfooter


+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Ulf Hofemeier <ulf@ladb.unm.edu> wrote:

> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)

Are these lines in your .cfg file for a reason?

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
I inherited the ancient analog.cfg configuration file and the scripts
that trigger analog every month so my answer is that the
DEFAULTLOGFORMAT lines are supposed to cover every Apache access_log
version there was going back to 2003. As I learned from you that
DEFAULTLOGFORMAT is what analog uses for log files that are handed
over as a parameter on the command line they should work fine. Analog
did run without reporting corrupt log file lines until I changed the
Apache logformat output to something else at least.



Quoting Aengus <analog07@eircom.net>:

> Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
>
>> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
>> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
>> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r" %c %b)
>> DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%r" %c %b)
>
> Are these lines in your .cfg file for a reason?
>
> Aengus
>
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Analog Documentation: http://analog.cx/docs/Readme.html
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
> +------------------------------------------------------------------------
>
>


+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
Ulf Hofemeier <ulf@ladb.unm.edu> wrote:
> I inherited the ancient analog.cfg configuration file and the scripts
> that trigger analog every month so my answer is that the
> DEFAULTLOGFORMAT lines are supposed to cover every Apache access_log
> version there was going back to 2003. As I learned from you that
> DEFAULTLOGFORMAT is what analog uses for log files that are handed
> over as a parameter on the command line they should work fine. Analog
> did run without reporting corrupt log file lines until I changed the
> Apache logformat output to something else at least.

I just checked the documentation, and the correct directive is
APACHEDEFAULTLOGFORMAT, rather than DEFAULTAPACHELOGFORMAT.

APACHELOGFORMAT is meant to be a convenience for those who have Apache configured with custom logformats - instead of translating the Apache configuration into Analog syntax, Analog will do it for you (most of the time - some complex Apache statements won't translate). It's probably worth taking a few minutes to look at the LOGFORMAT documentation to see how the LOGFORMAT is created - it's a fairly straightforward substitution of letter codes for fields (%S for IP address, %b for bytes, %B for Browser, %c for status code, etc), so

1.2.3.4 - "-" [17/Sep/2008:13:12:52 -0600] "GET /images/sandiamountains.jpg HTTP/1.1" 200 61671 "http://ladb.unm.edu/" "Mozilla/5.0 ..."

1.2.3.4 is %S.

"GET /images/sandiamountains.jpg HTTP/1.1" is "%j %r %j"
(I don't care about the GET or the HTTP/1.1, so they are coded as %j for junk).

The timestamp has a day (%d), Month, (%M), 4-digit Year (%Y), hour (%h), minutes (%n) and more junk (seconds and GMT offset), so
[17/Sep/2008:13:12:52 -0600] is coded as [%d/%M/%Y:%h:%n:%j]

Note that case and spacing is important.

Put it all together, and you end up with a LOGFORMAT like this:

%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B"

Your existing .cfg file uses "analog syntax", not Apache syntax, so when you added two extra fields to your logfile (referrer and User Agent), so you could have just copied the existing entries and add "%f" "%b" to the end (though there's a lot of redundancy in your existing setup - you really only need the first 1 of the 4 DEFAULTLOGFORMAT lines).

Or you could copy the modified logformat command from your http.conf file and add it to the analog.cfg file with

APACHEDEFAULTLOGFORMAT (%h %l \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\")


Based on what you've posted, you only need


# If you need a LOGFORMAT command (most people don't -- try it without
first!),
# it must go here, above the LOGFILE commands.

DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b)
DEFAULTLOGFORMAT (%S %j "%u" [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B")
DEBUG ON

You should be able to delete the other 12 lines from your .cfg file, as they don't appear to be doing anything useful (I'd be particularly concerned about that LOGFILE line - are you counting access_log.????-??.gz and log-????-??.gz?

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: LogFormat in analog.cfg broken [ In reply to ]
2008/10/6 Ulf Hofemeier <ulf@ladb.unm.edu>:
> I inherited the ancient analog.cfg configuration file and the scripts that
> trigger analog every month so my answer is that the DEFAULTLOGFORMAT lines
> are supposed to cover every Apache access_log version there was going back
> to 2003. As I learned from you that DEFAULTLOGFORMAT is what analog uses for
> log files that are handed over as a parameter on the command line they
> should work fine. Analog did run without reporting corrupt log file lines
> until I changed the Apache logformat output to something else at least.
>

There's no point in having DEFAULTLOGFORMATs and LOGFORMATs. Well,
unless you have some logfiles on the command line and some in the
config file. DEFAULTLOGFORMAT is what it uses if there is no
LOGFORMAT.

--
Stephen Turner
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------