Mailing List Archive

Re: Need help to retrieve (and correct) all reports(need help on LOGFORMAT)
Leung, Michael <Michael.Leung@networksolutions.com> wrote:
> Aengus,
>
> The below is what we see for the Domain Report, but it is not what we
> are expecting.
>
> Listing domains, sorted by the amount of traffic.
>
> reqs %bytes domain
> 655193 100% [unresolved numerical addresses]
>
> Even it is entirely based on IP numbers, I should see a list of
> several IP addresses, instead of what we have now.

My mistake - it's actually the Organization Report that shows breakdown by IP address when DNS resolution isn't enabled - the Domain Report reports on Top Level Domains (.com, .org, .co.uk, etc) so it requires IP names, not IP numbers.

The Organisation report lists the organizations (companies, institutions, ISPs etc.) that the IP addresses are registered to. When you only have IP numbers, the Organization Report basically breaks the addresses down by "Class", so anything from 12.x.y.x will be listed under 12, (a Class A address) but higher addresses will typically be listed with 2 or more octets.

> But when I am using the above, instead of letting analog to use its
> auto-detect, I got the following error message in the output:
>
> analog: Warning L: Large number of corrupt lines in logfile
> /source_data1/weblog/datafiles/1.log: turn debugging on or try
> different
> LOGFORMAT
> (For help on all errors and warnings, see docs/errors.html)
> Current logfile format:
> %S - %j [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b "%f" "%A" "%j"
> "%j" "-"\n
>
>
> what does it mean? Does it mean that I should this suggested format?

It means that not all of the lines in your logfile match the LOGFORMAT that you told Analog to use. At a guess, the last field in your logformat isn't always "-", so you might just want

LOGFORMATm (%S - %j [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%A" %j)

> The Domain report is one issue. And then, some of the "search"
> reports are turn off.
>
> analog: Warning R: Turning off empty Search Query Report
> analog: Warning R: Turning off empty Search Word Report
> analog: Warning R: Turning off empty Internal Search Query Report
> analog: Warning R: Turning off empty Internal Search Word Report
>
> how do I verify if we have any data for these reports?

The Search Word and Query Reports rely on the Referrer field, and on the relevant Search Engine being defined in your Analog.cfg (there are a couple of dozen of the more common ones listed in the default analog.cfg).

If you have any referrers from Google or Yahoo, then your Search Word Reports should not be empty.

The Internal Search Reports need you to define a particular URL on your web server as an "search engine", and which field in the Query String is the search term. The Internal Search Engine is not defined by default, so it's reports will always be empty unless you've defined an Internal Search Engine,

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
RE: Need help to retrieve (and correct) reports (need help on LOGFORMAT) [ In reply to ]
Aengus,

When I used your suggestion:

LOGFORMAT (%S - %j [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%A" %j)

It is working fine. At least, analog doesn't complaint anymore.



>From what I could tell, either of Domain Report or Organization Report
still is not giving out meaningful results.


Domain Report
-------------
This report lists the countries of the computers which requested files.

Listing domains, sorted by the amount of traffic.

reqs: %bytes: domain
------: ------: ------
806393: 100%: [unresolved numerical addresses]
------------------------------------------------------------------------
----

Organization Report
-------------------
This report lists the organizations of the computers which requested
files.

Listing the top 20 organizations by the number of requests, sorted by
the
number of requests.

reqs: %bytes: organization
------: ------: ------------
57717: 6.97%: 71
46858: 6.41%: 76
46018: 4.81%: 205.178
35415: 4.90%: 72
34912: 4.33%: 69
34499: 4.33%: 75
30822: 3.55%: 70
30050: 3.93%: 98
20957: 2.62%: 74
15863: 1.90%: 210.5
13839: 1.74%: 99
12744: 1.51%: 96
6666: 0.98%: 12


So, how do we resolve the issue for both Domain and Organization
Reports?




>The Search Word and Query Reports rely on the Referrer field, and on
the >relevant Search Engine being defined in your Analog.cfg (there are
a couple >of dozen of the more common ones listed in the default
analog.cfg).

>If you have any referrers from Google or Yahoo, then your Search Word
>Reports should not be empty.

>The Internal Search Reports need you to define a particular URL on your
web >server as an "search engine", and which field in the Query String
is the >search term. The Internal Search Engine is not defined by
default, so it's >reports will always be empty unless you've defined an
Internal Search >Engine,


Based on my LOGFORMAT, my Referrer field shouldn't be empty. Right?



Thanks

Michael





-----Original Message-----
From: analog-help-bounces@lists.meer.net
[mailto:analog-help-bounces@lists.meer.net] On Behalf Of Aengus
Sent: Monday, December 01, 2008 4:53 PM
To: Support for analog web log analyzer
Subject: Re: [analog-help] Need help to retrieve (and correct)
allreports(need help on LOGFORMAT)

Leung, Michael <Michael.Leung@networksolutions.com> wrote:
> Aengus,
>
> The below is what we see for the Domain Report, but it is not what we
> are expecting.
>
> Listing domains, sorted by the amount of traffic.
>
> reqs %bytes domain
> 655193 100% [unresolved numerical addresses]
>
> Even it is entirely based on IP numbers, I should see a list of
> several IP addresses, instead of what we have now.

My mistake - it's actually the Organization Report that shows breakdown
by IP address when DNS resolution isn't enabled - the Domain Report
reports on Top Level Domains (.com, .org, .co.uk, etc) so it requires IP
names, not IP numbers.

The Organisation report lists the organizations (companies,
institutions, ISPs etc.) that the IP addresses are registered to. When
you only have IP numbers, the Organization Report basically breaks the
addresses down by "Class", so anything from 12.x.y.x will be listed
under 12, (a Class A address) but higher addresses will typically be
listed with 2 or more octets.

> But when I am using the above, instead of letting analog to use its
> auto-detect, I got the following error message in the output:
>
> analog: Warning L: Large number of corrupt lines in logfile
> /source_data1/weblog/datafiles/1.log: turn debugging on or try
> different
> LOGFORMAT
> (For help on all errors and warnings, see docs/errors.html)
> Current logfile format:
> %S - %j [%d/%M/%Y:%h:%n:%j %j] "%j %r %j" %c %b "%f" "%A" "%j"
> "%j" "-"\n
>
>
> what does it mean? Does it mean that I should this suggested format?

It means that not all of the lines in your logfile match the LOGFORMAT
that you told Analog to use. At a guess, the last field in your
logformat isn't always "-", so you might just want

LOGFORMATm (%S - %j [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%A" %j)

> The Domain report is one issue. And then, some of the "search"
> reports are turn off.
>
> analog: Warning R: Turning off empty Search Query Report
> analog: Warning R: Turning off empty Search Word Report
> analog: Warning R: Turning off empty Internal Search Query Report
> analog: Warning R: Turning off empty Internal Search Word Report
>
> how do I verify if we have any data for these reports?

The Search Word and Query Reports rely on the Referrer field, and on the
relevant Search Engine being defined in your Analog.cfg (there are a
couple of dozen of the more common ones listed in the default
analog.cfg).

If you have any referrers from Google or Yahoo, then your Search Word
Reports should not be empty.

The Internal Search Reports need you to define a particular URL on your
web server as an "search engine", and which field in the Query String is
the search term. The Internal Search Engine is not defined by default,
so it's reports will always be empty unless you've defined an Internal
Search Engine,

Aengus

+-----------------------------------------------------------------------
-
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+-----------------------------------------------------------------------
-

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
RE: Need help to retrieve (and correct) reports (need help on LOGFORMAT) [ In reply to ]
At 6:55 PM -0500 12/1/08, Leung, Michael wrote:
> >From what I could tell, either of Domain Report or Organization Report
>still is not giving out meaningful results.
>
>
>Domain Report
>-------------
>This report lists the countries of the computers which requested files.
>
>Listing domains, sorted by the amount of traffic.
>
> reqs: %bytes: domain
>------: ------: ------
>806393: 100%: [unresolved numerical addresses]


Looks exactly correct to me. If you want the domain names, then you need to process your logfile to convert the IP addresses to Domain Names. You can do this with an external program before running Analog on the files (I use a script called "dnstrans"). Or you can tell Analog to do this using the DNS option in the Analog config files (see http://www.analog.cx/docs/dns.html ).


>Organization Report
>-------------------
>This report lists the organizations of the computers which requested
>files.
>
>Listing the top 20 organizations by the number of requests, sorted by
>the
> number of requests.
>
> reqs: %bytes: organization
>------: ------: ------------
> 57717: 6.97%: 71
> 46858: 6.41%: 76
> 46018: 4.81%: 205.178

Again - this looks quite normal for a default configuration analyzing a log file with no domain names, just IP addresses. Activate DNS processing and this will change (although DNS processing slows things down quite a bit).


>So, how do we resolve the issue for both Domain and Organization
>Reports?

Perform DNS processing on your log files - either letting Analog do it with its slower code, or using a different tool to pre-process the logfile before triggering Analog (see http://www.analog.cx/helpers/#dns )


>Based on my LOGFORMAT, my Referrer field shouldn't be empty. Right?


I'm not sure on this, but you could always look at some lines in the logfile and see if the referrer is there or not. :)

-Spode

--
Edward F Spodick, Information Technology Manager
Hong Kong University of Science & Technology Library
lbspodic@ust.hk tel:852-2358-6743 fax:852-2358-1043
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: Need help to retrieve (and correct) reports (need help on LOGFORMAT) [ In reply to ]
I think you're looking for the Host Report, rather than the Domain
Report or Organization Report.

--
Stephen Turner
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: Need help to retrieve (and correct) reports (need help on LOGFORMAT) [ In reply to ]
On 12/1/2008 8:42 PM, Edward Spodick wrote:

> Perform DNS processing on your log files - either letting Analog do it with its slower code, or using a different tool to pre-process the logfile before triggering Analog (see http://www.analog.cx/helpers/#dns )

With 806393 requests in the logfile, using Analogs built-in lookups
wouldn't be the best idea. For learning about DNS lookups, a sample from
the logfile of a few hundred lines would be a better idea. For DNS
lookups on anything larger than that, you really need to use one of the
the DNS helper apps.

Aengus
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: Need help to retrieve (and correct) reports (need help on LOGFORMAT) [ In reply to ]
On 12/1/2008 6:55 PM, Leung, Michael wrote:
>
> Based on my LOGFORMAT, my Referrer field shouldn't be empty. Right?

The sample logfile entry that you provided had a referrer field
(https://www.networksolutions.com/manage-it/private-registration-splash.jsp)

Aengus
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
RE: Need help to retrieve (and correct) reports(need help on LOGFORMAT) [ In reply to ]
-----Original Message-----
From: analog-help-bounces@lists.meer.net
[mailto:analog-help-bounces@lists.meer.net] On Behalf Of Edward Spodick
Sent: Monday, December 01, 2008 8:43 PM
To: Support for analog web log analyzer
Subject: RE: [analog-help] Need help to retrieve (and correct)
reports(need help on LOGFORMAT)

At 6:55 PM -0500 12/1/08, Leung, Michael wrote:
> >From what I could tell, either of Domain Report or Organization
Report
>still is not giving out meaningful results.
>
>
>Domain Report
>-------------
>This report lists the countries of the computers which requested files.
>
>Listing domains, sorted by the amount of traffic.
>
> reqs: %bytes: domain
>------: ------: ------
>806393: 100%: [unresolved numerical addresses]


>Looks exactly correct to me. If you want the domain names, then you
need >to process your logfile to convert the IP addresses to Domain
Names. You >can do this with an external program before running Analog
on the files (I >use a script called "dnstrans"). Or you can tell
Analog to do this using >the DNS option in the Analog config files (see
>http://www.analog.cx/docs/dns.html ).



Well, I was hoping that the report will list the IP address under the
domain column.

For example, reqs: %bytes: domain
806393 40% 146.115.44.11


But I guess it doesn't work that way!



>Organization Report
>-------------------
>This report lists the organizations of the computers which requested
>files.
>
>Listing the top 20 organizations by the number of requests, sorted by
>the
> number of requests.
>
> reqs: %bytes: organization
>------: ------: ------------
> 57717: 6.97%: 71
> 46858: 6.41%: 76
> 46018: 4.81%: 205.178

>Again - this looks quite normal for a default configuration analyzing a
log file with no domain names, just IP addresses. Activate DNS
processing and this will change (although DNS processing slows things
down quite a bit).



So, Organization Report is really breaking down the IP address like
71.76.205.178 to details. Is it?




>So, how do we resolve the issue for both Domain and Organization
>Reports?

>Perform DNS processing on your log files - either letting Analog do it
with its slower code, or using a different tool to pre-process the
logfile before triggering Analog (see http://www.analog.cx/helpers/#dns
)


>Based on my LOGFORMAT, my Referrer field shouldn't be empty. Right?

>I'm not sure on this, but you could always look at some lines in the
logfile and see if the referrer is there or not. :)

-Spode

--
Edward F Spodick, Information Technology Manager
Hong Kong University of Science & Technology Library
lbspodic@ust.hk tel:852-2358-6743 fax:852-2358-1043
+-----------------------------------------------------------------------
-
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+-----------------------------------------------------------------------
-

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------