Mailing List Archive

complete list of accessed files?
Our marketing people are interested in how often certain .PDF documents
accessed (downloaded)
but depending on the hit statistics these do not appear in the listed
files (only under others).

I already thought of putting index.htm (the most accessed files in
folders and subfolders)
on the FILEEXCLUDE list. This doesn't seem to work.

FILEEXCLUDE index.htm

doesn't make it disappear in the statistics of accessed files.

--
Christoph

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: complete list of accessed files? [ In reply to ]
Your exclude statements need to match the request. If index.htm is in your
root you'll want at least this:

FILEEXCLUDE /index.htm

However, if there are any query parameters you'll need to match those, so
probably more likely this:

FILEEXCLUDE /index.htm*

See http://analog.cx/docs/include.html for details.

However, I'm not sure that answers your original question. Analog's reports
will show the top number of results based on FLOOR settings. You can change
the floor for a given report to have it show more data. For example, this
will show all requests (because everything has at least one request):

REQFLOOR 1r

See http://analog.cx/docs/othreps.html#FLOOR for details.

Also, if you just want a report that shows PDF, you can use the FILEINCLUDE
to get that (assuming no previous FILEEXCLUDEs exist in your config):

FILEINCLUDE *.pdf

As you'll find if you peruse the archives of this list, PDF requests often
show multiple times for a given file because some versions of Adobe Acrobat
Reader will load a page at a time from the server. So the number of requests
to the PDF may be higher than the actual number of reads.

As HTTP is stateless and cached, there's no accurate way to ensure that you
know how many actual read you had.

--
Jeremy Wadsack

On Fri, Jul 24, 2009 at 9:52 AM, Christoph Kukulies <kuku@kukulies.org>wrote:

> Our marketing people are interested in how often certain .PDF documents
> accessed (downloaded)
> but depending on the hit statistics these do not appear in the listed files
> (only under others).
>
> I already thought of putting index.htm (the most accessed files in folders
> and subfolders)
> on the FILEEXCLUDE list. This doesn't seem to work.
>
> FILEEXCLUDE index.htm
>
> doesn't make it disappear in the statistics of accessed files.
>
> --
> Christoph
>
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Analog Documentation: http://analog.cx/docs/Readme.html
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
> +------------------------------------------------------------------------
>



--
Jeremy Wadsack
Re: complete list of accessed files? [ In reply to ]
Jeremy Wadsack schrieb:
> ...

> However, I'm not sure that answers your original question. Analog's
> reports will show the top number of results based on FLOOR settings.
> You can change the floor for a given report to have it show more data.
> For example, this will show all requests (because everything has at
> least one request):
>
> REQFLOOR 1r

I introduced this setting (REQFLOOR 1r) now in my config file, I also
included
FILEINCLUDE *.pdf

Still I don't see a single .pdf in the list of requested files (last
listing in the report).
I have about 3000 .pdf file requests in the original apache access_log file.

I assume the file extension (FILEINCLUDE syntax) isn't case sensitive.
>
> See http://analog.cx/docs/othreps.html#FLOOR for details.
>
> Also, if you just want a report that shows PDF, you can use the
> FILEINCLUDE to get that (assuming no previous FILEEXCLUDEs exist in
> your config):
>
> FILEINCLUDE *.pdf
>
> As you'll find if you peruse the archives of this list, PDF requests
> often show multiple times for a given file because some versions of
> Adobe Acrobat Reader will load a page at a time from the server. So
> the number of requests to the PDF may be higher than the actual number
> of reads.
>
> As HTTP is stateless and cached, there's no accurate way to ensure
> that you know how many actual read you had.
OK. Good to know. I just would like to know about a possible increase
over all, assuming the behaviour of the clients remains the same.

--
Christoph Kukulies

> --
> Jeremy Wadsack
>
> On Fri, Jul 24, 2009 at 9:52 AM, Christoph Kukulies <kuku@kukulies.org
> <mailto:kuku@kukulies.org>> wrote:
>
> Our marketing people are interested in how often certain .PDF
> documents accessed (downloaded)
> but depending on the hit statistics these do not appear in the
> listed files (only under others).
>
> I already thought of putting index.htm (the most accessed files in
> folders and subfolders)
> on the FILEEXCLUDE list. This doesn't seem to work.
>
> FILEEXCLUDE index.htm
>
> doesn't make it disappear in the statistics of accessed files.
>
> --
> Christoph
>
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Analog Documentation: http://analog.cx/docs/Readme.html
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> | Usenet version:
> news://news.gmane.org/gmane.comp.web.analog.general
> <http://news.gmane.org/gmane.comp.web.analog.general>
> +------------------------------------------------------------------------
>
>
>
>
> --
> Jeremy Wadsack
> ------------------------------------------------------------------------
>
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Analog Documentation: http://analog.cx/docs/Readme.html
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
> +------------------------------------------------------------------------
>

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: complete list of accessed files? [ In reply to ]
On 8/12/2009 10:47 AM, Christoph Kukulies wrote:

> I introduced this setting (REQFLOOR 1r) now in my config file, I also
> included
> FILEINCLUDE *.pdf
>
> Still I don't see a single .pdf in the list of requested files (last
> listing in the report).
> I have about 3000 .pdf file requests in the original apache access_log
> file.
>
> I assume the file extension (FILEINCLUDE syntax) isn't case sensitive.

Actually, if Analog is running on a case sensitive OS, then the
FILEINCLUDE probably is case sensitive.

http://analog.cx/docs/alias.html#CASE

Can you post 3 or 4 lines from your log file, including some PDF requests?

Aengus
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: complete list of accessed files? [ In reply to ]
Christoph Kukulies <kuku@...> writes:

> I introduced this setting (REQFLOOR 1r) now in my config file, I also
> included
> FILEINCLUDE *.pdf
>
> Still I don't see a single .pdf in the list of requested files (last
> listing in the report).
> I have about 3000 .pdf file requests in the original apache access_log file.

@Christoph - try

PAGEINCLUDE *.pdf

Paul

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: complete list of accessed files? [ In reply to ]
Aengus schrieb:
> On 8/12/2009 10:47 AM, Christoph Kukulies wrote:
>
>> I introduced this setting (REQFLOOR 1r) now in my config file, I also
>> included
>> FILEINCLUDE *.pdf
>>
>> Still I don't see a single .pdf in the list of requested files (last
>> listing in the report).
>> I have about 3000 .pdf file requests in the original apache
>> access_log file.
>>
>> I assume the file extension (FILEINCLUDE syntax) isn't case sensitive.
>
> Actually, if Analog is running on a case sensitive OS, then the
> FILEINCLUDE probably is case sensitive.
>
> http://analog.cx/docs/alias.html#CASE
>
> Can you post 3 or 4 lines from your log file, including some PDF requests?
87.79.34.253 - - [11/Aug/2009:17:58:38 +0200] "GET
/export/download/de/AB-lang/AB-3-5-7.pdf HTTP/1.1" 200 158955
"http://www.mysite.de/de/produkte/AB-lang//index.htm" "Mozilla/5.0
(Windows; U; Windows NT 5.1; de; rv:1.9.1.2) Gecko/20090729
Firefox/3.5.2 (.NET CLR 3.5.30729)"

217.91.80.223 - - [12/Aug/2009:09:31:33 +0200] "GET
/export/download/de/AB-lang/AB-Plan.pdf HTTP/1.1" 200 212734
"http://www.mysite.de/de/produkte/AB-lang/AB-Plan.htm" "Mozilla/5.0
(Windows; U; Windows NT 5.1; de; rv:1.9.0.13) Gecko/2009073022
Firefox/3.0.13"

159.51.236.51 - - [12/Aug/2009:14:21:36 +0200] "GET
/export/download/de/AB-lang/XYZPP.pdf HTTP/1.1" 200 108759
"http://www.mysite.de/de/produkte/AB-lang/ABACC-XYZYPP.htm" "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR
1.1.4322; .NET CLR 2.0.50727)"

85.3.36.7 - - [12/Aug/2009:15:36:27 +0200] "GET
/export/download/de/AB-lang/AB-Plan.pdf HTTP/1.1" 206 208857 "-"
"Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.0.13)
Gecko/2009073022 Firefox/3.0.13 (.NET CLR 3.5.30729)"

The first three are referred from our own pages (mysite.de).
I found that most of the requests with return code 200 are bots,
crawlers, Yandex, msnbot, google search results.
I will try the suggested page-include and see what happens when I
include SEARCHENGINES again.

I hope the syntax is correct since I must admit that I had to
reconstruct (paste together) the main log file from different files
(referrer.log, agents.log) because there was a break in the logformat
during time.

--
Christoph Kukulies

>
> Aengus
> +------------------------------------------------------------------------
> | TO UNSUBSCRIBE from this list:
> | http://lists.meer.net/mailman/listinfo/analog-help
> |
> | Analog Documentation: http://analog.cx/docs/Readme.html
> | List archives: http://www.analog.cx/docs/mailing.html#listarchives
> | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
> +------------------------------------------------------------------------

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
Re: complete list of accessed files? [ In reply to ]
Christoph Kukulies <kuku@kukulies.org> wrote:
> Aengus schrieb:
>> On 8/12/2009 10:47 AM, Christoph Kukulies wrote:
>>
>> Can you post 3 or 4 lines from your log file, including some PDF
>> requests?
> 87.79.34.253 - - [11/Aug/2009:17:58:38 +0200] "GET
> /export/download/de/AB-lang/AB-3-5-7.pdf HTTP/1.1" 200 158955
> "http://www.mysite.de/de/produkte/AB-lang//index.htm" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; de; rv:1.9.1.2) Gecko/20090729
> Firefox/3.5.2 (.NET CLR 3.5.30729)"

Just as a test, I ran Analog against these 4 lines with no analog.cfg, just using the hardcoded defaults in Analog, and REQFLOOR 1r and it displayed these entries in the Request Report.

Listing files, sorted by the number of requests.
reqs %bytes last time file
2 61.16% 12/Aug/09 15:36 /export/download/de/ab-lang/ab-plan.pdf
1 15.78% 12/Aug/09 14:21 /export/download/de/ab-lang/xyzpp.pdf
1 23.06% 11/Aug/09 17:58 /export/download/de/ab-lang/ab-3-5-7.pdf


analog pdf.log -G +O"report.html" +C"reqfloor 1r"

If these .pdf files aren't showing up in your reports, then you have something in your analog.cfg that is excluding them.

Aengus

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------