I'm trying to get Analog 6.0 to generate a Search Query and Search Word
report, but am having some trouble due to memory, I believe.
I'm running Analog on a Dell PowerEdge 2450/600 with 512MB RAM. When I
analyze my monthly logs, with about 2.1 million log lines, I'm able to
use the extensive lists of SearchEngines.txt or SearchQuery.txt without
problems; the report is generated just as I expect. It takes about 16.5
minutes to generate.
However, when I try to do the same with a whole year's volume of log
entries (about 25 million lines) I get a segmentation fault. I was able
to prevent this by setting "REFLOWMEM 3". REFLOWMEM 2 still creates a
seg fault. But, REFLOWMEM 3 prevents any Search Query or Search Word
report.
My first question is: Am I completely screwed at this point, and nothing
short of adding more memory to the server will help?
I thought that maybe I could generate a Search report by not using the
entire lists available in SearchEngines.txt and SearchQuery.txt.
Instead, I'm trying to just look at the top ten search engines that
refer to my site. I started with Google. I entered this in my Analog
config file:
# Creating Search Query and Word reports here
REFARGSEXCLUDE * #Reject all ref arguments, to prevent
seg fault with 12 months of data, then
REFARGSINCLUDE /search* #accept only the one for Google.
SEARCHENGINE http://*.google.com/* q,as_q,as_oq,as_epq,query
SEARCHENGINE http://*.google.co.*/* q,as_q,as_oq,as_epq,query
SEARCHENGINE http://*.google.com.*/* q,as_q,as_oq,as_epq,query
This didn't produce a seg fault (with the default REFLOWMEM 0 setting),
but also didn't produce a Search Query or Search Word report. An example
of some logs that I think should have made it into my reports include:
kevinz@cn2:/opt/analog/conf.d$ fgrep www.google.com/search
/opt/analog/logdata/web1/access_log.20071231 |head -5
ABTS-NCR-Dynamic-013.35.163.122.airtelbroadband.in - -
[31/Dec/2007:00:55:13 -0500] "GET
/igwg/presentations/Monday/SubplenB/PromotionMale.pdf HTTP/1.1" 200
44424
"http://www.google.com/search?q=graduate+housewives+in+india&hl=en&rlz=1
T4GGLJ_en-GBIN214IN214&start=20&sa=N" "Mozilla/4.0 (compatible; MSIE
7.0; Windows NT 5.1)"
85.185.229.106 - - [31/Dec/2007:00:55:43 -0500] "GET /pubs/sp/20/20.pdf
HTTP/1.0" 200 466095
"http://www.google.com/search?hl=fa&q=AIDS%2BPDF&btnG=%D8%AC%D8%B3%D8%AA
%D8%AC%D9%88%D9%8A+Google&lr=" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1)"
66.249.85.131 - - [31/Dec/2007:00:57:48 -0500] "GET
/asia/bangladesh/nsdp.shtml HTTP/1.1" 200 20061
"http://www.google.com/search?q=child+delivery+video&hl=en&start=70&sa=N
" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
c-98-204-115-120.hsd1.dc.comcast.net - - [31/Dec/2007:00:58:02 -0500]
"GET /pubs/ HTTP/1.1" 200 30575
"http://www.google.com/search?q=jhccp&ie=utf-8&oe=utf-8&aq=t&rls=org.moz
illa:en-US:official&client=firefox-a" "Mozilla/5.0 (Macintosh; U; Intel
Mac OS X; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
pool-71-182-79-153.ptldor.fios.verizon.net - - [31/Dec/2007:01:23:25
-0500] "GET /quality/expo.shtml HTTP/1.1" 200 10440
"http://www.google.com/search?hl=en&q=putting+quality+first&btnG=Search"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11)
Gecko/20071127 Firefox/2.0.0.11"
kevinz@cn2:/opt/analog/conf.d$
Can anyone offer me any advice on what I can try to generate these
report?
Thank you in advance for your help and suggestions.
-Kevin
Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland 21202
410-659-6139
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
report, but am having some trouble due to memory, I believe.
I'm running Analog on a Dell PowerEdge 2450/600 with 512MB RAM. When I
analyze my monthly logs, with about 2.1 million log lines, I'm able to
use the extensive lists of SearchEngines.txt or SearchQuery.txt without
problems; the report is generated just as I expect. It takes about 16.5
minutes to generate.
However, when I try to do the same with a whole year's volume of log
entries (about 25 million lines) I get a segmentation fault. I was able
to prevent this by setting "REFLOWMEM 3". REFLOWMEM 2 still creates a
seg fault. But, REFLOWMEM 3 prevents any Search Query or Search Word
report.
My first question is: Am I completely screwed at this point, and nothing
short of adding more memory to the server will help?
I thought that maybe I could generate a Search report by not using the
entire lists available in SearchEngines.txt and SearchQuery.txt.
Instead, I'm trying to just look at the top ten search engines that
refer to my site. I started with Google. I entered this in my Analog
config file:
# Creating Search Query and Word reports here
REFARGSEXCLUDE * #Reject all ref arguments, to prevent
seg fault with 12 months of data, then
REFARGSINCLUDE /search* #accept only the one for Google.
SEARCHENGINE http://*.google.com/* q,as_q,as_oq,as_epq,query
SEARCHENGINE http://*.google.co.*/* q,as_q,as_oq,as_epq,query
SEARCHENGINE http://*.google.com.*/* q,as_q,as_oq,as_epq,query
This didn't produce a seg fault (with the default REFLOWMEM 0 setting),
but also didn't produce a Search Query or Search Word report. An example
of some logs that I think should have made it into my reports include:
kevinz@cn2:/opt/analog/conf.d$ fgrep www.google.com/search
/opt/analog/logdata/web1/access_log.20071231 |head -5
ABTS-NCR-Dynamic-013.35.163.122.airtelbroadband.in - -
[31/Dec/2007:00:55:13 -0500] "GET
/igwg/presentations/Monday/SubplenB/PromotionMale.pdf HTTP/1.1" 200
44424
"http://www.google.com/search?q=graduate+housewives+in+india&hl=en&rlz=1
T4GGLJ_en-GBIN214IN214&start=20&sa=N" "Mozilla/4.0 (compatible; MSIE
7.0; Windows NT 5.1)"
85.185.229.106 - - [31/Dec/2007:00:55:43 -0500] "GET /pubs/sp/20/20.pdf
HTTP/1.0" 200 466095
"http://www.google.com/search?hl=fa&q=AIDS%2BPDF&btnG=%D8%AC%D8%B3%D8%AA
%D8%AC%D9%88%D9%8A+Google&lr=" "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1)"
66.249.85.131 - - [31/Dec/2007:00:57:48 -0500] "GET
/asia/bangladesh/nsdp.shtml HTTP/1.1" 200 20061
"http://www.google.com/search?q=child+delivery+video&hl=en&start=70&sa=N
" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)"
c-98-204-115-120.hsd1.dc.comcast.net - - [31/Dec/2007:00:58:02 -0500]
"GET /pubs/ HTTP/1.1" 200 30575
"http://www.google.com/search?q=jhccp&ie=utf-8&oe=utf-8&aq=t&rls=org.moz
illa:en-US:official&client=firefox-a" "Mozilla/5.0 (Macintosh; U; Intel
Mac OS X; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
pool-71-182-79-153.ptldor.fios.verizon.net - - [31/Dec/2007:01:23:25
-0500] "GET /quality/expo.shtml HTTP/1.1" 200 10440
"http://www.google.com/search?hl=en&q=putting+quality+first&btnG=Search"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11)
Gecko/20071127 Firefox/2.0.0.11"
kevinz@cn2:/opt/analog/conf.d$
Can anyone offer me any advice on what I can try to generate these
report?
Thank you in advance for your help and suggestions.
-Kevin
Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland 21202
410-659-6139
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------