Hi,
after three days of crwling the intranet, the nutch crawler throwed
an exception :-(
It seems that the crawler wants to do something with the .DS_store-
File from Mac OS X an he does not know how to handle it?
Can i re-initiate the clean-up without crawling the intranet again?
Regards,
Christian
050906 184831 Processing document 29000
050906 184833 Processing document 30000
050906 184835 Finishing update
050906 184845 Processing pagesByURL: Sorted 288601 instructions in
9.954 seconds.
050906 184845 Processing pagesByURL: Sorted 28993.46996182439
instructions/second
050906 184856 Processing pagesByURL: Merged to new DB containing
181590 records in 9.095 seconds
050906 184856 Processing pagesByURL: Merged 19965.915338097853
records/second
050906 184857 Processing pagesByMD5: Sorted 76461 instructions in
1.199 seconds.
050906 184857 Processing pagesByMD5: Sorted 63770.64220183486
instructions/second
050906 184904 Processing pagesByMD5: Merged to new DB containing
181590 records in 5.738 seconds
050906 184904 Processing pagesByMD5: Merged 31646.91530149878 records/
second
050906 184911 Processing linksByMD5: Sorted 286132 instructions in
7.354 seconds.
050906 184911 Processing linksByMD5: Sorted 38908.34919771553
instructions/second
050906 184940 Processing linksByMD5: Merged to new DB containing
1060091 records in 27.791 seconds
050906 184940 Processing linksByMD5: Merged 38145.11892339247 records/
second
050906 184943 Processing linksByURL: Sorted 145747 instructions in
3.082 seconds.
050906 184943 Processing linksByURL: Sorted 47289.74691758599
instructions/second
050906 185014 Processing linksByURL: Merged to new DB containing
1060091 records in 29.113 seconds
050906 185014 Processing linksByURL: Merged 36412.977020575 records/
second
050906 185017 Processing linksByMD5: Sorted 181123 instructions in
2.968 seconds.
050906 185017 Processing linksByMD5: Sorted 61025.26954177897
instructions/second
050906 185045 Processing linksByMD5: Merged to new DB containing
1060091 records in 26.092 seconds
050906 185045 Processing linksByMD5: Merged 40628.96673309827 records/
second
050906 185234 Update finished
050906 185235 Updating /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/segments from /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/db
050906 185235 reading /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/segments/.DS_Store
Exception in thread "main" java.io.FileNotFoundException: /Users/
caschoff/Desktop/nutch-0.7/crawl.uni.test/segments/.DS_Store/fetcher/
data
at org.apache.nutch.fs.LocalFileSystem.open
(LocalFileSystem.java:93)
at org.apache.nutch.io.SequenceFile$Reader.<init>
(SequenceFile.java:194)
at org.apache.nutch.io.SequenceFile$Reader.<init>
(SequenceFile.java:187)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:190)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:179)
at org.apache.nutch.io.ArrayFile$Reader.<init>
(ArrayFile.java:50)
at org.apache.nutch.tools.UpdateSegmentsFromDb.addSegment
(UpdateSegmentsFromDb.java:197)
at org.apache.nutch.tools.UpdateSegmentsFromDb.run
(UpdateSegmentsFromDb.java:182)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:147)
[2]- Exit 1 bin/nutch crawl urls -dir
crawl.uni.test -depth 10 1>&crawl.log
---
Dipl. Ing. (FH) Christian Aschoff
Büro:
Universität Ulm/KIZ
Raum O26/5403
Tel. 0731 50-22432
christian.aschoff@uni-ulm.de
Privat:
Fabristr. 13
89075 Ulm
Deutschland/Old Europe
Tel. 0731 60280360
Fax. 0731 60280361
caschoff@mac.com
Helfen Sie mit: www.meyers-konversationslexikon.de
after three days of crwling the intranet, the nutch crawler throwed
an exception :-(
It seems that the crawler wants to do something with the .DS_store-
File from Mac OS X an he does not know how to handle it?
Can i re-initiate the clean-up without crawling the intranet again?
Regards,
Christian
050906 184831 Processing document 29000
050906 184833 Processing document 30000
050906 184835 Finishing update
050906 184845 Processing pagesByURL: Sorted 288601 instructions in
9.954 seconds.
050906 184845 Processing pagesByURL: Sorted 28993.46996182439
instructions/second
050906 184856 Processing pagesByURL: Merged to new DB containing
181590 records in 9.095 seconds
050906 184856 Processing pagesByURL: Merged 19965.915338097853
records/second
050906 184857 Processing pagesByMD5: Sorted 76461 instructions in
1.199 seconds.
050906 184857 Processing pagesByMD5: Sorted 63770.64220183486
instructions/second
050906 184904 Processing pagesByMD5: Merged to new DB containing
181590 records in 5.738 seconds
050906 184904 Processing pagesByMD5: Merged 31646.91530149878 records/
second
050906 184911 Processing linksByMD5: Sorted 286132 instructions in
7.354 seconds.
050906 184911 Processing linksByMD5: Sorted 38908.34919771553
instructions/second
050906 184940 Processing linksByMD5: Merged to new DB containing
1060091 records in 27.791 seconds
050906 184940 Processing linksByMD5: Merged 38145.11892339247 records/
second
050906 184943 Processing linksByURL: Sorted 145747 instructions in
3.082 seconds.
050906 184943 Processing linksByURL: Sorted 47289.74691758599
instructions/second
050906 185014 Processing linksByURL: Merged to new DB containing
1060091 records in 29.113 seconds
050906 185014 Processing linksByURL: Merged 36412.977020575 records/
second
050906 185017 Processing linksByMD5: Sorted 181123 instructions in
2.968 seconds.
050906 185017 Processing linksByMD5: Sorted 61025.26954177897
instructions/second
050906 185045 Processing linksByMD5: Merged to new DB containing
1060091 records in 26.092 seconds
050906 185045 Processing linksByMD5: Merged 40628.96673309827 records/
second
050906 185234 Update finished
050906 185235 Updating /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/segments from /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/db
050906 185235 reading /Users/caschoff/Desktop/nutch-0.7/
crawl.uni.test/segments/.DS_Store
Exception in thread "main" java.io.FileNotFoundException: /Users/
caschoff/Desktop/nutch-0.7/crawl.uni.test/segments/.DS_Store/fetcher/
data
at org.apache.nutch.fs.LocalFileSystem.open
(LocalFileSystem.java:93)
at org.apache.nutch.io.SequenceFile$Reader.<init>
(SequenceFile.java:194)
at org.apache.nutch.io.SequenceFile$Reader.<init>
(SequenceFile.java:187)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:190)
at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:179)
at org.apache.nutch.io.ArrayFile$Reader.<init>
(ArrayFile.java:50)
at org.apache.nutch.tools.UpdateSegmentsFromDb.addSegment
(UpdateSegmentsFromDb.java:197)
at org.apache.nutch.tools.UpdateSegmentsFromDb.run
(UpdateSegmentsFromDb.java:182)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:147)
[2]- Exit 1 bin/nutch crawl urls -dir
crawl.uni.test -depth 10 1>&crawl.log
---
Dipl. Ing. (FH) Christian Aschoff
Büro:
Universität Ulm/KIZ
Raum O26/5403
Tel. 0731 50-22432
christian.aschoff@uni-ulm.de
Privat:
Fabristr. 13
89075 Ulm
Deutschland/Old Europe
Tel. 0731 60280360
Fax. 0731 60280361
caschoff@mac.com
Helfen Sie mit: www.meyers-konversationslexikon.de