Mailing List Archive

Fetcher Web Crawler: Technical Overview
Hi,

First, I want to thank you for putting the web crawler into the Lucene
sandbox. I'm looking forward to the future developments.

I have put together a technical overview of the Fetcher web crawler.
For those of you interested in it probably a starting point.

I hope you understand my English. I'm open for any comments concerning style
and grammar ;-)
I could imagine that some of Andrew's ideas could very well be included in a
future version as well. (http://www.trilug.org/~acoliver/luceneplan.html)

I will put this document in CVS as soon as possible.

Clemens


PS By the way, it's a Word document. Any preferences within the Jakarta
project...?
Re: Fetcher Web Crawler: Technical Overview [ In reply to ]
Thats cool . What I'd like to do is come up with an overall
architecture for crawlers, content handlers, etc... And make them
relatively pluggable.

-Andy

Clemens Marschner wrote:

>Hi,
>
>First, I want to thank you for putting the web crawler into the Lucene
>sandbox. I'm looking forward to the future developments.
>
>I have put together a technical overview of the Fetcher web crawler.
>For those of you interested in it probably a starting point.
>
>I hope you understand my English. I'm open for any comments concerning style
>and grammar ;-)
>I could imagine that some of Andrew's ideas could very well be included in a
>future version as well. (http://www.trilug.org/~acoliver/luceneplan.html)
>
>I will put this document in CVS as soon as possible.
>
>Clemens
>
>
>PS By the way, it's a Word document. Any preferences within the Jakarta
>project...?
>
>
>------------------------------------------------------------------------
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>