Mailing List Archive

cvs commit: jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc webcrawler_tech_overview.doc webcrawler_tech_overview.pdf
cmarschner 02/05/13 14:26:09

Modified: contributions/webcrawler-LARM README.txt
Added: contributions/webcrawler-LARM/doc
webcrawler_tech_overview.doc
webcrawler_tech_overview.pdf
Log:
added documentation

Revision Changes Path
1.2 +21 -12 jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt

Index: README.txt
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- README.txt 4 May 2002 14:32:24 -0000 1.1
+++ README.txt 13 May 2002 21:26:09 -0000 1.2
@@ -1,24 +1,33 @@
-$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $
+$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $

This is the README file for webcrawler-LARM contribution to Lucene Sandbox.

+This contribution requires:

-- This contribution requires:
- a) HTTPClient (not Jakarta's, but this one:
+a) HTTPClient.jar (not Jakarta's, but this one:
http://www.innovation.ch/java/HTTPClient/
b) Jakarta ORO package for regular expressions

-- The original archive file that I got from Clemens had ORO and
-HTTPClient in lib directory. I don't think we should include those
-there, so I took them out.
+Put the .jars into the lib directory.

-- This contribution also uses 3rd party (X?)HTML parser, which is
+Some of the HTTPClient source files will be replaced during the build, so they
+will be needed during the build. Sorry, I remember I couldn't do that with
+inheritance.
+
+- This contribution also uses portions of the HeX HTML parser, which is
included.
- I am not sure if Clemens' modified this parser in any way. If not,
-maybe we don't have to include it and can instead just add it to the
-list of required packages.

-- This code requires(?) JDK 1.4, as it uses assert keyword.
+OG> I am not sure if Clemens' modified this parser in any way. If not,
+OG> maybe we don't have to include it and can instead just add it to the
+OG> list of required packages.
+
+The parser was put upside down. Although it apparently still needs some
+of the original interfaces, most of them can probably be removed. I will check
+that out.
+
+OG> This code requires(?) JDK 1.4, as it uses assert keyword.

+No. It still contains a method called assert() for testing. I will probably
+rename this sometime (e.g. when changing the tests to JUnit).

-$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $
\ No newline at end of file
+$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $
\ No newline at end of file



1.1 jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.doc

<<Binary file>>


1.1 jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.pdf

<<Binary file>>



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>