Mailing List Archive

cvs commit: jakarta-lucene-sandbox/contributions/webcrawler-LARM CHANGES.txt
cmarschner 2002/06/17 17:49:57

Modified: contributions/webcrawler-LARM CHANGES.txt
Log:
added LuceneStorage

Revision Changes Path
1.3 +13 -1 jakarta-lucene-sandbox/contributions/webcrawler-LARM/CHANGES.txt

Index: CHANGES.txt
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/webcrawler-LARM/CHANGES.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- CHANGES.txt 1 Jun 2002 18:55:15 -0000 1.2
+++ CHANGES.txt 18 Jun 2002 00:49:57 -0000 1.3
@@ -1,4 +1,16 @@
-$id: $
+$Id$
+
+2002-06-18 (cmarschner)
+ * added an experimental version of Lucene storage. see FetcherMain.java for details how to use it
+ LuceneStorage simply saves all fields as specified in WebDocument. add a converter to the
+ storage pipeline before LuceneStorage to do preprocessing
+
+2002-06-17 (cmarschner)
+ * moved HostInfo and HostManager to larm.net package
+ * included URLNormalizer (todo: source code Docs)
+ * changed filters to use normalized URLs when appropriate;
+ logs contain normalized version of referer and URL now
+ (todo: change description of log format in technical_overview.rtf)

2002-06-01 (cmarschner)
* divided Storage into LinkStorage and DocumentStorage




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>