Mailing List Archive

Indexing HTML with Lucene
Hi,

Is it necessary to strip the HTML tags from HTML documents BEFORE telling Lucene to index them? Does Lucene do this or will it index the tags too?!

Melissa
Re: Indexing HTML with Lucene [ In reply to ]
You have to do it yourself, at at least find code that does this. The
Lucene sample code has an HTML parser, and I've posted (to lucene-dev) an
alternative way of using JTidy to do this.

Erik

----- Original Message -----
From: "Melissa Mifsud" <melissamifsud@yahoo.com>
To: "Lucene User" <lucene-user@jakarta.apache.org>
Sent: Tuesday, March 05, 2002 9:14 AM
Subject: Indexing HTML with Lucene


Hi,

Is it necessary to strip the HTML tags from HTML documents BEFORE telling
Lucene to index them? Does Lucene do this or will it index the tags too?!

Melissa



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>