Mailing List Archive

HTML Analyzer & filter
Not to seem too lazy but I was just beginning to write an HTML Filter
and Analyzer and thought..."gee, I bet someone has done this already".
Are there any Apache/GPL HTML filters out there as a part of another
project or that anyone on this list would be willing to contribute.

Thanks


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: HTML Analyzer & filter [ In reply to ]
> -----Original Message-----
> From: David Black [mailto:black@apple.com]
> Sent: Tuesday, April 16, 2002 5:07 PM
> To: lucene-user@jakarta.apache.org
> Subject: HTML Analyzer & filter
>
>
> Not to seem too lazy but I was just beginning to write an HTML Filter
> and Analyzer and thought..."gee, I bet someone has done this
> already".
> Are there any Apache/GPL HTML filters out there as a part of another
> project or that anyone on this list would be willing to contribute.
>
> Thanks
>
>


I'm afraid I don't understand what you really want. If you want to parse HTML files I suggest that you should see javax.swing.text.html package. I used it to exctract text and some metadata from HTML files.

peter

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: HTML Analyzer & filter [ In reply to ]
Perfect....thank you. I had my blinders on...I should read the entire
API.


On Tuesday, April 16, 2002, at 11:29 AM, Halácsy Péter wrote:

>
>
>> -----Original Message-----
>> From: David Black [mailto:black@apple.com]
>> Sent: Tuesday, April 16, 2002 5:07 PM
>> To: lucene-user@jakarta.apache.org
>> Subject: HTML Analyzer & filter
>>
>>
>> Not to seem too lazy but I was just beginning to write an HTML Filter
>> and Analyzer and thought..."gee, I bet someone has done this
>> already".
>> Are there any Apache/GPL HTML filters out there as a part of another
>> project or that anyone on this list would be willing to contribute.
>>
>> Thanks
>>
>>
>
>
> I'm afraid I don't understand what you really want. If you want to
> parse HTML files I suggest that you should see javax.swing.text.html
> package. I used it to exctract text and some metadata from HTML files.
>
> peter
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>