Mailing List Archive

HTMLDocument with META tags?
Has anybody written a replacement for HTMLDocument that parses META tags
and adds them as document fields?

Don't know enough about JavaCC to do it myself... :(

Thanks!

Matt


--
|
Matt Chaput | A l i a s | W a v e f r o n t
Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com | (416) 874-8268
|
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: HTMLDocument with META tags? [ In reply to ]
The HTMLParser.jj should already do that.

Otis

--- mchaput <mchaput@aw.sgi.com> wrote:
>
> Has anybody written a replacement for HTMLDocument that parses META
> tags
> and adds them as document fields?
>
> Don't know enough about JavaCC to do it myself... :(
>
> Thanks!
>
> Matt
>
>
> --
> |
> Matt Chaput | A l i a s | W a v e f r o n t
> Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7
> mchaput@aw.sgi.com | (416) 874-8268
> |
> "A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: HTMLDocument with META tags? [ In reply to ]
Otis Gospodnetic wrote:
> The HTMLParser.jj should already do that.
>
> Otis

The version I have doesn't seem to (no "meta" in the source code at
all), or is there a trick to getting them out? Or is it in a newer
version of Lucene than I have?

Sorry to bother, but it would solve a lot of problems for me if it
really is in there.

Cheers,

Matt



--
|
Matt Chaput | A l i a s | W a v e f r o n t
Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com | (416) 874-8268
|
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: HTMLDocument with META tags? [ In reply to ]
Have a look at the source for org.apache.lucene.demo.html.HTMLParser.jj

It stores the META tags in a Properties object that you can access via the getMetaTags() method.

The Document(File f) method of org.apache.lucene.demo.HTMLDocument is the one making the Document objects to store in the index. It does not add the meta tags to the index. You will either need to modify that or create your own document objects and index using the HTMLParser class or some other tool that parses your HTML files for you.

Eric

-----Original Message-----
From: mchaput [mailto:mchaput@aw.sgi.com]
Sent: Monday, December 09, 2002 11:40 AM
To: Lucene Developers List
Subject: Re: HTMLDocument with META tags?


Otis Gospodnetic wrote:
> The HTMLParser.jj should already do that.
>
> Otis

The version I have doesn't seem to (no "meta" in the source code at
all), or is there a trick to getting them out? Or is it in a newer
version of Lucene than I have?

Sorry to bother, but it would solve a lot of problems for me if it
really is in there.

Cheers,

Matt



--
|
Matt Chaput | A l i a s | W a v e f r o n t
Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7
mchaput@aw.sgi.com | (416) 874-8268
|
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: HTMLDocument with META tags? [ In reply to ]
This is really stuff tht should be on lucene-user, not -dev, but....
Yes, you probably have an older version. You didn't say what you have.
Anyhow, you can get the stuff out of CVS, look in the .jj file for meta
and you should find it.

Otis

--- mchaput <mchaput@aw.sgi.com> wrote:
> Otis Gospodnetic wrote:
> > The HTMLParser.jj should already do that.
> >
> > Otis
>
> The version I have doesn't seem to (no "meta" in the source code at
> all), or is there a trick to getting them out? Or is it in a newer
> version of Lucene than I have?
>
> Sorry to bother, but it would solve a lot of problems for me if it
> really is in there.
>
> Cheers,
>
> Matt
>
>
>
> --
> |
> Matt Chaput | A l i a s | W a v e f r o n t
> Information Designer | 210 King St. E. Toronto, ON, Canada M5A 1J7
> mchaput@aw.sgi.com | (416) 874-8268
> |
> "A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>