Mailing List Archive

Indexing Dynamically Generated Web Pages
Hello Lucene Users,

I am a novice developer researching Lucene for use on a web site that
primarily uses JSPs.

How do you index dynamically generated web pages with Lucene?
Or is it even possible?

The JSPs themselves don't have searchable data, only methods to get that
data.
When parsing these files for indexing, the necessary data for the search
would not yet be in the page.
The data upon which to search is generated only after certain parameters are
passed to the JSP ( resulting in the HTML output ).

Thanks in advance.

pax et bonum. p.


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Indexing Dynamically Generated Web Pages [ In reply to ]
You'd have to write a crawler that 'browses' your site just like a
person behind a web browser does.

Otis


--- Paul Friedman <paulsanfordfriedman@yahoo.com> wrote:
> Hello Lucene Users,
>
> I am a novice developer researching Lucene for use on a web site that
> primarily uses JSPs.
>
> How do you index dynamically generated web pages with Lucene?
> Or is it even possible?
>
> The JSPs themselves don't have searchable data, only methods to get
> that
> data.
> When parsing these files for indexing, the necessary data for the
> search
> would not yet be in the page.
> The data upon which to search is generated only after certain
> parameters are
> passed to the JSP ( resulting in the HTML output ).
>
> Thanks in advance.
>
> pax et bonum. p.
>
>
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Indexing Dynamically Generated Web Pages [ In reply to ]
Paul Friedman <paulsanfordfriedman@yahoo.com> asks:
> I am a novice developer researching Lucene for use on a web site
> that primarily uses JSPs. How do you index dynamically generated
> web pages with Lucene? Or is it even possible?
> The JSPs themselves don't have searchable data, only methods to get
> that data. When parsing these files for indexing, the necessary
> data for the search would not yet be in the page.

Lucene doesn't do any crawling for you - it's your responsibility
to write the code that "crawls" your website, i.e. selects the data
sources to be indexed, chooses how to convert them to Lucene
Documents, and adds them to the indexes. I suspect you're assuming
you would use the demo application. That would not be appropriate for
your project.

Instead, what you should do is:

1) write some java code that walks through your data source - i.e. if
your data source is a database, it would select each row in the
appropriate table - and composes a Lucene Document, and adds it to
your index.

2) write some servlets/jsps that do the search, and when it gets to
the point where you would redirect the user's browser off to the
appropriate HTML page, you either:

a) invoke the appropriate JSP in your site with appropriate
arguments

or

b) compose a page dynamically, roughly the same way the JSP page
would compose it, and return it to the user.

or

c) redirect the user to a new JSP page, which you will have to
write, which looks much like your old JSP page, but is
designed to be invoked with an argument specifying what
data to load.

You may find http://darksleep.com/puff/lucene/lucene.html useful
in further clarifying this. There is also a "getting started" article
at http://jakarta.apache.org/lucene/docs/index.html that may be of
some use.

Steven J. Owens
puff@darksleep.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>