Mailing List Archive

RE: Indexes in WAR files
Hi Erik,

Reading the servlet spec again it says that calls such as
servletcontext.getRealPath() will *possibly* return null if the content is
being served from a war as opposed the physical path on disk - I'm informed
that weblogic actually returns the name of the warfile and not the exploded
location. But you're right, Tomcat works differently.

So in order to isolate from different interpretations of the spec, I'm going
to knock up a WARDirectory that probably will wrap a RAMDirectory (going
back to the servlet container to getResourceAsStream seems awfully expensive
to me) as a first go.
I'll post my efforts in a couple of days.


Any comments on my thoughts for using the decorator pattern with
Directory(s) (below)?


Bye,

Les




> -----Original Message-----
> From: Erik Hatcher [mailto:lists@ehatchersolutions.com]
> Sent: 12 February 2002 14:50
> To: Lucene Developers List
> Subject: Re: Indexes in WAR files (was RE: [SUBMIT] docweb demo app)
>
>
> So you're talking about a servlet container that does not
> expand the WAR
> into filesystem files? Hmmm, I haven't encountered that scenario, but
> surely sounds like you're right in that it won't work without
> modifications
> to Lucene to pull an index from a "resource".
>
> I've been embedding indexes into WAR files for quite a while
> and it works
> fine with both Resin and Tomcat 4 - but both of those expand
> the WAR file
> into a filesystem tree.
>
> Erik
>
>
> ----- Original Message -----
> From: "Les Hughes" <leslie.hughes@rubus.com>
> To: "''Lucene Developers List' '" <lucene-dev@jakarta.apache.org>
> Sent: Tuesday, February 12, 2002 5:30 AM
> Subject: Indexes in WAR files (was RE: [SUBMIT] docweb demo app)
>
>
> >
> >
> >
> > Thanks Erik.
> >
> > Not too sure my question was clear. What I'm trying to do
> is to create a
> > searchable reference info system (ie bunch of
> documentation...:-) that is
> > packaged into a warfile, dropped into a servlet container
> and run 'in
> place'
> > In other words, I can't have the index on the filesystem
> but instead, it's
> > prebuilt and packaged into the WAR file.
> >
> > So at the moment, Lucene on seems to be able to access file
> system indexes
> > (via the FSDirectory) or RAM based indexes via the
> RAMDirectory (which I
> > understand has a few problems with arrayindexoutofbounds
> exceptions...)
> >
> > What I was thinking about doing was twofold.
> >
> > 1) Modify the Directory classes so that they follow more of
> a decorator
> > pattern (as in Directory dir = new RAMDirectory(new XXXDirectory(new
> > FSDirectory()) instead of the manual initialisation code
> that I've found
> on
> > the list
> >
> (http://www.mail-archive.com/lucene-user@jakarta.apache.org/ms
> g00196.html)
> > and
> >
> > 2) Create a WARDirectory for *readonly* access to a
> pre-built index stored
> > in a WAR.
> >
> >
> > So, am I barking up the wrong tree?
> >
> >
> > Bye,
> >
> > Les
> >
> >
> >
> > -----Original Message-----
> > From: Erik Hatcher
> > To: 'Lucene Developers List'
> > Sent: 2/11/02 10:36 PM
> > Subject: Re: [SUBMIT] docweb demo app
> >
> > ----- Original Message -----
> > From: "Les Hughes" <leslie.hughes@rubus.com>
> >
> > > 1) Thanks for the Ant task - that'll save me some time on
> the train
> > tomorrow
> > > ;-)
> >
> > You're welcome!
> >
> > > 2) Maybe I need to RTFM but does the index run straight
> out of the WAR
> > file?
> > > I was thinking about creating a WARDirectory to do this
> but if it's
> > possible
> > > some other way I'd be interested to hear.
> >
> > The index is built when someone runs it manually from Lucene's CVS
> > directory
> > locally currently. The idea is that it will be incorporated into
> > distribution builds (perhaps as a separate download since
> its about a
> > 2MB
> > WAR) in the future. What I patched doesn't make it automatically
> > happen...
> > that comes after there is lucene-dev signoff and it gets
> rolled into the
> > main distribution build dependency graph.
> >
> > Erik
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Indexes in WAR files [ In reply to ]
Les,

I look forward to the results of your work on this!

Erik


----- Original Message -----
From: "Les Hughes" <leslie.hughes@rubus.com>
To: "'Lucene Developers List'" <lucene-dev@jakarta.apache.org>
Sent: Tuesday, February 12, 2002 2:21 PM
Subject: RE: Indexes in WAR files


>
> Hi Erik,
>
> Reading the servlet spec again it says that calls such as
> servletcontext.getRealPath() will *possibly* return null if the content is
> being served from a war as opposed the physical path on disk - I'm
informed
> that weblogic actually returns the name of the warfile and not the
exploded
> location. But you're right, Tomcat works differently.
>
> So in order to isolate from different interpretations of the spec, I'm
going
> to knock up a WARDirectory that probably will wrap a RAMDirectory (going
> back to the servlet container to getResourceAsStream seems awfully
expensive
> to me) as a first go.
> I'll post my efforts in a couple of days.
>
>
> Any comments on my thoughts for using the decorator pattern with
> Directory(s) (below)?
>
>
> Bye,
>
> Les
>
>
>
>
> > -----Original Message-----
> > From: Erik Hatcher [mailto:lists@ehatchersolutions.com]
> > Sent: 12 February 2002 14:50
> > To: Lucene Developers List
> > Subject: Re: Indexes in WAR files (was RE: [SUBMIT] docweb demo app)
> >
> >
> > So you're talking about a servlet container that does not
> > expand the WAR
> > into filesystem files? Hmmm, I haven't encountered that scenario, but
> > surely sounds like you're right in that it won't work without
> > modifications
> > to Lucene to pull an index from a "resource".
> >
> > I've been embedding indexes into WAR files for quite a while
> > and it works
> > fine with both Resin and Tomcat 4 - but both of those expand
> > the WAR file
> > into a filesystem tree.
> >
> > Erik
> >
> >
> > ----- Original Message -----
> > From: "Les Hughes" <leslie.hughes@rubus.com>
> > To: "''Lucene Developers List' '" <lucene-dev@jakarta.apache.org>
> > Sent: Tuesday, February 12, 2002 5:30 AM
> > Subject: Indexes in WAR files (was RE: [SUBMIT] docweb demo app)
> >
> >
> > >
> > >
> > >
> > > Thanks Erik.
> > >
> > > Not too sure my question was clear. What I'm trying to do
> > is to create a
> > > searchable reference info system (ie bunch of
> > documentation...:-) that is
> > > packaged into a warfile, dropped into a servlet container
> > and run 'in
> > place'
> > > In other words, I can't have the index on the filesystem
> > but instead, it's
> > > prebuilt and packaged into the WAR file.
> > >
> > > So at the moment, Lucene on seems to be able to access file
> > system indexes
> > > (via the FSDirectory) or RAM based indexes via the
> > RAMDirectory (which I
> > > understand has a few problems with arrayindexoutofbounds
> > exceptions...)
> > >
> > > What I was thinking about doing was twofold.
> > >
> > > 1) Modify the Directory classes so that they follow more of
> > a decorator
> > > pattern (as in Directory dir = new RAMDirectory(new XXXDirectory(new
> > > FSDirectory()) instead of the manual initialisation code
> > that I've found
> > on
> > > the list
> > >
> > (http://www.mail-archive.com/lucene-user@jakarta.apache.org/ms
> > g00196.html)
> > > and
> > >
> > > 2) Create a WARDirectory for *readonly* access to a
> > pre-built index stored
> > > in a WAR.
> > >
> > >
> > > So, am I barking up the wrong tree?
> > >
> > >
> > > Bye,
> > >
> > > Les
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Erik Hatcher
> > > To: 'Lucene Developers List'
> > > Sent: 2/11/02 10:36 PM
> > > Subject: Re: [SUBMIT] docweb demo app
> > >
> > > ----- Original Message -----
> > > From: "Les Hughes" <leslie.hughes@rubus.com>
> > >
> > > > 1) Thanks for the Ant task - that'll save me some time on
> > the train
> > > tomorrow
> > > > ;-)
> > >
> > > You're welcome!
> > >
> > > > 2) Maybe I need to RTFM but does the index run straight
> > out of the WAR
> > > file?
> > > > I was thinking about creating a WARDirectory to do this
> > but if it's
> > > possible
> > > > some other way I'd be interested to hear.
> > >
> > > The index is built when someone runs it manually from Lucene's CVS
> > > directory
> > > locally currently. The idea is that it will be incorporated into
> > > distribution builds (perhaps as a separate download since
> > its about a
> > > 2MB
> > > WAR) in the future. What I patched doesn't make it automatically
> > > happen...
> > > that comes after there is lucene-dev signoff and it gets
> > rolled into the
> > > main distribution build dependency graph.
> > >
> > > Erik
> > >
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Indexes in WAR files [ In reply to ]
> From: Les Hughes [mailto:leslie.hughes@rubus.com]
>
> Reading the servlet spec again it says that calls such as
> servletcontext.getRealPath() will *possibly* return null if
> the content is
> being served from a war as opposed the physical path on disk
> - I'm informed
> that weblogic actually returns the name of the warfile and
> not the exploded
> location. But you're right, Tomcat works differently.

What kind of URL does weblogic return for
servletContext.getResource("//index/segments")?
Is it a file: URL?

Keeping the index in files and using FSDirectory will be much more
efficient. If all the major servlet containers support this it would be a
shame not to take advantage of it. You might look at the result of
getResource and use an FSDirectory if a file: url is returned, and do
something else when it's not.

> So in order to isolate from different interpretations of the
> spec, I'm going
> to knock up a WARDirectory that probably will wrap a
> RAMDirectory (going
> back to the servlet container to getResourceAsStream seems
> awfully expensive
> to me) as a first go.
> I'll post my efforts in a couple of days.

One technique you might consider is, when the index is not available as a
file, use getResourceAsStream to copy it to a temporary directory in
System.getProperty("java.io.tmpdir"), then use FSDirectory. Storing the
whole index in a RAMDirectory will make searches really fast, but could also
chew up a lot of memory. If the index isn't that big anyway, maybe this
isn't an issue.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: RE: Indexes in WAR files [ In reply to ]
I don't think you should do all that in the configuration jsp page... Somewhere..fine.. I'm pretty opposed to using getRealPath() on whole. Its evil. It would seem rather there should be an index reader that can take a war file directory as input for exactly this case... It can do the stream and all under the covers...

-Andy

Original Message:
-----------------
From: Doug Cutting DCutting@grandcentral.com
Date: Thu, 14 Feb 2002 10:00:24 -0800
To: lucene-dev@jakarta.apache.org
Subject: RE: Indexes in WAR files


> From: Les Hughes [mailto:leslie.hughes@rubus.com]
>
> Reading the servlet spec again it says that calls such as
> servletcontext.getRealPath() will *possibly* return null if
> the content is
> being served from a war as opposed the physical path on disk
> - I'm informed
> that weblogic actually returns the name of the warfile and
> not the exploded
> location. But you're right, Tomcat works differently.

What kind of URL does weblogic return for
servletContext.getResource("//index/segments")?
Is it a file: URL?

Keeping the index in files and using FSDirectory will be much more
efficient. If all the major servlet containers support this it would be a
shame not to take advantage of it. You might look at the result of
getResource and use an FSDirectory if a file: url is returned, and do
something else when it's not.

> So in order to isolate from different interpretations of the
> spec, I'm going
> to knock up a WARDirectory that probably will wrap a
> RAMDirectory (going
> back to the servlet container to getResourceAsStream seems
> awfully expensive
> to me) as a first go.
> I'll post my efforts in a couple of days.

One technique you might consider is, when the index is not available as a
file, use getResourceAsStream to copy it to a temporary directory in
System.getProperty("java.io.tmpdir"), then use FSDirectory. Storing the
whole index in a RAMDirectory will make searches really fast, but could also
chew up a lot of memory. If the index isn't that big anyway, maybe this
isn't an issue.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Indexes in WAR files [ In reply to ]
Hi Doug,

I haven't verified it but another one of our techies tells me that WLS
returns the name of the warfile that the webapp is sourced from without any
path info. I do need to check this for WLS 6.1 though ;-)

For my app, the index is only a few tens of megs so RAM isn't a problem.
Once I've sorted that version of WARDirectory out I'll probably annd
something that is either config's from the deployment descriptor
(use-ramdirectory = true kindof thing) or just unwar's the index into a temp
dir as you say. I'm guessing but since most Unix boxes use swap for /tmp
then this'll be nearly as fast as a RAMDirextory?

Either way, at the moment I'm grabbing a list of resources stored under (for
example /WEB-INF/index) doing a getResourceAsStream and then creating each
"file" in a RAMDirectory - in a similar way to your suggestion.



Hope to release something early next week.

Bye,

Les


> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: 14 February 2002 18:00
> To: 'Lucene Developers List'
> Subject: RE: Indexes in WAR files
>
>
> > From: Les Hughes [mailto:leslie.hughes@rubus.com]
> >
> > Reading the servlet spec again it says that calls such as
> > servletcontext.getRealPath() will *possibly* return null if
> > the content is
> > being served from a war as opposed the physical path on disk
> > - I'm informed
> > that weblogic actually returns the name of the warfile and
> > not the exploded
> > location. But you're right, Tomcat works differently.
>
> What kind of URL does weblogic return for
> servletContext.getResource("//index/segments")?
> Is it a file: URL?
>
> Keeping the index in files and using FSDirectory will be much more
> efficient. If all the major servlet containers support this
> it would be a
> shame not to take advantage of it. You might look at the result of
> getResource and use an FSDirectory if a file: url is returned, and do
> something else when it's not.
>
> > So in order to isolate from different interpretations of the
> > spec, I'm going
> > to knock up a WARDirectory that probably will wrap a
> > RAMDirectory (going
> > back to the servlet container to getResourceAsStream seems
> > awfully expensive
> > to me) as a first go.
> > I'll post my efforts in a couple of days.
>
> One technique you might consider is, when the index is not
> available as a
> file, use getResourceAsStream to copy it to a temporary directory in
> System.getProperty("java.io.tmpdir"), then use FSDirectory.
> Storing the
> whole index in a RAMDirectory will make searches really fast,
> but could also
> chew up a lot of memory. If the index isn't that big anyway,
> maybe this
> isn't an issue.
>
> Doug
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>