Mailing List Archive: Proposal for Lucene

Re: Proposal for Lucene [ In reply to ]

Feb 9, 2002, 5:57 AM

Post #26 of 49 (1941 views)

On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote:
> Here it is. Released under APL (I kinda copied and pasted the license from
> some Fulcrum code). Some (current) limitations:
>
> 1. Only a single datasource is supported at this point in time (support for
> multiple datasources can be easily added through the configuration file and
> improving SearchConfiguration)
> 2. Documentation isn't really complete. (Is it ever?)
> 3. It's a filesystem-based indexer. It's not too difficult to decouple the
> filesystem bit and make it more generic, but I don't have a need for it
> presently.
> 4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried
> using outputstreams but they turned out to be quite a nightmare...

great I'll take a look at all of this when I get back next week (going
to Boston for a week, will be out of touch.)

> 5. There's a JDBCDatasource for indexing a table from databases (the table
> stores metadata of the file to index. There should still be some way to
> obtain the file to index. This ties back to 3.). I really ought to provide
> an example on how to use it...
>

What's that good for...? Wouldn't one just create an index on the
database?

> Questions and feedback are really welcome.
>
> I've attached the source-only version, but there's a full version (with
> libs) at http://www.relevanz.com/search_full.zip.
>
> ----- Original Message -----
> From: Andrew C. Oliver <acoliver@apache.org>
> To: Lucene Developers List <lucene-dev@jakarta.apache.org>
> Sent: Friday, February 08, 2002 9:18 PM
> Subject: Re: Proposal for Lucene
>
>
> > Is this open source? APL'd? Where can I look at it?
> >
> > -Andy
> >
> > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote:
> > > Great suggestions all around, and I'm pretty much in agreement with
> what's been said.
> > >
> > > In my app, I've built a mini-framework around the searching such that
> I'm able to map ContentHandlers (which index file contents) to file
> extensions. I've been wanting to clean it up and contribute it for awhile,
> but haven't overcome the intertia to do so. Also introduced a DataSource
> (which can pretty much be anything, like a filesystem, a database, a URL,
> etc) from which to obtain the data to index, so I think it _could_ be inline
> with what some of you have in mind.
> > >
> > > I could also use alot of feedback with what's been done too...
> > >
> > > So what's the plan to move forward?
> > >
> > > K
> > > ----- Original Message -----
> > > From: Mark Tucker
> > > To: Lucene Developers List
> > > Sent: Friday, February 08, 2002 4:03 AM
> > > Subject: RE: Proposal for Lucene
> > >
> > >
> > > I like what you included in your proposal and suggest doing all that
> (over time) and taking the following into consideration:
> > >
> > > Indexers/Crawlers
> > >
> > > General Settings
> > > SleeptimeBetweenCalls - can be used to avoid flooding a machine with
> too many requests
> > > IndexerTimeout - kill this crawler thread after long period of
> inactivity
> > > IncludeFilter - include only items matching filter
> > > ExcludeFilter - exclude items matching filter (can be used with
> IncludeFilter)
> > > MaxItems - stops indexing after x items
> > > MaxMegs - stops indexing after x MB of data
> > >
> > > File System Indexer
> > > URLReplacePrefix - can crawl c:\ but expose URL as
> http://mysever/docs/
> > >
> > > Web Indexer
> > > HTTPUser
> > > HTTPPassword
> > > HTTPUserAgent
> > > ProxyServer
> > > ProxyUser
> > > ProxyPassword
> > > HTTPSCertificate
> > > HTTPSPrivateKey
> > >
> > > Other Possible Indexers
> > > Microsoft Exchange 5.5/2000
> > > Lotus Notes
> > > Newsgroup (NNTP)
> > > Documentum
> > > ODBC/OLEDB
> > > XML - index single XML that represents multiple documents
> > >
> > >
> > > Document Factory
> > > General
> > > The minimum properties for each document should be:
> > > URL
> > > Title
> > > Abstract
> > > Full Text
> > > Score
> > >
> > > HTML
> > > Support for META tags including Dublic Core syntax
> > >
> > > Other Possible Document Factories
> > > Office Docs - DOC, XLS, PPT
> > > PDF
> > >
> > >
> > > Thanks for the great proposal.
> > >
> > > Mark Tucker
> > >
> > >
> > > -----Original Message-----
> > > From: Andrew C. Oliver [mailto:acoliver@apache.org]
> > > Sent: Thursday, February 07, 2002 5:35 AM
> > > To: Lucene Developers List
> > > Subject: Proposal for Lucene
> > >
> > >
> > > Hi All,
> > >
> > > This is just a few thoughts about Lucene. Please send me your
> feedback,
> > > critiques and thought.
> > >
> > > If you folks would take a look:
> > >
> > > http://www.trilug.org/~acoliver/luceneplan.html
> > >
> > > if you'd like to submit patches:
> > >
> > > http://www.trilug.org/~acoliver/luceneplan.xml
> > >
> > > Once I've gotten feedback from the developer community I'll send this
> to
> > > the user community as well.
> > >
> > > Thanks,
> > >
> > > Andy
> > > --
> > > www.superlinksoftware.com
> > > www.sourceforge.net/projects/poi - port of Excel format to java
> > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > > - fix java generics!
> > >
> > >
> > > The avalanche has already started. It is too late for the pebbles to
> > > vote.
> > > -Ambassador Kosh
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> > >
> > >
> > >
> > --
> > www.superlinksoftware.com
> > www.sourceforge.net/projects/poi - port of Excel format to java
> > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
> > - fix java generics!
> >
> >
> > The avalanche has already started. It is too late for the pebbles to
> > vote.
> > -Ambassador Kosh
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
> ----
>

> --
> To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
--
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
- fix java generics!

The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Proposal for Lucene [ In reply to ]

kelvin at relevanz

Feb 9, 2002, 5:58 AM

Post #27 of 49 (1943 views)

Mailing List Archive

Mailing List Archive

Attached Files: