Mailing List Archive: Re: Lucene has moved to Jakarta

Re: Lucene has moved to Jakarta

nelson at monkey

Oct 2, 2001, 5:36 PM

Post #1 of 6 (2788 views)

Congratulations on the move! The new site looks up and smooth. I know
it's a lot of work to do this stuff, thanks to all the Apache folks
behind the scenes who made it happen. The software collection at
Jakarta just gets better and better.

As near as I can see, the two major changes for 1.2-rc1 are:
switch to org.apache.lucene package names.
Apache license instead of LGPL.

Sometime when someone has a chance, I'd love to hear a bit about what
plans there are for Lucene development.

nelson@monkey.org
. . . . . . . . http://www.media.mit.edu/~nelson/

RE: Lucene has moved to Jakarta [ In reply to ]

DCutting at grandcentral

Oct 5, 2001, 11:42 AM

Post #2 of 6 (2720 views)

> From: nelson@monkey.org [mailto:nelson@monkey.org]
>
> Congratulations on the move!

Thanks!

> As near as I can see, the two major changes for 1.2-rc1 are:
> switch to org.apache.lucene package names.
> Apache license instead of LGPL.

Yes. Thanks for pointing these out. These are big incompatible changes
that I forgot to mention.

Other changes since 1.01b include:
- ant-only build -- no more makefiles
- addition of lock files--now fully thread & process safe
- addition of German stemmer
- MultiSearcher now supports low-level search API
- added RangeQuery, for term-range searching
- Analyzers can choose tokenizer based on field name
- misc bug fixes.

I need to work up detailed release notes for the final 1.2 release.

> Sometime when someone has a chance, I'd love to hear a bit about what
> plans there are for Lucene development.

Let's see, some short term tasks for the 1.2 release:
- get source code back into releases
- clean up example code
- write release notes

Some mid-term tasks:
- add contributed Chinese analyzers
- add Hits.SetOrdering() support
- add some term highlighting support

Longer term tasks:
- add JDBC-based Directory
- optimize simple conjunctive queries
- optionally store document vectors in index

Have I missed your favorite?

Doug

RE: Lucene has moved to Jakarta [ In reply to ]

keng.wong at verizon

Oct 5, 2001, 2:12 PM

Post #3 of 6 (2716 views)

How about adding filters for different file types such as
-HTML (there is one in the demo already)
-XML
-PDF
-MsWord/RTF
-other common file formats
THanks.

-william

-----Original Message-----
From: Doug Cutting [mailto:DCutting@grandcentral.com]
Sent: Friday, October 05, 2001 11:42 AM
To: 'nelson@monkey.org'; lucene-user@jakarta.apache.org
Subject: RE: Lucene has moved to Jakarta

> From: nelson@monkey.org [mailto:nelson@monkey.org]
>
> Congratulations on the move!

Thanks!

> As near as I can see, the two major changes for 1.2-rc1 are:
> switch to org.apache.lucene package names.
> Apache license instead of LGPL.

Yes. Thanks for pointing these out. These are big incompatible changes
that I forgot to mention.

Other changes since 1.01b include:
- ant-only build -- no more makefiles
- addition of lock files--now fully thread & process safe
- addition of German stemmer
- MultiSearcher now supports low-level search API
- added RangeQuery, for term-range searching
- Analyzers can choose tokenizer based on field name
- misc bug fixes.

I need to work up detailed release notes for the final 1.2 release.

> Sometime when someone has a chance, I'd love to hear a bit about what
> plans there are for Lucene development.

Let's see, some short term tasks for the 1.2 release:
- get source code back into releases
- clean up example code
- write release notes

Some mid-term tasks:
- add contributed Chinese analyzers
- add Hits.SetOrdering() support
- add some term highlighting support

Longer term tasks:
- add JDBC-based Directory
- optimize simple conjunctive queries
- optionally store document vectors in index

Have I missed your favorite?

Doug

RE: Lucene has moved to Jakarta [ In reply to ]

DCutting at grandcentral

Oct 5, 2001, 2:18 PM

Post #4 of 6 (2723 views)

> From: William Wong [mailto:keng.wong@verizon.net]
>
> How about adding filters for different file types such as
> -HTML (there is one in the demo already)
> -XML
> -PDF
> -MsWord/RTF
> -other common file formats

These would be great. Who will implement them?
I was only listing tasks that I plan to do.

I think the best API for such converters is a method that takes a
java.io.InputStream and returns a java.io.Reader containing plain text,
e.g.:
public static java.io.InputStream getText(java.io.Reader);
That way they can easily be used by Lucene analyzers.

Should we put converters in org.apache.lucene.document?

Contributions anyone?

Doug

RE: Lucene has moved to Jakarta [ In reply to ]

cyberjay10 at yahoo

Oct 6, 2001, 4:15 AM

Post #5 of 6 (2723 views)

How does i2 do it? http://www.i2a.com/websearch/ -
they list both an HTML parser and a PDF parser as part
of their solution.

J

--- Doug Cutting <DCutting@grandcentral.com> wrote:
> > From: William Wong [mailto:keng.wong@verizon.net]
> >
> > How about adding filters for different file types
> such as
> > -HTML (there is one in the demo already)
> > -XML
> > -PDF
> > -MsWord/RTF
> > -other common file formats
>
> These would be great. Who will implement them?
> I was only listing tasks that I plan to do.
>
> I think the best API for such converters is a method
> that takes a
> java.io.InputStream and returns a java.io.Reader
> containing plain text,
> e.g.:
> public static java.io.InputStream
> getText(java.io.Reader);
> That way they can easily be used by Lucene
> analyzers.
>
> Should we put converters in
> org.apache.lucene.document?
>
> Contributions anyone?
>
> Doug

__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

RE: Lucene has moved to Jakarta [ In reply to ]

Stephan.Strittmatter.ext at kst

Oct 10, 2001, 4:13 AM

Post #6 of 6 (2717 views)

Hello,

I think PDF-indexing would be the greatest after HTML
for any searcher used in web conditions!

I am also very interested in indexing PDFs! If anybody has
some ideas or supporting libs I would help to implement it!

Greetings, Stephan

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Friday, October 05, 2001 11:19 PM
> To: 'William Wong'; Lucene-user
> Subject: RE: Lucene has moved to Jakarta
>
>
> > From: William Wong [mailto:keng.wong@verizon.net]
> >
> > How about adding filters for different file types such as
> > -HTML (there is one in the demo already)
> > -XML
> > -PDF
> > -MsWord/RTF
> > -other common file formats
>
> These would be great. Who will implement them?
> I was only listing tasks that I plan to do.
>
> I think the best API for such converters is a method that takes a
> java.io.InputStream and returns a java.io.Reader containing
> plain text,
> e.g.:
> public static java.io.InputStream getText(java.io.Reader);
> That way they can easily be used by Lucene analyzers.
>
> Should we put converters in org.apache.lucene.document?
>
> Contributions anyone?
>
> Doug
>