Mailing List Archive: Call for features in next release

Call for features in next release

May 20, 2002, 11:19 AM

Post #1 of 10 (1224 views)

Hi,

I would like be able to have the features that we are planning on having for
the next release.

So far this seems to be:

Vector Term Support
Support for Search Term Highlighting

Do people have any other functionality they are thinking of adding?

Thanks

--Peter

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Call for features in next release [ In reply to ]

cutting at lucene

May 20, 2002, 11:49 AM

Post #2 of 10 (1194 views)

Permalink

Here are a few others:

- Better support for hits sorted by things other than score. An easy,
efficient case is to support results sorted by the order documents were
added to the index. A little harder and less efficient is support for
results sorted by an arbitrary field.

- Add ability to "boost" individual documents/fields. When a document is
indexed, a numeric "boost" value could be specified for the whole document,
and/or for individual fields. This value would be multipled into scores for
hits on this document. This would facilitate the implementation of things
like Google's pagerank.

- Add to FSDirectory the ability to specify where lock files live and to
disable the use of lock files altogether (for read-only media).

- Add some requested methods:
String[] Document.getValues(String fieldName);
String[] IndexReader.getIndexedFields();
void Token.setPositionIncrement(int);

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Call for features in next release [ In reply to ]

otis_gospodnetic at yahoo

May 21, 2002, 6:15 AM

Post #3 of 10 (1193 views)

Permalink

I would also add the following to the list:
- Peter Halacsy's changes to the QueryParser that, I believe, make it
possible to programmatically specify a default operator (OR or AND).

- The recenly submitted code that allows for queries such as "Microsoft
suc*" to match "Microsoft success" and "Microsoft sucks".

- Alex Murzaku contributed some code for dealing with Russian.

- A lady from Finland submitted code for handling Finnish.

I think these could/should be added to Lucene if they pass the test.

Otis

--- Peter Carlson <carlson@bookandhammer.com> wrote:
> Hi,
>
> I would like be able to have the features that we are planning on
> having for
> the next release.
>
> So far this seems to be:
>
> Vector Term Support
> Support for Search Term Highlighting
>
> Do people have any other functionality they are thinking of adding?
>
> Thanks
>
> --Peter
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Call for features in next release [ In reply to ]

lee at grantadesign

May 21, 2002, 6:33 AM

Post #4 of 10 (1202 views)

Permalink

On Mon, 2002-05-20 at 19:19, Peter Carlson wrote:
> I would like be able to have the features that we are planning on having for
> the next release.
..snip..
> Do people have any other functionality they are thinking of adding?

It could be that this is impossible due to Lucene internals, but:

The ability to create IndexReader and IndexSearcher objects from an
FSDirectory whose files on the Real Filesystem are read only, (this
fails at the moment with Permission Denied errors).

In fact, that's more a feature request than something I plan to work on,
but given the right hints from other developers, I may be able to work
on it.

Lee.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Call for features in next release [ In reply to ]

otis_gospodnetic at yahoo

May 21, 2002, 7:18 AM

Post #5 of 10 (1208 views)

Permalink

I believe 1 or 2 people have already contributed code for this, which
they had to write in order to use Lucene on a CD-ROM.
It's just a matter of digging through my Lucene folder and finding it.

Otis

--- Lee Mallabone <lee@grantadesign.com> wrote:
> On Mon, 2002-05-20 at 19:19, Peter Carlson wrote:
> > I would like be able to have the features that we are planning on
> having for
> > the next release.
> ..snip..
> > Do people have any other functionality they are thinking of adding?
>
> It could be that this is impossible due to Lucene internals, but:
>
> The ability to create IndexReader and IndexSearcher objects from an
> FSDirectory whose files on the Real Filesystem are read only, (this
> fails at the moment with Permission Denied errors).
>
> In fact, that's more a feature request than something I plan to work
> on,
> but given the right hints from other developers, I may be able to
> work
> on it.
>
> Lee.
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>

__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Call for features in next release [ In reply to ]

kazama at ingrid

May 21, 2002, 11:31 PM

Post #6 of 10 (1197 views)

Permalink

Hi all,

I am making a Japanese analyzer and patches for Lucene tools now. I
have already wrote a tokenizer which uses a Japanese morphological
analyzer internally because Japanese texts aren't separated by
whitespaces. I have a plan to put them to lucene sandbox, but it will
take more time to make patches for tools.

From: Peter Carlson <carlson@bookandhammer.com>
Subject: Call for features in next release
Date: Mon, 20 May 2002 11:19:15 -0700
Message-ID: <B90E8C33.7903%carlson@bookandhammer.com>
> I would like be able to have the features that we are planning on having for
> the next release.

I would like to request the following:

1, Selecting a language-specific analyzer according to a locale.

Now we rewrite parts of lucene codes in order to use another
analyzer. It will be useful to select analyzer without touching codes.

2, Adding "-encoding" option and encoding-sensitive methods to tools.

Current tools needs minor changes on a Japanese (and other language)
environment: adding an "-encode" option and argument, useing
Reader/Writer classes instead of InputStream/OutputStream classes,
etc.

Thanks,

Kazuhiro Kazama (kazama@ingrid.org) NTT Network Innovation Laboratories

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Call for features in next release [ In reply to ]

halacsy.peter at axelero

May 22, 2002, 6:01 AM

Post #7 of 10 (1201 views)

Permalink

Some others (if you don't mind):
1. make package protected abtract methods of org.apache.lucene.search.Searcher to public (I'd like to be able to make subclasses of Searcher, IndexWriter, InderReader )
2. add lastModified() method to Directory, FSDirectory and RamDirectory (so it could be cached in IndexWriter/Searcher manager)
3. support for adding more than 1 term to the same position (I'm sorry I didn't find Doug's email about this)

peter

> -----Original Message-----
> From: cutting@lucene.com [mailto:cutting@lucene.com]
> Sent: Monday, May 20, 2002 8:50 PM
> To: lucene-dev@jakarta.apache.org
> Subject: RE: Call for features in next release
>
>
> Here are a few others:
>
> - Better support for hits sorted by things other than score. An easy,
> efficient case is to support results sorted by the order
> documents were
> added to the index. A little harder and less efficient is support for
> results sorted by an arbitrary field.
>
> - Add ability to "boost" individual documents/fields. When a
> document is
> indexed, a numeric "boost" value could be specified for the
> whole document,
> and/or for individual fields. This value would be multipled
> into scores for
> hits on this document. This would facilitate the
> implementation of things
> like Google's pagerank.
>
> - Add to FSDirectory the ability to specify where lock files
> live and to
> disable the use of lock files altogether (for read-only media).
>
> - Add some requested methods:
> String[] Document.getValues(String fieldName);
> String[] IndexReader.getIndexedFields();
> void Token.setPositionIncrement(int);
>
> Doug
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Call for features in next release [ In reply to ]

eric at ConveySoftware

May 22, 2002, 6:21 AM

Post #8 of 10 (1196 views)

Permalink

Does anyone see a problem with adding support for storing unindexed,
untokenized *binary* data as document fields? At the moment, the closest
thing we have is unindexed, untokenized *character* data. Looking at the
source, this will be a trivial change, but I'm curious to learn if there are
specific reasons (other than inclination and opportunity) that this has been
left out.

Eric

-----Original Message-----
From: Julien Nioche
To: Lucene Developers List
Sent: 5/22/02 10:13 AM
Subject: Re: Call for features in next release

Another feature could be the ability to retrieve the number of
occurences
not only for a term
but also for a Phrase (see
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html)

----- Original Message -----
From: <cutting@lucene.com>
To: <lucene-dev@jakarta.apache.org>
Sent: Monday, May 20, 2002 11:49 AM
Subject: RE: Call for features in next release

> Here are a few others:
>
> - Better support for hits sorted by things other than score. An easy,
> efficient case is to support results sorted by the order documents
were
> added to the index. A little harder and less efficient is support for
> results sorted by an arbitrary field.
>
> - Add ability to "boost" individual documents/fields. When a document
is
> indexed, a numeric "boost" value could be specified for the whole
document,
> and/or for individual fields. This value would be multipled into
scores
for
> hits on this document. This would facilitate the implementation of
things
> like Google's pagerank.
>
> - Add to FSDirectory the ability to specify where lock files live and
to
> disable the use of lock files altogether (for read-only media).
>
> - Add some requested methods:
> String[] Document.getValues(String fieldName);
> String[] IndexReader.getIndexedFields();
> void Token.setPositionIncrement(int);
>
> Doug
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Call for features in next release [ In reply to ]

Julien.Nioche at lingway

May 22, 2002, 10:13 AM

Post #9 of 10 (1200 views)

Permalink

Another feature could be the ability to retrieve the number of occurences
not only for a term
but also for a Phrase (see
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html)

----- Original Message -----
From: <cutting@lucene.com>
To: <lucene-dev@jakarta.apache.org>
Sent: Monday, May 20, 2002 11:49 AM
Subject: RE: Call for features in next release

> Here are a few others:
>
> - Better support for hits sorted by things other than score. An easy,
> efficient case is to support results sorted by the order documents were
> added to the index. A little harder and less efficient is support for
> results sorted by an arbitrary field.
>
> - Add ability to "boost" individual documents/fields. When a document is
> indexed, a numeric "boost" value could be specified for the whole
document,
> and/or for individual fields. This value would be multipled into scores
for
> hits on this document. This would facilitate the implementation of things
> like Google's pagerank.
>
> - Add to FSDirectory the ability to specify where lock files live and to
> disable the use of lock files altogether (for read-only media).
>
> - Add some requested methods:
> String[] Document.getValues(String fieldName);
> String[] IndexReader.getIndexedFields();
> void Token.setPositionIncrement(int);
>
> Doug
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Call for features in next release [ In reply to ]

eytan at exch

May 23, 2002, 1:43 PM

Post #10 of 10 (1197 views)

Permalink

I know voting has started, but I just thought of this...

How about a compressed index option? Something simple like delta or golomb coding would be fine... There's a java implementation of a variant at:
http://www.usenix.org/publications/login/2000-4/features/java.html

It'll be a performance hit, but I'm dealing with enormous indexes and saving space would help. A compressed index would be especially helpful in the initial (pre-optimization) index building since that takes up major disk space. A user could then unroll the optimized/compressed index into a standard uncompressed one.

-Eytan

-----Original Message-----
From: Peter Carlson [mailto:carlson@bookandhammer.com]
Sent: Monday, May 20, 2002 11:19 AM
To: Lucene Developers List
Subject: Call for features in next release

Hi,

I would like be able to have the features that we are planning on having for
the next release.

So far this seems to be:

Vector Term Support
Support for Search Term Highlighting

Do people have any other functionality they are thinking of adding?

Thanks

--Peter

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>