Mailing List Archive: [ANNOUNCE] Apache Lucene 8.0.0 released

14 March 2019, Apache Lucene™ 8.0.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:

http://lucene.apache.org/core/mirrors-core-latest-redir.html

Lucene 8.0.0 Release Highlights:

Query execution
Term queries, phrase queries and boolean queries introduced new
optimization that enables efficient skipping over non-competitive documents
when the total hit count is not needed. Depending on the exact query and
data distribution, queries might run between a few percents slower and many
times faster, especially term queries and pure disjunctions.
In order to support this enhancement, some API changes have been made:
* TopDocs.totalHits is no longer a long but an object that gives a lower
bound of the actual hit count.
* IndexSearcher's search and searchAfter methods now only compute total
hit counts accurately up to 1,000 in order to enable this optimization by
default.
* Queries are now required to produce non-negative scores.

Codecs
* Postings now index score impacts alongside skip data. This is how term
queries optimize collection of top hits when hit counts are not needed.
* Doc values introduced jump tables, so that advancing runs in constant
time. This is especially helpful on sparse fields.
* The terms index FST is now loaded off-heap for non-primary-key fields
using MMapDirectory, reducing heap usage for such fields.

Custom scoring
The new FeatureField allows efficient integration of static features such
as a pagerank into the score. Furthermore, the new
LongPoint#newDistanceFeatureQuery and LatLonPoint#newDistanceFeatureQuery
methods allow boosting by recency and geo-distance respectively. These new
helpers are optimized for the case when total hit counts are not needed.
For instance if the pagerank has a significant weight in your scores, then
Lucene might be able to skip over documents that have a low pagerank value.

Further details of changes are available in the change log available at:
http://lucene.apache.org/core/8_0_0/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.

Yeaaaah! It's finally done. I am a bit sad, that the new query short circutting is not useable from Solr at the moment.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: jim ferenczi <jim.ferenczi@gmail.com>
> Sent: Thursday, March 14, 2019 1:16 PM
> To: general@lucene.apache.org; dev@lucene.apache.org; java-
> user@lucene.apache.org
> Subject: [ANNOUNCE] Apache Lucene 8.0.0 released
>
> 14 March 2019, Apache Lucene™ 8.0.0 available
>
> The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0.
>
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
>
> This release contains numerous bug fixes, optimizations, and improvements,
> some of which are highlighted below. The release is available for immediate
> download at:
>
> http://lucene.apache.org/core/mirrors-core-latest-redir.html
>
> Lucene 8.0.0 Release Highlights:
>
> Query execution
> Term queries, phrase queries and boolean queries introduced new
> optimization that enables efficient skipping over non-competitive documents
> when the total hit count is not needed. Depending on the exact query and
> data distribution, queries might run between a few percents slower and
> many
> times faster, especially term queries and pure disjunctions.
> In order to support this enhancement, some API changes have been made:
> * TopDocs.totalHits is no longer a long but an object that gives a lower
> bound of the actual hit count.
> * IndexSearcher's search and searchAfter methods now only compute total
> hit counts accurately up to 1,000 in order to enable this optimization by
> default.
> * Queries are now required to produce non-negative scores.
>
> Codecs
> * Postings now index score impacts alongside skip data. This is how term
> queries optimize collection of top hits when hit counts are not needed.
> * Doc values introduced jump tables, so that advancing runs in constant
> time. This is especially helpful on sparse fields.
> * The terms index FST is now loaded off-heap for non-primary-key fields
> using MMapDirectory, reducing heap usage for such fields.
>
> Custom scoring
> The new FeatureField allows efficient integration of static features such
> as a pagerank into the score. Furthermore, the new
> LongPoint#newDistanceFeatureQuery and
> LatLonPoint#newDistanceFeatureQuery
> methods allow boosting by recency and geo-distance respectively. These new
> helpers are optimized for the case when total hit counts are not needed.
> For instance if the pagerank has a significant weight in your scores, then
> Lucene might be able to skip over documents that have a low pagerank
> value.
>
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/core/8_0_0/changes/Changes.html
>
> Please report any feedback to the mailing lists (
> http://lucene.apache.org/core/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.