14 March 2019, Apache Lucene™ 8.0.0 available
The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.
This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:
http://lucene.apache.org/core/mirrors-core-latest-redir.html
Lucene 8.0.0 Release Highlights:
Query execution
Term queries, phrase queries and boolean queries introduced new
optimization that enables efficient skipping over non-competitive documents
when the total hit count is not needed. Depending on the exact query and
data distribution, queries might run between a few percents slower and many
times faster, especially term queries and pure disjunctions.
In order to support this enhancement, some API changes have been made:
* TopDocs.totalHits is no longer a long but an object that gives a lower
bound of the actual hit count.
* IndexSearcher's search and searchAfter methods now only compute total
hit counts accurately up to 1,000 in order to enable this optimization by
default.
* Queries are now required to produce non-negative scores.
Codecs
* Postings now index score impacts alongside skip data. This is how term
queries optimize collection of top hits when hit counts are not needed.
* Doc values introduced jump tables, so that advancing runs in constant
time. This is especially helpful on sparse fields.
* The terms index FST is now loaded off-heap for non-primary-key fields
using MMapDirectory, reducing heap usage for such fields.
Custom scoring
The new FeatureField allows efficient integration of static features such
as a pagerank into the score. Furthermore, the new
LongPoint#newDistanceFeatureQuery and LatLonPoint#newDistanceFeatureQuery
methods allow boosting by recency and geo-distance respectively. These new
helpers are optimized for the case when total hit counts are not needed.
For instance if the pagerank has a significant weight in your scores, then
Lucene might be able to skip over documents that have a low pagerank value.
Further details of changes are available in the change log available at:
http://lucene.apache.org/core/8_0_0/changes/Changes.html
Please report any feedback to the mailing lists (
http://lucene.apache.org/core/discussion.html)
Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.
The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.
This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:
http://lucene.apache.org/core/mirrors-core-latest-redir.html
Lucene 8.0.0 Release Highlights:
Query execution
Term queries, phrase queries and boolean queries introduced new
optimization that enables efficient skipping over non-competitive documents
when the total hit count is not needed. Depending on the exact query and
data distribution, queries might run between a few percents slower and many
times faster, especially term queries and pure disjunctions.
In order to support this enhancement, some API changes have been made:
* TopDocs.totalHits is no longer a long but an object that gives a lower
bound of the actual hit count.
* IndexSearcher's search and searchAfter methods now only compute total
hit counts accurately up to 1,000 in order to enable this optimization by
default.
* Queries are now required to produce non-negative scores.
Codecs
* Postings now index score impacts alongside skip data. This is how term
queries optimize collection of top hits when hit counts are not needed.
* Doc values introduced jump tables, so that advancing runs in constant
time. This is especially helpful on sparse fields.
* The terms index FST is now loaded off-heap for non-primary-key fields
using MMapDirectory, reducing heap usage for such fields.
Custom scoring
The new FeatureField allows efficient integration of static features such
as a pagerank into the score. Furthermore, the new
LongPoint#newDistanceFeatureQuery and LatLonPoint#newDistanceFeatureQuery
methods allow boosting by recency and geo-distance respectively. These new
helpers are optimized for the case when total hit counts are not needed.
For instance if the pagerank has a significant weight in your scores, then
Lucene might be able to skip over documents that have a low pagerank value.
Further details of changes are available in the change log available at:
http://lucene.apache.org/core/8_0_0/changes/Changes.html
Please report any feedback to the mailing lists (
http://lucene.apache.org/core/discussion.html)
Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.