Mailing List Archive

LongField when searched using classic QueryParser doesnot yield results
Hi,

I am using Lucene v5.5.0

We are indexing a document into Lucene v5.5 using following code and trying to search for the document with a given Long value but the search does not yield any results.



1. This is how we are creating a document using Lucene v5.5


/**
* Makes a Lucene document for an Asset.
* <p>
* The document has two fields:
* <ul>
* <li><code>AssetId</code>--Id of asset, as a stored, untokenized
* field; and
* <li><code>reader</code>--Reader for asset's text content to be indexed;
*/
public static Document createDocumentFromFile(long assetId, Reader reader)
throws java.io.FileNotFoundException {

// make a new, empty document
Document doc = new Document();

// Add assetId as a field named "AssetId". AssetId will be indexed and stored
// but does not need to be tokenised as it will be string representation
// of number hence without any whitechars.
doc.add(new LongField("AssetId", assetId, Field.Store.YES));

// Add the contents to a field named "TextContent". Specify a
// Reader, so that the text of the file is tokenized and indexed, but not
// stored.
doc.add(new TextField("TextContent", reader));

// return the document
return doc;
}


2. Adding a document....

/*
* Adds a Lucene Document to the index
*/
protected void addDocument(Document doc) throws IOException {
indexWriter.addDocument(doc);
commit();
}

private void commit() throws IOException {
indexWriter.commit();
long start = System.nanoTime();
//searcherManager.maybeRefresh();
searcherManager.maybeRefreshBlocking();
long end = System.nanoTime();
logger.trace("IndexReader reopen in (us): " + (end-start)/1000 );
}



3. Searching for a document...... for instance search for document with AssetId = 12


/*
* Returns Lucene document corresponding to the given AssetId
*/
private Document getAssetDocument(long assetId) {
final Document[] assetDoc = new Document[1];
try {
/*
* Query to find a lucene document based on AssetId field
*/
QueryParser assetIdQueryparser = new QueryParser("AssetId", standardAnalyzer);
Query query = assetIdQueryparser.parse("" + assetId);
query(query,Integer.MAX_VALUE, new TextQueryResultCollector() {
@Override
public void collect(TopDocs searchResult, IndexSearcher searcher)
throws IOException {
ScoreDoc[] s = searchResult.scoreDocs;
if (s != null && s.length > 0) {
assetDoc[0] = searcher.doc(searchResult.scoreDocs[0].doc);
}
}
});
} catch (Exception e) {
logger.error("Error Searching text index on field AssetId", e);
}
return assetDoc[0];

}




protected void query(Query query, int maxResults, TextQueryResultCollector collector ) throws IOException {
IndexSearcher searcher = null;
try {
searcher = searcherManager.acquire();
//Represents hits returned by indexSearcher.search(Query,int)
TopDocs topDocs = searcher.search(query, maxResults);
collector.collect(topDocs, searcher);
} finally {
try {
searcherManager.release(searcher);
} catch (IOException e) {
logger.error("Error querying Text index." , e);
}
}

}

Can you please help in figuring out why searching a org.apache.lucene.document.LongField (with Lucene v5.5) using the org.apache.lucene.queryparser.classic.QueryParser does not yield results whereas a similar query parser when used for searching a TextField yields results ?
Are LongField(s) indexed in a different manner ?

Can you please help why the search does not yield results in Lucene v5.5 ?

Thanks,
Jaspreet Kaur
Re: LongField when searched using classic QueryParser doesnot yield results [ In reply to ]
Hi Jaspreet,

Not sure whether this helps to answer your question as I didn't try to run
the code:

From official guide:

> Within Lucene, each numeric value is indexed as a *trie* structure, where
> each term is logically assigned to larger and larger pre-defined brackets
> (which are simply lower-precision representations of the value). The step
> size between each successive bracket is called the precisionStep,
> measured in bits. Smaller precisionStep values result in larger number of
> brackets, which consumes more disk space in the index but may result in
> faster range search performance. The default value, 4, was selected for a
> reasonable tradeoff of disk space consumption versus performance


> If you only need to sort by numeric value, and never run range
> querying/filtering, you can index using a precisionStep of
> Integer.MAX_VALUE
> <http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html?is-external=true#MAX_VALUE>.
> This will minimize disk space consumed.
RE: LongField when searched using classic QueryParser doesnot yield results [ In reply to ]
Hi,

this is indeed related to this.

The problem is a missing "schema" in Lucene. If you index values using several different field types (like TextField vs. IntField/Float/Double...) this information how they were indexed is completely unknown to the query parser. The default query parser is using legacy code to create numeric ranges or numeric terms: It is just treating them as text! If it searches on a numeric field using text terms, it won't find anything.

Solr and Elasticsearch are maintaining a schema of the index. So they subclass the query parser and override getRangeQuery and getFieldQuery protected methods and using their schema to create the correct query types depending on the schema. The default is to create TermQuery and TermRangeQuery, which won't work on numeric fields.

To fix this in your code you have to do something similar. YOU are the person who knows what the type of Field XY is. If XY is a numeric field, the query parser must check the field name and then build the correct query (NumericRangeQuery).

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Amrit Sarkar [mailto:sarkaramrit2@gmail.com]
> Sent: Wednesday, January 11, 2017 9:52 AM
> To: general@lucene.apache.org
> Cc: java-user@lucene.apache.org
> Subject: Re: LongField when searched using classic QueryParser doesnot yield
> results
>
> Hi Jaspreet,
>
> Not sure whether this helps to answer your question as I didn't try to run
> the code:
>
> From official guide:
>
> > Within Lucene, each numeric value is indexed as a *trie* structure, where
> > each term is logically assigned to larger and larger pre-defined brackets
> > (which are simply lower-precision representations of the value). The step
> > size between each successive bracket is called the precisionStep,
> > measured in bits. Smaller precisionStep values result in larger number of
> > brackets, which consumes more disk space in the index but may result in
> > faster range search performance. The default value, 4, was selected for a
> > reasonable tradeoff of disk space consumption versus performance
>
>
> > If you only need to sort by numeric value, and never run range
> > querying/filtering, you can index using a precisionStep of
> > Integer.MAX_VALUE
> >
> <http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html?is-
> external=true#MAX_VALUE>.
> > This will minimize disk space consumed.