Mailing List Archive

DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From fix@idiom.com 2002-04-26 18:27 -------
Hi. I wrote a class that'll check a lucene index and tell you all the
searchable fields in it. The code is pasted below. Save it as
'CheckIsIndexed.java'

Run the class using java from the command line and pass it one argument, the
name of an index to check. If you the index does not exist, a one-document
index will
be created in the current directory with 4 fields, and the analysis will be
based on that.

In my experience, UnStored fields are not returned. It may be that I'm doing
something wrong, in which case I hope to learn something.

The getSearchableFields(Directory directory) method may also be of interest to
other lucene users.

best, eric

-------------------- CODE BEGINS AFTER THIS LINE ----------------
/** CheckIsIndexed.java
Test to see if isIndexed() works
April 2002
Eric Fixler <fix@idiom.com>

*/

import java.util.*;
import java.io.*;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.store.*;

public class CheckIsIndexed {
private Directory directory = null;
private int documentCount = 0;
public static final boolean DEBUG = true;

public boolean printAllDocuments = false;


public CheckIsIndexed (String indexName) throws IOException {
boolean makeIt = this.initializeIndex(indexName);
if (makeIt) this.makeIndex();
}

private boolean initializeIndex(String indexName) throws IOException {
File idir = new File(indexName);
boolean isNew = ! idir.exists();
if (isNew) idir.mkdirs();
if (! idir.isDirectory()) throw new IOException("Directory " + indexName + " does
not exist (may be a file?)");
System.out.println("Getting an index at " + idir.getAbsolutePath());
if (isNew) System.out.println("(created it)");
this.directory = FSDirectory.getDirectory(idir, false);
return isNew;
}

public void makeIndex() throws IOException {
System.out.println("Opening index...");
IndexWriter writer = new IndexWriter(directory,new StandardAnalyzer(), true);
this.addDocument(writer);
writer.optimize();
writer.close();
directory.close();
System.out.println("Done, index closed.\n");
}

private void addDocument(IndexWriter writer) throws IOException {
System.out.println("Adding document " + this.documentCount + "...");
Document doc = new Document();
doc.add(Field.UnIndexed("unindexed", "UnIndexed Field"));
doc.add(Field.UnStored("unstored", "Unstored Field: should, however, be indexed,
no?"));
doc.add(Field.Text("text", "Text field: should return true for isIndexed()"));
doc.add(Field.Text("keyword", "Keyword field: should return true for isIndexed()"));
writer.addDocument(doc);
this.documentCount++;
}

public void checkIndex() throws IOException {
String[] fields = this.getSearchableFields(this.directory);
System.out.println("Searchable field names in directory: ");
for (int i = 0; i < fields.length; i++) System.out.println("\t" + fields[i]);
System.out.print("\n");
}

//I suspect there's probably a better way to iterate over the index, but I'm not
sure how...
/** This method looks at every document in the index and compiles all the searchable
fields it finds. */
public String[] getSearchableFields(Directory dir) throws IOException {
IndexReader reader = IndexReader.open(dir);
int count = 0;
Set fieldNames = new HashSet();
for (int i = 0; i < reader.maxDoc(); i++) {
try {
if (DEBUG && (reader.isDeleted(i))) { System.out.println("deleted doc " + i);
continue; }
Document doc = reader.document(i);
if (DEBUG && (doc == null)) { System.out.println("null doc " + i); continue; }
if (this.printAllDocuments) System.out.println("Analyzing document " + i + "...");
Enumeration en = doc.fields();
while (en.hasMoreElements()) {
Field field = (Field) en.nextElement();
boolean indexed = field.isIndexed();
if (this.printAllDocuments) System.out.println("\t" + field.name() + ", isIndexed? :
" + indexed);
if (indexed) fieldNames.add(field.name());
}
if (this.printAllDocuments) System.out.println("----------------------------------------");
} catch (Exception e) {
System.err.println("Error reading documents for field info: " + e + "\n\t" +
e.getMessage());
e.printStackTrace(System.err);
reader.close();
if (e instanceof IOException) throw (IOException) e;
if (e instanceof RuntimeException) throw (RuntimeException) e;
return new String[0];
}

}
reader.close();
if (DEBUG) System.out.println("Done checking index");
String[] rval = (String[]) fieldNames.toArray(new String[fieldNames.size()]);
Arrays.sort(rval);
return rval;
}

public static void main (String[] args) throws Exception {
CheckIsIndexed ces = new CheckIsIndexed(args[0]);
ces.checkIndex();

}

}

// _END_

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From otis@apache.org 2002-05-07 02:46 -------
I think you are right, this is a bug.
If a field is unstored and not indexed what is it?
It should be either stored, or indexed, or both.

However, I just tried tracking the code from Field -> Document -> IndexWriter ->
DocumentWriter -> FieldInfos and I couldn't see a bug there.

In your code (by the way, it's a little better to create an attachment than to
paste the code inline), check this part:

Enumeration en = doc.fields();
while (en.hasMoreElements())
{
Field field = (Field) en.nextElement();
boolean indexed = field.isIndexed();
if (this.printAllDocuments)
System.out.println("\t" + field.name() + ", isIndexed? :" + indexed);

The UnStored field is never retrieved by doc.fields().
However, if you copy that same snippet of code in the code right before
writer.addDocument(doc) is called you will see that different stuff is printed.

So this just confirms that there is something fishy there, but I can't find the
source of the bug right now.
If you find it, please send a context diff.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From fix@idiom.com 2002-05-07 05:00 -------
Thanks for looking at it. Next time I'll attach code (sorry, I didn't even
realize you could do that with bugzilla).

IMO, an UnStored Field should return a Field with an indexed value of true,
although one would expect the content to be null. Either way, I'd like to find
a way to get a list of all the searchable fields at run time, so that a servlet
can auto-configure itself at startup based on the index it's pointing to.

I'll try to look at the lucene code in the next week or so.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From cutting@apache.org 2002-05-07 20:50 -------
An un-stored field does not exist in the Document object returned by a search.
I don't agree that it should.

What would be nice is to be able to enumerate all of the fields indexed.
Perhaps the following methods should be added to IndexReader to support this:
String[] getFieldNames();
String[] getIndexedFieldNames();

Would that meet your needs?

If so, it would be fairly simple to implement. Abstract methods would be added
to IndexReader, with implementations in SegmentReader and SegmentsReader.

The SegmentReader implementation could just do something like:
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
...
}

The SegmentsReader implementation would need to add a FieldInfos field,
construct it by using FieldInfos.add(FieldInfos), then use an implementation
like SegmentReader.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From fix@idiom.com 2002-05-07 22:27 -------
Thanks.

String[] getIndexedFieldNames() would work perfectly for what I'm trying to do.

I'll look at the Lucene classes and try to implement it.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From fix@idiom.com 2002-05-13 03:39 -------
Created an attachment (id=1843)
JDK 1.1 compliant (I think) patch adding getIndexedFields() methods to IndexReaders

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields

fix@idiom.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED



------- Additional Comments From fix@idiom.com 2002-05-13 03:43 -------
I'm submitting a patch to add the getIndexedFields() method as DC suggested.

I tried to this in JDK 1.1 fashion because I think lucene tries to be jdk 1.1
compliant. I also decided to stary by not adding any fields to any classes
.It's a bit clumsy -- let me know if that's not a requirement, and I can
streamline it.

Anyway, it seems to work for me -- look at it and let me know what you think,
and I'll do another rev with documentation.

The changed files are IndexReader.java SegmentReader.java and SegmentsReader.java

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 7912] - Field.isIndexed() returns false for UnStored fields [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7912

Field.isIndexed() returns false for UnStored fields





------- Additional Comments From cutting@apache.org 2002-05-20 17:28 -------
Looks good to me.

A couple of minor improvements:

The line:
return (String[]) v.toArray(new String[v.size()]);

Would be more efficient if you use a static for the prototype, e.g.:
private static final STRING_ARRAY_PROTO = new String[0];
...
return (String[]) v.toArray(STRING_ARRAY_PROTO);
This saves the allocation of an extra array.

Note: toArray is not in Java 1.1, so if we decide to keep Lucene 1.1
compatible, then this will have to be re-written as a 'for' loop anyway...

And in SegmentsReader.getIndexedFieldNames, you don't need the intermediate
Vector: you can create the array and fill it directly from the Hashtable:

Enumeration it = h.keys();
String[] result = new String[h.size()];
for (int i = 0; i < h.size(); i++) {
result[i] = it.nextElement();
}
return result;

This saves the allocation of the Vector, and is less code too.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>