Mailing List Archive

DO NOT REPLY [Bug 9906] - Removing a file from index does not remove all references to file.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906

Removing a file from index does not remove all references to file.

otis@apache.org changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED



------- Additional Comments From otis@apache.org 2002-06-17 03:15 -------
I think this bug report may be bogus.
This would be a very fundamental flaw in Lucene, and we'd hear about it long ago.
Is it possible that your application is adding files/documents to the Lucene
index multiple times?
Please provide a simple, standalone java class that demonstrates this problem.
The class included in the report is in a named package and I see references to
some other classes, like IndexerTab, which are separate classes.

I will leave this bug open for now, but will close it in a few days unless you
can replicate this problem with a simple demo class.

Thanks!

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 9906] - Removing a file from index does not remove all references to file. [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906

Removing a file from index does not remove all references to file.





------- Additional Comments From rvestal@austin.rr.com 2002-06-17 04:02 -------
Ok...somewhat bogus it is. I hadn't done enough analysis (I should know
better). When writing the sample class for you, the real issue was
identified...sorry for the misdirection. It is either:

1. I'm using the API incorrectly -or-
2. An IndexReader issue with deleting documents.

I've included the code here that will replicate my issue. All the
documents are not actually getting deleted because the IndexReader
says a document has already been deleted...so the delete will never
get called for some documents.

Thanks!

/*
* Created by IntelliJ IDEA.
* User: rvestal
* Date: Jun 16, 2002
* Time: 10:23:51 PM
* To change template for new class use
* Code Style | Class Templates options (Tools | IDE Options).
*/
package org.intellij.plugins.docPlugin;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.store.*;

import java.io.*;
import java.util.Vector;

public class IndexTest {

// path to ant 1.4.1 docs
private static String mDirToIndex = "c:/utils/ant/docs/manual/api/";

private static String INDEX_DIR = "indexTest";


static private void collectFiles( File dir, Vector files ) {
File[] children = dir.listFiles();
for ( int ix = 0; ix < children.length; ix++ ) {
File child = children[ix];
if ( child.isDirectory() ) {
collectFiles( child, files );
} else {
files.add( child );
}
}
}


public static void main( String[] args ) {
File indexDir = new File( INDEX_DIR );
if ( !indexDir.exists() ) {
indexDir.mkdirs();
}

Vector files = new Vector();
collectFiles( new File( mDirToIndex ), files );

try {
IndexWriter writer = new IndexWriter( INDEX_DIR, new
StandardAnalyzer(), true );

for ( int ix = 0; ix < files.size(); ix++ ) {
File file = ( File ) files.get( ix );
writer.addDocument( IndexTestDocument.createDocument( file ) );
}
System.out.println( "Added: " + files.size() + " files." );

writer.optimize();
writer.close();
writer = null;

Searcher searcher = new IndexSearcher( INDEX_DIR );
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse( "Ant", "contents", analyzer );

Hits hits = searcher.search( query );
System.out.println( "Hits after add: " + hits.length() );
searcher.close();

Directory directory = FSDirectory.getDirectory( INDEX_DIR, false );
IndexReader reader = IndexReader.open( directory );

int count = 0;
for ( int ix = 0; ix < files.size(); ix++ ) {
String path = IndexTestDocument.normalizePath( ( ( File )
files.get( ix ) ).getAbsolutePath().replace( '\\', '/' ) );

int numDocs = reader.numDocs();
boolean bDeleted = false;
for ( int ndx = 0; ndx < numDocs; ndx++ ) {
if ( !reader.isDeleted( ndx ) ) {
String docPath = IndexTestDocument.getPath(
reader.document( ndx ) );
if ( docPath.equals( path ) ) {
count++;
reader.delete( ndx );
bDeleted = true;
break;
}
}
}
if ( !bDeleted ) {
System.out.println( " Not Deleted: " + path );
for( int ndx = 0; ndx < numDocs; ndx++ ) {
if ( !reader.isDeleted( ndx ) ) {
String docPath = IndexTestDocument.getPath(
reader.document( ndx ) );
System.out.println( " path " + ndx + ": " +
docPath );
}
}
}
}
System.out.println( "Removed " + count + " documents of (" +
files.size() + ")" );
reader.close();

searcher = new IndexSearcher( INDEX_DIR );
analyzer = new StandardAnalyzer();
query = QueryParser.parse( "Ant", "contents", analyzer );

hits = searcher.search( query );
System.out.println( "Hits after remove: " + hits.length() );

} catch ( Exception ex ) {
ex.printStackTrace();
}
}


static class IndexTestDocument {

static public Document createDocument( File f )
throws FileNotFoundException {
Document doc = new Document();
doc.add( Field.Text( "path", normalizePath( f.getPath() ) ) );
Reader reader = new BufferedReader( new InputStreamReader( new
FileInputStream( f ) ) );
doc.add( Field.Text( "contents", reader ) );
return doc;
}


static public String getPath( Document doc ) {
return ( String ) doc.get( "path" );
}

static public String normalizePath( String path ) {
if ( path == null || path.length() == 0 ) {
return "";
}
path = path.replace( '\\', '/' );
File f = new File( path );
if ( f.isDirectory() ) {
if ( path.charAt( path.length() - 1 ) != '/' ) {
path = path + "/";
}
}
return path;
}
}
}

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
DO NOT REPLY [Bug 9906] - Removing a file from index does not remove all references to file. [ In reply to ]
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9906

Removing a file from index does not remove all references to file.

otis@apache.org changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |INVALID



------- Additional Comments From otis@apache.org 2002-06-17 13:12 -------
Your sample code throws a NullPointerException:
java IndexTest
Exception in thread "main" java.lang.NullPointerException
at IndexTest.collectFiles(IndexTest.java:23)
at IndexTest.main(IndexTest.java:41)

Since we established that this is not a bug I will close the bug here.
If you need more help, please use lucene-user list.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>