Mailing List Archive

Question Deleting/Reindexing Files
Hi,

I am using Lucene for indexing a relatively large article based system where articles change from time to time so i have to reindex them. reindexing had the effekt that a query would return the hit for a file multiple times (according to the number of updates.

The only solution to that problem I found was to delete the file to be updated before indexing it again. Is there another possibility ?

As the system is large i am collecting the articles that have to be updated together, open a writer and add the documents to the index. this solution worked fine for me using rc1 in rc4 it seems that it is not possible anymore to delete a file from an index while the index is opened for writing.

do you know any solutions to that problem ?

thanx a lot in advance

regards joe

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Question Deleting/Reindexing Files [ In reply to ]
[1] There's no update so delete and then add is what you want.
[2] I have had the same problems w/ using an IndexWriter and IndexReader
at the same time and getting a locking problem when deleting. I think I
sent
mail to the list w/ a test case a week ago [disclaimer: this is not
a complaint!] and I think the issue is still open. Maybe I should turn
this
into a bug report? I know fixing bugs is encourage but I don't have
enough
context about the right solution, or how the locking apparently
changed to foul this up, though I did look thru things.
My workaround was to write new entries to a new index and then run
a separate merge utility that 1st does a delete pass, and then reopens
and does adds, based on a "primary key" (the URL of each doc in my
case).


-----Original Message-----
From: Joe Hajek [mailto:Joe.Hajek@blackbox.net]
Sent: Wednesday, March 20, 2002 12:28 AM
To: lucene-user@jakarta.apache.org
Subject: Question Deleting/Reindexing Files


Hi,

I am using Lucene for indexing a relatively large article based system
where articles change from time to time so i have to reindex them.
reindexing had the effekt that a query would return the hit for a file
multiple times (according to the number of updates.

The only solution to that problem I found was to delete the file to be
updated before indexing it again. Is there another possibility ?

As the system is large i am collecting the articles that have to be
updated together, open a writer and add the documents to the index. this
solution worked fine for me using rc1 in rc4 it seems that it is not
possible anymore to delete a file from an index while the index is
opened for writing.

do you know any solutions to that problem ?

thanx a lot in advance

regards joe

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Question Deleting/Reindexing Files [ In reply to ]
The standard answer is "try deleting/adding in batches instead of
individually". Seems more efficient, too, if you can write your
application that way.
That is what you are essentially doing by writing to a separate index
and then doing a bunch of deletions, followed by re-additions.
I know I'm stating the obvious, but I wanted to get this out of the way
:)

Otis

--- "Spencer, Dave" <dave@lumos.com> wrote:
> [1] There's no update so delete and then add is what you want.
> [2] I have had the same problems w/ using an IndexWriter and
> IndexReader
> at the same time and getting a locking problem when deleting. I think
> I
> sent
> mail to the list w/ a test case a week ago [disclaimer: this is not
> a complaint!] and I think the issue is still open. Maybe I should
> turn
> this
> into a bug report? I know fixing bugs is encourage but I don't have
> enough
> context about the right solution, or how the locking apparently
> changed to foul this up, though I did look thru things.
> My workaround was to write new entries to a new index and then run
> a separate merge utility that 1st does a delete pass, and then
> reopens
> and does adds, based on a "primary key" (the URL of each doc in my
> case).
>
>
> -----Original Message-----
> From: Joe Hajek [mailto:Joe.Hajek@blackbox.net]
> Sent: Wednesday, March 20, 2002 12:28 AM
> To: lucene-user@jakarta.apache.org
> Subject: Question Deleting/Reindexing Files
>
>
> Hi,
>
> I am using Lucene for indexing a relatively large article based
> system
> where articles change from time to time so i have to reindex them.
> reindexing had the effekt that a query would return the hit for a
> file
> multiple times (according to the number of updates.
>
> The only solution to that problem I found was to delete the file to
> be
> updated before indexing it again. Is there another possibility ?
>
> As the system is large i am collecting the articles that have to be
> updated together, open a writer and add the documents to the index.
> this
> solution worked fine for me using rc1 in rc4 it seems that it is not
> possible anymore to delete a file from an index while the index is
> opened for writing.
>
> do you know any solutions to that problem ?
>
> thanx a lot in advance
>
> regards joe
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Question Deleting/Reindexing Files [ In reply to ]
Joe,

>Hi,
>
>I am using Lucene for indexing a relatively large article based system where articles change from time to time so i have to reindex them. reindexing had the effekt that a query would return the hit for a file multiple times (according to the number of updates.
>
>The only solution to that problem I found was to delete the file to be updated before indexing it again. Is there another possibility ?

You can add a counter or a date to your documents, in the same field
as your document identifier or in another field. This allows more flexibility
in deleting old documents, but you'll have to design a condition
to remove old docs yourself.

Regards,
Ype

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Question Deleting/Reindexing Files [ In reply to ]
>
>
>>[1] There's no update so delete and then add is what you want.
>>[2] I have had the same problems w/ using an IndexWriter and IndexReader
>>at the same time and getting a locking problem when deleting. I think I
>>sent
>>mail to the list w/ a test case a week ago [disclaimer: this is not
>>a complaint!] and I think the issue is still open. Maybe I should turn
>>this
>>into a bug report? I know fixing bugs is encourage but I don't have
>>enough
>>context about the right solution, or how the locking apparently
>>changed to foul this up, though I did look thru things.
>>My workaround was to write new entries to a new index and then run
>>a separate merge utility that 1st does a delete pass, and then reopens
>>and does adds, based on a "primary key" (the URL of each doc in my
>>case).
>>
>I think the locking issue is that the index directory is locked during
>IndexWriter existance so that IndexReader cannot be created. However,
>pre-existing IndexReaders should continue to work. Can you try to open
>IndexReader before starting the adds and see if that allows you to do
>the deletes?
>
>I like your workaround with a separate index. I think you might be able
>to do one better by using IndexWriter.addIndexes(Directory[] dirs)
>method to merge the additions from the new index directory into the old
>one after the old docs are deleted from it. This should be faster and it
>won't require that all of the data is in stored fields (I'm assuming
>that in your case you restore the documents from the new index and
>re-add them to the old one, and for this to work your docs must then
>have all fields "stored" not simply "indexed").
>
>




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: RE: Question Deleting/Reindexing Files [ In reply to ]
Hi,

> > I think the locking issue is that the index directory is locked
> > during
> > IndexWriter existance so that IndexReader cannot be created.
> > However,
> > pre-existing IndexReaders should continue to work. Can you try to
> > open
> > IndexReader before starting the adds and see if that allows you
> > to do the deletes?


i havent tried this with rc4 , but i will. as i said in rc1 it worked AND i didnt encounter any problems. so until now i didnt upgrade.

regards joe


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>