Mailing List Archive

Modifying document with unstored fields
Hi,

I asked this question yesterday on the user-list and so far there is no
reply. I post again this question on the dev-list hoping that someone can
answer it here.

We have a situation where we have a large collection of documents, which
consist of both stored and unstored fields, and we'd like to add/modify a
stored field on an existing document.

It seems the only way this can be achieved is to delete the document, and
then re-create it. However, this will only perserve stored fields, the
unstored field information will be lost.  

In our application, the unstored fields consist of very large data, and it
would not be desirable to store them.

Are there any ways in getting around this problem?  Thanks.

--
Victor Hadianto

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not the
intended recipient you are notified that disclosing, copying, distributing
or taking any action in reliance on the contents of this message or
attachment is strictly prohibited.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
Hello,

That's the very top question/answer in Lucene FAQ at jGuru:
http://www.jguru.com/faq/Lucene

Otis

--- Victor Hadianto <victorh@nuix.com.au> wrote:
> Hi,
>
> I asked this question yesterday on the user-list and so far there is
> no
> reply. I post again this question on the dev-list hoping that someone
> can
> answer it here.
>
> We have a situation where we have a large collection of documents,
> which
> consist of both stored and unstored fields, and we'd like to
> add/modify a
> stored field on an existing document.
>
> It seems the only way this can be achieved is to delete the document,
> and
> then re-create it. However, this will only perserve stored fields,
> the
> unstored field information will be lost.
>
> In our application, the unstored fields consist of very large data,
> and it
> would not be desirable to store them.
>
> Are there any ways in getting around this problem? Thanks.
>
> --
> Victor Hadianto
>
> NUIX Pty Ltd
> Level 8, 143 York Street, Sydney 2000
> Phone: (02) 9283 9010
> Fax: (02) 9283 9020
>
> This message is intended only for the named recipient. If you are not
> the
> intended recipient you are notified that disclosing, copying,
> distributing
> or taking any action in reliance on the contents of this message or
> attachment is strictly prohibited.
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote:
> That's the very top question/answer in Lucene FAQ at jGuru:
> http://www.jguru.com/faq/Lucene

Hi Otis,

Yep I realise that, but I think you haven't read my question closely. My
problem is not simply delete/add the new document, but what happen with the
fields thare are unstored. If all the fields in my documents are stored then
it should be fine, but unfortunately not in our current situation.

Has anyone else ever came across this problem?

> Otis

Victor


>
> --- Victor Hadianto <victorh@nuix.com.au> wrote:
> > Hi,
> >
> > I asked this question yesterday on the user-list and so far there is
> > no
> > reply. I post again this question on the dev-list hoping that someone
> > can
> > answer it here.
> >
> > We have a situation where we have a large collection of documents,
> > which
> > consist of both stored and unstored fields, and we'd like to
> > add/modify a
> > stored field on an existing document.
> >
> > It seems the only way this can be achieved is to delete the document,
> > and
> > then re-create it. However, this will only perserve stored fields,
> > the
> > unstored field information will be lost.
> >
> > In our application, the unstored fields consist of very large data,
> > and it
> > would not be desirable to store them.
> >
> > Are there any ways in getting around this problem? Thanks.
> >
> > --
> > Victor Hadianto
> >
> > NUIX Pty Ltd
> > Level 8, 143 York Street, Sydney 2000
> > Phone: (02) 9283 9010
> > Fax: (02) 9283 9020
> >
> > This message is intended only for the named recipient. If you are not
> > the
> > intended recipient you are notified that disclosing, copying,
> > distributing
> > or taking any action in reliance on the contents of this message or
> > attachment is strictly prohibited.
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
>
> __________________________________________________
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com

--
Victor Hadianto

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not the
intended recipient you are notified that disclosing, copying, distributing
or taking any action in reliance on the contents of this message or
attachment is strictly prohibited.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
hi,
i am not really an expert with lucene, but what do i think is:

1.it is not possible modify (or add) just a document's field. (actually i have the same needs, would be good to have this feature)
2.why you will lost your data? If you delete a document, then re-add it, lucene will change the index for stored and unstored fields, data + term info.

May be you did some test without close the reader and/or writer, because to delete you use the IndexReader and to write the IndexWriter. You should delete first, close, open the writer, add the document, optimize,close, then reopen the IndexSearcher and see what's there.

Hope will help.
Bye.

--

On Mon, 26 Aug 2002 09:30:57
Victor Hadianto wrote:
>On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote:
>> That's the very top question/answer in Lucene FAQ at jGuru:
>> http://www.jguru.com/faq/Lucene
>
>Hi Otis,
>
>Yep I realise that, but I think you haven't read my question closely. My
>problem is not simply delete/add the new document, but what happen with the
>fields thare are unstored. If all the fields in my documents are stored then
>it should be fine, but unfortunately not in our current situation.
>
>Has anyone else ever came across this problem?
>
>> Otis
>
>Victor
>
>
>>
>> --- Victor Hadianto <victorh@nuix.com.au> wrote:
>> > Hi,
>> >
>> > I asked this question yesterday on the user-list and so far there is
>> > no
>> > reply. I post again this question on the dev-list hoping that someone
>> > can
>> > answer it here.
>> >
>> > We have a situation where we have a large collection of documents,
>> > which
>> > consist of both stored and unstored fields, and we'd like to
>> > add/modify a
>> > stored field on an existing document.
>> >
>> > It seems the only way this can be achieved is to delete the document,
>> > and
>> > then re-create it. However, this will only perserve stored fields,
>> > the
>> > unstored field information will be lost.
>> >
>> > In our application, the unstored fields consist of very large data,
>> > and it
>> > would not be desirable to store them.
>> >
>> > Are there any ways in getting around this problem? Thanks.
>> >
>> > --
>> > Victor Hadianto
>> >
>> > NUIX Pty Ltd
>> > Level 8, 143 York Street, Sydney 2000
>> > Phone: (02) 9283 9010
>> > Fax: (02) 9283 9020
>> >
>> > This message is intended only for the named recipient. If you are not
>> > the
>> > intended recipient you are notified that disclosing, copying,
>> > distributing
>> > or taking any action in reliance on the contents of this message or
>> > attachment is strictly prohibited.
>> >
>> > --
>> > To unsubscribe, e-mail:
>> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>> > For additional commands, e-mail:
>> > <mailto:lucene-dev-help@jakarta.apache.org>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> HotJobs - Search Thousands of New Jobs
>> http://www.hotjobs.com
>
>--
>Victor Hadianto
>
>NUIX Pty Ltd
>Level 8, 143 York Street, Sydney 2000
>Phone: (02) 9283 9010
>Fax:   (02) 9283 9020
>
>This message is intended only for the named recipient. If you are not the
>intended recipient you are notified that disclosing, copying, distributing
>or taking any action in reliance on the contents of this message or
>attachment is strictly prohibited.
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>


__________________________________________________________
Outgrown your current e-mail service? Get a 25MB Inbox, POP3 Access,
No Ads and No Taglines with LYCOS MAIL PLUS.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
Victor Hadianto wrote:

>On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote:
>
>
>>That's the very top question/answer in Lucene FAQ at jGuru:
>>http://www.jguru.com/faq/Lucene
>>
>>
>
>Hi Otis,
>
>Yep I realise that, but I think you haven't read my question closely. My
>problem is not simply delete/add the new document, but what happen with the
>fields thare are unstored. If all the fields in my documents are stored then
>it should be fine, but unfortunately not in our current situation.
>
>Has anyone else ever came across this problem?
>
Yes, generally, there are two answers -- either make all fields stored
or use some other database for the storage of the "master" documents.
The first approach expands the use of Lucene to the point where it
becomes a database rather then an indexing engine. This makes sense in
some applications, but generally degrades in performance. The second
approach is preferred, but it increases complexity of the overall solution.

I suppose it might be possible to add a lower-level support for
extracting the terms from the unstored fields of the initial document
and re-indexing them for the new document...

Dmitry.

>
>
>
>>Otis
>>
>>
>
>Victor
>
>
>
>
>>--- Victor Hadianto <victorh@nuix.com.au> wrote:
>>
>>
>>>Hi,
>>>
>>>I asked this question yesterday on the user-list and so far there is
>>>no
>>>reply. I post again this question on the dev-list hoping that someone
>>>can
>>>answer it here.
>>>
>>>We have a situation where we have a large collection of documents,
>>>which
>>>consist of both stored and unstored fields, and we'd like to
>>>add/modify a
>>>stored field on an existing document.
>>>
>>>It seems the only way this can be achieved is to delete the document,
>>>and
>>>then re-create it. However, this will only perserve stored fields,
>>>the
>>>unstored field information will be lost.
>>>
>>>In our application, the unstored fields consist of very large data,
>>>and it
>>>would not be desirable to store them.
>>>
>>>Are there any ways in getting around this problem? Thanks.
>>>
>>>--
>>>Victor Hadianto
>>>
>>>NUIX Pty Ltd
>>>Level 8, 143 York Street, Sydney 2000
>>>Phone: (02) 9283 9010
>>>Fax: (02) 9283 9020
>>>
>>>This message is intended only for the named recipient. If you are not
>>>the
>>>intended recipient you are notified that disclosing, copying,
>>>distributing
>>>or taking any action in reliance on the contents of this message or
>>>attachment is strictly prohibited.
>>>
>>>--
>>>To unsubscribe, e-mail:
>>><mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>>>For additional commands, e-mail:
>>><mailto:lucene-dev-help@jakarta.apache.org>
>>>
>>>
>>__________________________________________________
>>Do You Yahoo!?
>>HotJobs - Search Thousands of New Jobs
>>http://www.hotjobs.com
>>
>>
>
>
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
Victor's problem is that some fields in the index are not stored.
That is, he can not retrieve a document from the index and get all the
fields' values back, so that he can safely remove the document, and add
a new one to replace it, using some of its old fields' values for new
fields' values.

He could probably get all values from the original sources, but that is
either too expensive (e.g. slow) or the original sources are
unavailable .

I think it would be a useful feature to have, yes. However, unless I'm
missing something, document updating without deleting and re-adding is
simply not possible currently.

That is why that FAQ entry does actually answer Victor's original
question. A number of people have inquired about this in the past,
which is why that FAQ entry has been entered. The mailing list
archives may reveal more information than there is in the FAQ entry.

Otis



--- none none <korfut@lycos.com> wrote:
> hi,
> i am not really an expert with lucene, but what do i think is:
>
> 1.it is not possible modify (or add) just a document's field.
> (actually i have the same needs, would be good to have this feature)
> 2.why you will lost your data? If you delete a document, then re-add
> it, lucene will change the index for stored and unstored fields, data
> + term info.
>
> May be you did some test without close the reader and/or writer,
> because to delete you use the IndexReader and to write the
> IndexWriter. You should delete first, close, open the writer, add the
> document, optimize,close, then reopen the IndexSearcher and see
> what's there.
>
> Hope will help.
> Bye.
>
> --
>
> On Mon, 26 Aug 2002 09:30:57
> Victor Hadianto wrote:
> >On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote:
> >> That's the very top question/answer in Lucene FAQ at jGuru:
> >> http://www.jguru.com/faq/Lucene
> >
> >Hi Otis,
> >
> >Yep I realise that, but I think you haven't read my question
> closely. My
> >problem is not simply delete/add the new document, but what happen
> with the
> >fields thare are unstored. If all the fields in my documents are
> stored then
> >it should be fine, but unfortunately not in our current situation.
> >
> >Has anyone else ever came across this problem?
> >
> >> Otis
> >
> >Victor
> >
> >
> >>
> >> --- Victor Hadianto <victorh@nuix.com.au> wrote:
> >> > Hi,
> >> >
> >> > I asked this question yesterday on the user-list and so far
> there is
> >> > no
> >> > reply. I post again this question on the dev-list hoping that
> someone
> >> > can
> >> > answer it here.
> >> >
> >> > We have a situation where we have a large collection of
> documents,
> >> > which
> >> > consist of both stored and unstored fields, and we'd like to
> >> > add/modify a
> >> > stored field on an existing document.
> >> >
> >> > It seems the only way this can be achieved is to delete the
> document,
> >> > and
> >> > then re-create it. However, this will only perserve stored
> fields,
> >> > the
> >> > unstored field information will be lost.
> >> >
> >> > In our application, the unstored fields consist of very large
> data,
> >> > and it
> >> > would not be desirable to store them.
> >> >
> >> > Are there any ways in getting around this problem? Thanks.
> >> >
> >> > --
> >> > Victor Hadianto
> >> >
> >> > NUIX Pty Ltd
> >> > Level 8, 143 York Street, Sydney 2000
> >> > Phone: (02) 9283 9010
> >> > Fax: (02) 9283 9020
> >> >
> >> > This message is intended only for the named recipient. If you
> are not
> >> > the
> >> > intended recipient you are notified that disclosing, copying,
> >> > distributing
> >> > or taking any action in reliance on the contents of this message
> or
> >> > attachment is strictly prohibited.
> >> >
> >> > --
> >> > To unsubscribe, e-mail:
> >> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> >> > For additional commands, e-mail:
> >> > <mailto:lucene-dev-help@jakarta.apache.org>
> >>
> >> __________________________________________________
> >> Do You Yahoo!?
> >> HotJobs - Search Thousands of New Jobs
> >> http://www.hotjobs.com
> >
> >--
> >Victor Hadianto
> >
> >NUIX Pty Ltd
> >Level 8, 143 York Street, Sydney 2000
> >Phone: (02) 9283 9010
> >Fax: (02) 9283 9020
> >
> >This message is intended only for the named recipient. If you are
> not the
> >intended recipient you are notified that disclosing, copying,
> distributing
> >or taking any action in reliance on the contents of this message or
> >attachment is strictly prohibited.
> >
> >--
> >To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> >For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
>
>
> __________________________________________________________
> Outgrown your current e-mail service? Get a 25MB Inbox, POP3 Access,
> No Ads and No Taglines with LYCOS MAIL PLUS.
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
We had the same problem, we wanted to add new tokens
to a field ( reagrdless of it being stored on unstored
) We also wanted to explore the possiblity of adding
the adding new fields to a document.

Deleting and re-adding document didn't seem to scale
because it involves re-tokenizing documents and big
document use a lot of cpu of tokenize.

One solution, which I have been trying to work on,
involves reading ".tis" and ".tii" files and
resurrecting the unstored tokens.

Do any of you have any insights into into how to go
about it or any other pointers.

-Manish


--- Dmitry Serebrennikov <dmitrys@earthlink.net>
wrote:
> Victor Hadianto wrote:
>
> >On Thu, 22 Aug 2002 23:14, Otis Gospodnetic wrote:
> >
> >
> >>That's the very top question/answer in Lucene FAQ
> at jGuru:
> >>http://www.jguru.com/faq/Lucene
> >>
> >>
> >
> >Hi Otis,
> >
> >Yep I realise that, but I think you haven't read my
> question closely. My
> >problem is not simply delete/add the new document,
> but what happen with the
> >fields thare are unstored. If all the fields in my
> documents are stored then
> >it should be fine, but unfortunately not in our
> current situation.
> >
> >Has anyone else ever came across this problem?
> >
> Yes, generally, there are two answers -- either make
> all fields stored
> or use some other database for the storage of the
> "master" documents.
> The first approach expands the use of Lucene to the
> point where it
> becomes a database rather then an indexing engine.
> This makes sense in
> some applications, but generally degrades in
> performance. The second
> approach is preferred, but it increases complexity
> of the overall solution.
>
> I suppose it might be possible to add a lower-level
> support for
> extracting the terms from the unstored fields of the
> initial document
> and re-indexing them for the new document...
>
> Dmitry.
>
> >
> >
> >
> >>Otis
> >>
> >>
> >
> >Victor
> >
> >
> >
> >
> >>--- Victor Hadianto <victorh@nuix.com.au> wrote:
> >>
> >>
> >>>Hi,
> >>>
> >>>I asked this question yesterday on the user-list
> and so far there is
> hoping that someone
> collection of documents,
> we'd like to
> delete the document,
> perserve stored fields,
> of very large data,
> problem? Thanks.
> recipient. If you are not
> disclosing, copying,
> of this message or
>
>>><mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> >>>For additional commands, e-mail:
> >>><mailto:lucene-dev-help@jakarta.apache.org>
> >>>
> >>>
> >>__________________________________________________
> >>Do You Yahoo!?
> >>HotJobs - Search Thousands of New Jobs
> >>http://www.hotjobs.com
> >>
> >>
> >
> >
> >
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
On Mon, 26 Aug 2002 10:47, Dmitry Serebrennikov wrote:
> Victor Hadianto wrote:
> Yes, generally, there are two answers -- either make all fields stored
> or use some other database for the storage of the "master" documents.

Thanks Dimtry for pointing this out, well I now I am sure that we can't do it.
I think the second approach might be more acceptable in our situation, it
does have a little bit of performance impact when recreating the document to
the index but should work.

> I suppose it might be possible to add a lower-level support for
> extracting the terms from the unstored fields of the initial document
> and re-indexing them for the new document...

Well now this is interesting, dare I hope that any Lucene guru is interested
doing this :D

Regards,

> Dmitry.

Victor Hadianto

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not the
intended recipient you are notified that disclosing, copying, distributing
or taking any action in reliance on the contents of this message or
attachment is strictly prohibited.

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Modifying document with unstored fields [ In reply to ]
Victor Hadianto wrote:

>On Mon, 26 Aug 2002 10:47, Dmitry Serebrennikov wrote:
>
>
>>Victor Hadianto wrote:
>>Yes, generally, there are two answers -- either make all fields stored
>>or use some other database for the storage of the "master" documents.
>>
>>
>
>Thanks Dimtry for pointing this out, well I now I am sure that we can't do it.
>I think the second approach might be more acceptable in our situation, it
>does have a little bit of performance impact when recreating the document to
>the index but should work.
>
Perhaps you can do something clever with multiple indexes. For example,
you might have large indexed fields in each document that do not change,
and then some other fields that are also indexed but do change. You
might then maintain two kinds of index entries that point to the same
underlying "document id" in some external storage - one for the fields
that change and another for those that do not. This way you will need to
reindex less text.

Alternatively, you might be able to have an "overlay index" where you
have only the updated information. The search is done against the
updated info first and then against the main index. You would then
consolidate the updates into the main index at night, or something.
Also, if you haven't already, take a look at MultiSearcher if any of
these scenarios seem interesting.

Regards.
Dmitry.




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>