Mailing List Archive

Re: [External] Re: How to highlight fields that are not stored?
Hi Michael.

Thanks for the reply.

As I said in the opening statement,
I need to move away reading a file into memory before indexing the file..
The use case here is files 2+ GB in size.

I thought streaming the file to be indexed is the only alternative
to reading the full file in RAM then indexing.

I would be happy to be directed to another way to get 2+ GB files indexed.


> highlighting requires
> the document in its uninverted form. Otherwise what text would you
> highlight?

Highlighting the, possibly changed, terms from the index is my goal
if I can't store the entire document due to RAM size constraints.

Not having the original file text in the highlight isn't ideal,
but it is better than not being able to highlight text in large documents.

David Shifflett


?On 2/16/23, 4:01 PM, "Michael Sokolov" <msokolov@gmail.com <mailto:msokolov@gmail.com>> wrote:


Sorry your problem statement makes no sense: you should be able to
store field data in the index without loading all your documents into
RAM while indexing. Maybe there is some constraint you are not telling
us about? Or you may be confused. In any case highlighting requires
the document in its uninverted form. Otherwise what text would you
highlight?


On Mon, Feb 13, 2023 at 3:46 PM Shifflett, David [USA]
<Shifflett_David@bah.com.inva <mailto:Shifflett_David@bah.com.inva>lid> wrote:
>
> Hi,
> I am converting my application from
> reading documents into memory, then indexing the documents
> to streaming the documents to be indexed.
>
> I quickly found out this required that the field NOT be stored.
> I then quickly found out that my highlighting code requires the field to be stored.
>
> I’ve been searching for an existing highlighter that doesn’t require the field to be stored,
> and thought I’d found one in the FastVectorHighlighter,
> but tests revealed this highlighter also requires the field to be stored,
> though this requirement isn’t documented, or reflected in any returned exception.
>
> I have been investigating using code like
> Terms terms = reader.getTermVector(docID, fieldName);
> TermsEnum termsEnum = terms.iterator();
> BytesRef bytesRef = termsEnum.next();
> PostingsEnum pe = termsEnum.postings(null, PostingsEnum.OFFSETS);
>
> While this gives me the terms from the document, and the positions,
> iterating over this, and matching to the queries I’m running,
> seems cumbersome, and inefficient.
>
> Any suggestions for highlighting query matches without the searched field being stored?
>
> Thanks,
> David Shifflett
> Senior Lead Technologist
> Enterprise Cross Domain Solutions (ECDS)
> Booz Allen Hamilton
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org <mailto:java-user-unsubscribe@lucene.apache.org>
For additional commands, e-mail: java-user-help@lucene.apache.org <mailto:java-user-help@lucene.apache.org>





???????????????????????????????????????????????????????????????????????F?V?7V'67&?&R?R???â?f?W6W"?V?7V'67&?&T?V6V?R?6?R??&p?f?"FF?F????6????G2?R???â?f?W6W"?V??V6V?R?6?R??&p?