Mailing List Archive: solr 4.7 MultiFields and MultiDocValues slow

solr 4.7 MultiFields and MultiDocValues slow

Mar 7, 2016, 1:49 AM

Post #1 of 3 (860 views)

Hello everyone,
I am using solr 4.7.2 . I am somewhat new to solr and want to dump solr
indexes to json format. To check for deleted docs I have used
*Bits liveDocs = MultiFields.getLiveDocs(reader);*
I also want to get field Boosts for all documents and for that I have used
*NumericDocValues ndv = MultiDocValues.getNormValues(reader, field.name
<http://field.name/>());*

*The documentation of these methods states that they are both quite
expensive and slow *as they merge individual sub-segment readers. The doc
recommends to write these implementations yourself. Can someone please
explain why will my implementation be fast as I will also have to merge the
segment readers as I want to have the info for all documents. Or Can anyone
suggest an optimal way to implement these methods. Any help is highly
appreciated.
--
Thanks and Regards
Rahul Jha

Re: solr 4.7 MultiFields and MultiDocValues slow [ In reply to ]

david.w.smiley at gmail

Mar 15, 2016, 6:56 AM

Post #2 of 3 (829 views)

Permalink

Basically, ideally you can do what you need to do by first iterating over
the LeafReaders and working with each there. If you can do that, then you
don't need SlowCompositeReaderWrapper and the overhead it introduces via
its Multi* classes. Very few tasks require SCRW. Dumping the index to a
JSON format shouldn't require SCRW.
~ David

On Mon, Mar 7, 2016 at 4:49 AM Rahul Kumar <rahul.kumar08@snapdeal.com>
wrote:

> Hello everyone,
> I am using solr 4.7.2 . I am somewhat new to solr and want to dump solr
> indexes to json format. To check for deleted docs I have used
> *Bits liveDocs = MultiFields.getLiveDocs(reader);*
> I also want to get field Boosts for all documents and for that I have used
> *NumericDocValues ndv = MultiDocValues.getNormValues(reader, field.name
> <http://field.name/>());*
>
> *The documentation of these methods states that they are both quite
> expensive and slow *as they merge individual sub-segment readers. The doc
> recommends to write these implementations yourself. Can someone please
> explain why will my implementation be fast as I will also have to merge the
> segment readers as I want to have the info for all documents. Or Can anyone
> suggest an optimal way to implement these methods. Any help is highly
> appreciated.
> --
> Thanks and Regards
> Rahul Jha
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: solr 4.7 MultiFields and MultiDocValues slow [ In reply to ]

rahul.kumar08 at snapdeal

Mar 15, 2016, 10:52 AM

Post #3 of 3 (830 views)

Permalink

Yeah, I already followed this approach. Can you tell me why is this
speed-up ?
As per my understanding, iterating through top level reader, there is a
binary search to lookup segment readers first and then they are called to
fetch the norm data. So iterating through segment readers. We save up on
that binary search for sub-segment readers. Also is there any other way to
access the norm values ?

Rahul Kumar
*Software Engineer- I (Search)*

*M*: +91 9023542950 *EXT: *14226
362-363, ASF CENTRE , UDYOG VIHAR , PHASE - IV , GURGAON 122 016 , INDIA

On Tue, Mar 15, 2016 at 7:26 PM, David Smiley <david.w.smiley@gmail.com>
wrote:

> Basically, ideally you can do what you need to do by first iterating over
> the LeafReaders and working with each there. If you can do that, then you
> don't need SlowCompositeReaderWrapper and the overhead it introduces via
> its Multi* classes. Very few tasks require SCRW. Dumping the index to a
> JSON format shouldn't require SCRW.
> ~ David
>
> On Mon, Mar 7, 2016 at 4:49 AM Rahul Kumar <rahul.kumar08@snapdeal.com>
> wrote:
>
> > Hello everyone,
> > I am using solr 4.7.2 . I am somewhat new to solr and want to dump solr
> > indexes to json format. To check for deleted docs I have used
> > *Bits liveDocs = MultiFields.getLiveDocs(reader);*
> > I also want to get field Boosts for all documents and for that I have
> used
> > *NumericDocValues ndv = MultiDocValues.getNormValues(reader, field.name
> > <http://field.name/>());*
> >
> > *The documentation of these methods states that they are both quite
> > expensive and slow *as they merge individual sub-segment readers. The doc
> > recommends to write these implementations yourself. Can someone please
> > explain why will my implementation be fast as I will also have to merge
> the
> > segment readers as I want to have the info for all documents. Or Can
> anyone
> > suggest an optimal way to implement these methods. Any help is highly
> > appreciated.
> > --
> > Thanks and Regards
> > Rahul Jha
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>