Mailing List Archive

Index Optimization
Hello folks,

I got some Lucene indexes in my project, mostly of them are created once and updated, not so frequently, about once a week or monthly. The indexes sizes are about 20GB and as more inserts are done the indexes grow, so I'd like to know what the best index optimization strategy or even it is really necessary, since 99% of the time we do read operations. The documentation is not clear in some aspects in this subject. If someone could give some tips, I'll be very grateful.

Best regrads,
Eduardo Lopes.





-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure."
Re: Index Optimization [ In reply to ]
Optimize is rarely useful. It can give some performance gains, but is quite an expensive operation. Pre Solr 7.5, optimizing had some behaviors that weren’t obvious, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Post 7.5, the behavior has changed.

If I were going to offer advice, you have two paths:
1> don’t optimize at all. This is my preference
2> optimize after every update, _assuming_ this means you update only daily at most.

Best,
Erick

> On Jun 25, 2019, at 5:34 AM, Eduardo Costa Lopes <eduardo-costa.lopes@serpro.gov.br> wrote:
>
> Hello folks,
>
> I got some Lucene indexes in my project, mostly of them are created once and updated, not so frequently, about once a week or monthly. The indexes sizes are about 20GB and as more inserts are done the indexes grow, so I'd like to know what the best index optimization strategy or even it is really necessary, since 99% of the time we do read operations. The documentation is not clear in some aspects in this subject. If someone could give some tips, I'll be very grateful.
>
> Best regrads,
> Eduardo Lopes.
>
>
>
>
>
> -
>
>
> "Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."
>
> "This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and may contain confidential data, protected under professional secrecy rules. Its unauthorized use is illegal and may subject the transgressor to the law's penalties. If you're not the addressee, please send it back, elucidating the failure."


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org