Hello,
I use Lucene with PyLucene on a public-facing web application. We have a moderately large index (~24M documents, ~11GB index data), with a constant stream of new documents.
I recently upgraded to PyLucene 7.
When trying to test the new release of PyLucene 8, I encountered an IndexFormatTooOld error because my index conversion from Lucene6 to Lucene7 was not complete.
I found IndexUpgrader, and I had a look at its implementation. I would very much like to avoid putting down the service during the index upgrade, so I believe I cannot use IndexUpgrader because I need the write lock to be held by the web application to index new documents.
So I figure I could get the desired result with an IndexWriter.forceMerge(1). But the documentation says "This is a horribly costly operation, especially when you pass a small maxNumSegments; usually you should only call this if the index is static (will no longer be changed)." https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge-int-
And indeed, forceMerge tends be killed the kernel OOM killer on my development VM. I want to avoid this failure mode in production. I could increase the VM until it works, but I would rather have a less brutal approach to upgrading a live index. Something that could run in the background with reasonable amounts of anonymous memory.
What is the recommended approach to upgrading a live index?
How can I know from the code that the index needs upgrading at all? I could add a manual knob to start an upgrade, but it would be better if it occurred transparently when I upgrade PyLucene.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
I use Lucene with PyLucene on a public-facing web application. We have a moderately large index (~24M documents, ~11GB index data), with a constant stream of new documents.
I recently upgraded to PyLucene 7.
When trying to test the new release of PyLucene 8, I encountered an IndexFormatTooOld error because my index conversion from Lucene6 to Lucene7 was not complete.
I found IndexUpgrader, and I had a look at its implementation. I would very much like to avoid putting down the service during the index upgrade, so I believe I cannot use IndexUpgrader because I need the write lock to be held by the web application to index new documents.
So I figure I could get the desired result with an IndexWriter.forceMerge(1). But the documentation says "This is a horribly costly operation, especially when you pass a small maxNumSegments; usually you should only call this if the index is static (will no longer be changed)." https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge-int-
And indeed, forceMerge tends be killed the kernel OOM killer on my development VM. I want to avoid this failure mode in production. I could increase the VM until it works, but I would rather have a less brutal approach to upgrading a live index. Something that could run in the background with reasonable amounts of anonymous memory.
What is the recommended approach to upgrading a live index?
How can I know from the code that the index needs upgrading at all? I could add a manual knob to start an upgrade, but it would be better if it occurred transparently when I upgrade PyLucene.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org