Hello,
recently I updated the Lucene version in one of our products from 8.3 to 8.8.x (8.8.2 as of now).
The update showed no issues (e.g. compiled without changes) but I noticed that our test-suites take a lot longer to finish.
So I took a closer look at one test-case which showed a severe slowdown
(it’s doing small update, flush, search cycles in order to stress NRT; the purpose is to see performance-changes in an early stage ???? ):
Lucene 8.3: ~2,3s
Lucene 8.8.x: 25s
This is a huge difference. Therefore I used YourKit to profile 8.3 and 8.8 and do a comparison.
The gap is caused by different amount of calls to sun.nio.fs.WindowsNativeDispatcher.CreateFile0(long, int, int, long, int, int) WindowsNativeDispatcher.java (native)
8.3: about 150 calls
8.8: about 12500 calls
In order to hunt down what is causing this, I took a look at the open() in NRTDirectory.
Here I could see that the amount of calls to that open is in the same ballpark for 8.3 and 8.8
The difference is that in 8.3 nearly all files are available in the underlying RAMDirectory. While in 8.8 files are opened for reading that do not (yet) exist.
This leads to a call to the WindowsNativeDispatcher.CreateFile0
Add the end of the mail I added two example-stacktraces that show this behavior.
Has someone an idea what change might cause this or if I need to do something different in 8.8 compared to 8.3?
Thanks for any help,
Markus
Here is an example stacktrace that is causing such a try of a read-access to non-existing file:
Filename= _0.fdm (IOContext is READ) (I checked the directory on harddisk: it did not yet contain it nor in RAM-directory of the NRTCacheDir)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openChecksumInput:157, Directory (org.apache.lucene.store)
finish:140, FieldsIndexWriter (org.apache.lucene.codecs.compressing)
finish:480, CompressingStoredFieldsWriter (org.apache.lucene.codecs.compressing)
flush:81, StoredFieldsConsumer (org.apache.lucene.index)
flush:239, DefaultIndexingChain (org.apache.lucene.index)
flush:350, DocumentsWriterPerThread (org.apache.lucene.index)
doFlush:476, DocumentsWriter (org.apache.lucene.index)
flushAllThreads:656, DocumentsWriter (org.apache.lucene.index)
getReader:605, IndexWriter (org.apache.lucene.index)
doOpenIfChanged:277, StandardDirectoryReader (org.apache.lucene.index)
openIfChanged:235, DirectoryReader (org.apache.lucene.index)
In a consequence later accesses to such files also lead to the state that the file is not within the RAMDirectory but only on harddisk.
Example:
Filename _1.fdx Context = READ (file is on harddisk but not in RAMDirectory)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openChecksumInput:157, Directory (org.apache.lucene.store)
write:90, Lucene50CompoundFormat (org.apache.lucene.codecs.lucene50)
createCompoundFile:5316, IndexWriter (org.apache.lucene.index)
sealFlushedSegment:457, DocumentsWriterPerThread (org.apache.lucene.index)
flush:395, DocumentsWriterPerThread (org.apache.lucene.index)
doFlush:476, DocumentsWriter (org.apache.lucene.index)
flushAllThreads:656, DocumentsWriter (org.apache.lucene.index)
getReader:605, IndexWriter (org.apache.lucene.index)
doOpenFromWriter:290, StandardDirectoryReader (org.apache.lucene.index)
doOpenIfChanged:275, StandardDirectoryReader (org.apache.lucene.index)
openIfChanged:235, DirectoryReader (org.apache.lucene.index)
Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Sanjay Brahmawar (Vorsitzender/Chairman), Dr. Elke Frank, Dr. Matthias Heiden, Dr. Stefan Sigg - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Karl-Heinz Streibich - http://www.softwareag.com
recently I updated the Lucene version in one of our products from 8.3 to 8.8.x (8.8.2 as of now).
The update showed no issues (e.g. compiled without changes) but I noticed that our test-suites take a lot longer to finish.
So I took a closer look at one test-case which showed a severe slowdown
(it’s doing small update, flush, search cycles in order to stress NRT; the purpose is to see performance-changes in an early stage ???? ):
Lucene 8.3: ~2,3s
Lucene 8.8.x: 25s
This is a huge difference. Therefore I used YourKit to profile 8.3 and 8.8 and do a comparison.
The gap is caused by different amount of calls to sun.nio.fs.WindowsNativeDispatcher.CreateFile0(long, int, int, long, int, int) WindowsNativeDispatcher.java (native)
8.3: about 150 calls
8.8: about 12500 calls
In order to hunt down what is causing this, I took a look at the open() in NRTDirectory.
Here I could see that the amount of calls to that open is in the same ballpark for 8.3 and 8.8
The difference is that in 8.3 nearly all files are available in the underlying RAMDirectory. While in 8.8 files are opened for reading that do not (yet) exist.
This leads to a call to the WindowsNativeDispatcher.CreateFile0
Add the end of the mail I added two example-stacktraces that show this behavior.
Has someone an idea what change might cause this or if I need to do something different in 8.8 compared to 8.3?
Thanks for any help,
Markus
Here is an example stacktrace that is causing such a try of a read-access to non-existing file:
Filename= _0.fdm (IOContext is READ) (I checked the directory on harddisk: it did not yet contain it nor in RAM-directory of the NRTCacheDir)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openChecksumInput:157, Directory (org.apache.lucene.store)
finish:140, FieldsIndexWriter (org.apache.lucene.codecs.compressing)
finish:480, CompressingStoredFieldsWriter (org.apache.lucene.codecs.compressing)
flush:81, StoredFieldsConsumer (org.apache.lucene.index)
flush:239, DefaultIndexingChain (org.apache.lucene.index)
flush:350, DocumentsWriterPerThread (org.apache.lucene.index)
doFlush:476, DocumentsWriter (org.apache.lucene.index)
flushAllThreads:656, DocumentsWriter (org.apache.lucene.index)
getReader:605, IndexWriter (org.apache.lucene.index)
doOpenIfChanged:277, StandardDirectoryReader (org.apache.lucene.index)
openIfChanged:235, DirectoryReader (org.apache.lucene.index)
In a consequence later accesses to such files also lead to the state that the file is not within the RAMDirectory but only on harddisk.
Example:
Filename _1.fdx Context = READ (file is on harddisk but not in RAMDirectory)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openInput:100, FilterDirectory (org.apache.lucene.store)
openChecksumInput:157, Directory (org.apache.lucene.store)
write:90, Lucene50CompoundFormat (org.apache.lucene.codecs.lucene50)
createCompoundFile:5316, IndexWriter (org.apache.lucene.index)
sealFlushedSegment:457, DocumentsWriterPerThread (org.apache.lucene.index)
flush:395, DocumentsWriterPerThread (org.apache.lucene.index)
doFlush:476, DocumentsWriter (org.apache.lucene.index)
flushAllThreads:656, DocumentsWriter (org.apache.lucene.index)
getReader:605, IndexWriter (org.apache.lucene.index)
doOpenFromWriter:290, StandardDirectoryReader (org.apache.lucene.index)
doOpenIfChanged:275, StandardDirectoryReader (org.apache.lucene.index)
openIfChanged:235, DirectoryReader (org.apache.lucene.index)
Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Sanjay Brahmawar (Vorsitzender/Chairman), Dr. Elke Frank, Dr. Matthias Heiden, Dr. Stefan Sigg - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Karl-Heinz Streibich - http://www.softwareag.com