Hello,
I've been looking a bit more carefully at nightly benchmarks recently and
I'm puzzled by the fact that indexing spends almost 5% of the time on
AttributeSource#addAttribute. Here is the link
<http://people.apache.org/~mikemccand/lucenebench/2021.10.20.08.24.09.html#profiler_4kb_indexing_1_cpu>
.
4.37% 14731
org.apache.lucene.util.AttributeSource#addAttribute()
at
org.apache.lucene.document.Field$StringTokenStream#()
at
org.apache.lucene.document.Field#tokenStream()
at
org.apache.lucene.index.IndexingChain$PerField#invert()
at
org.apache.lucene.index.IndexingChain#processField()
at
org.apache.lucene.index.IndexingChain#processDocument()
at
org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments()
at
org.apache.lucene.index.DocumentsWriter#updateDocuments()
at
org.apache.lucene.index.IndexWriter#updateDocuments()
at
org.apache.lucene.index.IndexWriter#updateDocument()
at
org.apache.lucene.index.IndexWriter#addDocument()
at perf.IndexThreads$IndexThread#run()
Given that nightly benchmarks reuse Field instances across documents, this
should only happen once per thread, so why does it show up as a bottleneck
in our nightly benchmarks? I tried to reproduce locally, but I'm not seeing
AttributeSource among top CPU consumers.
--
Adrien
I've been looking a bit more carefully at nightly benchmarks recently and
I'm puzzled by the fact that indexing spends almost 5% of the time on
AttributeSource#addAttribute. Here is the link
<http://people.apache.org/~mikemccand/lucenebench/2021.10.20.08.24.09.html#profiler_4kb_indexing_1_cpu>
.
4.37% 14731
org.apache.lucene.util.AttributeSource#addAttribute()
at
org.apache.lucene.document.Field$StringTokenStream#()
at
org.apache.lucene.document.Field#tokenStream()
at
org.apache.lucene.index.IndexingChain$PerField#invert()
at
org.apache.lucene.index.IndexingChain#processField()
at
org.apache.lucene.index.IndexingChain#processDocument()
at
org.apache.lucene.index.DocumentsWriterPerThread#updateDocuments()
at
org.apache.lucene.index.DocumentsWriter#updateDocuments()
at
org.apache.lucene.index.IndexWriter#updateDocuments()
at
org.apache.lucene.index.IndexWriter#updateDocument()
at
org.apache.lucene.index.IndexWriter#addDocument()
at perf.IndexThreads$IndexThread#run()
Given that nightly benchmarks reuse Field instances across documents, this
should only happen once per thread, so why does it show up as a bottleneck
in our nightly benchmarks? I tried to reproduce locally, but I'm not seeing
AttributeSource among top CPU consumers.
--
Adrien