Mailing List Archive

Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1.
Hi,

Also have a look here:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378

Seems it might be related.
- Alex

On Sun, Jul 26, 2020, 23:31 Trejkaz <trejkaz@trypticon.org> wrote:

> Hi all.
>
> I've been tracking down slow seeking performance in TermsEnum after
> updating to Lucene 8.5.1.
>
> On 8.5.1:
>
> SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in our
> code)
> SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%)
> CompressionAlgorithm$2.read: 25,789 ms (53.5%)
> LowercaseAsciiCompression.decompress: 25,789 ms (53.5%)
> DataInput.readVInt: 24,690 ms (51.2%)
> SegmentTermsEnumFrame.scanToTerm: 2,921 ms (6.1%)
>
> On 7.7.0 (previous version we were using):
>
> SegmentTermsEnum.seekExact: 5,897 ms (43.7%) (remaining time in our
> code)
> SegmentTermsEnumFrame.loadBlock: 3,499 ms (25.9%)
> BufferedIndexInput.readBytes: 1,500 ms (11.1%)
> DataInput.readVInt: 1,108 (8.2%)
> SegmentTermsEnumFrame.scanToTerm: 1,501 ms (11.1%)
>
> So on the surface it sort of looks like the new version spends less
> time scanning and much more time loading blocks to decompress?
>
> Looking for some clues to what might have changed here, and whether
> it's something we can avoid, but currently LUCENE-4702 looks like it
> may be related.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1. [ In reply to ]
Alex, this issue you linked is about the terms dictionary of doc values.
Trejkaz linked the correct issue which is about the terms dictionary of the
inverted index.

It's interesting you're seeing so much time spent in readVInt on 8.5 since
there is a single vint that is read for each block in
"LowercaseAsciiCompression.decompress". Are these relative timings
consistent over multiple runs?

On Mon, Jul 27, 2020 at 5:57 AM Alex K <aklibisz@gmail.com> wrote:

> Hi,
>
> Also have a look here:
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378
>
> Seems it might be related.
> - Alex
>
> On Sun, Jul 26, 2020, 23:31 Trejkaz <trejkaz@trypticon.org> wrote:
>
> > Hi all.
> >
> > I've been tracking down slow seeking performance in TermsEnum after
> > updating to Lucene 8.5.1.
> >
> > On 8.5.1:
> >
> > SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in our
> > code)
> > SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%)
> > CompressionAlgorithm$2.read: 25,789 ms (53.5%)
> > LowercaseAsciiCompression.decompress: 25,789 ms (53.5%)
> > DataInput.readVInt: 24,690 ms (51.2%)
> > SegmentTermsEnumFrame.scanToTerm: 2,921 ms (6.1%)
> >
> > On 7.7.0 (previous version we were using):
> >
> > SegmentTermsEnum.seekExact: 5,897 ms (43.7%) (remaining time in our
> > code)
> > SegmentTermsEnumFrame.loadBlock: 3,499 ms (25.9%)
> > BufferedIndexInput.readBytes: 1,500 ms (11.1%)
> > DataInput.readVInt: 1,108 (8.2%)
> > SegmentTermsEnumFrame.scanToTerm: 1,501 ms (11.1%)
> >
> > So on the surface it sort of looks like the new version spends less
> > time scanning and much more time loading blocks to decompress?
> >
> > Looking for some clues to what might have changed here, and whether
> > it's something we can avoid, but currently LUCENE-4702 looks like it
> > may be related.
> >
> > TX
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>


--
Adrien
Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1. [ In reply to ]
Yep, the timings posted were the best speed out of 10 runs in a row.
The profiling was done in the middle of 1000 iterations in a row just
to knock off any warm-up time.

The sort of data we're storing in the field is quite possibly a
worst-case scenario for the compression. The data is mixed digest info
like

"MD5;[md5 value in hex]"
"SHA-1;[sha1 value in hex]"
"SHA-256;[sha256 value in hex]"

In fact, there's another field in the index which contains the same
MD5s without the common prefix - the same sort of operation on that
field doesn't get the same slowdown. (It's a bit slower. Like 5% or
so? Certainly nothing like 100%.) So at least for looking up MD5s we
have the luxury of an alternative option for the lookups. For other
digests I'm afraid we're stuck for now until we change how we index
those.

What's ironic is that we originally put the prefix on to make seeking
to the values faster. ^^;;

TX


On Mon, 27 Jul 2020 at 17:08, Adrien Grand <jpountz@gmail.com> wrote:
>
> Alex, this issue you linked is about the terms dictionary of doc values.
> Trejkaz linked the correct issue which is about the terms dictionary of the
> inverted index.
>
> It's interesting you're seeing so much time spent in readVInt on 8.5 since
> there is a single vint that is read for each block in
> "LowercaseAsciiCompression.decompress". Are these relative timings
> consistent over multiple runs?
>
> On Mon, Jul 27, 2020 at 5:57 AM Alex K <aklibisz@gmail.com> wrote:
>
> > Hi,
> >
> > Also have a look here:
> > https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378
> >
> > Seems it might be related.
> > - Alex
> >
> > On Sun, Jul 26, 2020, 23:31 Trejkaz <trejkaz@trypticon.org> wrote:
> >
> > > Hi all.
> > >
> > > I've been tracking down slow seeking performance in TermsEnum after
> > > updating to Lucene 8.5.1.
> > >
> > > On 8.5.1:
> > >
> > > SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in our
> > > code)
> > > SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%)
> > > CompressionAlgorithm$2.read: 25,789 ms (53.5%)
> > > LowercaseAsciiCompression.decompress: 25,789 ms (53.5%)
> > > DataInput.readVInt: 24,690 ms (51.2%)
> > > SegmentTermsEnumFrame.scanToTerm: 2,921 ms (6.1%)
> > >
> > > On 7.7.0 (previous version we were using):
> > >
> > > SegmentTermsEnum.seekExact: 5,897 ms (43.7%) (remaining time in our
> > > code)
> > > SegmentTermsEnumFrame.loadBlock: 3,499 ms (25.9%)
> > > BufferedIndexInput.readBytes: 1,500 ms (11.1%)
> > > DataInput.readVInt: 1,108 (8.2%)
> > > SegmentTermsEnumFrame.scanToTerm: 1,501 ms (11.1%)
> > >
> > > So on the surface it sort of looks like the new version spends less
> > > time scanning and much more time loading blocks to decompress?
> > >
> > > Looking for some clues to what might have changed here, and whether
> > > it's something we can avoid, but currently LUCENE-4702 looks like it
> > > may be related.
> > >
> > > TX
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1. [ In reply to ]
It's interesting you're not seeing the same slowdown on the other field.
How hard would it be for you to test what the performance is if you
lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]",
etc. The reason I'm asking is because the compression logic is optimized
for lowercase ASCII so removing uppercase letters would help remove the
need to encode exceptions, which is one reason I'm thinking why the
slowdown might be less on your other field.

In case you're using an old JRE, you might want to try out with a JRE 13 or
more recent. Some of the logic in this lowercase ASCII compression only
gets vectorized on JDK13+.

On Mon, Jul 27, 2020 at 9:40 AM Trejkaz <trejkaz@trypticon.org> wrote:

> Yep, the timings posted were the best speed out of 10 runs in a row.
> The profiling was done in the middle of 1000 iterations in a row just
> to knock off any warm-up time.
>
> The sort of data we're storing in the field is quite possibly a
> worst-case scenario for the compression. The data is mixed digest info
> like
>
> "MD5;[md5 value in hex]"
> "SHA-1;[sha1 value in hex]"
> "SHA-256;[sha256 value in hex]"
>
> In fact, there's another field in the index which contains the same
> MD5s without the common prefix - the same sort of operation on that
> field doesn't get the same slowdown. (It's a bit slower. Like 5% or
> so? Certainly nothing like 100%.) So at least for looking up MD5s we
> have the luxury of an alternative option for the lookups. For other
> digests I'm afraid we're stuck for now until we change how we index
> those.
>
> What's ironic is that we originally put the prefix on to make seeking
> to the values faster. ^^;;
>
> TX
>
>
> On Mon, 27 Jul 2020 at 17:08, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Alex, this issue you linked is about the terms dictionary of doc values.
> > Trejkaz linked the correct issue which is about the terms dictionary of
> the
> > inverted index.
> >
> > It's interesting you're seeing so much time spent in readVInt on 8.5
> since
> > there is a single vint that is read for each block in
> > "LowercaseAsciiCompression.decompress". Are these relative timings
> > consistent over multiple runs?
> >
> > On Mon, Jul 27, 2020 at 5:57 AM Alex K <aklibisz@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Also have a look here:
> > >
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378
> > >
> > > Seems it might be related.
> > > - Alex
> > >
> > > On Sun, Jul 26, 2020, 23:31 Trejkaz <trejkaz@trypticon.org> wrote:
> > >
> > > > Hi all.
> > > >
> > > > I've been tracking down slow seeking performance in TermsEnum after
> > > > updating to Lucene 8.5.1.
> > > >
> > > > On 8.5.1:
> > > >
> > > > SegmentTermsEnum.seekExact: 33,829 ms (70.2%) (remaining time in
> our
> > > > code)
> > > > SegmentTermsEnumFrame.loadBlock: 29,104 ms (60.4%)
> > > > CompressionAlgorithm$2.read: 25,789 ms (53.5%)
> > > > LowercaseAsciiCompression.decompress: 25,789 ms
> (53.5%)
> > > > DataInput.readVInt: 24,690 ms (51.2%)
> > > > SegmentTermsEnumFrame.scanToTerm: 2,921 ms (6.1%)
> > > >
> > > > On 7.7.0 (previous version we were using):
> > > >
> > > > SegmentTermsEnum.seekExact: 5,897 ms (43.7%) (remaining time in
> our
> > > > code)
> > > > SegmentTermsEnumFrame.loadBlock: 3,499 ms (25.9%)
> > > > BufferedIndexInput.readBytes: 1,500 ms (11.1%)
> > > > DataInput.readVInt: 1,108 (8.2%)
> > > > SegmentTermsEnumFrame.scanToTerm: 1,501 ms (11.1%)
> > > >
> > > > So on the surface it sort of looks like the new version spends less
> > > > time scanning and much more time loading blocks to decompress?
> > > >
> > > > Looking for some clues to what might have changed here, and whether
> > > > it's something we can avoid, but currently LUCENE-4702 looks like it
> > > > may be related.
> > > >
> > > > TX
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Adrien
Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1. [ In reply to ]
On Mon, 27 Jul 2020 at 19:24, Adrien Grand <jpountz@gmail.com> wrote:
>
> It's interesting you're not seeing the same slowdown on the other field.
> How hard would it be for you to test what the performance is if you
> lowercase the name of the digest algorithms, ie. "md5;[md5 value in hex]",
> etc. The reason I'm asking is because the compression logic is optimized
> for lowercase ASCII so removing uppercase letters would help remove the
> need to encode exceptions, which is one reason I'm thinking why the
> slowdown might be less on your other field.

It took me a while to get some free time to make a new version of the
test which doesn't have our own code in it so that I was able to add
the new field without rewriting a large chunk of our system... but it
looks like the timing for lowercase prefixes is around the same as
upper.

This particular test I've ended up doing though is a pathological
case, as it turned out to have 0 hits in the index despite searching
for 29 million digests.

-------------------------
Time for just reading the digest list
Count = 29459432, time = 1946 ms
Count = 29459432, time = 1752 ms
Count = 29459432, time = 1752 ms
-------------------------
Times for digest-upper
Count = 0, time = 40570 ms
Count = 0, time = 42574 ms
Count = 0, time = 40121 ms
-------------------------
Times for digest-lower
Count = 0, time = 40462 ms
Count = 0, time = 40319 ms
Count = 0, time = 39938 ms
-------------------------
Times for digest-no-prefix
Count = 0, time = 10936 ms
Count = 0, time = 10857 ms
Count = 0, time = 10628 ms
-------------------------

So about 4 times faster on the field with no term prefixes.
The code for all 3 tests is shared:

private static void timeDigest(Path md5sFile, IndexReader reader,
String field, String termPrefix) throws IOException {
try (BufferedReader md5sReader = Files.newBufferedReader(md5sFile)) {
TermsEnum termsEnum = MultiTerms.getTerms(reader, field).iterator();
PostingsEnum postingsEnum = null;

long t0 = System.currentTimeMillis();
int hitCount = 0;

while (true) {
String md5 = md5sReader.readLine();
if (md5 == null) {
break;
}

if (termsEnum.seekExact(new BytesRef(termPrefix + md5))) {
postingsEnum = termsEnum.postings(postingsEnum,
PostingsEnum.NONE);
while (postingsEnum.nextDoc() !=
DocIdSetIterator.NO_MORE_DOCS) {
hitCount++;
}
}
}

long t1 = System.currentTimeMillis();
System.out.println("Count = " + hitCount + ", time = " +
(t1 - t0) + " ms");
}
}

> In case you're using an old JRE, you might want to try out with a JRE 13 or
> more recent. Some of the logic in this lowercase ASCII compression only
> gets vectorized on JDK13+.

Times for JDK 14.0.2:

-------------------------
Times for just reading the digest list
Count = 29459432, time = 2050 ms
Count = 29459432, time = 2156 ms
Count = 29459432, time = 1905 ms
-------------------------
Times for digest-upper
Count = 0, time = 24336 ms
Count = 0, time = 24236 ms
Count = 0, time = 23986 ms
-------------------------
Times for digest-lower
Count = 0, time = 24440 ms
Count = 0, time = 23960 ms
Count = 0, time = 23956 ms
-------------------------
Times for digest-no-prefix
Count = 0, time = 13177 ms
Count = 0, time = 13095 ms
Count = 0, time = 13081 ms
-------------------------


Almost a 2:1 speed boost for prefixed timings just by updating the JDK...

The non-prefixed timings seem to be 30% slower than on JDK 8 (WTF?)
but still win when compared to the prefixed timings alone.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: TermsEnum.seekExact degraded performance somewhere between Lucene 7.7.0 and 8.5.1. [ In reply to ]
I'm puzzled. I would have expected the digest-no-prefix times to be faster
on JDK14 than on your older JDK for the same reason that digest-lower and
digest-upper got faster.

I wonder if part of the reason why the no-prefix variant is faster is
because it is better at identifying that the digest doesn't exist in the
terms dictionary thanks to the terms index, while the prefix variants might
need to go to the terms dict and decompress data more frequently? But if
that was the case, then you would already be seeing faster lookups on the
no-prefix variant on older versions of Lucene.

If this is an important use-case for you, one way to improve these times
would consist of indexing these digests in their binary form.

On Thu, Jul 30, 2020 at 9:01 AM Trejkaz <trejkaz@trypticon.org> wrote:

> On Mon, 27 Jul 2020 at 19:24, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > It's interesting you're not seeing the same slowdown on the other field.
> > How hard would it be for you to test what the performance is if you
> > lowercase the name of the digest algorithms, ie. "md5;[md5 value in
> hex]",
> > etc. The reason I'm asking is because the compression logic is optimized
> > for lowercase ASCII so removing uppercase letters would help remove the
> > need to encode exceptions, which is one reason I'm thinking why the
> > slowdown might be less on your other field.
>
> It took me a while to get some free time to make a new version of the
> test which doesn't have our own code in it so that I was able to add
> the new field without rewriting a large chunk of our system... but it
> looks like the timing for lowercase prefixes is around the same as
> upper.
>
> This particular test I've ended up doing though is a pathological
> case, as it turned out to have 0 hits in the index despite searching
> for 29 million digests.
>
> -------------------------
> Time for just reading the digest list
> Count = 29459432, time = 1946 ms
> Count = 29459432, time = 1752 ms
> Count = 29459432, time = 1752 ms
> -------------------------
> Times for digest-upper
> Count = 0, time = 40570 ms
> Count = 0, time = 42574 ms
> Count = 0, time = 40121 ms
> -------------------------
> Times for digest-lower
> Count = 0, time = 40462 ms
> Count = 0, time = 40319 ms
> Count = 0, time = 39938 ms
> -------------------------
> Times for digest-no-prefix
> Count = 0, time = 10936 ms
> Count = 0, time = 10857 ms
> Count = 0, time = 10628 ms
> -------------------------
>
> So about 4 times faster on the field with no term prefixes.
> The code for all 3 tests is shared:
>
> private static void timeDigest(Path md5sFile, IndexReader reader,
> String field, String termPrefix) throws IOException {
> try (BufferedReader md5sReader =
> Files.newBufferedReader(md5sFile)) {
> TermsEnum termsEnum = MultiTerms.getTerms(reader,
> field).iterator();
> PostingsEnum postingsEnum = null;
>
> long t0 = System.currentTimeMillis();
> int hitCount = 0;
>
> while (true) {
> String md5 = md5sReader.readLine();
> if (md5 == null) {
> break;
> }
>
> if (termsEnum.seekExact(new BytesRef(termPrefix + md5))) {
> postingsEnum = termsEnum.postings(postingsEnum,
> PostingsEnum.NONE);
> while (postingsEnum.nextDoc() !=
> DocIdSetIterator.NO_MORE_DOCS) {
> hitCount++;
> }
> }
> }
>
> long t1 = System.currentTimeMillis();
> System.out.println("Count = " + hitCount + ", time = " +
> (t1 - t0) + " ms");
> }
> }
>
> > In case you're using an old JRE, you might want to try out with a JRE 13
> or
> > more recent. Some of the logic in this lowercase ASCII compression only
> > gets vectorized on JDK13+.
>
> Times for JDK 14.0.2:
>
> -------------------------
> Times for just reading the digest list
> Count = 29459432, time = 2050 ms
> Count = 29459432, time = 2156 ms
> Count = 29459432, time = 1905 ms
> -------------------------
> Times for digest-upper
> Count = 0, time = 24336 ms
> Count = 0, time = 24236 ms
> Count = 0, time = 23986 ms
> -------------------------
> Times for digest-lower
> Count = 0, time = 24440 ms
> Count = 0, time = 23960 ms
> Count = 0, time = 23956 ms
> -------------------------
> Times for digest-no-prefix
> Count = 0, time = 13177 ms
> Count = 0, time = 13095 ms
> Count = 0, time = 13081 ms
> -------------------------
>
>
> Almost a 2:1 speed boost for prefixed timings just by updating the JDK...
>
> The non-prefixed timings seem to be 30% slower than on JDK 8 (WTF?)
> but still win when compared to the prefixed timings alone.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Adrien