Mailing List Archive: Weird HNSW merge performance result

Weird HNSW merge performance result

Oct 10, 2023, 8:07 PM

Post #1 of 5 (61 views)

Hi folks,
I was running the HNSW benchmark today and found some weird results. Want
to share it here and see whether people have any ideas.

The set up is:
the 384 dimension vector that's available in luceneutil, 100k documents.
And lucene main branch.
max_conn=64, fanout=0, beam_width=250

I first tried with the default setting where we use a 1994MB writer buffer,
so with 100k documents, there will be no merge happening and I will have 1
segment at the end.
This gives me 0.755 recall and 101113ms index building time.

Then I tried with 50MB writer buffer and then forcemerge at the last, and
with 100k documents, I'll get several segments (the final index is around
300MB so I guess 5 or 6) before merge, and then merge them into 1 at last.
This gives me 0.692 recall but it took only 81562ms (including 34394ms
doing the merge) to index.
I have also tried disabling the initialize from graph feature (such that
when we merge we always rebuild the whole graph), or change the random
seed, but still get the similar result.

I'm wondering:
1. Why recall drops that much in the later setup?
2. Why index time is way better? I think we still need to rebuild the whole
graph, or maybe it's just because we're using more off-heap memory (and
less heap) when merge (do we?)?

Best
Patrick

Re: Weird HNSW merge performance result [ In reply to ]

jpountz at gmail

Oct 10, 2023, 10:46 PM

Post #2 of 5 (59 views)

Permalink

Regarding building time, did you configure a SerialMergeScheduler?
Otherwise merges run in separate threads, which would explain the speedup
as adding vectors to the graph gets more and more expensive as the size of
the graph increases.

Le mer. 11 oct. 2023, 05:07, Patrick Zhai <zhai7631@gmail.com> a écrit :

> Hi folks,
> I was running the HNSW benchmark today and found some weird results. Want
> to share it here and see whether people have any ideas.
>
> The set up is:
> the 384 dimension vector that's available in luceneutil, 100k documents.
> And lucene main branch.
> max_conn=64, fanout=0, beam_width=250
>
> I first tried with the default setting where we use a 1994MB writer
> buffer, so with 100k documents, there will be no merge happening and I will
> have 1 segment at the end.
> This gives me 0.755 recall and 101113ms index building time.
>
> Then I tried with 50MB writer buffer and then forcemerge at the last, and
> with 100k documents, I'll get several segments (the final index is around
> 300MB so I guess 5 or 6) before merge, and then merge them into 1 at last.
> This gives me 0.692 recall but it took only 81562ms (including 34394ms
> doing the merge) to index.
> I have also tried disabling the initialize from graph feature (such that
> when we merge we always rebuild the whole graph), or change the random
> seed, but still get the similar result.
>
> I'm wondering:
> 1. Why recall drops that much in the later setup?
> 2. Why index time is way better? I think we still need to rebuild the
> whole graph, or maybe it's just because we're using more off-heap memory
> (and less heap) when merge (do we?)?
>
> Best
> Patrick
>

Re: Weird HNSW merge performance result [ In reply to ]

zhai7631 at gmail

Oct 11, 2023, 7:09 AM

Post #3 of 5 (59 views)

Permalink

Hi Adrien,
I'm using the default CMS, but I doubt whether the merge will be triggered
at all in the background. Since no merge policy is changed the default TMP
will likely only merge the segments after they reach 10 I believe? But the
index is about 300M and the buffer size is around 50M so I don't think we
will have enough segments to trigger the merge when I'm building the index?

On Wed, Oct 11, 2023, 02:47 Adrien Grand <jpountz@gmail.com> wrote:

> Regarding building time, did you configure a SerialMergeScheduler?
> Otherwise merges run in separate threads, which would explain the speedup
> as adding vectors to the graph gets more and more expensive as the size of
> the graph increases.
>
> Le mer. 11 oct. 2023, 05:07, Patrick Zhai <zhai7631@gmail.com> a écrit :
>
>> Hi folks,
>> I was running the HNSW benchmark today and found some weird results. Want
>> to share it here and see whether people have any ideas.
>>
>> The set up is:
>> the 384 dimension vector that's available in luceneutil, 100k documents.
>> And lucene main branch.
>> max_conn=64, fanout=0, beam_width=250
>>
>> I first tried with the default setting where we use a 1994MB writer
>> buffer, so with 100k documents, there will be no merge happening and I will
>> have 1 segment at the end.
>> This gives me 0.755 recall and 101113ms index building time.
>>
>> Then I tried with 50MB writer buffer and then forcemerge at the last, and
>> with 100k documents, I'll get several segments (the final index is around
>> 300MB so I guess 5 or 6) before merge, and then merge them into 1 at last.
>> This gives me 0.692 recall but it took only 81562ms (including 34394ms
>> doing the merge) to index.
>> I have also tried disabling the initialize from graph feature (such that
>> when we merge we always rebuild the whole graph), or change the random
>> seed, but still get the similar result.
>>
>> I'm wondering:
>> 1. Why recall drops that much in the later setup?
>> 2. Why index time is way better? I think we still need to rebuild the
>> whole graph, or maybe it's just because we're using more off-heap memory
>> (and less heap) when merge (do we?)?
>>
>> Best
>> Patrick
>>
>

Re: Weird HNSW merge performance result [ In reply to ]

ben.w.trent at gmail

Oct 11, 2023, 8:25 AM

Post #4 of 5 (59 views)

Permalink

Heya Patrick,

What version of Lucene Util are you using? There was a bug where
`forceMerge` was not actually using your configured maxConn & beamWidth.
See: https://github.com/mikemccand/luceneutil/pull/232

Do you have that commit and rebuilt the KnnGraphTester?

On Wed, Oct 11, 2023 at 10:10?AM Patrick Zhai <zhai7631@gmail.com> wrote:

> Hi Adrien,
> I'm using the default CMS, but I doubt whether the merge will be triggered
> at all in the background. Since no merge policy is changed the default TMP
> will likely only merge the segments after they reach 10 I believe? But the
> index is about 300M and the buffer size is around 50M so I don't think we
> will have enough segments to trigger the merge when I'm building the index?
>
> On Wed, Oct 11, 2023, 02:47 Adrien Grand <jpountz@gmail.com> wrote:
>
>> Regarding building time, did you configure a SerialMergeScheduler?
>> Otherwise merges run in separate threads, which would explain the speedup
>> as adding vectors to the graph gets more and more expensive as the size of
>> the graph increases.
>>
>> Le mer. 11 oct. 2023, 05:07, Patrick Zhai <zhai7631@gmail.com> a écrit :
>>
>>> Hi folks,
>>> I was running the HNSW benchmark today and found some weird results.
>>> Want to share it here and see whether people have any ideas.
>>>
>>> The set up is:
>>> the 384 dimension vector that's available in luceneutil, 100k documents.
>>> And lucene main branch.
>>> max_conn=64, fanout=0, beam_width=250
>>>
>>> I first tried with the default setting where we use a 1994MB writer
>>> buffer, so with 100k documents, there will be no merge happening and I will
>>> have 1 segment at the end.
>>> This gives me 0.755 recall and 101113ms index building time.
>>>
>>> Then I tried with 50MB writer buffer and then forcemerge at the last,
>>> and with 100k documents, I'll get several segments (the final index is
>>> around 300MB so I guess 5 or 6) before merge, and then merge them into 1 at
>>> last.
>>> This gives me 0.692 recall but it took only 81562ms (including 34394ms
>>> doing the merge) to index.
>>> I have also tried disabling the initialize from graph feature (such that
>>> when we merge we always rebuild the whole graph), or change the random
>>> seed, but still get the similar result.
>>>
>>> I'm wondering:
>>> 1. Why recall drops that much in the later setup?
>>> 2. Why index time is way better? I think we still need to rebuild the
>>> whole graph, or maybe it's just because we're using more off-heap memory
>>> (and less heap) when merge (do we?)?
>>>
>>> Best
>>> Patrick
>>>
>>

Re: Weird HNSW merge performance result [ In reply to ]

zhai7631 at gmail

Oct 11, 2023, 9:40 AM

Post #5 of 5 (59 views)

Permalink

Hi Ben,
Thanks! I think that's the issue! I was using some old local checkout. I
will try with the latest commit and report back if the results still look
weird.

On Wed, Oct 11, 2023, 12:26 Benjamin Trent <ben.w.trent@gmail.com> wrote:

> Heya Patrick,
>
> What version of Lucene Util are you using? There was a bug where
> `forceMerge` was not actually using your configured maxConn & beamWidth.
> See: https://github.com/mikemccand/luceneutil/pull/232
>
> Do you have that commit and rebuilt the KnnGraphTester?
>
> On Wed, Oct 11, 2023 at 10:10?AM Patrick Zhai <zhai7631@gmail.com> wrote:
>
>> Hi Adrien,
>> I'm using the default CMS, but I doubt whether the merge will be
>> triggered at all in the background. Since no merge policy is changed the
>> default TMP will likely only merge the segments after they reach 10 I
>> believe? But the index is about 300M and the buffer size is around 50M so I
>> don't think we will have enough segments to trigger the merge when I'm
>> building the index?
>>
>> On Wed, Oct 11, 2023, 02:47 Adrien Grand <jpountz@gmail.com> wrote:
>>
>>> Regarding building time, did you configure a SerialMergeScheduler?
>>> Otherwise merges run in separate threads, which would explain the speedup
>>> as adding vectors to the graph gets more and more expensive as the size of
>>> the graph increases.
>>>
>>> Le mer. 11 oct. 2023, 05:07, Patrick Zhai <zhai7631@gmail.com> a écrit :
>>>
>>>> Hi folks,
>>>> I was running the HNSW benchmark today and found some weird results.
>>>> Want to share it here and see whether people have any ideas.
>>>>
>>>> The set up is:
>>>> the 384 dimension vector that's available in luceneutil, 100k
>>>> documents. And lucene main branch.
>>>> max_conn=64, fanout=0, beam_width=250
>>>>
>>>> I first tried with the default setting where we use a 1994MB writer
>>>> buffer, so with 100k documents, there will be no merge happening and I will
>>>> have 1 segment at the end.
>>>> This gives me 0.755 recall and 101113ms index building time.
>>>>
>>>> Then I tried with 50MB writer buffer and then forcemerge at the last,
>>>> and with 100k documents, I'll get several segments (the final index is
>>>> around 300MB so I guess 5 or 6) before merge, and then merge them into 1 at
>>>> last.
>>>> This gives me 0.692 recall but it took only 81562ms (including 34394ms
>>>> doing the merge) to index.
>>>> I have also tried disabling the initialize from graph feature (such
>>>> that when we merge we always rebuild the whole graph), or change the random
>>>> seed, but still get the similar result.
>>>>
>>>> I'm wondering:
>>>> 1. Why recall drops that much in the later setup?
>>>> 2. Why index time is way better? I think we still need to rebuild the
>>>> whole graph, or maybe it's just because we're using more off-heap memory
>>>> (and less heap) when merge (do we?)?
>>>>
>>>> Best
>>>> Patrick
>>>>
>>>