Mailing List Archive: Slow HNSW creation times.

Slow HNSW creation times.

Apr 19, 2024, 7:16 AM

Post #1 of 4 (19 views)

Greetings,

We are experiencing slow HNSW creation times during index merge. Specifically, we have noticed that the HNSW graph creation becomes progressively slow after reaching a certain size.

Our indexing workflow creates around 60 indices, each containing approximately 500k vectors. The vector dimensions are 768 floats. We then merge all these small indices into a single large index, with a force segment size of 1. During the merge step, the HNSW graph creation starts off with good performance, taking about 15 seconds to process 10k documents. However, once the graph reaches around 7.5m documents, the performance starts to degrade significantly. 10k documents now take about 30 minutes to process, and the processing time continues to increase as the graph becomes larger. We have observed similar performance issues with different setting, M=16 with a beam width of 100, and M=32 with a beam width of 50.

We are using Lucene version 9.8.0 and Java version `openjdk 17.0.3` Our Java heap is set to 30GB, and we do not use any data compression for the vectors. Additionally, we have not observed any long or continuous Garbage Collection pauses.

Greatly appreciate any pointers or thoughts on how to further debug this issue or improve the performance.

Thanks
Kannan Krishnamurthy.

Re: Slow HNSW creation times. [ In reply to ]

jpountz at gmail

Apr 28, 2024, 2:20 PM

Post #2 of 4 (8 views)

Permalink

Hello Kannan,

The fact that adding 10k docs to an empty HNSW graph is faster than adding
10k docs to a large HNSW graph sounds expected to me, but the 120x factor
that you are reporting sounds high. Maybe your dataset is larger than the
size of your page cache, forcing your OS to read vectors from disk directly?

If this doesn't sound right, running your application with a profiler would
help identify your merging bottleneck.

On Fri, Apr 19, 2024 at 4:17?PM Krishnamurthy, Kannan
<Kannan.Krishnamurthy@cengage.com.invalid> wrote:

> Greetings,
>
> We are experiencing slow HNSW creation times during index merge.
> Specifically, we have noticed that the HNSW graph creation becomes
> progressively slow after reaching a certain size.
>
> Our indexing workflow creates around 60 indices, each containing
> approximately 500k vectors. The vector dimensions are 768 floats. We then
> merge all these small indices into a single large index, with a force
> segment size of 1. During the merge step, the HNSW graph creation starts
> off with good performance, taking about 15 seconds to process 10k
> documents. However, once the graph reaches around 7.5m documents, the
> performance starts to degrade significantly. 10k documents now take about
> 30 minutes to process, and the processing time continues to increase as the
> graph becomes larger. We have observed similar performance issues with
> different setting, M=16 with a beam width of 100, and M=32 with a beam
> width of 50.
>
> We are using Lucene version 9.8.0 and Java version `openjdk 17.0.3` Our
> Java heap is set to 30GB, and we do not use any data compression for the
> vectors. Additionally, we have not observed any long or continuous Garbage
> Collection pauses.
>
> Greatly appreciate any pointers or thoughts on how to further debug this
> issue or improve the performance.
>
> Thanks
> Kannan Krishnamurthy.
>
>

--
Adrien

Re: Slow HNSW creation times. [ In reply to ]

uwe at thetaphi

Apr 29, 2024, 5:08 AM

Post #3 of 4 (8 views)

Permalink

Hi,

how much physical RAM has the machine, because 30 GiB heap sounds a lot
to me? If you use so much heap and the remaining physical RAM without
the heap allocation is not able to fit the rest of the total index into
page cache, then it will start to read. This is a usual problem I have
seen at my customers also without vectors. They have a 32 GiB machine
and allocate 28 GiB of heap and then wonder when the IO system drives
crazy and have 120x slowdowns while indexing and searching (in their
case with no vectors, after switching to 2 GiB heap it was also like 120
times faster).

Please run "IOTOP" next to merging to see what happens on the IO system
that is triggered from the Java process.

Uwe

Am 28.04.2024 um 23:20 schrieb Adrien Grand:
> Hello Kannan,
>
> The fact that adding 10k docs to an empty HNSW graph is faster than adding
> 10k docs to a large HNSW graph sounds expected to me, but the 120x factor
> that you are reporting sounds high. Maybe your dataset is larger than the
> size of your page cache, forcing your OS to read vectors from disk directly?
>
> If this doesn't sound right, running your application with a profiler would
> help identify your merging bottleneck.
>
> On Fri, Apr 19, 2024 at 4:17?PM Krishnamurthy, Kannan
> <Kannan.Krishnamurthy@cengage.com.invalid> wrote:
>
>> Greetings,
>>
>> We are experiencing slow HNSW creation times during index merge.
>> Specifically, we have noticed that the HNSW graph creation becomes
>> progressively slow after reaching a certain size.
>>
>> Our indexing workflow creates around 60 indices, each containing
>> approximately 500k vectors. The vector dimensions are 768 floats. We then
>> merge all these small indices into a single large index, with a force
>> segment size of 1. During the merge step, the HNSW graph creation starts
>> off with good performance, taking about 15 seconds to process 10k
>> documents. However, once the graph reaches around 7.5m documents, the
>> performance starts to degrade significantly. 10k documents now take about
>> 30 minutes to process, and the processing time continues to increase as the
>> graph becomes larger. We have observed similar performance issues with
>> different setting, M=16 with a beam width of 100, and M=32 with a beam
>> width of 50.
>>
>> We are using Lucene version 9.8.0 and Java version `openjdk 17.0.3` Our
>> Java heap is set to 30GB, and we do not use any data compression for the
>> vectors. Additionally, we have not observed any long or continuous Garbage
>> Collection pauses.
>>
>> Greatly appreciate any pointers or thoughts on how to further debug this
>> issue or improve the performance.
>>
>> Thanks
>> Kannan Krishnamurthy.
>>
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: [EXTERNAL] Re: Slow HNSW creation times. [ In reply to ]

Kannan.Krishnamurthy at cengage

May 1, 2024, 9:59 AM

Post #4 of 4 (7 views)

Permalink

Thank you, Adrien and Uwe. The box has 32GB of physical memory with 10GB of swap space. The index size without vectors is approximately 100GB, and the index merge of the 60 indexes without vectors takes about 2 hours.
The high heap size is due to other non-lucene operations that occur in the workflow.

When running the merge on a box with 64GB of physical memory and 8GB allocated for the Java heap, the slowness starts around 14M documents (instead of the previous ~7.5M), and the insertion of 10K vectors is reduced to
5 minutes compared to the previous runs on 32GB of memory, which took 30 minutes. There is a correlation between the total free physical memory and HNSW insertion times.

The average nfsiostat I/O read stats during the index creation are as follows:

ops/s: 1100
kB/s: 4800

We read about 1500MB per 10k vector insertion over the 5 minutes, but we are unsure if this is considered bad.
Will soon share insights from the profiler.

Kannan

From: Uwe Schindler <uwe@thetaphi.de>
Date: Monday, April 29, 2024 at 8:08?AM
To: java-user@lucene.apache.org <java-user@lucene.apache.org>
Subject: [EXTERNAL] Re: Slow HNSW creation times.
Hi,

how much physical RAM has the machine, because 30 GiB heap sounds a lot
to me? If you use so much heap and the remaining physical RAM without
the heap allocation is not able to fit the rest of the total index into
page cache, then it will start to read. This is a usual problem I have
seen at my customers also without vectors. They have a 32 GiB machine
and allocate 28 GiB of heap and then wonder when the IO system drives
crazy and have 120x slowdowns while indexing and searching (in their
case with no vectors, after switching to 2 GiB heap it was also like 120
times faster).

Please run "IOTOP" next to merging to see what happens on the IO system
that is triggered from the Java process.

Uwe

Am 28.04.2024 um 23:20 schrieb Adrien Grand:
> Hello Kannan,
>
> The fact that adding 10k docs to an empty HNSW graph is faster than adding
> 10k docs to a large HNSW graph sounds expected to me, but the 120x factor
> that you are reporting sounds high. Maybe your dataset is larger than the
> size of your page cache, forcing your OS to read vectors from disk directly?
>
> If this doesn't sound right, running your application with a profiler would
> help identify your merging bottleneck.
>
> On Fri, Apr 19, 2024 at 4:17?PM Krishnamurthy, Kannan
> <Kannan.Krishnamurthy@cengage.com.invalid> wrote:
>
>> Greetings,
>>
>> We are experiencing slow HNSW creation times during index merge.
>> Specifically, we have noticed that the HNSW graph creation becomes
>> progressively slow after reaching a certain size.
>>
>> Our indexing workflow creates around 60 indices, each containing
>> approximately 500k vectors. The vector dimensions are 768 floats. We then
>> merge all these small indices into a single large index, with a force
>> segment size of 1. During the merge step, the HNSW graph creation starts
>> off with good performance, taking about 15 seconds to process 10k
>> documents. However, once the graph reaches around 7.5m documents, the
>> performance starts to degrade significantly. 10k documents now take about
>> 30 minutes to process, and the processing time continues to increase as the
>> graph becomes larger. We have observed similar performance issues with
>> different setting, M=16 with a beam width of 100, and M=32 with a beam
>> width of 50.
>>
>> We are using Lucene version 9.8.0 and Java version `openjdk 17.0.3` Our
>> Java heap is set to 30GB, and we do not use any data compression for the
>> vectors. Additionally, we have not observed any long or continuous Garbage
>> Collection pauses.
>>
>> Greatly appreciate any pointers or thoughts on how to further debug this
>> issue or improve the performance.
>>
>> Thanks
>> Kannan Krishnamurthy.
>>
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.com/v3/__https://www.thetaphi.de__;!!MXVguWEtGgZw!OCYiQPPqp64MDqOmu1GxcQ1OEREtfG0MlVDZl3FnpE9-lwxnVMHJFPrjtTZCuAxGDM5S_ErPqRy0PNntjfOiXDA$<https://urldefense.com/v3/__https:/www.thetaphi.de__;!!MXVguWEtGgZw!OCYiQPPqp64MDqOmu1GxcQ1OEREtfG0MlVDZl3FnpE9-lwxnVMHJFPrjtTZCuAxGDM5S_ErPqRy0PNntjfOiXDA$>
eMail: uwe@thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org