Hi all,
I've created an HNSW index implementation that allows for concurrent build
and querying. On my i9-12900 (8 performance cores and 8 efficiency) I get
a bit less than 10x speedup of wall clock time for building and querying
the "siftsmall" and "sift" datasets from http://corpus-texmex.irisa.fr/.
The small dataset is 10k vectors while the large is 1M. This speedup feels
pretty good for a data structure that isn't completely parallelizable, and
it's good to see that it's consistent as the dataset gets larger.
The concurrent classes achieve identical recall compared to the
non-concurrent versions within my ability to test it, and are drop-in
replacements for OnHeapHnswGraph and HnswGraphBuilder; I use threadlocals
to work around the places where the existing API assumes no concurrency.
The concurrent classes also pass the existing test suite with the exception
of the ram usage ones; the estimator doesn't know about AtomicReference
etc. (Big thanks to Michael Sokolov for testAknnDiverse which made it much
easier to track down subtle problems!)
My motivation is
1. It is faster to query a single on-heap hnsw index, than to query
multiple such indexes and combine the result.
2. Even with some contention necessarily occurring during building of the
index, we still come out way ahead in terms of total efficiency vs creating
per-thread indexes and combining them, since combining such indexes boils
down to "pick the largest and then add all the other nodes normally," you
don't really benefit from having computed the others previously.
I am currently adding this to Cassandra as code in our repo, but my
preference would be to upstream it. Is Lucene open to a pull request?
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
I've created an HNSW index implementation that allows for concurrent build
and querying. On my i9-12900 (8 performance cores and 8 efficiency) I get
a bit less than 10x speedup of wall clock time for building and querying
the "siftsmall" and "sift" datasets from http://corpus-texmex.irisa.fr/.
The small dataset is 10k vectors while the large is 1M. This speedup feels
pretty good for a data structure that isn't completely parallelizable, and
it's good to see that it's consistent as the dataset gets larger.
The concurrent classes achieve identical recall compared to the
non-concurrent versions within my ability to test it, and are drop-in
replacements for OnHeapHnswGraph and HnswGraphBuilder; I use threadlocals
to work around the places where the existing API assumes no concurrency.
The concurrent classes also pass the existing test suite with the exception
of the ram usage ones; the estimator doesn't know about AtomicReference
etc. (Big thanks to Michael Sokolov for testAknnDiverse which made it much
easier to track down subtle problems!)
My motivation is
1. It is faster to query a single on-heap hnsw index, than to query
multiple such indexes and combine the result.
2. Even with some contention necessarily occurring during building of the
index, we still come out way ahead in terms of total efficiency vs creating
per-thread indexes and combining them, since combining such indexes boils
down to "pick the largest and then add all the other nodes normally," you
don't really benefit from having computed the others previously.
I am currently adding this to Cassandra as code in our repo, but my
preference would be to upstream it. Is Lucene open to a pull request?
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced