Mailing List Archive: Quantization for vector search

Quantization for vector search

Nov 4, 2023, 1:07 AM

Post #1 of 3 (149 views)

Hi

If I understand correctly some devs are working on introducing
quantization for vector search or at least considering it

https://github.com/apache/lucene/issues/12497

Just being curious what is the status on this resp. is somebody working
on this actively?

It came to my mind, because Cohere recently made their new embedding
model "Embed v3" available

https://txt.cohere.com/introducing-embed-v3/

whereas IIUC, Cohere intends to also provide embeddings optimized for
compression soon.

Nils Reimers recently wrote on LinkedIn:

----
"... what we see on the BioASQ dataset:
4x - 99.99% search quality
16x - 99.9% search quality
32x - 95% search quality
64x - 85% search quality
But it requires that the respective vector DB supports these modes, what
we currently work on with partners."
----

This might be interesting for Lucene as well, resp. I am not sure
whether somebody at Lucene is already working on something like this.

Thanks

Michael

Re: Quantization for vector search [ In reply to ]

ben.w.trent at gmail

Nov 4, 2023, 6:41 AM

Post #2 of 3 (146 views)

Permalink

Hey Michael,

In short, it's being worked on :).

Could you point to the LinkedIN post? Is Nils talking about the model
output quantized output or that their default output is easily compressible
because of how the embeddings are built?

I have done a bad job of linking back against that original issue the work
that is being done:

The initial implementation of adding int8 (really, its int7 because of
signed bytes...): https://github.com/apache/lucene/pull/12582

A significant refactor to make adding new quantized storage easier:
https://github.com/apache/lucene/pull/12729

Lucene already supports folks just giving it signed `byte[]` values. But
this only gets so far. The additional work should get Lucene further down
the road towards better lossy-compression for vectors.

Thanks!

Ben

On Sat, Nov 4, 2023 at 4:07?AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi
>
> If I understand correctly some devs are working on introducing
> quantization for vector search or at least considering it
>
> https://github.com/apache/lucene/issues/12497
>
> Just being curious what is the status on this resp. is somebody working on
> this actively?
>
>
> It came to my mind, because Cohere recently made their new embedding model
> "Embed v3" available
>
> https://txt.cohere.com/introducing-embed-v3/
>
> whereas IIUC, Cohere intends to also provide embeddings optimized for
> compression soon.
>
> Nils Reimers recently wrote on LinkedIn:
>
> ----
> "... what we see on the BioASQ dataset:
> 4x - 99.99% search quality
> 16x - 99.9% search quality
> 32x - 95% search quality
> 64x - 85% search quality
> But it requires that the respective vector DB supports these modes, what
> we currently work on with partners."
> ----
>
> This might be interesting for Lucene as well, resp. I am not sure whether
> somebody at Lucene is already working on something like this.
>
> Thanks
>
> Michael
>

Re: Quantization for vector search [ In reply to ]

michael.wechner at wyona

Nov 4, 2023, 4:05 PM

Post #3 of 3 (143 views)

Permalink

Hi Ben

Am 04.11.23 um 14:41 schrieb Benjamin Trent:
> Hey Michael,
>
> In short, it's being worked on :).

cool, thanks!

>
> Could you point to the LinkedIN post?

https://www.linkedin.com/posts/reimersnils_%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F-%3F%3F-%3F%3F%3F-%3F%3F%3F-activity-7125863813064581120-bO6N/?utm_source=share&utm_medium=member_desktop

> Is Nils talking about the model output quantized output or that their
> default output is easily compressible because of how the embeddings
> are built?

it is not clear to me from the post, but maybe you understand the post
(link above) better

>
> I have done a bad job of linking back against that original issue the
> work that is being done:
>
> The initial implementation of adding int8 (really, its int7 because of
> signed bytes...): https://github.com/apache/lucene/pull/12582
>
> A significant refactor to make adding new quantized storage easier:
> https://github.com/apache/lucene/pull/12729
>
> Lucene already supports folks just giving it signed `byte[]` values.
> But this only gets so far. The additional work should get Lucene
> further down the road towards better lossy-compression for vectors.

very cool, thank you!

All the best

Michael

>
> Thanks!
>
> Ben
>
> On Sat, Nov 4, 2023 at 4:07?AM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>
> Hi
>
> If I understand correctly some devs are working on introducing
> quantization for vector search or at least considering it
>
> https://github.com/apache/lucene/issues/12497
>
> Just being curious what is the status on this resp. is somebody
> working on this actively?
>
>
> It came to my mind, because Cohere recently made their new
> embedding model "Embed v3" available
>
> https://txt.cohere.com/introducing-embed-v3/
>
> whereas IIUC, Cohere intends to also provide embeddings optimized
> for compression soon.
>
> Nils Reimers recently wrote on LinkedIn:
>
> ----
> "... what we see on the BioASQ dataset:
> 4x - 99.99% search quality
> 16x - 99.9% search quality
> 32x - 95% search quality
> 64x - 85% search quality
> But it requires that the respective vector DB supports these
> modes, what we currently work on with partners."
> ----
>
> This might be interesting for Lucene as well, resp. I am not sure
> whether somebody at Lucene is already working on something like this.
>
> Thanks
>
> Michael
>