Hi devs,
As I was working on https://github.com/apache/lucene/issues/12513 I needed to compress positive integers which are used to locate postings etc.
To put it concretely, I will need to pack a few values per term contiguously and those values can have different bit-width. For example, consider that we need to encode docFreq and postingsStartOffset per term and docFreq takes 4 bit and the postingsStartOffset takes 6 bit. We expect to write the following for two terms.
```
Term1??????????????????????????????????????????????????????| ????Term2
docFreq(4bit) | postingsStartOffset(6bit) | docFreq(4bit) | postingsStartOffset(6bit)
```
On the read path, I expect to locate the offest for a term first and followed by reading two values that have different bit-width.
In the spirit of not re-inventing necessarily, I tried to explore the existing PackedInts util classes and I believe there is no support for this at the moment. The biggest gap I found is that the existing classes expect to write/read values of same bit-width.
I'm writing to get feedback from yall to see if I missed anything.
Cheers,
Tony X
As I was working on https://github.com/apache/lucene/issues/12513 I needed to compress positive integers which are used to locate postings etc.
To put it concretely, I will need to pack a few values per term contiguously and those values can have different bit-width. For example, consider that we need to encode docFreq and postingsStartOffset per term and docFreq takes 4 bit and the postingsStartOffset takes 6 bit. We expect to write the following for two terms.
```
Term1??????????????????????????????????????????????????????| ????Term2
docFreq(4bit) | postingsStartOffset(6bit) | docFreq(4bit) | postingsStartOffset(6bit)
```
On the read path, I expect to locate the offest for a term first and followed by reading two values that have different bit-width.
In the spirit of not re-inventing necessarily, I tried to explore the existing PackedInts util classes and I believe there is no support for this at the moment. The biggest gap I found is that the existing classes expect to write/read values of same bit-width.
I'm writing to get feedback from yall to see if I missed anything.
Cheers,
Tony X