Mailing List Archive

PayloadDecoder.FLOAT_DECODER
In reviewing SOLR-14787 <https://issues.apache.org/jira/browse/SOLR-14787> ,
with the author verbally, he pointed out there's some duplicated code where
he needed to decode a payload into a float. We
have org.apache.lucene.analysis.payloads.PayloadHelper#decodeFloat(byte[],
int) but lucene/queries doesn't have lucene/analysis-common on the
classpath and he found no better option than to essentially cut and paste
those methods into his code. Neither of us likes that solution very much
because code duplication is not cool, so I went fishing for other options
or examples in the code base (on 8.x) and ran into
PayloadDecoder.FLOAT_DECODER... which looks quite broken, or at least
wildly different:

PayloadDecoder FLOAT_DECODER = bytes -> bytes == null ? 1 :
bytes.bytes[bytes.offset];

this coerces a single byte to a float, but analysis common is encoding via
PayloadHelper like this:

public static byte[] encodeFloat(float payload, byte[] data, int offset){
return encodeInt(Float.floatToIntBits(payload), data, offset);
}

Which can clearly write more than one byte... and decoding via

public static final float decodeFloat(byte [] bytes, int offset){
return Float.intBitsToFloat(decodeInt(bytes, offset));
}

public static final int decodeInt(byte [] bytes, int offset){
return ((bytes[offset] & 0xFF) << 24) | ((bytes[offset + 1] & 0xFF) <<
16)
| ((bytes[offset + 2] & 0xFF) << 8) | (bytes[offset + 3] & 0xFF);
}

Which is clearly expecting 4 bytes... and nothing like FLOAT_DECODER

The scary thing is PayloadDecoder.FLOAT_DECODER seems to be used in
BoostingTermBuilder... and a whole bunch of unit tests, which I presume are
passing on coincidence of handling floats that happen to qualify as small
value integers but I haven't tested that theory yet.

PayloadDecoder.FLOAT_DECODER has been around for a long time, so I thought
I'd solicit opinions before attempting to clean it up. My concept of a
cleanup here would include trying to find a home for the float decoding
logic (from PayloadHelper) that can be seen by both areas and unifying
encode/decode for float payloads to a single location, and also add a unit
test that round trips the encode/decode cycle (which I'm not seeing).

-Gus


http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)
Re: PayloadDecoder.FLOAT_DECODER [ In reply to ]
Unifying encode/decode to a single class makes sense to me.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Sep 4, 2020 at 11:12 AM Gus Heck <gus.heck@gmail.com> wrote:

> In reviewing SOLR-14787 <https://issues.apache.org/jira/browse/SOLR-14787> ,
> with the author verbally, he pointed out there's some duplicated code where
> he needed to decode a payload into a float. We
> have org.apache.lucene.analysis.payloads.PayloadHelper#decodeFloat(byte[],
> int) but lucene/queries doesn't have lucene/analysis-common on the
> classpath and he found no better option than to essentially cut and paste
> those methods into his code. Neither of us likes that solution very much
> because code duplication is not cool, so I went fishing for other options
> or examples in the code base (on 8.x) and ran into
> PayloadDecoder.FLOAT_DECODER... which looks quite broken, or at least
> wildly different:
>
> PayloadDecoder FLOAT_DECODER = bytes -> bytes == null ? 1 :
> bytes.bytes[bytes.offset];
>
> this coerces a single byte to a float, but analysis common is encoding via
> PayloadHelper like this:
>
> public static byte[] encodeFloat(float payload, byte[] data, int offset){
> return encodeInt(Float.floatToIntBits(payload), data, offset);
> }
>
> Which can clearly write more than one byte... and decoding via
>
> public static final float decodeFloat(byte [] bytes, int offset){
> return Float.intBitsToFloat(decodeInt(bytes, offset));
> }
>
> public static final int decodeInt(byte [] bytes, int offset){
> return ((bytes[offset] & 0xFF) << 24) | ((bytes[offset + 1] & 0xFF) <<
> 16)
> | ((bytes[offset + 2] & 0xFF) << 8) | (bytes[offset + 3] &
> 0xFF);
> }
>
> Which is clearly expecting 4 bytes... and nothing like FLOAT_DECODER
>
> The scary thing is PayloadDecoder.FLOAT_DECODER seems to be used in
> BoostingTermBuilder... and a whole bunch of unit tests, which I presume are
> passing on coincidence of handling floats that happen to qualify as small
> value integers but I haven't tested that theory yet.
>
> PayloadDecoder.FLOAT_DECODER has been around for a long time, so I thought
> I'd solicit opinions before attempting to clean it up. My concept of a
> cleanup here would include trying to find a home for the float decoding
> logic (from PayloadHelper) that can be seen by both areas and unifying
> encode/decode for float payloads to a single location, and also add a unit
> test that round trips the encode/decode cycle (which I'm not seeing).
>
> -Gus
>
>
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>