Mailing List Archive

[PATCH] cipher: Fix SM3 avx/bmi2 compilation error
* cipher/sm3-avx-bmi2-amd64.S: Fix assembler errors.

--

There are a lot of the following errors compiling with GNU assembler
version 2.27-41:

sm3-avx-bmi2-amd64.S: Assembler messages:
sm3-avx-bmi2-amd64.S:402: Error: 0xf3988a32 out range of signed
32bit displacement

The newer GNU assembler does not have this issue. It is likely that
the old version of the assembler did not handle it well, but in order
to allow libgcrypt to be compiled on more systems, I still fixed this
problem, an additional add operation is added to the lea instruction
to calculate the sum of three elements. I did a benchmark test on an
Intel i5-6200U 2.30GHz CPU and found no significant performance
difference.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
---
cipher/sm3-avx-bmi2-amd64.S | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cipher/sm3-avx-bmi2-amd64.S b/cipher/sm3-avx-bmi2-amd64.S
index 93aecacb..4a075d76 100644
--- a/cipher/sm3-avx-bmi2-amd64.S
+++ b/cipher/sm3-avx-bmi2-amd64.S
@@ -206,7 +206,8 @@ ELF(.size _gcry_sm3_avx2_consts,.-_gcry_sm3_avx2_consts)
/* rol(a, 12) => t0 */ \
roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \
/* rol (t0 + e + t), 7) => t1 */ \
- leal K##round(t0, e, 1), t1; \
+ addl3(t0, e, t1); \
+ addl $K##round, t1; \
roll2(7, t1); \
/* h + w1 => h */ \
addl wtype##_W1_ADDR(round, widx), h; \
--
2.19.1.3.ge56e4f7


_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel@gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
Re: [PATCH] cipher: Fix SM3 avx/bmi2 compilation error [ In reply to ]
Hello,

On 20.12.2021 5.23, Tianjia Zhang via Gcrypt-devel wrote:
> * cipher/sm3-avx-bmi2-amd64.S: Fix assembler errors.
>
> --
>
> There are a lot of the following errors compiling with GNU assembler
> version 2.27-41:
>
> sm3-avx-bmi2-amd64.S: Assembler messages:
> sm3-avx-bmi2-amd64.S:402: Error: 0xf3988a32 out range of signed
> 32bit displacement
>
> The newer GNU assembler does not have this issue. It is likely that
> the old version of the assembler did not handle it well, but in order
> to allow libgcrypt to be compiled on more systems, I still fixed this
> problem, an additional add operation is added to the lea instruction
> to calculate the sum of three elements. I did a benchmark test on an
> Intel i5-6200U 2.30GHz CPU and found no significant performance
> difference.


Thanks for reporting. However, I think this can be fixed by changing
K0-K63 macros from hex-format to signed decimal values. Patch attached.

>
> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
> ---
> cipher/sm3-avx-bmi2-amd64.S | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/cipher/sm3-avx-bmi2-amd64.S b/cipher/sm3-avx-bmi2-amd64.S
> index 93aecacb..4a075d76 100644
> --- a/cipher/sm3-avx-bmi2-amd64.S
> +++ b/cipher/sm3-avx-bmi2-amd64.S
> @@ -206,7 +206,8 @@ ELF(.size _gcry_sm3_avx2_consts,.-_gcry_sm3_avx2_consts)
> /* rol(a, 12) => t0 */ \
> roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on zen3 */ \
> /* rol (t0 + e + t), 7) => t1 */ \
> - leal K##round(t0, e, 1), t1; \
> + addl3(t0, e, t1); \
> + addl $K##round, t1; \

This is 12% slower on AMD Zen3 (from 7.37 cycles/byte to 8.30 cpb).

-Jussi

> roll2(7, t1); \
> /* h + w1 => h */ \
> addl wtype##_W1_ADDR(round, widx), h; \
>
Re: [PATCH] cipher: Fix SM3 avx/bmi2 compilation error [ In reply to ]
Hi Jussi,

On 12/21/21 12:49 AM, Jussi Kivilinna wrote:
> Hello,
>
> On 20.12.2021 5.23, Tianjia Zhang via Gcrypt-devel wrote:
>> * cipher/sm3-avx-bmi2-amd64.S: Fix assembler errors.
>>
>> --
>>
>> There are a lot of the following errors compiling with GNU assembler
>> version 2.27-41:
>>
>>    sm3-avx-bmi2-amd64.S: Assembler messages:
>>    sm3-avx-bmi2-amd64.S:402: Error: 0xf3988a32 out range of signed
>>      32bit displacement
>>
>> The newer GNU assembler does not have this issue. It is likely that
>> the old version of the assembler did not handle it well, but in order
>> to allow libgcrypt to be compiled on more systems, I still fixed this
>> problem, an additional add operation is added to the lea instruction
>> to calculate the sum of three elements. I did a benchmark test on an
>> Intel i5-6200U 2.30GHz CPU and found no significant performance
>> difference.
>
>
> Thanks for reporting. However, I think this can be fixed by changing
> K0-K63 macros from hex-format to signed decimal values. Patch attached.
>

Thanks for your suggestion, this method is feasible, I will try to fix
this issue.

Best regards,
Tianjia

>>
>> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
>> ---
>>   cipher/sm3-avx-bmi2-amd64.S | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/cipher/sm3-avx-bmi2-amd64.S b/cipher/sm3-avx-bmi2-amd64.S
>> index 93aecacb..4a075d76 100644
>> --- a/cipher/sm3-avx-bmi2-amd64.S
>> +++ b/cipher/sm3-avx-bmi2-amd64.S
>> @@ -206,7 +206,8 @@ ELF(.size
>> _gcry_sm3_avx2_consts,.-_gcry_sm3_avx2_consts)
>>           /* rol(a, 12) => t0 */ \
>>             roll3mov(12, a, t0); /* rorxl here would reduce perf by 6%
>> on zen3 */ \
>>           /* rol (t0 + e + t), 7) => t1 */ \
>> -          leal K##round(t0, e, 1), t1; \
>> +          addl3(t0, e, t1); \
>> +          addl $K##round, t1; \
>
> This is 12% slower on AMD Zen3 (from 7.37 cycles/byte to 8.30 cpb).
>
> -Jussi
>
>>             roll2(7, t1); \
>>           /* h + w1 => h */ \
>>             addl wtype##_W1_ADDR(round, widx), h; \
>>

_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel@gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
Re: [PATCH] cipher: Fix SM3 avx/bmi2 compilation error [ In reply to ]
Hi Jussi,

On 12/21/21 12:49 AM, Jussi Kivilinna wrote:
> Hello,
>
> On 20.12.2021 5.23, Tianjia Zhang via Gcrypt-devel wrote:
>> * cipher/sm3-avx-bmi2-amd64.S: Fix assembler errors.
>>
>> --
>>
>> There are a lot of the following errors compiling with GNU assembler
>> version 2.27-41:
>>
>>    sm3-avx-bmi2-amd64.S: Assembler messages:
>>    sm3-avx-bmi2-amd64.S:402: Error: 0xf3988a32 out range of signed
>>      32bit displacement
>>
>> The newer GNU assembler does not have this issue. It is likely that
>> the old version of the assembler did not handle it well, but in order
>> to allow libgcrypt to be compiled on more systems, I still fixed this
>> problem, an additional add operation is added to the lea instruction
>> to calculate the sum of three elements. I did a benchmark test on an
>> Intel i5-6200U 2.30GHz CPU and found no significant performance
>> difference.
>
>
> Thanks for reporting. However, I think this can be fixed by changing
> K0-K63 macros from hex-format to signed decimal values. Patch attached.
>

I'm very sorry, I was too careless, didn't notice that you have attached
the patch.

Best regards,
Tianjia

_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel@gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel