Mailing List Archive

strcasecmp raises its...
2022 and we discuss strcasecmp() again?

Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there are several issue around their implementation. Up to this version, they relied on the POSIX strcasecmp(). Whatever their reasons for their change...

Checking our sources, we have ap_cstr_casecmp() that does the right thing. But
- we do not use it everywhere
- it is not part of APR which relies on the POSIX strcasecmp(), esp. apr_table does.

I want to handshake with you regarding this:
1. should we scan our sources for strcasecmp and replace it with ap_cstr_casecmp()?
2. should we talk to the ARP people about their use? I heard that some of them are here as well.

Kind Regards,
Stefan
Re: strcasecmp raises its... [ In reply to ]
On 5/18/22 12:19 PM, Stefan Eissing wrote:
> 2022 and we discuss strcasecmp() again?
>
> Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there are several issue around their implementation. Up to this version, they relied on the POSIX strcasecmp(). Whatever their reasons for their change...
>
> Checking our sources, we have ap_cstr_casecmp() that does the right thing. But
> - we do not use it everywhere
> - it is not part of APR which relies on the POSIX strcasecmp(), esp. apr_table does.

It is, but it may not be used where it possibly should:

https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html

>
> I want to handshake with you regarding this:
> 1. should we scan our sources for strcasecmp and replace it with ap_cstr_casecmp()?

If I remember correctly ap_cstr_casecmp was only designed to be used for comparisons of HTTP protocol strings as it is locale
agnostic. Hence I am not sure if it is correct to use it everywhere. From the documentation:

**
* Perform a case-insensitive comparison of two strings @a str1 and @a str2,
* treating upper and lower case values of the 26 standard C/POSIX alphabetic
* characters as equivalent. Extended latin characters outside of this set
* are treated as unique octets, irrespective of the current locale.

Hence it might be wrong to use it in cases where you need to respect the locale.


Regards

Rüdiger
Re: strcasecmp raises its... [ In reply to ]
On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote:
>
>
> On 5/18/22 12:19 PM, Stefan Eissing wrote:
> > 2022 and we discuss strcasecmp() again?
> >
> > Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there are several issue around their implementation. Up to this version, they relied on the POSIX strcasecmp(). Whatever their reasons for their change...
> >
> > Checking our sources, we have ap_cstr_casecmp() that does the right thing. But
> > - we do not use it everywhere
> > - it is not part of APR which relies on the POSIX strcasecmp(), esp. apr_table does.
>
> It is, but it may not be used where it possibly should:
>
> https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html
>
> >
> > I want to handshake with you regarding this:
> > 1. should we scan our sources for strcasecmp and replace it with ap_cstr_casecmp()?
>
> If I remember correctly ap_cstr_casecmp was only designed to be used for comparisons of HTTP protocol strings as it is locale
> agnostic. Hence I am not sure if it is correct to use it everywhere. From the documentation:
>
> **
> * Perform a case-insensitive comparison of two strings @a str1 and @a str2,
> * treating upper and lower case values of the 26 standard C/POSIX alphabetic
> * characters as equivalent. Extended latin characters outside of this set
> * are treated as unique octets, irrespective of the current locale.
>
> Hence it might be wrong to use it in cases where you need to respect the locale.

Are there really any cases like that in httpd?

I think for httpd it is only safe and sane to run httpd with LANG=C, we
do this in the default service scripts in Fedora/RHEL for a very long
time. Other than the protocol parsing issues you can get in non-C
locales, you can also get "surprises" when sort order can change with
the system locale, impacting e.g. config file load ordering and more.

So IMHO it is probably sufficient & simpler to adjust apachectl to set
LANG=C rather than trying to eliminate strcasecmp, and add another
strcasecmp() reimplementation in APR, in this case.

Regards, Joe
Re: strcasecmp raises its... [ In reply to ]
On 5/18/22 4:55 PM, Joe Orton wrote:
> On Wed, May 18, 2022 at 12:53:57PM +0200, Ruediger Pluem wrote:
>>
>>
>> On 5/18/22 12:19 PM, Stefan Eissing wrote:
>>> 2022 and we discuss strcasecmp() again?
>>>
>>> Background: OpenSSL 3.0.3 added OPENSSL_strcasecmp() and friends and there are several issue around their implementation. Up to this version, they relied on the POSIX strcasecmp(). Whatever their reasons for their change...
>>>
>>> Checking our sources, we have ap_cstr_casecmp() that does the right thing. But
>>> - we do not use it everywhere
>>> - it is not part of APR which relies on the POSIX strcasecmp(), esp. apr_table does.
>>
>> It is, but it may not be used where it possibly should:
>>
>> https://apr.apache.org/docs/apr/1.7/group__apr__cstr.html
>>
>>>
>>> I want to handshake with you regarding this:
>>> 1. should we scan our sources for strcasecmp and replace it with ap_cstr_casecmp()?
>>
>> If I remember correctly ap_cstr_casecmp was only designed to be used for comparisons of HTTP protocol strings as it is locale
>> agnostic. Hence I am not sure if it is correct to use it everywhere. From the documentation:
>>
>> **
>> * Perform a case-insensitive comparison of two strings @a str1 and @a str2,
>> * treating upper and lower case values of the 26 standard C/POSIX alphabetic
>> * characters as equivalent. Extended latin characters outside of this set
>> * are treated as unique octets, irrespective of the current locale.
>>
>> Hence it might be wrong to use it in cases where you need to respect the locale.
>
> Are there really any cases like that in httpd?
>
> I think for httpd it is only safe and sane to run httpd with LANG=C, we
> do this in the default service scripts in Fedora/RHEL for a very long
> time. Other than the protocol parsing issues you can get in non-C
> locales, you can also get "surprises" when sort order can change with
> the system locale, impacting e.g. config file load ordering and more.

Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended
latin characters? I know these Germans with their Umlaute :-).

>
> So IMHO it is probably sufficient & simpler to adjust apachectl to set
> LANG=C rather than trying to eliminate strcasecmp, and add another
> strcasecmp() reimplementation in APR, in this case.

We already have this implementation in APR and we use the
httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places.
The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set.

Regards

Rüdiger
Re: strcasecmp raises its... [ In reply to ]
> On 18 May 2022, at 16:34, Ruediger Pluem <rpluem@apache.org> wrote:
>
> Rüdiger

What locale are YOU in there? Any attempt at locale is going to have to draw lines:
what are the rules for when Ruediger == Rüdiger?

In a WWW (and hence httpd) context, internationalised domain names raise all kinds
of issues, including for us potentially breaking case-insensitivity rules in matching
hostnames, and perhaps other configuration matters. What happens if we make
locale a configurable parameter for hostnames and use strcasecmp_l?

--
Nick Kew
Re: strcasecmp raises its... [ In reply to ]
> Am 18.05.2022 um 19:17 schrieb Nick Kew <niq@apache.org>:
>
>
>> On 18 May 2022, at 16:34, Ruediger Pluem <rpluem@apache.org> wrote:
>>
>> Rüdiger
>
> What locale are YOU in there? Any attempt at locale is going to have to draw lines:
> what are the rules for when Ruediger == Rüdiger?
>
> In a WWW (and hence httpd) context, internationalised domain names raise all kinds
> of issues, including for us potentially breaking case-insensitivity rules in matching
> hostnames, and perhaps other configuration matters. What happens if we make
> locale a configurable parameter for hostnames and use strcasecmp_l?

It is not restricted to that. In the Turkish locale strcasecmp("file", "FILE") can be != 0 ("can be" as POSIX declares it as undefined. But it has been a real world issue in curl in the past.)

If we enforce locale to "C" in apachectl, that seems to solve the issue. However using our own ap_cstr_casecmp in protocol functions seems like a good idea.

Kind Regards,
Stefan

>
> --
> Nick Kew
Re: strcasecmp raises its... [ In reply to ]
On 5/18/22 7:17 PM, Nick Kew wrote:
>
>> On 18 May 2022, at 16:34, Ruediger Pluem <rpluem@apache.org> wrote:
>>
>> Rüdiger
>
> What locale are YOU in there? Any attempt at locale is going to have to draw lines:

de_DE.UTF-8

> what are the rules for when Ruediger == Rüdiger?

There can be none. Because I can transcribe ü to ue , but not every ue is an ü the other way around.

Regards

Rüdiger
Re: strcasecmp raises its... [ In reply to ]
On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
> On 5/18/22 4:55 PM, Joe Orton wrote:
> > I think for httpd it is only safe and sane to run httpd with LANG=C, we
> > do this in the default service scripts in Fedora/RHEL for a very long
> > time. Other than the protocol parsing issues you can get in non-C
> > locales, you can also get "surprises" when sort order can change with
> > the system locale, impacting e.g. config file load ordering and more.
>
> Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended
> latin characters? I know these Germans with their Umlaute :-).

Heh. Well, I got away with it so far :)

> > So IMHO it is probably sufficient & simpler to adjust apachectl to set
> > LANG=C rather than trying to eliminate strcasecmp, and add another
> > strcasecmp() reimplementation in APR, in this case.
>
> We already have this implementation in APR and we use the
> httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places.
> The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set.

Ah, I missed that, thanks.

+1 from me on doing replacement of strcasecmp() with the
locale-insensitive versions then. At least with config options, protocol
data, it is definitely right.

Regards, Joe
Re: strcasecmp raises its... [ In reply to ]
> Am 19.05.2022 um 16:44 schrieb Joe Orton <jorton@redhat.com>:
>
> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
>> On 5/18/22 4:55 PM, Joe Orton wrote:
>>> I think for httpd it is only safe and sane to run httpd with LANG=C, we
>>> do this in the default service scripts in Fedora/RHEL for a very long
>>> time. Other than the protocol parsing issues you can get in non-C
>>> locales, you can also get "surprises" when sort order can change with
>>> the system locale, impacting e.g. config file load ordering and more.
>>
>> Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended
>> latin characters? I know these Germans with their Umlaute :-).
>
> Heh. Well, I got away with it so far :)
>
>>> So IMHO it is probably sufficient & simpler to adjust apachectl to set
>>> LANG=C rather than trying to eliminate strcasecmp, and add another
>>> strcasecmp() reimplementation in APR, in this case.
>>
>> We already have this implementation in APR and we use the
>> httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places.
>> The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set.
>
> Ah, I missed that, thanks.
>
> +1 from me on doing replacement of strcasecmp() with the
> locale-insensitive versions then. At least with config options, protocol
> data, it is definitely right.
>

+1 from me for replacing our protocol+config handling code with the ap_cstr_casecmp().

Cheers,
Stefan
Re: strcasecmp raises its... [ In reply to ]
On 5/19/22 5:15 PM, Stefan Eissing wrote:
>
>
>> Am 19.05.2022 um 16:44 schrieb Joe Orton <jorton@redhat.com>:
>>
>> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
>>> On 5/18/22 4:55 PM, Joe Orton wrote:
>>>> I think for httpd it is only safe and sane to run httpd with LANG=C, we
>>>> do this in the default service scripts in Fedora/RHEL for a very long
>>>> time. Other than the protocol parsing issues you can get in non-C
>>>> locales, you can also get "surprises" when sort order can change with
>>>> the system locale, impacting e.g. config file load ordering and more.
>>>
>>> Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended
>>> latin characters? I know these Germans with their Umlaute :-).
>>
>> Heh. Well, I got away with it so far :)
>>
>>>> So IMHO it is probably sufficient & simpler to adjust apachectl to set
>>>> LANG=C rather than trying to eliminate strcasecmp, and add another
>>>> strcasecmp() reimplementation in APR, in this case.
>>>
>>> We already have this implementation in APR and we use the
>>> httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places.
>>> The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set.
>>
>> Ah, I missed that, thanks.
>>
>> +1 from me on doing replacement of strcasecmp() with the
>> locale-insensitive versions then. At least with config options, protocol
>> data, it is definitely right.
>>
>
> +1 from me for replacing our protocol+config handling code with the ap_cstr_casecmp().

+1. Just to mention: Christophe already did quite some work in this area.

Regards

Rüdiger
Re: strcasecmp raises its... [ In reply to ]
> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem <rpluem@apache.org>:
>
>
>
> On 5/19/22 5:15 PM, Stefan Eissing wrote:
>>
>>
>>> Am 19.05.2022 um 16:44 schrieb Joe Orton <jorton@redhat.com>:
>>>
>>> On Wed, May 18, 2022 at 05:34:22PM +0200, Ruediger Pluem wrote:
>>>> On 5/18/22 4:55 PM, Joe Orton wrote:
>>>>> I think for httpd it is only safe and sane to run httpd with LANG=C, we
>>>>> do this in the default service scripts in Fedora/RHEL for a very long
>>>>> time. Other than the protocol parsing issues you can get in non-C
>>>>> locales, you can also get "surprises" when sort order can change with
>>>>> the system locale, impacting e.g. config file load ordering and more.
>>>>
>>>> Don't you need a locale sensitive case insensitive string comparison in case of case blind file systems which support extended
>>>> latin characters? I know these Germans with their Umlaute :-).
>>>
>>> Heh. Well, I got away with it so far :)
>>>
>>>>> So IMHO it is probably sufficient & simpler to adjust apachectl to set
>>>>> LANG=C rather than trying to eliminate strcasecmp, and add another
>>>>> strcasecmp() reimplementation in APR, in this case.
>>>>
>>>> We already have this implementation in APR and we use the
>>>> httpd one which is just a forward port from APR to httpd until we require a sufficient recent APR version in several places.
>>>> The question is just if we should use them everywhere and thus do the correct thing no matter what locale is set.
>>>
>>> Ah, I missed that, thanks.
>>>
>>> +1 from me on doing replacement of strcasecmp() with the
>>> locale-insensitive versions then. At least with config options, protocol
>>> data, it is definitely right.
>>>
>>
>> +1 from me for replacing our protocol+config handling code with the ap_cstr_casecmp().
>
> +1. Just to mention: Christophe already did quite some work in this area.

Thanks, Christophe.

For my understanding: the code in APR for tables uses strcasecmp() and I am probably just too stupid to see where this is redefined?

Kind Regards,
Stefan

>
> Regards
>
> Rüdiger
Re: strcasecmp raises its... [ In reply to ]
On 5/19/22 7:54 PM, Stefan Eissing wrote:
>
>
>> Am 19.05.2022 um 17:20 schrieb Ruediger Pluem <rpluem@apache.org>:
>>
>>

>>>
>>> +1 from me for replacing our protocol+config handling code with the ap_cstr_casecmp().
>>
>> +1. Just to mention: Christophe already did quite some work in this area.
>
> Thanks, Christophe.
>
> For my understanding: the code in APR for tables uses strcasecmp() and I am probably just too stupid to see where this is redefined?
>

From my point of view apr_tables use strcasecmp and this is not redefined. I guess one of the reasons is that the apr_tables stuff
has been there forever in APR and the apr_cstr stuff only came into APR 2016 and the code of apr_tables was not adjusted after
this. The other reason at least for 1.x might be concerns that switching to apr_cstr might change the behavior of apr_tables in a
way that is incompatible with the versioning rules of APR.
But this is a discussion for dev@apr.

Regards

Rüdiger