On Fri, Jul 8, 2022 at 5:07 PM <ivan@apache.org> wrote:
>
> Author: ivan
> Date: Fri Jul 8 15:07:00 2022
> New Revision: 1902572
>
> URL: http://svn.apache.org/viewvc?rev=1902572&view=rev
> Log:
> Rewrite ap_regexec() without a thread-local storage context for allocations.
>
> Provide custom malloc() and free() implementations that use a stack buffer
> for first N bytes and then fall back to an ordinary malloc/free().
>
> The key properties of this approach are:
>
> 1) Allocations with PCRE2 happen the same way as they were happening
> with PCRE1 in httpd 2.4.52 and earlier.
>
> 2) There are no malloc()/free() calls for typical cases where the
> match data can be kept on stack.
>
> 3) The patch avoids a malloc() for the match_data structure itself,
> because the match data is allocated with the provided custom malloc()
> function.
>
> 4) Using custom allocation functions should ensure that PCRE is not
> going to use malloc() for any auxiliary allocations, if they are
> necessary.
>
> 5) There is no per-thread state.
>
> References:
> 1) https://lists.apache.org/thread/l6m7dqjkk0yy3tooyd2so0rb20jmtpwd
> 2) https://lists.apache.org/thread/5k9y264whn4f1ll35tvl2164dz0wphvy
The main downside is:
> +#ifndef AP_PCRE_STACKBUF_SIZE
> +#define AP_PCRE_STACKBUF_SIZE (256)
> #endif
[]
> AP_DECLARE(int) ap_regexec(const ap_regex_t *preg, const char *string,
> apr_size_t nmatch, ap_regmatch_t *pmatch,
> int eflags)
> - int small_vector[POSIX_MALLOC_THRESHOLD * 3];
> + /* Use apr_uint64_t to get proper alignment. */
> + apr_uint64_t stack_buf[(AP_PCRE_STACKBUF_SIZE + sizeof(apr_uint64_t) - 1) / sizeof(apr_uint64_t)];
So, if I count correctly, that's >2KB stack usage (regardless of PCRE1
or PCRE2) compared to the 120B we had previously (for PCRE1 only)?
This stack size allows for more than 200 captures for PCR1 (way too
much), so how many captures/complexity/margin do these 2KB represent
for PCRE2 and how much can it be reduced to have PCRE2 not malloc()ate
for the usual (<10 captures) regexes?
(Remember that ap_regexec() can be called anywhere/deep in the call
stack, e.g. mod_rewrite, so 2KB stack is not reasonable IMHO).
Finally, I don't see how it's better than one allocation (or two) for
the whole lifetime of a thread, given that thread_local is available
on all the platforms (where performances matter), neither do I see
what are your grievances against thread_local..
Regards;
Yann.
>
> Author: ivan
> Date: Fri Jul 8 15:07:00 2022
> New Revision: 1902572
>
> URL: http://svn.apache.org/viewvc?rev=1902572&view=rev
> Log:
> Rewrite ap_regexec() without a thread-local storage context for allocations.
>
> Provide custom malloc() and free() implementations that use a stack buffer
> for first N bytes and then fall back to an ordinary malloc/free().
>
> The key properties of this approach are:
>
> 1) Allocations with PCRE2 happen the same way as they were happening
> with PCRE1 in httpd 2.4.52 and earlier.
>
> 2) There are no malloc()/free() calls for typical cases where the
> match data can be kept on stack.
>
> 3) The patch avoids a malloc() for the match_data structure itself,
> because the match data is allocated with the provided custom malloc()
> function.
>
> 4) Using custom allocation functions should ensure that PCRE is not
> going to use malloc() for any auxiliary allocations, if they are
> necessary.
>
> 5) There is no per-thread state.
>
> References:
> 1) https://lists.apache.org/thread/l6m7dqjkk0yy3tooyd2so0rb20jmtpwd
> 2) https://lists.apache.org/thread/5k9y264whn4f1ll35tvl2164dz0wphvy
The main downside is:
> +#ifndef AP_PCRE_STACKBUF_SIZE
> +#define AP_PCRE_STACKBUF_SIZE (256)
> #endif
[]
> AP_DECLARE(int) ap_regexec(const ap_regex_t *preg, const char *string,
> apr_size_t nmatch, ap_regmatch_t *pmatch,
> int eflags)
> - int small_vector[POSIX_MALLOC_THRESHOLD * 3];
> + /* Use apr_uint64_t to get proper alignment. */
> + apr_uint64_t stack_buf[(AP_PCRE_STACKBUF_SIZE + sizeof(apr_uint64_t) - 1) / sizeof(apr_uint64_t)];
So, if I count correctly, that's >2KB stack usage (regardless of PCRE1
or PCRE2) compared to the 120B we had previously (for PCRE1 only)?
This stack size allows for more than 200 captures for PCR1 (way too
much), so how many captures/complexity/margin do these 2KB represent
for PCRE2 and how much can it be reduced to have PCRE2 not malloc()ate
for the usual (<10 captures) regexes?
(Remember that ap_regexec() can be called anywhere/deep in the call
stack, e.g. mod_rewrite, so 2KB stack is not reasonable IMHO).
Finally, I don't see how it's better than one allocation (or two) for
the whole lifetime of a thread, given that thread_local is available
on all the platforms (where performances matter), neither do I see
what are your grievances against thread_local..
Regards;
Yann.