Mailing List Archive: Solving mutex concerns with OCSP stapling

Solving mutex concerns with OCSP stapling

May 3, 2015, 6:58 PM

Post #1 of 3 (1820 views)

Your thoughts on the following?

Current OCSP behavior that I think needs to be fixed:

mod_ssl holds the single stapling global mutex when looking up a cached
entry,
deserializing it, checking validity, and (when missing/expired)
communicating
with the OCSP responder to get a new response.

1. mod_ssl shouldn't hold the single stapling global mutex when talking to
the OCSP responder. This will stall ALL initial handshakes in all
stapling-
enabled vhosts, regardless of the certificate they use.
2. For the cache itself, mod_ssl shouldn't hold the single stapling global
mutex when looking up a cached entry unless the socache type requires it
for its own purposes. (memcached and distcache do not require it.)

Assumption: The cache can be shared among different httpd instances (e.g.,
via
memcached) but getting different instances to agree on which instance
refreshes
the cache is not worth handling for now. (Let multiple instances refresh if
the timing is unlucky.)

What must be serialized globally within an httpd instance?

1. If the socache provider requires it: Any access to the stapling cache.
2. A thread claiming responsibility for refreshing the cached entry.

Why no global mutex per certificate?

1. There could be a large number of certificates, and lots of global mutexes
could be very surprising or even require OS tuning with some mutex types.
2. A single mutex is required to interact with the cache anyway (when the
cache requires a mutex).
3. That doesn't resolve the decision of which thread fetches a new response
anyway.

Solution A: Prefetching in a daemon process/thread per httpd instance

The request processing flow would be most unlikely to block for stapling
if a daemon is responsible for maintaining the cache and the request thread
never has to look anything up. That leaves a race between prefetching the
first time and requests hitting the server right after server startup.
(Browsers may report an error to the user when tryLater is returned.)

The daemon would try to renew stapling responses ahead of the time that
the existing response could no longer be used. If it can't, the error
path on the request thread would be the same as the current handling of
an inability to fetch a new response.

Solution B: Fetch on demand largely like current code, but utilize a
separate Fetch mutex

Hold the stapling cache mutex just while reading from/writing to the
cache; grab the Fetch mutex when needing to perform a lookup.
(Once obtaining the Fetch mutex, you'd need to look in the cache again
to see if another request thread did the lookup/store while waiting
for the Fetch mutex.)

By itself this doesn't solve potentially blocking a bunch of initial
handshakes when performing a lookup, but at least it solves blocking
requests that already have a cached response (different certificate)
when performing a lookup.

A fairly simple improvement to this would be to have a small number
of Fetch mutexes, where each certificate maps to a specific fetch
mutex (but not vice versa), so that lookups for multiple certificates
could be done at once. This doesn't solve blocking all initial
handshakes for a certificate that needs a fresh response, or completely
solve blocking those for other certificates that need a fresh response
(since multiple certificates could map to the same Fetch mutex).

Solution C: Hybrid of A and B

The request thread implements solution B but generally a lookup on
the request thread won't be needed since the daemon has already done
the work. But at server startup the daemon and the request threads
might fight over the Fetch mutex until responses for commonly-used
certificates had been obtained/cached. This solves a potential lack
of responses at server startup.

Since the request thread is able to do the work in a pinch, this
lends itself to a "SSLStaplingPrefetch On|Off" directive that could
be used to disable the prefetch daemon.

--
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: Solving mutex concerns with OCSP stapling [ In reply to ]

trawick at gmail

May 6, 2015, 5:19 PM

Post #2 of 3 (1783 views)

Permalink

On 05/03/2015 09:58 PM, Jeff Trawick wrote:
> Your thoughts on the following?
>
> Current OCSP behavior that I think needs to be fixed:
>
> mod_ssl holds the single stapling global mutex when looking up a
> cached entry,
> deserializing it, checking validity, and (when missing/expired)
> communicating
> with the OCSP responder to get a new response.
>
> 1. mod_ssl shouldn't hold the single stapling global mutex when
> talking to
> the OCSP responder. This will stall ALL initial handshakes in all
> stapling-
> enabled vhosts, regardless of the certificate they use.
> 2. For the cache itself, mod_ssl shouldn't hold the single stapling global
> mutex when looking up a cached entry unless the socache type
> requires it
> for its own purposes. (memcached and distcache do not require it.)
>
> Assumption: The cache can be shared among different httpd instances
> (e.g., via
> memcached) but getting different instances to agree on which instance
> refreshes
> the cache is not worth handling for now. (Let multiple instances
> refresh if
> the timing is unlucky.)
>
> What must be serialized globally within an httpd instance?
>
> 1. If the socache provider requires it: Any access to the stapling cache.
> 2. A thread claiming responsibility for refreshing the cached entry.
>
> Why no global mutex per certificate?
>
> 1. There could be a large number of certificates, and lots of global
> mutexes
> could be very surprising or even require OS tuning with some mutex types.
> 2. A single mutex is required to interact with the cache anyway (when the
> cache requires a mutex).
> 3. That doesn't resolve the decision of which thread fetches a new
> response
> anyway.
>
> Solution A: Prefetching in a daemon process/thread per httpd instance
>
> The request processing flow would be most unlikely to block for stapling
> if a daemon is responsible for maintaining the cache and the request
> thread
> never has to look anything up. That leaves a race between prefetching the
> first time and requests hitting the server right after server startup.
> (Browsers may report an error to the user when tryLater is returned.)
>
> The daemon would try to renew stapling responses ahead of the time that
> the existing response could no longer be used. If it can't, the error
> path on the request thread would be the same as the current handling of
> an inability to fetch a new response.
>
> Solution B: Fetch on demand largely like current code, but utilize a
> separate Fetch mutex
>
> Hold the stapling cache mutex just while reading from/writing to the
> cache; grab the Fetch mutex when needing to perform a lookup.
> (Once obtaining the Fetch mutex, you'd need to look in the cache again
> to see if another request thread did the lookup/store while waiting
> for the Fetch mutex.)
>
> By itself this doesn't solve potentially blocking a bunch of initial
> handshakes when performing a lookup, but at least it solves blocking
> requests that already have a cached response (different certificate)
> when performing a lookup.
>
> A fairly simple improvement to this would be to have a small number
> of Fetch mutexes, where each certificate maps to a specific fetch
> mutex (but not vice versa), so that lookups for multiple certificates
> could be done at once. This doesn't solve blocking all initial
> handshakes for a certificate that needs a fresh response, or completely
> solve blocking those for other certificates that need a fresh response
> (since multiple certificates could map to the same Fetch mutex).
>
> Solution C: Hybrid of A and B
>
> The request thread implements solution B but generally a lookup on
> the request thread won't be needed since the daemon has already done
> the work. But at server startup the daemon and the request threads
> might fight over the Fetch mutex until responses for commonly-used
> certificates had been obtained/cached. This solves a potential lack
> of responses at server startup.
>
> Since the request thread is able to do the work in a pinch, this
> lends itself to a "SSLStaplingPrefetch On|Off" directive that could
> be used to disable the prefetch daemon.
>

FWIW I'm just testing solution B for the moment. I think that the
ability to prefetch is needed for the busiest sites to avoid weird
pileups, but B seems necessary anyway.

> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/
>

Re: Solving mutex concerns with OCSP stapling [ In reply to ]

trawick at gmail

May 12, 2015, 12:10 PM

Post #3 of 3 (1747 views)

Permalink

On 05/06/2015 08:19 PM, Jeff Trawick wrote:
> On 05/03/2015 09:58 PM, Jeff Trawick wrote:
>> Your thoughts on the following?
>>
>> Current OCSP behavior that I think needs to be fixed:
>>
>> mod_ssl holds the single stapling global mutex when looking up a
>> cached entry,
>> deserializing it, checking validity, and (when missing/expired)
>> communicating
>> with the OCSP responder to get a new response.
>>
>> 1. mod_ssl shouldn't hold the single stapling global mutex when
>> talking to
>> the OCSP responder. This will stall ALL initial handshakes in all
>> stapling-
>> enabled vhosts, regardless of the certificate they use.
>> 2. For the cache itself, mod_ssl shouldn't hold the single stapling
>> global
>> mutex when looking up a cached entry unless the socache type
>> requires it
>> for its own purposes. (memcached and distcache do not require it.)
>>
>> Assumption: The cache can be shared among different httpd instances
>> (e.g., via
>> memcached) but getting different instances to agree on which instance
>> refreshes
>> the cache is not worth handling for now. (Let multiple instances
>> refresh if
>> the timing is unlucky.)
>>
>> What must be serialized globally within an httpd instance?
>>
>> 1. If the socache provider requires it: Any access to the stapling
>> cache.
>> 2. A thread claiming responsibility for refreshing the cached entry.
>>
>> Why no global mutex per certificate?
>>
>> 1. There could be a large number of certificates, and lots of global
>> mutexes
>> could be very surprising or even require OS tuning with some mutex
>> types.
>> 2. A single mutex is required to interact with the cache anyway (when
>> the
>> cache requires a mutex).
>> 3. That doesn't resolve the decision of which thread fetches a new
>> response
>> anyway.
>>
>> Solution A: Prefetching in a daemon process/thread per httpd instance
>>
>> The request processing flow would be most unlikely to block for stapling
>> if a daemon is responsible for maintaining the cache and the request
>> thread
>> never has to look anything up. That leaves a race between
>> prefetching the
>> first time and requests hitting the server right after server startup.
>> (Browsers may report an error to the user when tryLater is returned.)
>>
>> The daemon would try to renew stapling responses ahead of the time that
>> the existing response could no longer be used. If it can't, the error
>> path on the request thread would be the same as the current handling of
>> an inability to fetch a new response.
>>
>> Solution B: Fetch on demand largely like current code, but utilize a
>> separate Fetch mutex
>>
>> Hold the stapling cache mutex just while reading from/writing to the
>> cache; grab the Fetch mutex when needing to perform a lookup.
>> (Once obtaining the Fetch mutex, you'd need to look in the cache again
>> to see if another request thread did the lookup/store while waiting
>> for the Fetch mutex.)
>>
>> By itself this doesn't solve potentially blocking a bunch of initial
>> handshakes when performing a lookup, but at least it solves blocking
>> requests that already have a cached response (different certificate)
>> when performing a lookup.
>>
>> A fairly simple improvement to this would be to have a small number
>> of Fetch mutexes, where each certificate maps to a specific fetch
>> mutex (but not vice versa), so that lookups for multiple certificates
>> could be done at once. This doesn't solve blocking all initial
>> handshakes for a certificate that needs a fresh response, or completely
>> solve blocking those for other certificates that need a fresh response
>> (since multiple certificates could map to the same Fetch mutex).
>>
>> Solution C: Hybrid of A and B
>>
>> The request thread implements solution B but generally a lookup on
>> the request thread won't be needed since the daemon has already done
>> the work. But at server startup the daemon and the request threads
>> might fight over the Fetch mutex until responses for commonly-used
>> certificates had been obtained/cached. This solves a potential lack
>> of responses at server startup.
>>
>> Since the request thread is able to do the work in a pinch, this
>> lends itself to a "SSLStaplingPrefetch On|Off" directive that could
>> be used to disable the prefetch daemon.
>>
>
> FWIW I'm just testing solution B for the moment. I think that the
> ability to prefetch is needed for the busiest sites to avoid weird
> pileups, but B seems necessary anyway.

r1679032 implements Plan B, without using multiple Fetch mutexes.

Some further thoughts:

Alternative to Plan A for prefetching: A request thread realizes that
stapling response will expire "soon", claims responsibility for
refreshing it so that other request threads don't do so, and does the
work; this avoids another execution thread to perform the prefetch.
Somehow claiming responsibility seems like it will add its own
complication (need stapling cache entry type at front of cert-based
cache key, and one type is response and another type is refresh
responsibility???). r1679032 would still be used when this isn't done,
such as when there isn't demand in time (e.g., mass vhosting, for some
value of "mass").

Plan A could presumably use some "mod_daemon" or similar that lets
modules off-load non-request-related work to a separate child process
(or thread on Windows). mod_ssl_ct is an existing module its own
service work daemon. It doesn't seem so useful to keep adding more and
more ;)

>
>> --
>> Born in Roswell... married an alien...
>> http://emptyhammock.com/
>>
>