Hi,
We're currently in the process of upgrading the MediaWiki servers to
Debian Buster and expect a performance regression to come with it.
The cause appears to be better Spectre[1] mitigations in the Buster 4.19
kernel, which we can't disable. Most of the effect is seen in code that
ends up invoking syscalls like filemtime, file_get_contents, etc.
I posted some numbers and charts on the Phabricator investigation
ticket[2]. For normal requests it looks like ~5% worse for p50/p75 and
around ~13% for p95/p99. API requests look much worse, at 10% for p50
22% for p75.
What now? We're going to continue with the upgrade as planned, but we
also need help to try and make some performance improvements to reduce
the impact of the regression.
The PHP profiling flamegraphs[3] are a great tool to use to identify
potentially slow spots. We now also have flamegraphs that only contain
Buster requests. I created a set of differential flamegraphs[4] that
compare Stretch vs Buster so you can see what specific areas slowed down.
You can also use WikimediaDebug/XHGui[5] to profile a specific request.
mwdebug1001/mwdebug1002 are Stretch and mwdebug1003 is Buster.
If you have questions or suggestions please ask or let us know. Thanks
to everyone who helped with the investigation and those who've started
working on improvements already.
[1] https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
[2] https://phabricator.wikimedia.org/T273312#6802330
[3] https://performance.wikimedia.org/php-profiling/
[4]
https://people.wikimedia.org/~legoktm/T273312/data/clean/images/flamegraphs/
[5] https://wikitech.wikimedia.org/wiki/WikimediaDebug#Request_profiling
-- Kunal
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
We're currently in the process of upgrading the MediaWiki servers to
Debian Buster and expect a performance regression to come with it.
The cause appears to be better Spectre[1] mitigations in the Buster 4.19
kernel, which we can't disable. Most of the effect is seen in code that
ends up invoking syscalls like filemtime, file_get_contents, etc.
I posted some numbers and charts on the Phabricator investigation
ticket[2]. For normal requests it looks like ~5% worse for p50/p75 and
around ~13% for p95/p99. API requests look much worse, at 10% for p50
22% for p75.
What now? We're going to continue with the upgrade as planned, but we
also need help to try and make some performance improvements to reduce
the impact of the regression.
The PHP profiling flamegraphs[3] are a great tool to use to identify
potentially slow spots. We now also have flamegraphs that only contain
Buster requests. I created a set of differential flamegraphs[4] that
compare Stretch vs Buster so you can see what specific areas slowed down.
You can also use WikimediaDebug/XHGui[5] to profile a specific request.
mwdebug1001/mwdebug1002 are Stretch and mwdebug1003 is Buster.
If you have questions or suggestions please ask or let us know. Thanks
to everyone who helped with the investigation and those who've started
working on improvements already.
[1] https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
[2] https://phabricator.wikimedia.org/T273312#6802330
[3] https://performance.wikimedia.org/php-profiling/
[4]
https://people.wikimedia.org/~legoktm/T273312/data/clean/images/flamegraphs/
[5] https://wikitech.wikimedia.org/wiki/WikimediaDebug#Request_profiling
-- Kunal
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l