Mailing List Archive

Production Excellence #32: May 2021
How’d we do in our strive for operational excellence last month? Read on to
find out!

Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/236/
Incidents

Zero incidents recorded in the past month. Yay! That's only five months
after November 2020, the last month without documented incidents (Incident
stats <https://codepen.io/Krinkle/full/wbYMZK>).

Remember to review Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are action items filed after an incident.

-------
Trends

In May, we unfortunately saw a repeat of the worrying pattern we saw in
April
<https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends>,
but with higher numbers. We found 54 new errors. This is the most new
errors in a single month, since the Excellence monthly began three years
ago in 2018. About half of these (29 of 54) remain unresolved as of
writing, two weeks into the following month.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/#trends>

Month-over-month plots based on spreadsheet data
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.

-------
New errors in May

Below is a snapshot of just the 54 new issues
<https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R> found
last month, listed by their code steward
<https://www.mediawiki.org/wiki/Developers/Maintainers>.

Be mindful that the reporting of errors is not itself a negative point
per-se. I think it should be celebrated when teams have good telemetry,
detect their issues early, and address them within their development cycle.
It might be more worrisome when teams lack telemetry or time to find such
issues, or can't keep up with the pace at which issues are found.
Anti Harassment Tools None.
Community Tech None.
Editing Team +2, -1 Cite (T283755
<https://phabricator.wikimedia.org/T283755>); OOUI (T282176
<https://phabricator.wikimedia.org/T282176>).
Growth Team +17, -4 Add-Link (T281960
<https://phabricator.wikimedia.org/T281960>); GrowthExperiments (T281525
<https://phabricator.wikimedia.org/T281525> T281703
<https://phabricator.wikimedia.org/T281703> T283546
<https://phabricator.wikimedia.org/T283546> T283638
<https://phabricator.wikimedia.org/T283638> T283924
<https://phabricator.wikimedia.org/T283924>); Echo (T282446
<https://phabricator.wikimedia.org/T282446>); Recent-changes (T282047
<https://phabricator.wikimedia.org/T282047> T282726
<https://phabricator.wikimedia.org/T282726>); StructuredDiscussions (T281521
<https://phabricator.wikimedia.org/T281521> T281523
<https://phabricator.wikimedia.org/T281523> T281782
<https://phabricator.wikimedia.org/T281782> T281784
<https://phabricator.wikimedia.org/T281784> T282069
<https://phabricator.wikimedia.org/T282069> T282146
<https://phabricator.wikimedia.org/T282146> T282599
<https://phabricator.wikimedia.org/T282599> T282605
<https://phabricator.wikimedia.org/T282605>).
Language Team +1 Translate extension (T283828
<https://phabricator.wikimedia.org/T283828>).
Parsing Team +1 Parsoid (T281932 <https://phabricator.wikimedia.org/T281932>
).
Reading Web None.
Structured Data None.
Product Infra Team +1 WikimediaEvents (T282580
<https://phabricator.wikimedia.org/T282580>).
Analytics None.
Performance Team None.
Platform Engineering +16, -11 MediaWiki-API (T282122
<https://phabricator.wikimedia.org/T282122>); MediaWiki-General (T282173
<https://phabricator.wikimedia.org/T282173>); MediaWiki-Page-derived-data (
T281714 <https://phabricator.wikimedia.org/T281714> T281802
<https://phabricator.wikimedia.org/T281802> T282180
<https://phabricator.wikimedia.org/T282180> T283282
<https://phabricator.wikimedia.org/T283282>), MediaWiki-Revision-backend (
T282145 <https://phabricator.wikimedia.org/T282145> T282723
<https://phabricator.wikimedia.org/T282723> T282825
<https://phabricator.wikimedia.org/T282825> T283170
<https://phabricator.wikimedia.org/T283170>); MediaWiki-User-management (
T283167 <https://phabricator.wikimedia.org/T283167>); MW Expedition (T281526
<https://phabricator.wikimedia.org/T281526> T281981
<https://phabricator.wikimedia.org/T281981> T282038
<https://phabricator.wikimedia.org/T282038> T282181
<https://phabricator.wikimedia.org/T282181> T283196
<https://phabricator.wikimedia.org/T283196>).
Search Platform +3, -2 CirrusSearch (T282036
<https://phabricator.wikimedia.org/T282036> T282207
<https://phabricator.wikimedia.org/T282207>); GeoData (T282735
<https://phabricator.wikimedia.org/T282735>).
WMDE TechWish +2, -1 Revision-Slider (T282067
<https://phabricator.wikimedia.org/T282067>); VisualEditor Template dialog (
T283511 <https://phabricator.wikimedia.org/T283511>).
WMDE Wikidata +3, -1 Wikibase (T282534
<https://phabricator.wikimedia.org/T282534> T283198
<https://phabricator.wikimedia.org/T283198> T283862
<https://phabricator.wikimedia.org/T283862>).
No owner +7, -6 CentralAuth (T282834
<https://phabricator.wikimedia.org/T282834> T283635
<https://phabricator.wikimedia.org/T283635>); Change-tagging (T283098
<https://phabricator.wikimedia.org/T283098> T283099
<https://phabricator.wikimedia.org/T283099>); MapSources (T282833
<https://phabricator.wikimedia.org/T282833>); MediaWiki-Page-information (
T283751 <https://phabricator.wikimedia.org/T283751>); Other (T283252
<https://phabricator.wikimedia.org/T283252>).
-------

Outstanding errors

Take a look at the workboard and look for tasks that could use your help.
? https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:
Aug 2019 (0 of 14 left) ? Last task resolved! -1
Jan 2020 (1 of 7 left) ?? Unchanged (over one year old).
Mar 2020 (2 of 2 left) ?? Unchanged (over one year old).
Apr 2020 (4 of 14 left) ?? One task resolved. -1
May 2020 (5 of 14 left) ?? Unchanged (over one year old).
Jun 2020 (5 of 14 left) ?? Unchanged (over one year old).
Jul 2020 (4 of 24 issues) ? —
Aug 2020 (12 of 53 issues) ?? One task resolved. -1
Sep 2020 (7 of 33 issues) ? —
Oct 2020 (19 of 69 issues) ?? One task resolved. -1
Nov 2020 (8 of 38 issues) ?? One task resolved. -1
Dec 2020 (7 of 33 issues) ? —
Jan 2021 (3 of 50 issues
<https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R>) ? —
Feb 2021 (7 of 20 issues
<https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R>) ?? One
task resolved. -1
Mar 2021 (14 of 48 issues
<https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R>) ??
Four tasks resolved. -4
Apr 2021 (23 of 42 issues
<https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R>) ?? Two
tasks resolved. -2
*May 2021* (29 of 54 issues
<https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R>) 54 new
issues found, of which 29 remain open. +54; -25

-------
Tally
133 issues open, as of Excellence #31
<https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/>
(12 May 2021).
-12 issues closed, of the previous 133 open issues.
+29 new issues that survived May 2021.
150 issues open, as of today (12 June 2021).

-------
Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof
Re: Production Excellence #32: May 2021 [ In reply to ]
Thanks as always for this report, Timo.

One reason the count is higher in May is because that's when the Growth
team began implementing a chores process
<https://www.mediawiki.org/wiki/Growth/Team/Chores> (credit to Readers Web
for the inspiration <https://www.mediawiki.org/wiki/Readers/Web/Chores>) to
systematically review and log production errors that appear on our team
dashboard
<https://logstash.wikimedia.org/app/kibana#/dashboard/AWl4jlZ78aQffZ3Ho7BV>
in Logstash. (We've also implemented a triage process for our inbox
<https://phabricator.wikimedia.org/project/board/1114/>, which used to have
~2000 tasks and is now at 10.) Some of the tasks we've filed from Logstash
are probably duplicates or close relatives of existing production error
tasks, but because we are trying to timebox our triage process, we don't
always succeed in ensuring that we identify existing tasks before filing
new ones.

A bigger problem is how to handle our growing pile of tasks that need some
attention; as a team that's tasked with feature development, making time to
work on maintenance tasks unrelated to the code we touch day-to-day is a
challenge. So, while we are going to be more diligent about filing tasks
when we see issues in Logstash, unless something appears to be badly
broken, it is probably going to stay as an open task.

Kosta

On Mon, Jun 21, 2021 at 4:55 AM Krinkle <krinklemail@gmail.com> wrote:

> How’d we do in our strive for operational excellence last month? Read on
> to find out!
>
> Read on Phabricator at
> https://phabricator.wikimedia.org/phame/post/view/236/
> Incidents
>
> Zero incidents recorded in the past month. Yay! That's only five months
> after November 2020, the last month without documented incidents (Incident
> stats <https://codepen.io/Krinkle/full/wbYMZK>).
>
> Remember to review Preventive measures
> <https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
> which are action items filed after an incident.
>
> -------
> Trends
>
> In May, we unfortunately saw a repeat of the worrying pattern we saw in
> April
> <https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends>,
> but with higher numbers. We found 54 new errors. This is the most new
> errors in a single month, since the Excellence monthly began three years
> ago in 2018. About half of these (29 of 54) remain unresolved as of
> writing, two weeks into the following month.
>
> Figure 1, Figure 2: Unresolved error reports stacked by month.
> <https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/#trends>
>
> Month-over-month plots based on spreadsheet data
> <https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
> .
>
> -------
> New errors in May
>
> Below is a snapshot of just the 54 new issues
> <https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R> found
> last month, listed by their code steward
> <https://www.mediawiki.org/wiki/Developers/Maintainers>.
>
> Be mindful that the reporting of errors is not itself a negative point
> per-se. I think it should be celebrated when teams have good telemetry,
> detect their issues early, and address them within their development cycle.
> It might be more worrisome when teams lack telemetry or time to find such
> issues, or can't keep up with the pace at which issues are found.
> Anti Harassment Tools None.
> Community Tech None.
> Editing Team +2, -1 Cite (T283755
> <https://phabricator.wikimedia.org/T283755>); OOUI (T282176
> <https://phabricator.wikimedia.org/T282176>).
> Growth Team +17, -4 Add-Link (T281960
> <https://phabricator.wikimedia.org/T281960>); GrowthExperiments (T281525
> <https://phabricator.wikimedia.org/T281525> T281703
> <https://phabricator.wikimedia.org/T281703> T283546
> <https://phabricator.wikimedia.org/T283546> T283638
> <https://phabricator.wikimedia.org/T283638> T283924
> <https://phabricator.wikimedia.org/T283924>); Echo (T282446
> <https://phabricator.wikimedia.org/T282446>); Recent-changes (T282047
> <https://phabricator.wikimedia.org/T282047> T282726
> <https://phabricator.wikimedia.org/T282726>); StructuredDiscussions (
> T281521 <https://phabricator.wikimedia.org/T281521> T281523
> <https://phabricator.wikimedia.org/T281523> T281782
> <https://phabricator.wikimedia.org/T281782> T281784
> <https://phabricator.wikimedia.org/T281784> T282069
> <https://phabricator.wikimedia.org/T282069> T282146
> <https://phabricator.wikimedia.org/T282146> T282599
> <https://phabricator.wikimedia.org/T282599> T282605
> <https://phabricator.wikimedia.org/T282605>).
> Language Team +1 Translate extension (T283828
> <https://phabricator.wikimedia.org/T283828>).
> Parsing Team +1 Parsoid (T281932
> <https://phabricator.wikimedia.org/T281932>).
> Reading Web None.
> Structured Data None.
> Product Infra Team +1 WikimediaEvents (T282580
> <https://phabricator.wikimedia.org/T282580>).
> Analytics None.
> Performance Team None.
> Platform Engineering +16, -11 MediaWiki-API (T282122
> <https://phabricator.wikimedia.org/T282122>); MediaWiki-General (T282173
> <https://phabricator.wikimedia.org/T282173>); MediaWiki-Page-derived-data
> (T281714 <https://phabricator.wikimedia.org/T281714> T281802
> <https://phabricator.wikimedia.org/T281802> T282180
> <https://phabricator.wikimedia.org/T282180> T283282
> <https://phabricator.wikimedia.org/T283282>), MediaWiki-Revision-backend (
> T282145 <https://phabricator.wikimedia.org/T282145> T282723
> <https://phabricator.wikimedia.org/T282723> T282825
> <https://phabricator.wikimedia.org/T282825> T283170
> <https://phabricator.wikimedia.org/T283170>); MediaWiki-User-management (
> T283167 <https://phabricator.wikimedia.org/T283167>); MW Expedition (
> T281526 <https://phabricator.wikimedia.org/T281526> T281981
> <https://phabricator.wikimedia.org/T281981> T282038
> <https://phabricator.wikimedia.org/T282038> T282181
> <https://phabricator.wikimedia.org/T282181> T283196
> <https://phabricator.wikimedia.org/T283196>).
> Search Platform +3, -2 CirrusSearch (T282036
> <https://phabricator.wikimedia.org/T282036> T282207
> <https://phabricator.wikimedia.org/T282207>); GeoData (T282735
> <https://phabricator.wikimedia.org/T282735>).
> WMDE TechWish +2, -1 Revision-Slider (T282067
> <https://phabricator.wikimedia.org/T282067>); VisualEditor Template
> dialog (T283511 <https://phabricator.wikimedia.org/T283511>).
> WMDE Wikidata +3, -1 Wikibase (T282534
> <https://phabricator.wikimedia.org/T282534> T283198
> <https://phabricator.wikimedia.org/T283198> T283862
> <https://phabricator.wikimedia.org/T283862>).
> No owner +7, -6 CentralAuth (T282834
> <https://phabricator.wikimedia.org/T282834> T283635
> <https://phabricator.wikimedia.org/T283635>); Change-tagging (T283098
> <https://phabricator.wikimedia.org/T283098> T283099
> <https://phabricator.wikimedia.org/T283099>); MapSources (T282833
> <https://phabricator.wikimedia.org/T282833>); MediaWiki-Page-information (
> T283751 <https://phabricator.wikimedia.org/T283751>); Other (T283252
> <https://phabricator.wikimedia.org/T283252>).
> -------
>
> Outstanding errors
>
> Take a look at the workboard and look for tasks that could use your help.
> ? https://phabricator.wikimedia.org/tag/wikimedia-production-error/
>
> Summary over recent months:
> Aug 2019 (0 of 14 left) ? Last task resolved! -1
> Jan 2020 (1 of 7 left) ?? Unchanged (over one year old).
> Mar 2020 (2 of 2 left) ?? Unchanged (over one year old).
> Apr 2020 (4 of 14 left) ?? One task resolved. -1
> May 2020 (5 of 14 left) ?? Unchanged (over one year old).
> Jun 2020 (5 of 14 left) ?? Unchanged (over one year old).
> Jul 2020 (4 of 24 issues) ? —
> Aug 2020 (12 of 53 issues) ?? One task resolved. -1
> Sep 2020 (7 of 33 issues) ? —
> Oct 2020 (19 of 69 issues) ?? One task resolved. -1
> Nov 2020 (8 of 38 issues) ?? One task resolved. -1
> Dec 2020 (7 of 33 issues) ? —
> Jan 2021 (3 of 50 issues
> <https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R>) ? —
> Feb 2021 (7 of 20 issues
> <https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R>) ??
> One task resolved. -1
> Mar 2021 (14 of 48 issues
> <https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R>) ??
> Four tasks resolved. -4
> Apr 2021 (23 of 42 issues
> <https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R>) ??
> Two tasks resolved. -2
> *May 2021* (29 of 54 issues
> <https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R>) 54
> new issues found, of which 29 remain open. +54; -25
>
> -------
> Tally
> 133 issues open, as of Excellence #31
> <https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/>
> (12 May 2021).
> -12 issues closed, of the previous 133 open issues.
> +29 new issues that survived May 2021.
> 150 issues open, as of today (12 June 2021).
>
> -------
> Thanks!
>
> Thank you to everyone who helped by reporting, investigating, or resolving
> problems in Wikimedia production. Thanks!
>
> Until next time,
>
> – Timo Tijhof
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/