Mailing List Archive

Abstract Schema: An Update
Hello,
If you missed the previous updates, there's the first
<https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/HSVI63YWZP7OOXB46WP2NRH2RWNPYA2I/>and
the second
<https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/TMFVXO46J57BWQHQGZXXICXNY35EVOMI/>one.
This is the last one I'm sending but in a good way.

We now finished migrating all 57 core tables to abstract schema. Now tables.sql
is empty for MySQL
<https://gerrit.wikimedia.org/g/mediawiki/core/+/c60ccf4e6d4932dddc2efd72a8abf6e56243f086/maintenance/tables.sql>.
We will soon remove the tables.sql files and links to them.

We have also cleaned more than hundreds of old schema change files.
<https://phabricator.wikimedia.org/T272199> That enabled us to actually
look for unused sql files and drop tens of unused ones that have not been
used since 2002 <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/668552>,
2004 <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/668761>, or 2005
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/670176> (and much more)
and they got lost in the sheer number of our sql patch files.

The next update is exciting for me. With abstraction in place, we can now
have a proper tracking of drifts between schema in paper and our production
(This is a follow up from a major incident in 2018). Now we have
https://drift-tracker.toolforge.org/ that keeps track of these drifts. Our
schema has been around for more than twenty years and we have hundreds of
database hosts, making sure everything is using the right database schema
(and stays correct) is impossible manually and we have been finding and
fixing these drifts since 2018, see this comment onwards
<https://phabricator.wikimedia.org/T104459#4314828>.). Of course more work
in improving the tracker is welcome (here's the list
<https://phabricator.wikimedia.org/project/board/5350/>)

If you want to enjoy the benefits of abstract schema [1] in extension(s)
you or your team maintains, Please abstract the schema of your extension.
There's a long list of WMF deployed extensions that are not using abstract
schema <https://phabricator.wikimedia.org/T261912> and some even already
have a patch that only needs reviewing. Once that's done, we can add that
to drift tracking and have a more comprehensive list of potential issues.
If you need help with the abstraction work, just ping me.

This also helped us resolve several long-standing tickets like T104459
<https://phabricator.wikimedia.org/T104459> (5.5 years old), T62962
<https://phabricator.wikimedia.org/T62962> (7 years old), and T42626
<https://phabricator.wikimedia.org/T42626> (soon reaching its ninth
birthday) and will help us to address even more tech debt in future.

There is more to be done, improving the abstract schema *change* system,
finding a home of schema documentation, improving the drift tracker and
making it more automated, so much more. But the biggest chunk of work is
now finally done.

I really would like to thank Ammarpad for great work on abstracting the
tables and handling all sorts of edge cases, James Forrester and Tgr for
their reviews which without them this wouldn't be possible and Sam Reed who
wrote a script to speed up migration
<https://github.com/Ladsgroup/db-analyzor-tools/blob/master/db_abstractor.py>.
This was a team work to its core.

[1] In more details, by abstracting you will have automated checks for
dirfits of the schema of extension(s) you maintain and production. You will
have Postgres support for free. Also, you can have automated documentation
generation, ability to test the schema itself, and have better consistency
of your data types (like one datatype for timestamps).

Until the next adventure.
--
Amir (he/him)