Mailing List Archive

Update on abstract schema and schema changes
Hello,
There has been a lot of progress in abstract schema and abstract schema
changes initiative since last time
<https://lists.wikimedia.org/pipermail/wikitech-l/2020-October/093954.html>
I gave an update on it. So here's another one.

*Abstract Schema*
So far, more than 90% (51 out 56) of tables of mediawiki core are now
migrated to abstract schema.

This means much smaller schema drifts between MySQL and Postgres. We have
done more than 250 schema changes in Postgres to fix these drifts.
Including 56 index rename, 66 data type change, setting default to 43
fields and changing nullability of 29 fields. To compare, that's more
schema changes done on Postgres from 2014 until 2020. Once we have migrated
all tables, we can close this four-year old ticket
<https://phabricator.wikimedia.org/T164898>.

Similar improvement has happened on standardizing timestamp fields in MySQL
<https://phabricator.wikimedia.org/T42626>, once all tables are migrated,
we can call this eight-year old ticket done too.

One nice thing about having an abstract schema is that you can generate
documentation automatically, This page is completely made
<https://www.mediawiki.org/w/index.php?title=User:Ladsgroup/Test&oldid=4379555>
automatically from tables.json. We can make it generated in
doc.wikimedia.org on every merge. And also we can make the database layout
diagram
<https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&action=render>
created automatically.

Another nice thing. When you have an abstract schema, you can easily write
tests and enforce database conventions. For example, you can write a test
to make sure all tables have exactly five columns (because five is your
lucky number). We haven't written such a test but now there's a test that
enforces a uniform prefix for columns and indexes of tables in core
<https://phabricator.wikimedia.org/T270033>. We are currently fixing its
violations to standardize our schema even more.

I'm planning to make reporting on drifts between the abstract schema and
our production completely automated and make it accessible to DBAs for
further investigations which is now much easier thanks to abstract schema.
You can follow the progress of that work in this ticket.
<https://phabricator.wikimedia.org/T104459>

*Abstract Schema Changes*
Now we have a new maintenance script, it produces schema change sql files
(aka ALTER TABLE files) based on snapshot of abstract schema of before and
after of a table. Here's an example of an index rename.
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/651176> It would make
creating schema change patches much easier (a little bit of work but you
don't need to know internals of Postgres anymore, it's also less prone to
mistakes)

With approval of RFC to drop support of upgrading from versions older than
two LTS releases, we can now drop hundreds and hundreds of sql files. It
would give us room to breath and audit our sql files to find orphan ones
and improve abstract schema change work. That is currently blocked on this
patch landing. <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/648576>

We will work on reshaping the schema changes in general since its current
checks system is less than optimal, its tests are not very updated and so
much more to do.

*What can we do?*
Glad you asked :D The biggest request I have from people is to migrate
their extensions to abstract schema. There's a list of WMF-deployed
extensions that their schema has not migrated yet
<https://phabricator.wikimedia.org/T261912>. This is doubly important as we
want to build a reporting system for drifts in production and it's not
possible to report these drifts for extensions that their schema has not
migrated yet. So if you or your team maintain an extension from that list,
prioritize migrating that please. Reedy wrote a great script
<https://github.com/Ladsgroup/db-analyzor-tools/blob/master/db_abstractor.py>
that takes a sql file and produces its equivalent abstract schema and it
gives you a good starting point (PR is welcome!). Feel free to add me as a
reviewer to patches of migrating extensions to abstract schema.

Another thing is that if you use postgres for mediawiki, you help testing
our postgres schema by trying master (make sure to take a backup first) and
see if everything is alright.

*Thank you!*
I would really like to thank Ammarpad for migrating lots of tables of core
to abstract schema and handling all sorts of edge cases and doing most of
the work of using uniform prefix tests and fixes. Thanks to James Forrester
for reviewing lots of patches. Thanks Reedy for the script and also
abstracting lots of tables in extensions and also Tgr for helping in
reviews and getting the project going. Also a big thank you to DBAs for doing
a lot more schema changes in production
<https://phabricator.wikimedia.org/tag/blocked-on-schema-change/>. You rock!
An apology is also warranted for breaking update.php on master twice
(caused by yours truly).

Until next update!
--
Amir (he/him)