Mailing List Archive

New DB connect/query methods for MediaWiki
If your code doesn't read or write to databases in MediaWiki, feel free to
ignore this email.

Hello,

We have made changes to MediaWiki's Rdbms library that simplify and improve
how to get database connections and how to perform UPDATE queries. We
encourage you to adopt these and provide feedback here or via Phabricator.

== 1. Get a database connection ==

What changed:

To get a connection to the primary database, you previously needed the
following:

$lbFactory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();

$lbFactory->getMainLB()->getConnection( DB_PRIMARY );

And for a replica connection:

$lbFactory->getMainLB()->getConnection( DB_REPLICA );

You may now adopt the simpler $lbFactory->getPrimaryDatabase() and
$lbFactory->getReplicaDatabase().

In other words, instead of needing knowledge of the internal LoadBalancer,
its getConnection methods, and the DB_ constants, you only call
getPrimaryDatabase() or getReplicaDatabase().

We won’t deprecate LoadBalancer::getConnection() any time soon. If your
code depends on other LoadBalancer methods, feel free to keep your code
as-is. This change is aimed at the most common use case where you only need
a database connection.

The optional parameters have gotten more usable as well.
getPrimaryDatabase() only takes $domain, which means you no longer need to
skip over the non-applicable $groups parameter. getReplicaDatabase() takes
a string for $group (instead of an array in LB::getConnection).

If you maintain service classes that inject a LoadBalancer object, we
recommend transitioning to inject and type against LBFactory instead. This
removes exposure to LB altogether in most cases.

Currently, getPrimaryDatabase/getReplicaDatabase do not yet support
connection to external clusters (LB::getExternalLB). This is coming soon
(see below).

Reasoning:

We have been reducing complexity around LB/LBF for a while (T326274
<https://phabricator.wikimedia.org/T326274>). The distinction between these
is arbitrary in practice, each wears too many hats, and are not well
encapsulated internally.

Most importantly, LB is mostly internal to the Rdbms library. Most
developers only need a connection to perform read or write queries. The
added coupling can lead to mistakes, misunderstandings, and can feel
overwhelming. The LB documentation includes jargon aimed at Rdbms
maintainer, such as references to internal methods like
isReadyForRoundOperations, setTempTablesOnlyMode, getAnyOpenConnection,
appendShutdownCPIndexAsQuery, etc.

The getConnection() signature is fragile and has led to major outages due
to $domain being set incorrectly. E.g. if you need to connect to another
wiki, you have to set $domain in *both* getMainLB() and getConnection(),
otherwise *fireworks in production*. We keep this internally now to prevent
future outages.

Lastly, this change makes it possible to add a narrow typehint dedicated to
read-only connections on replica databases, separate from primary database
connections (more below).

Example adoption:


-

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/891357
-


https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/890469

Adaption is being tracked in T330641
<https://phabricator.wikimedia.org/T330641>

What is next:

getPrimaryDatabase() and getReplicaDatabase() don't support "external"
databases yet (e.g. x1 at WMF
<https://wikitech.wikimedia.org/wiki/MariaDB#Extension_storage>). This is
coming soon. Details and discussion at T330590
<https://phabricator.wikimedia.org/T330590>.

In addition to reducing service dependencies to the ILBFactory typehint, we
are working on an even narrower interface (that ILBFactory extends) called
IConnectionProvider
<https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/libs/rdbms/lbfactory/IConnectionProvider.php>
which only has four methods. It’s marked @internal for now, but that will
change after we hammer out some details.

== 2. IReadableDatabase typehint for replica connections ==

What changed:

When you typehint a parameter or class member to IDatabase where a replica
database is expected (DB_REPLICA), you may change it to the narrower
IReadableDatabas interface.

LBFactory::getReplicaDatabase returns IReadableDatabase.

Reasoning:

MediaWiki developers have at times performed write queries on a replica
connection by mistake, which reached production. While we have effective
protections against data corruption, our only option in production is to
throw a fatal error, which leads to an outage. With this change we move
detection to the very start of the development cycle, verified fully by
static analysis in CI, regardless of test coverage!

The new interface also helps you to self-document in a standard way whether
your service class only reads or also writes to the database. And in IDEs
we will no longer autocomplete write methods on what are known to be
replica connection objects. It’ll also make the coupling between the rest
of MediaWiki and extension with the rdbms library looser
<https://en.wikipedia.org/wiki/Loose_coupling#In_programming>.

Note that the IDatabase::query() is not part of IReadableDatabase. This
method is discouraged as it executes raw SQL that may include write
queries. If you call this method, continue to typehint IDatabase until you
transition to safe methods like select() or SelectQueryBuilder.



Example adoption:


-


https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/892031


== 3. Introducing UpdateQueryBuilder ==

What changed:

In 2020, we introduced a chainable SelectQueryBuilder
<https://www.mediawiki.org/wiki/Manual:Database_access#SelectQueryBuilder>
for SELECT queries. To complement this, we now added a class called
UpdateQueryBuilder, for UPDATE queries. The following existing code uses
update() with unnamed positional parameters:

$db->update(
'image',
[ 'img_name' => 'foo' ],
[ 'img_name' => 'bar' ],
__METHOD__
);


You may now use the fluent style like so:

$db->newUpdateQueryBuilder()

->update( 'image' )

->set( [ 'img_name' => 'foo' ] )

->where( [ 'img_name' => 'bar' ])

->caller( __METHOD__ )
->execute();

Reasoning:

Similar to the SelectQueryBuilder, the fluent style is more readable. It
simplifies building of conditions, is less prone to mistakes, and is more
aligned with widely-used frameworks such as Doctrine (PHP) and Jooq (Java).

We have had at least two data corruption incidents in production due to
passing the wrong order of $set and $where condition (swapping them by
mistake). This reduces the chance of such data corruptions happening, which
alone was worth developing the builder for, before we get to the improved
ergonomics.

Example adoption:


-


https://gerrit.wikimedia.org/r/c/mediawiki/core/+/889627/5/includes/Category/Category.php

Adaption is being tracked in T330640
<https://phabricator.wikimedia.org/T330640>.

What is next:

We will continue to develop query builders for DELETE and INSERT queries.

The work on migrating calls from ::select() to SelectQueryBuilder is
on-going (T311866 <https://phabricator.wikimedia.org/T311866>).

Thank you
--
*Amir Sarabadani (he/him)*
Staff Database Architect
Wikimedia Foundation <https://wikimediafoundation.org/>