Mailing List Archive

March 2024 Switchover completed successfully
Hello everyone,

Please join us in celebrating a very successful Datacenter Switchover. This
switch to our data center in Virginia was run by Effie Mouzeli. Despite
some minor hiccup on Effie's network connection (a similar thing happened
to Clément a year ago, this is starting to become a pattern) it was
completed without a hitch.

For context, the Site Reliability Team (SRE) runs a planned data center
switchover periodically, moving all wikis from our primary data center in
(for this instance, Texas) to the secondary data center (for this instance,
Virginia). This is an important periodic test of our tools and procedures,
to ensure the wikis will continue to be available even in the event of
major technical issues. It also gives all our SRE and ops teams a chance to
do maintenance and upgrades on systems that normally run 24 hours a day.

The switchover process requires a brief read-only period for all
Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March
20th, and lasted 3 minutes and 8 seconds. All our public and private wikis
continued to be available for reading as usual. Users saw a notification of
the upcoming maintenance, and anyone still editing was asked to try again
in a few minutes.

As with the previous Switchover, I 've been trying to discern the effect of
the Switchover in many of the graphs we have to monitor the infrastructure
in https://grafana.wikimedia.org. In many, it's impossible to tell the
event. We consider this very nice and attribute it to various improvements
done throughout the years from many teams, in and outside SRE. The most
discernible graph we have is of the edit rate.

This switchover is our first where we are predominantly on MediaWiki on
Kubernetes, setting a very nice milestone for the project.

As per our newer process, we no longer have a Switchback. We will be
staying in Virginia as our primary data center for the next 6 months,
switching back to Virginia on Wednesday, September 25th.

As always, my deepest thanks to all people that have helped with this, in
one way or another, ranging from the person running point, to all SREs and
developers/deployers participating or having contributed, to people in
Movement Communications for helping with the messaging.

To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator ticket with the datacenter-switchover tag (pre-filled form
here); we'll be monitoring closely for reports of trouble during and after
the switchover. (If you're new to Phab, there's more information at
Phabricator/Help.) The switchover, preparation as well as followup actions
are tracked in Phabricator Task T357547

--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
Re: March 2024 Switchover completed successfully [ In reply to ]
A minor correction, we will be switching to Dallas on Wednesday, September
25th.

On Wed, Mar 20, 2024 at 5:23?PM Alexandros Kosiaris <akosiaris@wikimedia.org>
wrote:

> Hello everyone,
>
> Please join us in celebrating a very successful Datacenter Switchover.
> This switch to our data center in Virginia was run by Effie Mouzeli.
> Despite some minor hiccup on Effie's network connection (a similar thing
> happened to Clément a year ago, this is starting to become a pattern) it
> was completed without a hitch.
>
> For context, the Site Reliability Team (SRE) runs a planned data center
> switchover periodically, moving all wikis from our primary data center in
> (for this instance, Texas) to the secondary data center (for this instance,
> Virginia). This is an important periodic test of our tools and procedures,
> to ensure the wikis will continue to be available even in the event of
> major technical issues. It also gives all our SRE and ops teams a chance to
> do maintenance and upgrades on systems that normally run 24 hours a day.
>
> The switchover process requires a brief read-only period for all
> Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March
> 20th, and lasted 3 minutes and 8 seconds. All our public and private wikis
> continued to be available for reading as usual. Users saw a notification of
> the upcoming maintenance, and anyone still editing was asked to try again
> in a few minutes.
>
> As with the previous Switchover, I 've been trying to discern the effect
> of the Switchover in many of the graphs we have to monitor the
> infrastructure in https://grafana.wikimedia.org. In many, it's impossible
> to tell the event. We consider this very nice and attribute it to various
> improvements done throughout the years from many teams, in and outside SRE.
> The most discernible graph we have is of the edit rate.
>
> This switchover is our first where we are predominantly on MediaWiki on
> Kubernetes, setting a very nice milestone for the project.
>
> As per our newer process, we no longer have a Switchback. We will be
> staying in Virginia as our primary data center for the next 6 months,
> switching back to Virginia on Wednesday, September 25th.
>
> As always, my deepest thanks to all people that have helped with this, in
> one way or another, ranging from the person running point, to all SREs and
> developers/deployers participating or having contributed, to people in
> Movement Communications for helping with the messaging.
>
> To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
> Phabricator ticket with the datacenter-switchover tag (pre-filled form
> here); we'll be monitoring closely for reports of trouble during and after
> the switchover. (If you're new to Phab, there's more information at
> Phabricator/Help.) The switchover, preparation as well as followup actions
> are tracked in Phabricator Task T357547
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation
>


--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation