Mailing List Archive

Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC
Hello

The read-only Gerrit replica[0] will be down for 30 minutes tomorrow (Tue,
16 May 2023) between 13:00–15:00 UTC[1] due to network switch upgrades in
codfw row D[2].

During this window, git reads from the replica will not work.

To my knowledge, this affects bots which rely on the replica for git read
operations.

Apologies for any inconvenience.

Tyler Cipriani (he/him)
Engineering Manager, Release Engineering
Wikimedia Foundation

[0]: <https://wikitech.wikimedia.org/wiki/Gerrit/Replica>
[1]: <https://zonestamp.toolforge.org/1684242019>
[2]: <https://phabricator.wikimedia.org/T335042>
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
This means codesearch will be affected (and won't get updated) and possibly
even will be down during that time.

Best

Am Mo., 15. Mai 2023 um 22:03 Uhr schrieb Tyler Cipriani <
tcipriani@wikimedia.org>:

> Hello
>
> The read-only Gerrit replica[0] will be down for 30 minutes tomorrow (Tue,
> 16 May 2023) between 13:00–15:00 UTC[1] due to network switch upgrades in
> codfw row D[2].
>
> During this window, git reads from the replica will not work.
>
> To my knowledge, this affects bots which rely on the replica for git read
> operations.
>
> Apologies for any inconvenience.
>
> Tyler Cipriani (he/him)
> Engineering Manager, Release Engineering
> Wikimedia Foundation
>
> [0]: <https://wikitech.wikimedia.org/wiki/Gerrit/Replica>
> [1]: <https://zonestamp.toolforge.org/1684242019>
> [2]: <https://phabricator.wikimedia.org/T335042>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



--
Amir (he/him)
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
> This means codesearch will be affected (and won't get updated) and possibly even will be down during that time.

We, at least in my team, would like to switch codesearch (and other
clients) back to just use gerrit.wikimedia.org and not the replica
directly.

Just today we agreed to make a new ticket for specifically this,
because soon we have to reimage the replica to bullseye and add more
downtime.

The reason we did the split in the past was to reduce load on the main
gerrit server but meanwhile first the issue has been fixed in newer
Gerrit
versions and then also just a few days ago we switched to brand new hardware.

So now if anything it should be beefier than before and even without
that it seemed already a thing of the past.

And we pay for this with this issue that the replica becomes a second
production system, with the need for downtimes. It complicates
fail-over scenarios
too and in a way means there is never a passive host when we do DC switch-over.

So yea, I suggest we change the config of codesearch now to use the
main gerrit unless you have concerns about that.

On Mon, May 15, 2023 at 1:18?PM Amir Sarabadani <ladsgroup@gmail.com> wrote:
>
> This means codesearch will be affected (and won't get updated) and possibly even will be down during that time.
>
> Best
>
> Am Mo., 15. Mai 2023 um 22:03 Uhr schrieb Tyler Cipriani <tcipriani@wikimedia.org>:
>>
>> Hello
>>
>> The read-only Gerrit replica[0] will be down for 30 minutes tomorrow (Tue, 16 May 2023) between 13:00–15:00 UTC[1] due to network switch upgrades in codfw row D[2].
>>
>> During this window, git reads from the replica will not work.
>>
>> To my knowledge, this affects bots which rely on the replica for git read operations.
>>
>> Apologies for any inconvenience.
>>
>> Tyler Cipriani (he/him)
>> Engineering Manager, Release Engineering
>> Wikimedia Foundation
>>
>> [0]: <https://wikitech.wikimedia.org/wiki/Gerrit/Replica>
>> [1]: <https://zonestamp.toolforge.org/1684242019>
>> [2]: <https://phabricator.wikimedia.org/T335042>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



--
Daniel Zahn <dzahn@wikimedia.org>
Site Reliability Engineer
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
I have used codesearch to search for the config of codesearch with things like

https://codesearch.wmcloud.org/search/?q=codesearch&files=&excludeFiles=&repos=

I did find the puppet module codesearch and a hound config file in there.

But somehow I have not found yet where the "gerrit-replica" URL s configured.

Do you see it? Could that be in Horizon Hiera instead of the repos maybe?
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
I made the patch: https://gerrit.wikimedia.org/r/c/labs/codesearch/+/919924
I can merge and deploy it soon.

Am Di., 16. Mai 2023 um 00:23 Uhr schrieb Daniel Zahn <dzahn@wikimedia.org>:

> I have used codesearch to search for the config of codesearch with things
> like
>
>
> https://codesearch.wmcloud.org/search/?q=codesearch&files=&excludeFiles=&repos=
>
> I did find the puppet module codesearch and a hound config file in there.
>
> But somehow I have not found yet where the "gerrit-replica" URL s
> configured.
>
> Do you see it? Could that be in Horizon Hiera instead of the repos maybe?
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>


--
Amir (he/him)
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
Thank you Amir! Also here:

https://gerrit.wikimedia.org/r/c/labs/codesearch/+/919925

Sorry for duplicate work, merge / abandon either at will from my side.

But I used commit message for reasoning and needed a ticket, so linked to new:

https://phabricator.wikimedia.org/T336710

Just today in our team meeting it came up and that we should create a
dedicated ticket for it.

So the patch would either be temporary or simply never be reverted,
even after the maintenance.

On Mon, May 15, 2023 at 3:27?PM Amir Sarabadani <ladsgroup@gmail.com> wrote:
>
> I made the patch: https://gerrit.wikimedia.org/r/c/labs/codesearch/+/919924 I can merge and deploy it soon.
>
> Am Di., 16. Mai 2023 um 00:23 Uhr schrieb Daniel Zahn <dzahn@wikimedia.org>:
>>
>> I have used codesearch to search for the config of codesearch with things like
>>
>> https://codesearch.wmcloud.org/search/?q=codesearch&files=&excludeFiles=&repos=
>>
>> I did find the puppet module codesearch and a hound config file in there.
>>
>> But somehow I have not found yet where the "gerrit-replica" URL s configured.
>>
>> Do you see it? Could that be in Horizon Hiera instead of the repos maybe?
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



--
Daniel Zahn <dzahn@wikimedia.org>
Site Reliability Engineer
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
Does this include other uses of gerrit replica? Should extension
distributor be switched to main gerrit?

--
Brian

On Tuesday, May 16, 2023, Daniel Zahn <dzahn@wikimedia.org> wrote:

> > This means codesearch will be affected (and won't get updated) and
> possibly even will be down during that time.
>
> We, at least in my team, would like to switch codesearch (and other
> clients) back to just use gerrit.wikimedia.org and not the replica
> directly.
>
> Just today we agreed to make a new ticket for specifically this,
> because soon we have to reimage the replica to bullseye and add more
> downtime.
>
> The reason we did the split in the past was to reduce load on the main
> gerrit server but meanwhile first the issue has been fixed in newer
> Gerrit
> versions and then also just a few days ago we switched to brand new
> hardware.
>
> So now if anything it should be beefier than before and even without
> that it seemed already a thing of the past.
>
> And we pay for this with this issue that the replica becomes a second
> production system, with the need for downtimes. It complicates
> fail-over scenarios
> too and in a way means there is never a passive host when we do DC
> switch-over.
>
> So yea, I suggest we change the config of codesearch now to use the
> main gerrit unless you have concerns about that.
>
> On Mon, May 15, 2023 at 1:18?PM Amir Sarabadani <ladsgroup@gmail.com>
> wrote:
> >
> > This means codesearch will be affected (and won't get updated) and
> possibly even will be down during that time.
> >
> > Best
> >
> > Am Mo., 15. Mai 2023 um 22:03 Uhr schrieb Tyler Cipriani <
> tcipriani@wikimedia.org>:
> >>
> >> Hello
> >>
> >> The read-only Gerrit replica[0] will be down for 30 minutes tomorrow
> (Tue, 16 May 2023) between 13:00–15:00 UTC[1] due to network switch
> upgrades in codfw row D[2].
> >>
> >> During this window, git reads from the replica will not work.
> >>
> >> To my knowledge, this affects bots which rely on the replica for git
> read operations.
> >>
> >> Apologies for any inconvenience.
> >>
> >> Tyler Cipriani (he/him)
> >> Engineering Manager, Release Engineering
> >> Wikimedia Foundation
> >>
> >> [0]: <https://wikitech.wikimedia.org/wiki/Gerrit/Replica>
> >> [1]: <https://zonestamp.toolforge.org/1684242019>
> >> [2]: <https://phabricator.wikimedia.org/T335042>
> >> _______________________________________________
> >> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> >> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.
> lists.wikimedia.org/
> >
> >
> >
> > --
> > Amir (he/him)
> >
> > _______________________________________________
> > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> > To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> > https://lists.wikimedia.org/postorius/lists/wikitech-l.
> lists.wikimedia.org/
>
>
>
> --
> Daniel Zahn <dzahn@wikimedia.org>
> Site Reliability Engineer
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.
> lists.wikimedia.org/
Re: Gerrit replica downtime (30 minutes) tomorrow Tue, 16 May 13:00-15:00 UTC [ In reply to ]
> Does this include other uses of gerrit replica? Should extension distributor be switched to main gerrit?

Personally I think so, yes. We can make a list of users and discuss
on https://phabricator.wikimedia.org/T336710
I also don't know yet which of the users is causing the highest load
compared to others.

On Mon, May 15, 2023 at 9:12?PM Brian Wolff <bawolff@gmail.com> wrote:
>
> Does this include other uses of gerrit replica? Should extension distributor be switched to main gerrit?
>
> --
> Brian
>
> On Tuesday, May 16, 2023, Daniel Zahn <dzahn@wikimedia.org> wrote:
>>
>> > This means codesearch will be affected (and won't get updated) and possibly even will be down during that time.
>>
>> We, at least in my team, would like to switch codesearch (and other
>> clients) back to just use gerrit.wikimedia.org and not the replica
>> directly.
>>
>> Just today we agreed to make a new ticket for specifically this,
>> because soon we have to reimage the replica to bullseye and add more
>> downtime.
>>
>> The reason we did the split in the past was to reduce load on the main
>> gerrit server but meanwhile first the issue has been fixed in newer
>> Gerrit
>> versions and then also just a few days ago we switched to brand new hardware.
>>
>> So now if anything it should be beefier than before and even without
>> that it seemed already a thing of the past.
>>
>> And we pay for this with this issue that the replica becomes a second
>> production system, with the need for downtimes. It complicates
>> fail-over scenarios
>> too and in a way means there is never a passive host when we do DC switch-over.
>>
>> So yea, I suggest we change the config of codesearch now to use the
>> main gerrit unless you have concerns about that.
>>
>> On Mon, May 15, 2023 at 1:18?PM Amir Sarabadani <ladsgroup@gmail.com> wrote:
>> >
>> > This means codesearch will be affected (and won't get updated) and possibly even will be down during that time.
>> >
>> > Best
>> >
>> > Am Mo., 15. Mai 2023 um 22:03 Uhr schrieb Tyler Cipriani <tcipriani@wikimedia.org>:
>> >>
>> >> Hello
>> >>
>> >> The read-only Gerrit replica[0] will be down for 30 minutes tomorrow (Tue, 16 May 2023) between 13:00–15:00 UTC[1] due to network switch upgrades in codfw row D[2].
>> >>
>> >> During this window, git reads from the replica will not work.
>> >>
>> >> To my knowledge, this affects bots which rely on the replica for git read operations.
>> >>
>> >> Apologies for any inconvenience.
>> >>
>> >> Tyler Cipriani (he/him)
>> >> Engineering Manager, Release Engineering
>> >> Wikimedia Foundation
>> >>
>> >> [0]: <https://wikitech.wikimedia.org/wiki/Gerrit/Replica>
>> >> [1]: <https://zonestamp.toolforge.org/1684242019>
>> >> [2]: <https://phabricator.wikimedia.org/T335042>
>> >> _______________________________________________
>> >> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> >> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>> >
>> >
>> >
>> > --
>> > Amir (he/him)
>> >
>> > _______________________________________________
>> > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> > To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>> > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>>
>>
>> --
>> Daniel Zahn <dzahn@wikimedia.org>
>> Site Reliability Engineer
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



--
Daniel Zahn <dzahn@wikimedia.org>
Site Reliability Engineer
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/