Mailing List Archive

Massive DNS Trouble today 2021-04-01 17:25 EDT
Details scant so far, but there's something awful going on with DNS for a
large portion of the Internet, especially Azure.

Microsoft just posted this to Twitter via @MSFT365Status:
[image: image.png]
https://twitter.com/MSFT365Status/status/1377738432265396225

But I have vendors using UltraDNS (for example) that are also failing to
resolve, and those servers aren't hosted in Azure.

Investigating now. Early warning. Further reports would be appreciated.

Thanks all!

- Cary
Re: Massive DNS Trouble today 2021-04-01 17:25 EDT [ In reply to ]
Can confirm that all of our Azure DNS was offline for a few minutes. Seems
to be coming back now.

Jeff


On Thu, Apr 1, 2021 at 4:59 PM Cary Wiedemann via Outages <
outages@outages.org> wrote:

> Details scant so far, but there's something awful going on with DNS for a
> large portion of the Internet, especially Azure.
>
> Microsoft just posted this to Twitter via @MSFT365Status:
> [image: image.png]
> https://twitter.com/MSFT365Status/status/1377738432265396225
>
> But I have vendors using UltraDNS (for example) that are also failing to
> resolve, and those servers aren't hosted in Azure.
>
> Investigating now. Early warning. Further reports would be appreciated.
>
> Thanks all!
>
> - Cary
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>


--
Jeff Ollie
The majestik møøse is one of the mäni interesting furry animals in Sweden.
Re: Massive DNS Trouble today 2021-04-01 17:25 EDT [ In reply to ]
Something is wonky for sure.

Was on company VPN, couldn't access Teams at all.

Got off the VPN and now Teams is somewhat back.

--John

On 4/1/21 3:52 PM, Cary Wiedemann via Outages wrote:
> Details scant so far, but there's something awful going on with DNS
> for a large portion of the Internet, especially Azure.
>
> Microsoft just posted this to Twitter via @MSFT365Status:
> image.png
> https://twitter.com/MSFT365Status/status/1377738432265396225
> <https://twitter.com/MSFT365Status/status/1377738432265396225>
>
> But I have vendors using UltraDNS (for example) that are also failing
> to resolve, and those servers aren't hosted in Azure.
>
> Investigating now.  Early warning.  Further reports would be appreciated.
>
> Thanks all!
>
> - Cary
>
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
Re: Massive DNS Trouble today 2021-04-01 17:25 EDT [ In reply to ]
Kind of an extreme April fools' joke.

On Thu, Apr 1, 2021 at 6:24 PM Jeffrey Ollie via Outages <
outages@outages.org> wrote:

> Can confirm that all of our Azure DNS was offline for a few minutes. Seems
> to be coming back now.
>
> Jeff
>
>
> On Thu, Apr 1, 2021 at 4:59 PM Cary Wiedemann via Outages <
> outages@outages.org> wrote:
>
>> Details scant so far, but there's something awful going on with DNS for a
>> large portion of the Internet, especially Azure.
>>
>> Microsoft just posted this to Twitter via @MSFT365Status:
>> [image: image.png]
>> https://twitter.com/MSFT365Status/status/1377738432265396225
>>
>> But I have vendors using UltraDNS (for example) that are also failing to
>> resolve, and those servers aren't hosted in Azure.
>>
>> Investigating now. Early warning. Further reports would be appreciated.
>>
>> Thanks all!
>>
>> - Cary
>> _______________________________________________
>> Outages mailing list
>> Outages@outages.org
>> https://puck.nether.net/mailman/listinfo/outages
>>
>
>
> --
> Jeff Ollie
> The majestik møøse is one of the mäni interesting furry animals in Sweden.
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>


--
Sincerely,

Jason W Kuehl
Cell 920-419-8983
jason.w.kuehl@gmail.com
Re: Massive DNS Trouble today 2021-04-01 17:25 EDT [ In reply to ]
Here is the latest RCA from Microsoft on the DNS issues
STATUS:
RCA
COMMUNICATION:

*Summary of Impact: *Between 21:21 UTC and 22:00 UTC on 1 Apr 2021, Azure
DNS experienced a service availability issue. This resulted in customers
being unable to resolve domain names for services they use, which resulted
in intermittent failures accessing or managing Azure and Microsoft
services. Due to the nature of DNS, the impact of the issue was observed
across multiple regions. Recovery time varied by service, but the majority
of services recovered by 22:30 UTC.

*Root Cause:* Azure DNS servers experienced an anomalous surge in DNS
queries from across the globe targeting a set of domains hosted on Azure.
Normally, Azure’s layers of caches and traffic shaping would mitigate this
surge. In this incident, one specific sequence of events exposed a code
defect in our DNS service that reduced the efficiency of our DNS Edge
caches. As our DNS service became overloaded, DNS clients began frequent
retries of their requests which added workload to the DNS service. Since
client retries are considered legitimate DNS traffic, this traffic was not
dropped by our volumetric spike mitigation systems. This increase in
traffic led to decreased availability of our DNS service.

*Mitigation:* The decrease in service availability triggered our monitoring
systems and engaged our engineers. Our DNS services automatically recovered
themselves by 22:00 UTC. This recovery time exceeded our design goal, and
our engineers prepared additional serving capacity and the ability to
answer DNS queries from the volumetric spike mitigation system in case
further mitigation steps were needed. The majority of services were fully
recovered by 22:30 UTC. Immediately after the incident, we updated the
logic on the volumetric spike mitigation system to protect the DNS service
from excessive retries.

*Next Steps:* We apologize for the impact to affected customers. We are
continuously taking steps to improve the Microsoft Azure Platform and our
processes to help ensure such incidents do not occur in the future. In this
case, this includes (but is not limited to):

- Repair the code defect so that all requests can be efficiently handled
in cache.
- Improve the automatic detection and mitigation of anomalous traffic
patterns.




Catherine Durig


On Thu, Apr 1, 2021, 6:16 PM Jason Kuehl via Outages <outages@outages.org>
wrote:

> Kind of an extreme April fools' joke.
>
> On Thu, Apr 1, 2021 at 6:24 PM Jeffrey Ollie via Outages <
> outages@outages.org> wrote:
>
>> Can confirm that all of our Azure DNS was offline for a few minutes.
>> Seems to be coming back now.
>>
>> Jeff
>>
>>
>> On Thu, Apr 1, 2021 at 4:59 PM Cary Wiedemann via Outages <
>> outages@outages.org> wrote:
>>
>>> Details scant so far, but there's something awful going on with DNS for
>>> a large portion of the Internet, especially Azure.
>>>
>>> Microsoft just posted this to Twitter via @MSFT365Status:
>>> [image: image.png]
>>> https://twitter.com/MSFT365Status/status/1377738432265396225
>>>
>>> But I have vendors using UltraDNS (for example) that are also failing to
>>> resolve, and those servers aren't hosted in Azure.
>>>
>>> Investigating now. Early warning. Further reports would be appreciated.
>>>
>>> Thanks all!
>>>
>>> - Cary
>>> _______________________________________________
>>> Outages mailing list
>>> Outages@outages.org
>>> https://puck.nether.net/mailman/listinfo/outages
>>>
>>
>>
>> --
>> Jeff Ollie
>> The majestik møøse is one of the mäni interesting furry animals in Sweden.
>> _______________________________________________
>> Outages mailing list
>> Outages@outages.org
>> https://puck.nether.net/mailman/listinfo/outages
>>
>
>
> --
> Sincerely,
>
> Jason W Kuehl
> Cell 920-419-8983
> jason.w.kuehl@gmail.com
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>