Mailing List Archive

Facebook Engineering on today's outage
http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919

Apparently, our surmise about Akamai notwithstanding, the problem was actually
internal to their app-specific caching facilities, which went into Sorcerer's
Apprentice mode, and they had to kill them all and let ghod sort them out.

More if I get it; hope that posting's public.

Cheers,
-- jra

--
Jay R. Ashworth Baylink jra at baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com '87 e24
St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

Start a man a fire, and he'll be warm all night.
Set a man on fire, and he'll be warm for the rest of his life.
Facebook Engineering on today's outage [ In reply to ]
Agreed; my reading of this suggests database caching issues (i.e. all the frontend/middleware clients hitting the main sql cluster at once instead of the memcached farm they normally use), not HTTP/CDN caching issues.

-C

On Sep 23, 2010, at 7:17 12PM, Jay R. Ashworth wrote:

> http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919
>
> Apparently, our surmise about Akamai notwithstanding, the problem was actually
> internal to their app-specific caching facilities, which went into Sorcerer's
> Apprentice mode, and they had to kill them all and let ghod sort them out.
>
> More if I get it; hope that posting's public.
>
> Cheers,
> -- jra
>
> --
> Jay R. Ashworth Baylink jra at baylink.com
> Designer The Things I Think RFC 2100
> Ashworth & Associates http://baylink.pitas.com '87 e24
> St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
>
> Start a man a fire, and he'll be warm all night.
> Set a man on fire, and he'll be warm for the rest of his life.
>
Facebook Engineering on today's outage [ In reply to ]
On Thu, Sep 23, 2010 at 7:17 PM, Jay R. Ashworth <jra at baylink.com> wrote:
> http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919
>
> Apparently, our surmise about Akamai notwithstanding, the problem was actually
> internal to their app-specific caching facilities, which went into Sorcerer's
> Apprentice mode, and they had to kill them all and let ghod sort them out.
>
> More if I get it; hope that posting's public.

That was a model postmortem. Wish more companies had that sort of
detail and clarity around what went wrong and what was being done to
fix it.

/vijay

>
> Cheers,
> -- jra
>
> --
> Jay R. Ashworth ? ? ? ? ? ? ? ? ? Baylink ? ? ? ? ? ? ? ? ? ? ?jra at baylink.com
> Designer ? ? ? ? ? ? ? ? ? ? The Things I Think ? ? ? ? ? ? ? ? ? ? ? RFC 2100
> Ashworth & Associates ? ? http://baylink.pitas.com ? ? ? ? ? ? ? ? ? ? '87 e24
> St Petersburg FL USA ? ? ?http://photo.imageinc.us ? ? ? ? ? ? +1 727 647 1274
>
> ? ?Start a man a fire, and he'll be warm all night.
> ? ? Set a man on fire, and he'll be warm for the rest of his life.
>
>