Mailing List Archive

165 Halsey recurring power issues
Hello,

I wanted to get some feedback as to what is considered standard A/B
power setup when data centers sell redundant power.  It has always been
my understanding that A/B power means individually unique and preferably
alternate path connections to disparate UPS units.

A few months ago, 165 Halsey took us down for several hours. They
claimed that a UPS failed causing this issue.  Our natural reaction was
that we have A/B redundant power so a failed UPS on the A circuit should
not take down the cabinet. Joe the facility manager claimed that
industry standard A/B power means two circuits to the same UPS, which
makes no sense to me.

They committed to move us to A/B power with redundant circuits to
disparate UPS units.  However, we had a multi-hour outage again in that
site this weekend. At first glance it seems to be the same problem.

We have checked with all of our other data center providers who have
confirmed A/B power is in fact individually unique connections to
disparate UPS units. 165 Halsey's definition of what constitutes
redundant power seems unique. Why would anyone pay extra for a second
connection to the same UPS?  However, I wanted to get feedback to see if
I am taking crazy pills here ????

None-the-less, we have lost all confidence in this facility.

Best Regards,

Babak
Re: 165 Halsey recurring power issues [ In reply to ]
165 Halsey (and most of its tenant) data centers is an older facility.
Data center practices have changed over the decades, and terminlogy wasn't
standardized until recently.


The biggest FUBAR in telco and data centers is the difference between
"redundancy" and "diversity."

Redundant A/B power feeds are often multiple cables from the same power
source.

Diverse A/B power feeds are cables from different backup power sources
(within limits). 1-utility, 2 battery strings or backup generators.
Often routed through in same conduits/cable trays. But both may be out of
service for scheduled maintenance and some kinds of faults.

Add a spare A/B power feed. Generally a N+1 backup power source and some
additional power switching capability.

Fault tolerant A/B power. Everything from utility to rack is diverse and
redudant (cables, conduit/cable trays, switching equipment and backup
sources). Maintenance can be performed on one of the power feeds without
affecting the other feeds. Does not include redundant utility feed (not
redundant substation, utility).

Cost increase 2x, 5x, 10x

I haven't toured 165 Halsey for 10+ years, so I don't know its current
state. It has multiple tenant data centers, so some may be better than
others.


On Mon, 23 Oct 2023, Babak Pasdar wrote:
> Hello,
>
> I wanted to get some feedback as to what is considered standard A/B power
> setup when data centers sell redundant power.  It has always been my
> understanding that A/B power means individually unique and preferably
> alternate path connections to disparate UPS units.
>
> A few months ago, 165 Halsey took us down for several hours. They claimed
> that a UPS failed causing this issue.  Our natural reaction was that we have
> A/B redundant power so a failed UPS on the A circuit should not take down the
> cabinet. Joe the facility manager claimed that industry standard A/B power
> means two circuits to the same UPS, which makes no sense to me.
>
> They committed to move us to A/B power with redundant circuits to disparate
> UPS units.  However, we had a multi-hour outage again in that site this
> weekend. At first glance it seems to be the same problem.
>
> We have checked with all of our other data center providers who have
> confirmed A/B power is in fact individually unique connections to disparate
> UPS units. 165 Halsey's definition of what constitutes redundant power seems
> unique. Why would anyone pay extra for a second connection to the same UPS? 
> However, I wanted to get feedback to see if I am taking crazy pills here ????
>
> None-the-less, we have lost all confidence in this facility.
>
> Best Regards,
>
> Babak
>
Re: 165 Halsey recurring power issues [ In reply to ]
On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
> I wanted to get some feedback as to what is considered standard A/B
> power setup when data centers sell redundant power.?? It has always been
> my understanding that A/B power means individually unique and preferably
> alternate path connections to disparate UPS units.

Generally speaking, the definition of A/B has become muddied in recent decades. It has almost become an inaccurate marketing term.

Most sane people have the opinion (myself included) that when "A/B" power is offered, it is at minimum offererd as 2N UPS (different building entrance and MSBs and even physically separate UPS rooms are also desired on a true 2N A/B, but may not always be available). Some data center operators go even further and architect load switching within their distribution, thereby preventing single-side/one-leg power outages for customers during most of their power maintenance activities

Some data center operators treat "A/B" as convenience for them to undertake maintenance and offload uptime responsibilities to their own customers, and require them to either undertake their own transfer switching and/or dual-cord every equipment, so that they can keep taking one side of the power system down for repeated maintenance. This does not scale well for retail colo, as not every customer is going to be good at maintaining two PSUs for every single piece of equipment.

Some data centers also view "N+1" system deployment at the UPS as an acceptable form of A/B protection, as long as customer circuits are on different PDUs.

Long story short, whether you're receiving N+1 or 2N or 1N, it's important to inquire about how your power circuits will be architected and delivered by the data center, and either have that codified in the contract or reflected appropriately in SLA offering. There is nothing wrong with the data center providing N+1 or 1N power, as long as they're transparent about it and that it is what you're willing to accept for the right terms. However, simply accepting "we are providing you A/B power" or "we've never had primary power failure" are not sufficient to meet proper due diligence during a site selection process, unless you can accept the site outage occurring from time to time, or you're deploying your own power plant (i.e. DC power and batteries) to supplant data center's own power protection scheme.

James
Re: 165 Halsey recurring power issues [ In reply to ]
Thanks James,

At signup we asked for N+1 power, two circuits to different UPS units. I
think they sliced it thin by connecting us to two battery packs on the
same UPS. When the UPS controller crashed both battery packs went down. 
Which now raises the question -- is it reasonable to have to specify and
expect that two UPS units means that they do not share any common points
of failure.

Is the UPS the battery or the battery and controller combined?

Babak



On 10/23/23 15:16, James Jun wrote:
> On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
>> I wanted to get some feedback as to what is considered standard A/B
>> power setup when data centers sell redundant power.?? It has always been
>> my understanding that A/B power means individually unique and preferably
>> alternate path connections to disparate UPS units.
> Generally speaking, the definition of A/B has become muddied in recent decades. It has almost become an inaccurate marketing term.
>
> Most sane people have the opinion (myself included) that when "A/B" power is offered, it is at minimum offererd as 2N UPS (different building entrance and MSBs and even physically separate UPS rooms are also desired on a true 2N A/B, but may not always be available). Some data center operators go even further and architect load switching within their distribution, thereby preventing single-side/one-leg power outages for customers during most of their power maintenance activities
>
> Some data center operators treat "A/B" as convenience for them to undertake maintenance and offload uptime responsibilities to their own customers, and require them to either undertake their own transfer switching and/or dual-cord every equipment, so that they can keep taking one side of the power system down for repeated maintenance. This does not scale well for retail colo, as not every customer is going to be good at maintaining two PSUs for every single piece of equipment.
>
> Some data centers also view "N+1" system deployment at the UPS as an acceptable form of A/B protection, as long as customer circuits are on different PDUs.
>
> Long story short, whether you're receiving N+1 or 2N or 1N, it's important to inquire about how your power circuits will be architected and delivered by the data center, and either have that codified in the contract or reflected appropriately in SLA offering. There is nothing wrong with the data center providing N+1 or 1N power, as long as they're transparent about it and that it is what you're willing to accept for the right terms. However, simply accepting "we are providing you A/B power" or "we've never had primary power failure" are not sufficient to meet proper due diligence during a site selection process, unless you can accept the site outage occurring from time to time, or you're deploying your own power plant (i.e. DC power and batteries) to supplant data center's own power protection scheme.
>
> James
Re: 165 Halsey recurring power issues [ In reply to ]
On Mon, Oct 23, 2023 at 03:31:21PM -0400, Babak Pasdar wrote:
>
> Is the UPS the battery or the battery and controller combined?

"N+1" nominally means you're connected to the same UPS system/complex, but each of your feed is on a different module. Your other leg will be diverse from a failure in that module, or downstream PDU/panel work fed by that module. A common failure mode in the UPS system itself hosting the different modules can knock out both of your circuits. It sounds like this is how you are configured presently.

"2N" generally means you're connected to completely different UPS system/complex and corresponding distribution systems for each of your circuit. This is ideal configuration for most critical loads.

James
Re: 165 Halsey recurring power issues [ In reply to ]
On Mon, 23 Oct 2023, James Jun wrote:
> "2N" generally means you're connected to completely different UPS system/complex and corresponding distribution systems for each of your circuit. This is ideal configuration for most critical loads.

If you are in a single facility, even one with 2N+2 backups, redundancy,
diversity, etc., it still has shared fate. Clouds with regions and zones
on campuses in Eastern Virginia seem to come up with new and exciting ways
to fail :-)

https://en.wikipedia.org/wiki/Chaos_engineering
Re: 165 Halsey recurring power issues [ In reply to ]
On Mon, Oct 23, 2023 at 7:38?AM Babak Pasdar <babak@pasdar.com> wrote:
> A few months ago, 165 Halsey took us down for several hours. They
> claimed that a UPS failed causing this issue. Our natural reaction was
> that we have A/B redundant power so a failed UPS on the A circuit should
> not take down the cabinet. Joe the facility manager claimed that
> industry standard A/B power means two circuits to the same UPS, which
> makes no sense to me.

If they're being truthful (and many folks are not) then A/B power
means that your power is redundant back to at least two different
UPSes. The UPSes are maintained at under 40% capacity so that a
failure of one doesn't cascade to the other. Ideally these UPSes back
to two different generators, also maintained at 40% of capacity. In
large, fancy data centers they even get power company feeds from two
different substations.

Don't just ask the sales droid. When they deliver the rack or the
cage, ask the data center manager to show you where your power
connections run. If they can't or won't... don't believe them.

"Industry standard" A/B power does NOT mean two circuits to the same
UPS. That's just extra power, not A/B. Joe lied to you.

Incidentally, if you're worried about N+1 redundancy, I assume you're
hosted at more than one data center from more than one vendor?
Buildings and vendors are single points of failure too. Even when
built right, stuff happens.

Regards,
Bill Herrin


--
William Herrin
bill@herrin.us
https://bill.herrin.us/
RE: 165 Halsey recurring power issues [ In reply to ]
If you have been sold "redundant" power and the DC provider has connected both sides to one UPS in any form they are seriously amiss. You should not be expected to know the internal workings of the DC UPS systems and any talk of battery packs (unless you are getting 48v DC) is utterly irrelevant. This DC provider is, in my opinion is very much out of step with reality if they think this is some sort of normal practice.



-----Original Message-----
From: NANOG <nanog-bounces+tony=wicks.co.nz@nanog.org> On Behalf Of Babak Pasdar
Sent: Tuesday, October 24, 2023 8:31 AM
To: James Jun <james.jun@towardex.com>
Cc: nanog@nanog.org
Subject: Re: 165 Halsey recurring power issues

Thanks James,

At signup we asked for N+1 power, two circuits to different UPS units. I think they sliced it thin by connecting us to two battery packs on the same UPS. When the UPS controller crashed both battery packs went down. Which now raises the question -- is it reasonable to have to specify and expect that two UPS units means that they do not share any common points of failure.

Is the UPS the battery or the battery and controller combined?

Babak
Re: 165 Halsey recurring power issues [ In reply to ]
I toured The Planet years ago in Dallas and was told by the sales rep
that A+B power was two circuits from the same PDU. :)

I consider A+B power to be two distinct feeds, separate utility
entrances, separate generators, separate UPS', PDU's, etc.  Past that I
consider things like firewall separation, rated chases and such to be
customer specific requirements.

Aaron

On 10/23/2023 9:38 AM, Babak Pasdar wrote:
> Hello,
>
> I wanted to get some feedback as to what is considered standard A/B
> power setup when data centers sell redundant power.  It has always
> been my understanding that A/B power means individually unique and
> preferably alternate path connections to disparate UPS units.
>
> A few months ago, 165 Halsey took us down for several hours. They
> claimed that a UPS failed causing this issue.  Our natural reaction
> was that we have A/B redundant power so a failed UPS on the A circuit
> should not take down the cabinet. Joe the facility manager claimed
> that industry standard A/B power means two circuits to the same UPS,
> which makes no sense to me.
>
> They committed to move us to A/B power with redundant circuits to
> disparate UPS units.  However, we had a multi-hour outage again in
> that site this weekend. At first glance it seems to be the same problem.
>
> We have checked with all of our other data center providers who have
> confirmed A/B power is in fact individually unique connections to
> disparate UPS units. 165 Halsey's definition of what constitutes
> redundant power seems unique. Why would anyone pay extra for a second
> connection to the same UPS?  However, I wanted to get feedback to see
> if I am taking crazy pills here ????
>
> None-the-less, we have lost all confidence in this facility.
>
> Best Regards,
>
> Babak
Re: 165 Halsey recurring power issues [ In reply to ]
Bulk/high-volume hosting companies, dedicated server companies/small rack
unit count colocation operate on very thin margins. Unless a customer is
paying a LOT more per month they're not economically going to be connected
to true diverse A/B power.

In this case their use of the incorrectly-described A/B was probably
exclusively to handle the (not extremely rare) instances of rackmount
server power supply failures, to give each 1U or 2U size machine, or rack
of blades, two live power supplies with live power feeds. Nothing more
complicated than that.

On Mon, Oct 23, 2023 at 3:34?PM Aaron Wendel <aaron@wholesaleinternet.net>
wrote:

> I toured The Planet years ago in Dallas and was told by the sales rep
> that A+B power was two circuits from the same PDU. :)
>
> I consider A+B power to be two distinct feeds, separate utility
> entrances, separate generators, separate UPS', PDU's, etc. Past that I
> consider things like firewall separation, rated chases and such to be
> customer specific requirements.
>
> Aaron
>
> On 10/23/2023 9:38 AM, Babak Pasdar wrote:
> > Hello,
> >
> > I wanted to get some feedback as to what is considered standard A/B
> > power setup when data centers sell redundant power. It has always
> > been my understanding that A/B power means individually unique and
> > preferably alternate path connections to disparate UPS units.
> >
> > A few months ago, 165 Halsey took us down for several hours. They
> > claimed that a UPS failed causing this issue. Our natural reaction
> > was that we have A/B redundant power so a failed UPS on the A circuit
> > should not take down the cabinet. Joe the facility manager claimed
> > that industry standard A/B power means two circuits to the same UPS,
> > which makes no sense to me.
> >
> > They committed to move us to A/B power with redundant circuits to
> > disparate UPS units. However, we had a multi-hour outage again in
> > that site this weekend. At first glance it seems to be the same problem.
> >
> > We have checked with all of our other data center providers who have
> > confirmed A/B power is in fact individually unique connections to
> > disparate UPS units. 165 Halsey's definition of what constitutes
> > redundant power seems unique. Why would anyone pay extra for a second
> > connection to the same UPS? However, I wanted to get feedback to see
> > if I am taking crazy pills here ????
> >
> > None-the-less, we have lost all confidence in this facility.
> >
> > Best Regards,
> >
> > Babak
>
>
Re: 165 Halsey recurring power issues [ In reply to ]
On Mon, Oct 23, 2023 at 3:56?PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
> Bulk/high-volume hosting companies, dedicated server companies/small
> rack unit count colocation operate on very thin margins. Unless a
> customer is paying a LOT more per month they're not economically
> going to be connected to true diverse A/B power.

Zero sympathy for anyone who advertises A/B power and doesn't at least
have them connected to different UPSs. Don't care how big you are;
don't advertise fake reliability. I don't need "six nines" to make
effective use of your service but if you lie to me, we're done.

Regards,
Bill Herrin



--
William Herrin
bill@herrin.us
https://bill.herrin.us/
Re: 165 Halsey recurring power issues [ In reply to ]
On 10/23/23 15:56, Eric Kuhnke wrote:

> In this case their use of the incorrectly-described A/B was probably
> exclusively to handle the (not extremely rare) instances of rackmount
> server power supply failures, to give each 1U or 2U size machine, or
> rack of blades, two live power supplies with live power feeds. Nothing
> more complicated than that.

And then inevitably the customer will load the rack with dual supply
gear to the point that each feed is pulling over 50% of the breaker rating.

When one of the feeds eventually does have an issue, they'll immediately
pop the breaker on the other one.

--
Jay Hennigan - jay@west.net
Network Engineering - CCIE #7880
503 897-8550 - WB6RDV
Re: 165 Halsey recurring power issues [ In reply to ]
I didn't say that I have sympathy for it but that unfortunately this is
considered acceptable practice within many low-budget "hosting" companies
and probably has been for 15 years. It's a known risk when you're buying a
$50/month "server". Same general category of problem as the OVH datacenter
that caught on fire in France a while back. Anything like that which
becomes a race to the bottom in pricing for product MRC will have
unacceptable corners cut.

I would highly encourage anyone who takes seriously hosting their own stuff
to really know/understand the full infrastructure "underneath" your server
in terms of power and cooling redundancy.

On Mon, Oct 23, 2023 at 4:38?PM William Herrin <bill@herrin.us> wrote:

> On Mon, Oct 23, 2023 at 3:56?PM Eric Kuhnke <eric.kuhnke@gmail.com> wrote:
> > Bulk/high-volume hosting companies, dedicated server companies/small
> > rack unit count colocation operate on very thin margins. Unless a
> > customer is paying a LOT more per month they're not economically
> > going to be connected to true diverse A/B power.
>
> Zero sympathy for anyone who advertises A/B power and doesn't at least
> have them connected to different UPSs. Don't care how big you are;
> don't advertise fake reliability. I don't need "six nines" to make
> effective use of your service but if you lie to me, we're done.
>
> Regards,
> Bill Herrin
>
>
>
> --
> William Herrin
> bill@herrin.us
> https://bill.herrin.us/
>
Re: 165 Halsey recurring power issues [ In reply to ]
At which point one starts looking at the risk factors, if your whole
facility is "redundant", is the power feed coming in from two
geographically diverse substations, via diverse duct banks, into diverse
entry vaults, and diverse risers?

Doesn't eliminate the possibility of the entire building having some
catastrophic emergency, but if you really need to use a singular specific
geographic facility, can reduce the risk

The giant new electrical vault built under 6th Ave in Seattle in front of
the Westin Building back in 2016/2017 is an example of such diversity.



On Mon, Oct 23, 2023 at 7:08?PM Sean Donelan <sean@donelan.com> wrote:

> On Mon, 23 Oct 2023, James Jun wrote:
> > "2N" generally means you're connected to completely different UPS
> system/complex and corresponding distribution systems for each of your
> circuit. This is ideal configuration for most critical loads.
>
> If you are in a single facility, even one with 2N+2 backups, redundancy,
> diversity, etc., it still has shared fate. Clouds with regions and zones
> on campuses in Eastern Virginia seem to come up with new and exciting ways
> to fail :-)
>
> https://en.wikipedia.org/wiki/Chaos_engineering
>
Re: 165 Halsey recurring power issues [ In reply to ]
The building itself got into the action and their goal was to make a top
notch facility focusing on central patch panel fiber cross connects.
They started with half of the 9th floor originally called MMR-2 and
continued with multiple spaces each bigger as it was quite successful.
No raised floors, properly positioned chillers, ample power, basic but
standard and roomy cabinets, one time fee per cross connect (plus
initial cabling and panel setup OTC) and they have been very succesfull
by all appearances.

Staff reflected their initial goals and I have always interacted well
with them.

Original mmr where each xcon was actualy pulled space to space was quite
a sight with multiple cable conduits and trays running from the tops of
the cabs to the ceiling, all full. New space adopted modern approaches
and looked it.

Joe

Sean Donelan wrote:
> ine tume
> 165 Halsey (and most of its tenant) data centers is an older facility.
> Data center practices have changed over the decades, and terminlogy
> wasn't standardized until recently.
>
>
> The biggest FUBAR in telco and data centers is the difference between
> "redundancy" and "diversity."
>
> Redundant A/B power feeds are often multiple cables from the same
> power source.
>
> Diverse A/B power feeds are cables from different backup power sources
> (within limits). 1-utility, 2 battery strings or backup generators.
> Often routed through in same conduits/cable trays. But both may be out
> of service for scheduled maintenance and some kinds of faults.
>
> Add a spare A/B power feed. Generally a N+1 backup power source and
> some additional power switching capability.
>
> Fault tolerant A/B power. Everything from utility to rack is diverse
> and redudant (cables, conduit/cable trays, switching equipment and
> backup sources). Maintenance can be performed on one of the power
> feeds without affecting the other feeds. Does not include redundant
> utility feed (not redundant substation, utility).
>
> Cost increase 2x, 5x, 10x
>
> I haven't toured 165 Halsey for 10+ years, so I don't know its current
> state. It has multiple tenant data centers, so some may be better
> than others.
>
>
> On Mon, 23 Oct 2023, Babak Pasdar wrote:
>> Hello,
>>
>> I wanted to get some feedback as to what is considered standard A/B
>> power setup when data centers sell redundant power. It has always
>> been my understanding that A/B power means individually unique and
>> preferably alternate path connections to disparate UPS units.
>>
>> A few months ago, 165 Halsey took us down for several hours. They
>> claimed that a UPS failed causing this issue. Our natural reaction
>> was that we have A/B redundant power so a failed UPS on the A circuit
>> should not take down the cabinet. Joe the facility manager claimed
>> that industry standard A/B power means two circuits to the same UPS,
>> which makes no sense to me.
>>
>> They committed to move us to A/B power with redundant circuits to
>> disparate UPS units. However, we had a multi-hour outage again in
>> that site this weekend. At first glance it seems to be the same problem.
>>
>> We have checked with all of our other data center providers who have
>> confirmed A/B power is in fact individually unique connections to
>> disparate UPS units. 165 Halsey's definition of what constitutes
>> redundant power seems unique. Why would anyone pay extra for a second
>> connection to the same UPS? However, I wanted to get feedback to see
>> if I am taking crazy pills here ????
>>
>> None-the-less, we have lost all confidence in this facility.
>>
>> Best Regards,
>>
>> Babak
>>
>
>
Re: 165 Halsey recurring power issues [ In reply to ]
Willing to bet that there was slicing on both sides of that conversation
and this is what I will now refer to as the expected and resulting razor
burn.

Babak Pasdar wrote:
> Thanks James,
>
> At signup we asked for N+1 power, two circuits to different UPS units.
> I think they sliced it thin by connecting us to two battery packs on
> the same UPS. When the UPS controller crashed both battery packs went
> down. Which now raises the question -- is it reasonable to have to
> specify and expect that two UPS units means that they do not share any
> common points of failure.
>
> Is the UPS the battery or the battery and controller combined?
>
> Babak
>
>
>
> On 10/23/23 15:16, James Jun wrote:
>> On Mon, Oct 23, 2023 at 10:38:09AM -0400, Babak Pasdar wrote:
>>> I wanted to get some feedback as to what is considered standard A/B
>>> power setup when data centers sell redundant power.?? It has always
>>> been
>>> my understanding that A/B power means individually unique and
>>> preferably
>>> alternate path connections to disparate UPS units.
>> Generally speaking, the definition of A/B has become muddied in
>> recent decades. It has almost become an inaccurate marketing term.
>>
>> Most sane people have the opinion (myself included) that when "A/B"
>> power is offered, it is at minimum offererd as 2N UPS (different
>> building entrance and MSBs and even physically separate UPS rooms are
>> also desired on a true 2N A/B, but may not always be available).
>> Some data center operators go even further and architect load
>> switching within their distribution, thereby preventing
>> single-side/one-leg power outages for customers during most of their
>> power maintenance activities
>>
>> Some data center operators treat "A/B" as convenience for them to
>> undertake maintenance and offload uptime responsibilities to their
>> own customers, and require them to either undertake their own
>> transfer switching and/or dual-cord every equipment, so that they can
>> keep taking one side of the power system down for repeated
>> maintenance. This does not scale well for retail colo, as not every
>> customer is going to be good at maintaining two PSUs for every single
>> piece of equipment.
>>
>> Some data centers also view "N+1" system deployment at the UPS as an
>> acceptable form of A/B protection, as long as customer circuits are
>> on different PDUs.
>>
>> Long story short, whether you're receiving N+1 or 2N or 1N, it's
>> important to inquire about how your power circuits will be
>> architected and delivered by the data center, and either have that
>> codified in the contract or reflected appropriately in SLA offering.
>> There is nothing wrong with the data center providing N+1 or 1N
>> power, as long as they're transparent about it and that it is what
>> you're willing to accept for the right terms. However, simply
>> accepting "we are providing you A/B power" or "we've never had
>> primary power failure" are not sufficient to meet proper due
>> diligence during a site selection process, unless you can accept the
>> site outage occurring from time to time, or you're deploying your own
>> power plant (i.e. DC power and batteries) to supplant data center's
>> own power protection scheme.
>>
>> James
>
>