Mailing List Archive

PoE complex usage of regulator API
Hello Mark, Oleksij,

Now that PoE support has been merged, I am digging more into the PoE features.
We decided to use the regulator API inside the PSE (Power Sourcing Equipment)
API because PSE and regulators are quite similar as exposed by Oleksij in the ML
before: https://lore.kernel.org/netdev/20231221174246.GI1697233@pengutronixde/
We designed it to have one regulator provider registered for each PSE ports
as described by Oleksij.

I am not really familiar with the regulator API and regulator controllers so I
have few questions and one issue using the API.

Let's begin simple, in PSE world we are more talking about power.
Would it be ok to add a regulator_get/set_power_limit() and
regulator_get_power() callback to regulator API. Would regulator API have
interest to such callbacks?

Port priority, more complex subject:
Indeed a PSE controller managing several ports may be able to turn off ports
with low priority if the total power consumption exceed a certain level.
- There are controller like PD692x0 that can managed this on the hardware side.
In that case we would have a regulator_get/set_power_limit() callbacks from
the regulator parent (the PSE contoller) and a regulator_get/set_priory()
callbacks for the regulator children (PSE ports).
- There are controller like TPS23881 or LTC4266 that can set two priorities
levels on their ports and a level change in one of their input pin can
shutdown all the low priority ports. In that case the same callbacks could be
used. regulator_get/set_power_limit() from the parent will be only at software
level. regulator_get/set_priority() will set the priorities of the ports on
hardware level. A polling function have to read frequently the total power
used and compare it to the power budget, then it has to call something like
regulator_shutdown_consumer() in case of power overflow.
- We could also want to manage the regulator priorities fully at software level,
in that case it will be like above but saving all informations in the driver
or using regulator generic functions.
This priority support could bring lots of issue and complexity like unbinding
regulator children driver at runtime if regulator parent overflow its power
budget. In the other side it could be interesting in the global management of
power if power supply can vary, like battery or hot-pluggable power supply.
What do you think? Do you think it is worth adding it to regulator API?

Last point, the PSE issue with regulator counters:
In regulator world we are using counters to not disable a regulator if children
are still using it. In the PSE world the regulator providers describing the
PSE ports do not want such counter to exist. We do want to run enable/disable
commands several times without increment/decrement the counter. So I added an
admin_state_enabled PSE intermediate variable to fix that.
https://lore.kernel.org/netdev/20240417-feature_poe-v9-10-242293fd1900@bootlin.com/
But in case the port is enabled from Linux then shutdown from the PSE controller
for any reason, I have to run disable and enable command to enable it again Not
really efficient :/
I am thinking of disabling the usage of counters in case of a
regulator_get_exclusive(). What do you think? Could it break other usage?

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Re: PoE complex usage of regulator API [ In reply to ]
> Let's begin simple, in PSE world we are more talking about power.
> Would it be ok to add a regulator_get/set_power_limit() and
> regulator_get_power() callback to regulator API. Would regulator API have
> interest to such callbacks?

Could you define this API in more details.

I'm assuming this is mostly about book keeping? When a regulator is
created, we want to say is can deliver up to X Kilowatts. We then want
to allocate power to ports. So there needs to be a call asking it to
allocate part of X to a consumer, which could fail if there is not
sufficient power budget left. And there needs to be a call to release
such an allocation.

We are probably not so much interested in what the actual current
power draw is, assuming there is no wish to over provision?

There is in theory a potential second user of this. Intel have been
looking at power control for SFPs. Typically they are guaranteed a
minimum of 1.5W. However, they can operate at higher power
classes. You can have boards with multiple SFPs, with a theoretical
maximum power draw more than what the supply can supply. So you need
similar sort of power budget book keeping to allocate power to an SFP
cage before telling the SFP module it can swap to a higher power
class. I say this is theoretical, because the device Intel is working
on has this hidden away in firmware. But maybe sometime in the future
somebody will want Linux doing this.

Andrew
Re: PoE complex usage of regulator API [ In reply to ]
On Sat, 27 Apr 2024 00:41:19 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > Let's begin simple, in PSE world we are more talking about power.
> > Would it be ok to add a regulator_get/set_power_limit() and
> > regulator_get_power() callback to regulator API. Would regulator API have
> > interest to such callbacks?
>
> Could you define this API in more details.

The first new PoE features targeted by this API was to read the consumed power
and get set the power limit for each ports. Yes mainly book keeping.
Few drivers callbacks that will be called by ethtool and maybe the read of power
limit and consumed power could be add to read-only sysfs regulator.

> I'm assuming this is mostly about book keeping? When a regulator is
> created, we want to say is can deliver up to X Kilowatts. We then want
> to allocate power to ports. So there needs to be a call asking it to
> allocate part of X to a consumer, which could fail if there is not
> sufficient power budget left. And there needs to be a call to release
> such an allocation.

This is more the aim of the second point I have raised, power priority and
parent power budget. And how the core can manage it.

> We are probably not so much interested in what the actual current
> power draw is, assuming there is no wish to over provision?
>
> There is in theory a potential second user of this. Intel have been
> looking at power control for SFPs. Typically they are guaranteed a
> minimum of 1.5W. However, they can operate at higher power
> classes. You can have boards with multiple SFPs, with a theoretical
> maximum power draw more than what the supply can supply. So you need
> similar sort of power budget book keeping to allocate power to an SFP
> cage before telling the SFP module it can swap to a higher power
> class. I say this is theoretical, because the device Intel is working
> on has this hidden away in firmware. But maybe sometime in the future
> somebody will want Linux doing this.

So there is a potential second user, that's great to hear it! Could the
priority stuff be also interesting? Like to allow only high priority SFP to use
higher power class in case of a limiting power budget.

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Re: PoE complex usage of regulator API [ In reply to ]
Hi Kory,

On Mon, Apr 29, 2024 at 02:52:03PM +0200, Kory Maincent wrote:
> On Sat, 27 Apr 2024 00:41:19 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
>
> > > Let's begin simple, in PSE world we are more talking about power.
> > > Would it be ok to add a regulator_get/set_power_limit() and
> > > regulator_get_power() callback to regulator API. Would regulator API have
> > > interest to such callbacks?
> >
> > Could you define this API in more details.
>
> The first new PoE features targeted by this API was to read the consumed power
> and get set the power limit for each ports. Yes mainly book keeping.
> Few drivers callbacks that will be called by ethtool and maybe the read of power
> limit and consumed power could be add to read-only sysfs regulator.

regulator framework already supports operations with current (I):
regulator_set_current_limit()
regulator_get_current_limit()

The power P = I * V. On one side you can calculate needed current value:
I = P/V. On other side, may be regulator framework can be extended to do
it to. In case of PoE/PoDL we have adjustable voltage, depending on the
Class of the device, we will probably interact with PSE controller by
using Power instate of Current.

> > I'm assuming this is mostly about book keeping? When a regulator is
> > created, we want to say is can deliver up to X Kilowatts. We then want
> > to allocate power to ports. So there needs to be a call asking it to
> > allocate part of X to a consumer, which could fail if there is not
> > sufficient power budget left. And there needs to be a call to release
> > such an allocation.
>
> This is more the aim of the second point I have raised, power priority and
> parent power budget. And how the core can manage it.

Since there is already support to work with current (I) values, there
are is also overcurrent protection. If a device is beyond the power
budget limit, it is practically an over current event. Regulator
framework already capable on handling some of this events, what we need
for PoE is prioritization. If we detect overcurrent on supply root/node
we need to shutdown enough low prio consumers to provide enough power
for the high prio consumers.

In reality, this will be done by the PoE controller in HW. Usually we
will get

> > We are probably not so much interested in what the actual current
> > power draw is, assuming there is no wish to over provision?
> >
> > There is in theory a potential second user of this. Intel have been
> > looking at power control for SFPs. Typically they are guaranteed a
> > minimum of 1.5W. However, they can operate at higher power
> > classes. You can have boards with multiple SFPs, with a theoretical
> > maximum power draw more than what the supply can supply. So you need
> > similar sort of power budget book keeping to allocate power to an SFP
> > cage before telling the SFP module it can swap to a higher power
> > class. I say this is theoretical, because the device Intel is working
> > on has this hidden away in firmware. But maybe sometime in the future
> > somebody will want Linux doing this.
>
> So there is a potential second user, that's great to hear it! Could the
> priority stuff be also interesting? Like to allow only high priority SFP to use
> higher power class in case of a limiting power budget.

There are even more use cases. For example on power loss with some
limited backup power source, you wont to shut all low prio consumers
and provided needed power and time for some device which may fail. For
example storage devices.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Re: PoE complex usage of regulator API [ In reply to ]
> Since there is already support to work with current (I) values, there
> are is also overcurrent protection. If a device is beyond the power
> budget limit, it is practically an over current event. Regulator
> framework already capable on handling some of this events, what we need
> for PoE is prioritization. If we detect overcurrent on supply root/node
> we need to shutdown enough low prio consumers to provide enough power
> for the high prio consumers.

So the assumption is we allow over provisioning?

> > So there is a potential second user, that's great to hear it! Could the
> > priority stuff be also interesting? Like to allow only high priority SFP to use
> > higher power class in case of a limiting power budget.

I was not expecting over-provisioning to happen. So prioritisation
does not make much sense. You either have the power budget, or you
don't. The SFP gets to use a higher power class if there is budget, or
it is kept at a lower power class if there is no budget. I _guess_ you
could give it a high power class, let it establish link, monitor its
actual power consumption, and then decide to drop it to a lower class
if the actual consumption indicates it could work at a lower
class. But the danger is, you are going to loose link.

I've no real experience with this, and all systems today hide this
away in firmware, rather than have Linux control it.

Andrew
Re: PoE complex usage of regulator API [ In reply to ]
On Fri, Apr 26, 2024 at 12:42:53PM +0200, Kory Maincent wrote:

> Let's begin simple, in PSE world we are more talking about power.
> Would it be ok to add a regulator_get/set_power_limit() and
> regulator_get_power() callback to regulator API. Would regulator API have
> interest to such callbacks?

Why would these be different to the existing support for doing current
limiting? If the voltage for the supply is known then the power is a
simple function of the current and the voltage. I suppose you could try
to do a convenience functions for a fixed voltage, but there'd be issues
there if the voltage isn't configured to an exact voltage and might
vary.

> Port priority, more complex subject:
> Indeed a PSE controller managing several ports may be able to turn off ports
> with low priority if the total power consumption exceed a certain level.
> - There are controller like PD692x0 that can managed this on the hardware side.
> In that case we would have a regulator_get/set_power_limit() callbacks from
> the regulator parent (the PSE contoller) and a regulator_get/set_priory()
> callbacks for the regulator children (PSE ports).

All this priority stuff feels very PSE specific but possibly doable.
You'd have to define the domains in which priorities apply as well as
the priorities themselves.

> - There are controller like TPS23881 or LTC4266 that can set two priorities
> levels on their ports and a level change in one of their input pin can
> shutdown all the low priority ports. In that case the same callbacks could be
> used. regulator_get/set_power_limit() from the parent will be only at software
> level. regulator_get/set_priority() will set the priorities of the ports on
> hardware level. A polling function have to read frequently the total power
> used and compare it to the power budget, then it has to call something like
> regulator_shutdown_consumer() in case of power overflow.

I would expect the regulators can generate notifications when they go
out of regulation? Having to poll feels very crude for something with
configurable power limits.

> https://lore.kernel.org/netdev/20240417-feature_poe-v9-10-242293fd1900@bootlin.com/
> But in case the port is enabled from Linux then shutdown from the PSE controller
> for any reason, I have to run disable and enable command to enable it again. Not
> really efficient :/

If that is a hot path something has gone very wrong with the system,
especially if it's such a hot path that the cost of a disable is making
a difference. Note that hardware may have multiple error handling
strategies, some hardware will turn off outputs when there's a problem
while other implementations will try to provide as good an output as
they can. Sometimes the strategy will depend on the specific error
condition, and there may be timeouts involved. This all makes it very
difficult to assume any particular state after an error has occurred, or
that the state configured in the control registers reflects the physical
state of the hardware so you probably want some explicit handling for
any new state you're looking for.

> I am thinking of disabling the usage of counters in case of a
> regulator_get_exclusive(). What do you think? Could it break other usage?

Yes, that seems likely to break other users and in general a sharp edge
for people working with the API.
Re: PoE complex usage of regulator API [ In reply to ]
On Sat, Apr 27, 2024 at 12:41:19AM +0200, Andrew Lunn wrote:

> I'm assuming this is mostly about book keeping? When a regulator is
> created, we want to say is can deliver up to X Kilowatts. We then want
> to allocate power to ports. So there needs to be a call asking it to
> allocate part of X to a consumer, which could fail if there is not
> sufficient power budget left. And there needs to be a call to release
> such an allocation.

The current limits for regulators are generally imposed in hardware as a
safety measure, this also happens for example with USB where there's
regulators in the PHYs. Whatever is providing the power is very likely
to have reasonable headroom for robustness.

> We are probably not so much interested in what the actual current
> power draw is, assuming there is no wish to over provision?

One of the goals is to protect the system in the case that something
malfunctions and tries to draw more current than can be sustained. A
system that is overprovisioned might choose to allow excessive draw,
especially transiently to cover bootsrapping issues, though there's
tradeoffs with system protection vs interoperability with poor quality
implementations there.
Re: PoE complex usage of regulator API [ In reply to ]
On Mon, Apr 29, 2024 at 04:57:35PM +0200, Andrew Lunn wrote:
> > Since there is already support to work with current (I) values, there
> > are is also overcurrent protection. If a device is beyond the power
> > budget limit, it is practically an over current event. Regulator
> > framework already capable on handling some of this events, what we need
> > for PoE is prioritization. If we detect overcurrent on supply root/node
> > we need to shutdown enough low prio consumers to provide enough power
> > for the high prio consumers.
>
> So the assumption is we allow over provisioning?

I assume yes. But I didn't spend enough time to understand and analyze
this part. May be I just misunderstand over provisioning.

> > > So there is a potential second user, that's great to hear it! Could the
> > > priority stuff be also interesting? Like to allow only high priority SFP to use
> > > higher power class in case of a limiting power budget.
>
> I was not expecting over-provisioning to happen. So prioritisation
> does not make much sense. You either have the power budget, or you
> don't.
> The SFP gets to use a higher power class if there is budget, or
> it is kept at a lower power class if there is no budget. I _guess_ you
> could give it a high power class, let it establish link, monitor its
> actual power consumption, and then decide to drop it to a lower class
> if the actual consumption indicates it could work at a lower
> class. But the danger is, you are going to loose link.
>
> I've no real experience with this, and all systems today hide this
> away in firmware, rather than have Linux control it.
>
> Andrew

It may not be a over-provisioning by design. I can imagine some scenarios where
available power budge may dynamically change:

- Changes in Available Power Budget: If a PoE switch is modular or supports
hot-swappable power supplies, inserting a power supply with a lower power
budget while the system is under load can lead to insufficient power
availability. This might cause the system to redistribute power, potentially
leading to instability or overcurrent situations if the power management isn't
handled smoothly.

- Power Loss and Switching to Backup Sources: In cases where a switch relies on
a backup power source (like a UPS or a secondary power supply), the transition
from the primary power source to the backup can create fluctuations. These
fluctuations may temporarily affect how power is supplied to the PoE ports,
potentially causing overcurrent if the backup power does not match the original
specifications.

- System Internal Consumers: Components within the switch itself, such as
processing units or internal lighting/cooling systems, might draw power
differently under various operating conditions. Changes in internal consumption
due to increased processing needs or thermal dynamics could affect the overall
power budget.

- Environmental Conditions: High ambient temperatures can reduce the efficiency
of power delivery and increase the electrical resistance in circuits,
potentially leading to higher current draws. Additionally, cooling failures
within the switch can exacerbate this issue.

- Faulty Power Management Logic: Firmware bugs or errors in the power
management algorithm might incorrectly allocate power or fail to properly
respond to changes in power demands, leading to potential overcurrent
scenarios.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Re: PoE complex usage of regulator API [ In reply to ]
On Mon, Apr 29, 2024 at 04:57:35PM +0200, Andrew Lunn wrote:

> I was not expecting over-provisioning to happen. So prioritisation
> does not make much sense. You either have the power budget, or you
> don't. The SFP gets to use a higher power class if there is budget, or
> it is kept at a lower power class if there is no budget. I _guess_ you
> could give it a high power class, let it establish link, monitor its
> actual power consumption, and then decide to drop it to a lower class
> if the actual consumption indicates it could work at a lower
> class. But the danger is, you are going to loose link.

I suspect these devices will be like most other modern systems and
typically not consume anything like their peak current most of the time,
for networking hardware I'd imagine this will only be when the link is
saturated and could depend on factors like how long the physical links
are. If it's anything like other similar hardware you may also be
making power requests with a very low resolution specification of the
consumption so have conservative allocation end up rejecting systems
that should work.
Re: PoE complex usage of regulator API [ In reply to ]
On Tue, 30 Apr 2024 00:38:10 +0900
Mark Brown <broonie@kernel.org> wrote:

Hello all, thank for your replies!
That gives me more hint for the development.

> On Fri, Apr 26, 2024 at 12:42:53PM +0200, Kory Maincent wrote:
>
> > Let's begin simple, in PSE world we are more talking about power.
> > Would it be ok to add a regulator_get/set_power_limit() and
> > regulator_get_power() callback to regulator API. Would regulator API have
> > interest to such callbacks?
>
> Why would these be different to the existing support for doing current
> limiting? If the voltage for the supply is known then the power is a
> simple function of the current and the voltage. I suppose you could try
> to do a convenience functions for a fixed voltage, but there'd be issues
> there if the voltage isn't configured to an exact voltage and might
> vary.

That's right I was focusing on power where I could use already implemented
voltage and current callbacks. Would you be interested to a new get_current()
callback to know the current and allows regulator to deduce the consumed power
or should it be specific to PSE subsystem.

> > Port priority, more complex subject:
> > Indeed a PSE controller managing several ports may be able to turn off ports
> > with low priority if the total power consumption exceed a certain level.
> > - There are controller like PD692x0 that can managed this on the hardware
> > side. In that case we would have a regulator_get/set_power_limit()
> > callbacks from the regulator parent (the PSE contoller) and a
> > regulator_get/set_priory() callbacks for the regulator children (PSE
> > ports).
>
> All this priority stuff feels very PSE specific but possibly doable.
> You'd have to define the domains in which priorities apply as well as
> the priorities themselves.

If you think that it is really specific to PSE no need to add it in the
regulator API, it will also remove me some brain knots.

> > - There are controller like TPS23881 or LTC4266 that can set two priorities
> > levels on their ports and a level change in one of their input pin can
> > shutdown all the low priority ports. In that case the same callbacks
> > could be used. regulator_get/set_power_limit() from the parent will be only
> > at software level. regulator_get/set_priority() will set the priorities of
> > the ports on hardware level. A polling function have to read frequently the
> > total power used and compare it to the power budget, then it has to call
> > something like regulator_shutdown_consumer() in case of power overflow.
>
> I would expect the regulators can generate notifications when they go
> out of regulation? Having to poll feels very crude for something with
> configurable power limits.

Yep that's true. Indeed using notification would be way better!

> > https://lore.kernel.org/netdev/20240417-feature_poe-v9-10-242293fd1900@bootlin.com/
> > But in case the port is enabled from Linux then shutdown from the PSE
> > controller for any reason, I have to run disable and enable command to
> > enable it again. Not really efficient :/
>
> If that is a hot path something has gone very wrong with the system,
> especially if it's such a hot path that the cost of a disable is making
> a difference.

That's not in the hotpath.

> Note that hardware may have multiple error handling
> strategies, some hardware will turn off outputs when there's a problem
> while other implementations will try to provide as good an output as
> they can. Sometimes the strategy will depend on the specific error
> condition, and there may be timeouts involved. This all makes it very
> difficult to assume any particular state after an error has occurred, or
> that the state configured in the control registers reflects the physical
> state of the hardware so you probably want some explicit handling for
> any new state you're looking for.

Alright, didn't thought of these different management of an error condition.
We might also see similar things in PSE, so I will keep it like that.

> > I am thinking of disabling the usage of counters in case of a
> > regulator_get_exclusive(). What do you think? Could it break other usage?
>
> Yes, that seems likely to break other users and in general a sharp edge
> for people working with the API.

Okay,

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Re: PoE complex usage of regulator API [ In reply to ]
On Mon, Apr 29, 2024 at 07:28:48PM +0200, Kory Maincent wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Apr 26, 2024 at 12:42:53PM +0200, Kory Maincent wrote:

> That's right I was focusing on power where I could use already implemented
> voltage and current callbacks. Would you be interested to a new get_current()
> callback to know the current and allows regulator to deduce the consumed power
> or should it be specific to PSE subsystem.

That feels like it belongs in hwmon or possibly power rather than in the
regulator API but it does feel like it's generally useful rather than
PSE specific.
Re: PoE complex usage of regulator API [ In reply to ]
On Tue, Apr 30, 2024 at 11:23:15AM +0900, Mark Brown wrote:
> On Mon, Apr 29, 2024 at 07:28:48PM +0200, Kory Maincent wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> > > On Fri, Apr 26, 2024 at 12:42:53PM +0200, Kory Maincent wrote:
>
> > That's right I was focusing on power where I could use already implemented
> > voltage and current callbacks. Would you be interested to a new get_current()
> > callback to know the current and allows regulator to deduce the consumed power
> > or should it be specific to PSE subsystem.
>
> That feels like it belongs in hwmon or possibly power rather than in the
> regulator API but it does feel like it's generally useful rather than
> PSE specific.

I would say, it depends on use case and abilities of HW. Power
consumption may change rapidly, so it is all about sampling rate. For
real time current measurement you wont to use iio framework. For most
cases and simple diagnostic are more interesting max and probably min
values which self cleared after last read.

If HW provides only real time measurement, then the question is, how
many samples are needed to provide some usable result.

Regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |