Mailing List Archive

s2idle breaks on machines without cpuidle support
Hi everyone,

s2idle is blocked on machines without proper cpuidle support here
in kernel/sched/idle.c:

> if (cpuidle_not_available(drv, dev)) {
> tick_nohz_idle_stop_tick();

> default_idle_call();
> goto exit_idle;
> }

> /*
> * Suspend-to-idle ("s2idle") is a system state in which all user space
> * has been frozen, all I/O devices have been suspended and the only

However, there are 2 problems with this approach:

1. The suspend framework does not expect this, and continues to suspend the
machine, which causes machines without proper cpuidle support to break when
suspending
2. Suspend actually works on ARM64 machines even without proper
cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
on such machines

I'm not exactly sure how to figure this out, and my attempts have all led to an
unbootable kernel, so I've cc'ed the relevant people and hopefully we can find a
solution to this problem.

Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
Hi Kazaki,

On Sun, Feb 05, 2023 at 12:27:47AM +0900, Kazuki wrote:
>
> Hi everyone,
>
> s2idle is blocked on machines without proper cpuidle support here
> in kernel/sched/idle.c:
>
> > if (cpuidle_not_available(drv, dev)) {
> > tick_nohz_idle_stop_tick();
>
> > default_idle_call();
> > goto exit_idle;
> > }
>
> > /*
> > * Suspend-to-idle ("s2idle") is a system state in which all user space
> > * has been frozen, all I/O devices have been suspended and the only
>
> However, there are 2 problems with this approach:
>
> 1. The suspend framework does not expect this, and continues to suspend the
> machine, which causes machines without proper cpuidle support to break when
> suspending

What do you mean by break ? More details on the observation would be helpful.

> 2. Suspend actually works on ARM64 machines even without proper
> cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
> on such machines
>

Sorry I am bit confused here. Your point (2) contradicts the $subject.

> I'm not exactly sure how to figure this out, and my attempts have all led to an
> unbootable kernel, so I've cc'ed the relevant people and hopefully we can find a
> solution to this problem.
>

Again, since s2idle is userspace driven, I don't understand what do you
mean by unbootable kernel in the context of s2idle.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> Hi Kazaki,
>
> On Sun, Feb 05, 2023 at 12:27:47AM +0900, Kazuki wrote:
> >
> > Hi everyone,
> >
> > s2idle is blocked on machines without proper cpuidle support here
> > in kernel/sched/idle.c:
> >
> > > if (cpuidle_not_available(drv, dev)) {
> > > tick_nohz_idle_stop_tick();
> >
> > > default_idle_call();
> > > goto exit_idle;
> > > }
> >
> > > /*
> > > * Suspend-to-idle ("s2idle") is a system state in which all user space
> > > * has been frozen, all I/O devices have been suspended and the only
> >
> > However, there are 2 problems with this approach:
> >
> > 1. The suspend framework does not expect this, and continues to suspend the
> > machine, which causes machines without proper cpuidle support to break when
> > suspending
>
> What do you mean by break ? More details on the observation would be helpful.
For example, CLOCK_MONOTONIC doesn't stop even after suspend since
these chain of commands don't get called.

call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)

Which in turn causes programs like systemd to crash since it doesn't
expect this.
>
> > 2. Suspend actually works on ARM64 machines even without proper
> > cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
> > on such machines
> >
>
> Sorry I am bit confused here. Your point (2) contradicts the $subject.
drivers/cpuidle/cpuidle.c:

bool cpuidle_not_available(struct cpuidle_driver *drv,
struct cpuidle_device *dev)
{
return off || !initialized || !drv || !dev || !dev->enabled;
}

The cpuidle framework reports ARM64 devices without PSCI cpuidle as
"cpuidle not available" even when they support wfi, which causes suspend
to fail, which shouldn't be happening since they do support idling.
>
> > I'm not exactly sure how to figure this out, and my attempts have all led to an
> > unbootable kernel, so I've cc'ed the relevant people and hopefully we can find a
> > solution to this problem.
> >
>
> Again, since s2idle is userspace driven, I don't understand what do you
> mean by unbootable kernel in the context of s2idle.
Sorry, I meant "attempts to fix this bug have all led to an unbootable
kernel."
>
> --
> Regards,
> Sudeep
Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> >
> > What do you mean by break ? More details on the observation would be helpful.
> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> these chain of commands don't get called.
>
> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>
> Which in turn causes programs like systemd to crash since it doesn't
> expect this.

Yes expected IIUC. The per-cpu timers and counters continue to tick in
WFI and hence CLOCK_MONOTONIC can't stop.

> >
> > > 2. Suspend actually works on ARM64 machines even without proper
> > > cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
> > > on such machines
> > >
> >
> > Sorry I am bit confused here. Your point (2) contradicts the $subject.
> drivers/cpuidle/cpuidle.c:
>
> bool cpuidle_not_available(struct cpuidle_driver *drv,
> struct cpuidle_device *dev)
> {
> return off || !initialized || !drv || !dev || !dev->enabled;
> }
>
> The cpuidle framework reports ARM64 devices without PSCI cpuidle as
> "cpuidle not available" even when they support wfi, which causes suspend
> to fail, which shouldn't be happening since they do support idling.

Yes with just WFI, there will be no active cpuidle driver.

[...]

> > Again, since s2idle is userspace driven, I don't understand what do you
> > mean by unbootable kernel in the context of s2idle.
>
> Sorry, I meant "attempts to fix this bug have all led to an unbootable
> kernel."

Again I assume you mean kernel hang or crash and nothing to do with boot.
Once you enter s2i state with your changes/fix, it hangs or is unresponsive
as it might have either failed to enter or resume from the state.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 10:35:11AM +0000, Sudeep Holla wrote:
> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> > On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> > >
> > > What do you mean by break ? More details on the observation would be helpful.
> > For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> > these chain of commands don't get called.
> >
> > call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
> >
> > Which in turn causes programs like systemd to crash since it doesn't
> > expect this.
>
> Yes expected IIUC. The per-cpu timers and counters continue to tick in
> WFI and hence CLOCK_MONOTONIC can't stop.
Yes, but it shouldn't be the case when suspending[1]. Currently that's what
happens when we enter s2idle without a cpuidle driver. This doesn't seem
to happen with S3 sleep [2].

[1]
Documentation/core-api/timekeeping.rst:

.. c:function:: ktime_t ktime_get( void )

CLOCK_MONOTONIC

Useful for reliable timestamps and measuring short time intervals
accurately. Starts at system boot time but stops during suspend.

[2]
kernel/time/sched_clock.c:

int sched_clock_suspend(void)
{
struct clock_read_data *rd = &cd.read_data[0];

update_sched_clock();
hrtimer_cancel(&sched_clock_timer);
rd->read_sched_clock = suspended_sched_clock_read;

return 0;
}

void sched_clock_resume(void)
{
struct clock_read_data *rd = &cd.read_data[0];

rd->epoch_cyc = cd.actual_read_sched_clock();
hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL_HARD);
rd->read_sched_clock = cd.actual_read_sched_clock;
}

static struct syscore_ops sched_clock_ops = {
.suspend = sched_clock_suspend,
.resume = sched_clock_resume,
};

static int __init sched_clock_syscore_init(void)
{
register_syscore_ops(&sched_clock_ops);

return 0;
}
device_initcall(sched_clock_syscore_init);
>
> > >
> > > > 2. Suspend actually works on ARM64 machines even without proper
> > > > cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
> > > > on such machines
> > > >
> > >
> > > Sorry I am bit confused here. Your point (2) contradicts the $subject.
> > drivers/cpuidle/cpuidle.c:
> >
> > bool cpuidle_not_available(struct cpuidle_driver *drv,
> > struct cpuidle_device *dev)
> > {
> > return off || !initialized || !drv || !dev || !dev->enabled;
> > }
> >
> > The cpuidle framework reports ARM64 devices without PSCI cpuidle as
> > "cpuidle not available" even when they support wfi, which causes suspend
> > to fail, which shouldn't be happening since they do support idling.
>
> Yes with just WFI, there will be no active cpuidle driver.
>
> [...]
>
> > > Again, since s2idle is userspace driven, I don't understand what do you
> > > mean by unbootable kernel in the context of s2idle.
> >
> > Sorry, I meant "attempts to fix this bug have all led to an unbootable
> > kernel."
>
> Again I assume you mean kernel hang or crash and nothing to do with boot.
> Once you enter s2i state with your changes/fix, it hangs or is unresponsive
> as it might have either failed to enter or resume from the state.
>
> --
> Regards,
> Sudeep
Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 08:20:31PM +0900, Kazuki wrote:
> On Wed, Feb 08, 2023 at 10:35:11AM +0000, Sudeep Holla wrote:
> > On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> > > On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> > > >
> > > > What do you mean by break ? More details on the observation would be helpful.
> > > For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> > > these chain of commands don't get called.
> > >
> > > call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
> > >
> > > Which in turn causes programs like systemd to crash since it doesn't
> > > expect this.
> >
> > Yes expected IIUC. The per-cpu timers and counters continue to tick in
> > WFI and hence CLOCK_MONOTONIC can't stop.
> Yes, but it shouldn't be the case when suspending[1]. Currently that's what
> happens when we enter s2idle without a cpuidle driver. This doesn't seem
> to happen with S3 sleep [2].
>

Correct, but check the requirements to use syscore operations(mainly
syscore_suspend/resume where only one CPU is online with interrupts
disabled. In case of s2idle, all CPUs are idling and not offlined as
required by the syscore operations and hence it can't be used.

I was about ask you earlier as why can't you implement just system
suspend in PSCI where the last cpu just calls WFI if you are interested
in system sleep state. Or you can implement CPU_SUSPEND with an additional
retention state which enters PSCI implementation just to make sure there is
an active cpuidle driver and the s2idle state machinery works as expected.
It is built with those requirements and trying to work it out for WFI without
any idle driver or firmware implementation to back it up is just not going
to work.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 02:16:58PM +0000, Sudeep Holla wrote:
> On Wed, Feb 08, 2023 at 08:20:31PM +0900, Kazuki wrote:
> > On Wed, Feb 08, 2023 at 10:35:11AM +0000, Sudeep Holla wrote:
> > > On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> > > > On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> > > > >
> > > > > What do you mean by break ? More details on the observation would be helpful.
> > > > For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> > > > these chain of commands don't get called.
> > > >
> > > > call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
> > > >
> > > > Which in turn causes programs like systemd to crash since it doesn't
> > > > expect this.
> > >
> > > Yes expected IIUC. The per-cpu timers and counters continue to tick in
> > > WFI and hence CLOCK_MONOTONIC can't stop.
> > Yes, but it shouldn't be the case when suspending[1]. Currently that's what
> > happens when we enter s2idle without a cpuidle driver. This doesn't seem
> > to happen with S3 sleep [2].
> >
>
> Correct, but check the requirements to use syscore operations(mainly
> syscore_suspend/resume where only one CPU is online with interrupts
> disabled. In case of s2idle, all CPUs are idling and not offlined as
> required by the syscore operations and hence it can't be used.
>
> I was about ask you earlier as why can't you implement just system
> suspend in PSCI where the last cpu just calls WFI if you are interested
> in system sleep state. Or you can implement CPU_SUSPEND with an additional
> retention state which enters PSCI implementation just to make sure there is
> an active cpuidle driver and the s2idle state machinery works as expected.
The machine I have (Macbook with Apple M1) doesn't have PSCI.
> It is built with those requirements and trying to work it out for WFI without
> any idle driver or firmware implementation to back it up is just not going
> to work.
Any reason why that's the case?

I guess we should ensure that systems without a cpuidle driver
will not suspend maybe around here then.

kernel/power/main.c:

static ssize_t mem_sleep_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
suspend_state_t state;
int error;

error = pm_autosleep_lock();
if (error)
return error;

if (pm_autosleep_state() > PM_SUSPEND_ON) {
error = -EBUSY;
goto out;
}

state = decode_suspend_state(buf, n);
if (state < PM_SUSPEND_MAX && state > PM_SUSPEND_ON)
mem_sleep_current = state;
else
error = -EINVAL;

out:
pm_autosleep_unlock();
return error ? error : n;
}
>
> --
> Regards,
> Sudeep
Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
Sent the wrong bit of code, sorry. Correct code is:

static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
suspend_state_t state;
int error;

error = pm_autosleep_lock();
if (error)
return error;

if (pm_autosleep_state() > PM_SUSPEND_ON) {
error = -EBUSY;
goto out;
}

state = decode_state(buf, n);
if (state < PM_SUSPEND_MAX) {
if (state == PM_SUSPEND_MEM)
state = mem_sleep_current;

error = pm_suspend(state);
} else if (state == PM_SUSPEND_MAX) {
error = hibernate();
} else {
error = -EINVAL;
}

out:
pm_autosleep_unlock();
return error ? error : n;
}

power_attr(state);
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 11:43:27PM +0900, Kazuki wrote:
> On Wed, Feb 08, 2023 at 02:16:58PM +0000, Sudeep Holla wrote:

[...]

> > I was about ask you earlier as why can't you implement just system
> > suspend in PSCI where the last cpu just calls WFI if you are interested
> > in system sleep state. Or you can implement CPU_SUSPEND with an additional
> > retention state which enters PSCI implementation just to make sure there is
> > an active cpuidle driver and the s2idle state machinery works as expected.
>
> The machine I have (Macbook with Apple M1) doesn't have PSCI.

Well, if we are allowing to boot on such a system, then we must allow
adding a platform specific idle driver. It may be useful once we info
to add deeper than WFI states.

> I guess we should ensure that systems without a cpuidle driver
> will not suspend maybe around here then.
>

Are we ? I thought were making changes to enable it. Or are you saying
we allow to enter into such a state and render the system unusable, if
so we need to fix it.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Thu, Feb 09, 2023 at 12:19:39AM +0900, Kazuki wrote:
> On Wed, Feb 08, 2023 at 03:03:06PM +0000, Sudeep Holla wrote:

[...]

> > Well, if we are allowing to boot on such a system, then we must allow
> > adding a platform specific idle driver. It may be useful once we info
> > to add deeper than WFI states.
>
> Hmmm, I thought for arm64, non-PSCI idle drivers were prohibited? Or am
> I mistaken here?

I don't know. I am not against it especially now that we have allowed a
non-PSCI based production system to boot the kernel.

> > Are we ? I thought were making changes to enable it. Or are you saying
> > we allow to enter into such a state and render the system unusable, if
> > so we need to fix it.
>
> Both as I mentioned in my first email. Apologies if it turned out to
> be confusing.

Sorry still confusing. Are you saying you can enter s2idle and crash or
hang the system without changes(especially around this s2idle code) ?

If yes, then it is a bug. If the hang/crash is only after your changes,
we need to check.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 03:03:06PM +0000, Sudeep Holla wrote:
> On Wed, Feb 08, 2023 at 11:43:27PM +0900, Kazuki wrote:
> > On Wed, Feb 08, 2023 at 02:16:58PM +0000, Sudeep Holla wrote:
>
> [...]
>
> > > I was about ask you earlier as why can't you implement just system
> > > suspend in PSCI where the last cpu just calls WFI if you are interested
> > > in system sleep state. Or you can implement CPU_SUSPEND with an additional
> > > retention state which enters PSCI implementation just to make sure there is
> > > an active cpuidle driver and the s2idle state machinery works as expected.
> >
> > The machine I have (Macbook with Apple M1) doesn't have PSCI.
>
> Well, if we are allowing to boot on such a system, then we must allow
> adding a platform specific idle driver. It may be useful once we info
> to add deeper than WFI states.
Hmmm, I thought for arm64, non-PSCI idle drivers were prohibited? Or am
I mistaken here?
>
> > I guess we should ensure that systems without a cpuidle driver
> > will not suspend maybe around here then.
> >
>
> Are we ? I thought were making changes to enable it. Or are you saying
> we allow to enter into such a state and render the system unusable, if
> so we need to fix it.
Both as I mentioned in my first email. Apologies if it turned out to
be confusing.

Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On 08/02/2023 19.35, Sudeep Holla wrote:
> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
>> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
>>>
>>> What do you mean by break ? More details on the observation would be helpful.
>> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
>> these chain of commands don't get called.
>>
>> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>>
>> Which in turn causes programs like systemd to crash since it doesn't
>> expect this.
>
> Yes expected IIUC. The per-cpu timers and counters continue to tick in
> WFI and hence CLOCK_MONOTONIC can't stop.

The hardware counters would also keep ticking in "real" s2idle (with
hypothetical PSCI idle support) and often in full suspend. There is a
flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
machinery to make the kernel's view of time stop anyway, it's just not
being invoked here.

This is somewhat orthogonal to the issue of PSCI. These machines can't
physically support PSCI and KVM at the same time (they do not have EL3),
so PSCI is not an option. We will be starting a conversation about how
to provide something "like" PSCI over some other sort of transport to
solve this soon. So that will "fix" this issue once it's all implemented.

However, these machines aren't the only ones without PSCI (search for
"spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
first). It seems broken that Linux currently implements s2idle in such a
way that it violates the userspace clock behavior contract on systems
without a cpuidle driver (and does so silently, to make it worse). So
that should be fixed regardless of whether we end up coming up with a
PSCI alternative or not for these platforms. There's no fundamental
reason why s2idle can't work properly with plain WFI.

- Hector
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Wed, Feb 08, 2023 at 03:34:07PM +0000, Sudeep Holla wrote:
> On Thu, Feb 09, 2023 at 12:19:39AM +0900, Kazuki wrote:
> > On Wed, Feb 08, 2023 at 03:03:06PM +0000, Sudeep Holla wrote:
>
> [...]
>
> > > Well, if we are allowing to boot on such a system, then we must allow
> > > adding a platform specific idle driver. It may be useful once we info
> > > to add deeper than WFI states.
> >
> > Hmmm, I thought for arm64, non-PSCI idle drivers were prohibited? Or am
> > I mistaken here?
>
> I don't know. I am not against it especially now that we have allowed a
> non-PSCI based production system to boot the kernel.
>
> > > Are we ? I thought were making changes to enable it. Or are you saying
> > > we allow to enter into such a state and render the system unusable, if
> > > so we need to fix it.
> >
> > Both as I mentioned in my first email. Apologies if it turned out to
> > be confusing.
>
> Sorry still confusing. Are you saying you can enter s2idle and crash or
> hang the system without changes(especially around this s2idle code) ?
Yes.

Thanks,
Kazuki
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On Thu, Feb 09, 2023 at 12:42:17AM +0900, Hector Martin wrote:
> On 08/02/2023 19.35, Sudeep Holla wrote:
> > On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> >> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> >>>
> >>> What do you mean by break ? More details on the observation would be helpful.
> >> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> >> these chain of commands don't get called.
> >>
> >> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
> >>
> >> Which in turn causes programs like systemd to crash since it doesn't
> >> expect this.
> >
> > Yes expected IIUC. The per-cpu timers and counters continue to tick in
> > WFI and hence CLOCK_MONOTONIC can't stop.
>
> The hardware counters would also keep ticking in "real" s2idle (with
> hypothetical PSCI idle support) and often in full suspend. There is a
> flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
> ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
> machinery to make the kernel's view of time stop anyway, it's just not
> being invoked here.
>

Indeed, and I assumed s2idle was designed with those requirements but I
think I may be wrong especially looking at few points you have raised
provide my understanding is aligned with yours.

> This is somewhat orthogonal to the issue of PSCI. These machines can't
> physically support PSCI and KVM at the same time (they do not have EL3),
> so PSCI is not an option. We will be starting a conversation about how
> to provide something "like" PSCI over some other sort of transport to
> solve this soon. So that will "fix" this issue once it's all implemented.
>

All the best for the efforts.

> However, these machines aren't the only ones without PSCI (search for
> "spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
> first).

Yes I am aware of it and if you see arch/arm64/kernel/smp_spin_table.c
we don't support CPU hotplug or suspend for such a system.

> It seems broken that Linux currently implements s2idle in such a
> way that it violates the userspace clock behavior contract on systems
> without a cpuidle driver (and does so silently, to make it worse).

Just to check if I understand this correctly, are you referring to:
cpuidle_idle_call()->default_idle_call() if cpuidle_not_available()
And the problem is it idles there in wfi but CLOCK_MONOTONIC isn't
stopping as expected by the userspace ? If so, fair enough. If not, I
may be missing to understand something.

> So that should be fixed regardless of whether we end up coming up with a
> PSCI alternative or not for these platforms.

If above understanding is correct, I agree. But not sure what was the
motivation behind the current behaviour.

> There's no fundamental reason why s2idle can't work properly with plain WFI.
>

Fair enough. I hadn't thought much of it before as most of the platforms
I have seen or used have at-least one deeper than WFI state these days.
On arm32, this was common but each platform managed suspend_set_ops
on its own and probably can do the same s2idle_set_ops.

--
Regards,
Sudeep
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On 09/02/2023 01.18, Sudeep Holla wrote:
> On Thu, Feb 09, 2023 at 12:42:17AM +0900, Hector Martin wrote:
>> On 08/02/2023 19.35, Sudeep Holla wrote:
>>> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
>>>> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
>>>>>
>>>>> What do you mean by break ? More details on the observation would be helpful.
>>>> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
>>>> these chain of commands don't get called.
>>>>
>>>> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>>>>
>>>> Which in turn causes programs like systemd to crash since it doesn't
>>>> expect this.
>>>
>>> Yes expected IIUC. The per-cpu timers and counters continue to tick in
>>> WFI and hence CLOCK_MONOTONIC can't stop.
>>
>> The hardware counters would also keep ticking in "real" s2idle (with
>> hypothetical PSCI idle support) and often in full suspend. There is a
>> flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
>> ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
>> machinery to make the kernel's view of time stop anyway, it's just not
>> being invoked here.
>>
>
> Indeed, and I assumed s2idle was designed with those requirements but I
> think I may be wrong especially looking at few points you have raised
> provide my understanding is aligned with yours.
>
>> This is somewhat orthogonal to the issue of PSCI. These machines can't
>> physically support PSCI and KVM at the same time (they do not have EL3),
>> so PSCI is not an option. We will be starting a conversation about how
>> to provide something "like" PSCI over some other sort of transport to
>> solve this soon. So that will "fix" this issue once it's all implemented.
>>
>
> All the best for the efforts.
>
>> However, these machines aren't the only ones without PSCI (search for
>> "spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
>> first).
>
> Yes I am aware of it and if you see arch/arm64/kernel/smp_spin_table.c
> we don't support CPU hotplug or suspend for such a system.

We certainly support s2idle, except it's kind of broken as stated. Try
it, it works :-)

I didn't do anything special to enable s2idle on Apple platforms other
than make sure random drivers weren't broken and there was at least one
driver capable of triggering a wakeup. I just compile with
CONFIG_SUSPEND and s2idle works. Except for the part where
CLOCK_MONOTONIC keeps running. So generic kernels on spin_table
platforms ought to expose (broken) s2idle by default already.

>> It seems broken that Linux currently implements s2idle in such a
>> way that it violates the userspace clock behavior contract on systems
>> without a cpuidle driver (and does so silently, to make it worse).
>
> Just to check if I understand this correctly, are you referring to:
> cpuidle_idle_call()->default_idle_call() if cpuidle_not_available()
> And the problem is it idles there in wfi but CLOCK_MONOTONIC isn't
> stopping as expected by the userspace ? If so, fair enough. If not, I
> may be missing to understand something.

Right. I'm not too certain on the details of exactly what suspend
machinery is running/supposed to, because this CLOCK_MONOTONIC issue was
a surprise to me when it came up. From my point of view s2idle "just
worked", it's only now that this has come up that we're realizing it's
winding up in a very different codepath to what would happen were
cpuidle/PSCI available. This was all silent from the user POV (it all
looks like it suspends/resumes normally as far as I can tell).

>> So that should be fixed regardless of whether we end up coming up with a
>> PSCI alternative or not for these platforms.
>
> If above understanding is correct, I agree. But not sure what was the
> motivation behind the current behaviour.
>
>> There's no fundamental reason why s2idle can't work properly with plain WFI.
>>
>
> Fair enough. I hadn't thought much of it before as most of the platforms
> I have seen or used have at-least one deeper than WFI state these days.
> On arm32, this was common but each platform managed suspend_set_ops
> on its own and probably can do the same s2idle_set_ops.

Yeah, we do have one deeper idle state (and we should figure out how to
implement a PSCI alternative to enable it soon, since in particular for
certain SoCs plain WFI is quite a power hog since it keeps all the core
clusters powered up and at least partially clocked). But since we don't
have that yet, we've been using WFI-only s2idle so users have *some*
suspend ability.

- Hector
Re: s2idle breaks on machines without cpuidle support [ In reply to ]
On 2/8/23 08:45, Hector Martin wrote:
> On 09/02/2023 01.18, Sudeep Holla wrote:
>> On Thu, Feb 09, 2023 at 12:42:17AM +0900, Hector Martin wrote:
>>> On 08/02/2023 19.35, Sudeep Holla wrote:
>>>> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
>>>>> On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
>>>>>>
>>>>>> What do you mean by break ? More details on the observation would be helpful.
>>>>> For example, CLOCK_MONOTONIC doesn't stop even after suspend since
>>>>> these chain of commands don't get called.
>>>>>
>>>>> call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
>>>>>
>>>>> Which in turn causes programs like systemd to crash since it doesn't
>>>>> expect this.
>>>>
>>>> Yes expected IIUC. The per-cpu timers and counters continue to tick in
>>>> WFI and hence CLOCK_MONOTONIC can't stop.
>>>
>>> The hardware counters would also keep ticking in "real" s2idle (with
>>> hypothetical PSCI idle support) and often in full suspend. There is a
>>> flag for this (CLOCK_SOURCE_SUSPEND_NONSTOP) that is set by default for
>>> ARM. So this isn't why CLOCK_MONOTONIC isn't stopping; there is
>>> machinery to make the kernel's view of time stop anyway, it's just not
>>> being invoked here.
>>>
>>
>> Indeed, and I assumed s2idle was designed with those requirements but I
>> think I may be wrong especially looking at few points you have raised
>> provide my understanding is aligned with yours.
>>
>>> This is somewhat orthogonal to the issue of PSCI. These machines can't
>>> physically support PSCI and KVM at the same time (they do not have EL3),
>>> so PSCI is not an option. We will be starting a conversation about how
>>> to provide something "like" PSCI over some other sort of transport to
>>> solve this soon. So that will "fix" this issue once it's all implemented.
>>>
>>
>> All the best for the efforts.
>>
>>> However, these machines aren't the only ones without PSCI (search for
>>> "spin-table" in arch/arm64/boot/dts, this isn't new and these aren't the
>>> first).
>>
>> Yes I am aware of it and if you see arch/arm64/kernel/smp_spin_table.c
>> we don't support CPU hotplug or suspend for such a system.
>
> We certainly support s2idle, except it's kind of broken as stated. Try
> it, it works :-)
>
> I didn't do anything special to enable s2idle on Apple platforms other
> than make sure random drivers weren't broken and there was at least one
> driver capable of triggering a wakeup. I just compile with
> CONFIG_SUSPEND and s2idle works. Except for the part where
> CLOCK_MONOTONIC keeps running. So generic kernels on spin_table
> platforms ought to expose (broken) s2idle by default already.
>
>>> It seems broken that Linux currently implements s2idle in such a
>>> way that it violates the userspace clock behavior contract on systems
>>> without a cpuidle driver (and does so silently, to make it worse).
>>
>> Just to check if I understand this correctly, are you referring to:
>> cpuidle_idle_call()->default_idle_call() if cpuidle_not_available()
>> And the problem is it idles there in wfi but CLOCK_MONOTONIC isn't
>> stopping as expected by the userspace ? If so, fair enough. If not, I
>> may be missing to understand something.
>
> Right. I'm not too certain on the details of exactly what suspend
> machinery is running/supposed to, because this CLOCK_MONOTONIC issue was
> a surprise to me when it came up. From my point of view s2idle "just
> worked", it's only now that this has come up that we're realizing it's
> winding up in a very different codepath to what would happen were
> cpuidle/PSCI available. This was all silent from the user POV (it all
> looks like it suspends/resumes normally as far as I can tell).
>
>>> So that should be fixed regardless of whether we end up coming up with a
>>> PSCI alternative or not for these platforms.
>>
>> If above understanding is correct, I agree. But not sure what was the
>> motivation behind the current behaviour.
>>
>>> There's no fundamental reason why s2idle can't work properly with plain WFI.
>>>
>>
>> Fair enough. I hadn't thought much of it before as most of the platforms
>> I have seen or used have at-least one deeper than WFI state these days.
>> On arm32, this was common but each platform managed suspend_set_ops
>> on its own and probably can do the same s2idle_set_ops.
>
> Yeah, we do have one deeper idle state (and we should figure out how to
> implement a PSCI alternative to enable it soon, since in particular for
> certain SoCs plain WFI is quite a power hog since it keeps all the core
> clusters powered up and at least partially clocked). But since we don't
> have that yet, we've been using WFI-only s2idle so users have *some*
> suspend ability.

Same here. I would caution against assuming that a cpuidle driver is
always available. There are tons of systems out there that only support
WFI as their idle state and in that case there is no backing cpuidle
driver either, that is cpuidle_not_available() returns false. There were
also report on x86 where PSCI is not available:

https://github.com/systemd/systemd/issues/9538
https://bugzilla.kernel.org/show_bug.cgi?id=200595

Having recently been affected in the same way by this bug, it would be
nice if we could do something about it. I tried to do the following:

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 3f8c7867c14c..d021f6ceb352 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -90,9 +90,11 @@ void __cpuidle default_idle_call(void)
if (current_clr_polling_and_test()) {
local_irq_enable();
} else {
+ RCU_NONIDLE(tick_freeze());
stop_critical_timings();
arch_cpu_idle();
start_critical_timings();
+ RCU_NONIDLE(tick_unfreeze());
}
}

but this is not quite working and I am getting tons of warnings from
ktime_get_seconds() early on boot.

Is there a point at which it is safe to call tick_freeze() and
tick_unfreeze()? It is not obvious to me how cpuidle goes about not
hitting those same warnings, short of being initialized later.

Thanks!
--
Florian