Mailing List Archive

Regression in 3.1 causes Xen to use wrong idle routine
The following commit changes calls to pm_idle into first trying
cpuidle_call_idle() and if that returns non-zero to fall back to
call pm_idle().

commit a0bfa1373859e9d11dc92561a8667588803e42d8
Author: Len Brown <len.brown@intel.com>
Date: Fri Apr 1 19:34:59 2011 -0400

cpuidle: stop depending on pm_idle

However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled
by cpuidle.off. Which then causes pm_idle() to be called.

This has some bad interaction with the following change that tries to
make use of disabling cpuidle in Xen to fall back to hlt.

commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
Author: Len Brown <len.brown@intel.com>
Date: Fri Apr 1 18:28:35 2011 -0400

cpuidle: replace xen access to x86 pm_idle and default_idle

The problem I see is that select_idle_routine() is called from
arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
problems, but mwait_idle just causes crashes. From the reports I have
this may be related to older hypervisors (3.1 and older) not clearing the mwait
capability. But overall there seems something wrong in the interaction.

I am not really sure whether the logic of calling pm_idle() on all errors from
cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
being able to prevent the wrong idle function by turning cpuidle off is incorrect.
One quick fix could be to add some Xen case into select_idle_routine() which
picks default_idle...

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Wed, Oct 26, 2011 at 12:24:17PM +0200, Stefan Bader wrote:
> The following commit changes calls to pm_idle into first trying
> cpuidle_call_idle() and if that returns non-zero to fall back to
> call pm_idle().
>
> commit a0bfa1373859e9d11dc92561a8667588803e42d8
> Author: Len Brown <len.brown@intel.com>
> Date: Fri Apr 1 19:34:59 2011 -0400
>
> cpuidle: stop depending on pm_idle
>
> However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled
> by cpuidle.off. Which then causes pm_idle() to be called.
>
> This has some bad interaction with the following change that tries to
> make use of disabling cpuidle in Xen to fall back to hlt.
>
> commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
> Author: Len Brown <len.brown@intel.com>
> Date: Fri Apr 1 18:28:35 2011 -0400
>
> cpuidle: replace xen access to x86 pm_idle and default_idle
>
> The problem I see is that select_idle_routine() is called from
> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.

Right, b/c that is what d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 was suppose
to do - " xen scribble on pm_idle and access default_idle,
have it simply disable_cpuidle() so acpi_idle will not load and
architecture default HLT will be used."

But it seems that select_idle_routine() was not thought off.

> In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
> problems, but mwait_idle just causes crashes. From the reports I have
> this may be related to older hypervisors (3.1 and older) not clearing the mwait
> capability. But overall there seems something wrong in the interaction.
>
> I am not really sure whether the logic of calling pm_idle() on all errors from
> cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
> being able to prevent the wrong idle function by turning cpuidle off is incorrect.
> One quick fix could be to add some Xen case into select_idle_routine() which
> picks default_idle...

What about using the cpuidle_disabled() functionality and adhere to that?
As so:

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e7e3b01..1f7f8c8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -14,6 +14,7 @@
#include <linux/utsname.h>
#include <trace/events/power.h>
#include <linux/hw_breakpoint.h>
+#include <linux/cpuidle.h>
#include <asm/cpu.h>
#include <asm/system.h>
#include <asm/apic.h>
@@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
if (pm_idle)
return;

+ if (cpuidle_disabled()) {
+ pm_idle = default_idle;
+ return;
+ }
if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
/*
* One CPU supports mwait => All CPUs supports mwait
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index b51629e..123fe9e 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -122,6 +122,7 @@ struct cpuidle_driver {
};

#ifdef CONFIG_CPU_IDLE
+extern int cpuidle_disabled(void);
extern void disable_cpuidle(void);
extern int cpuidle_idle_call(void);

@@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device *dev);
extern void cpuidle_disable_device(struct cpuidle_device *dev);

#else
+static inline int cpuidle_disabled(void) { return 1; }
static inline void disable_cpuidle(void) { }
static inline int cpuidle_idle_call(void) { return -ENODEV; }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On 26.10.2011 15:30, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 26, 2011 at 12:24:17PM +0200, Stefan Bader wrote:
>> The following commit changes calls to pm_idle into first trying
>> cpuidle_call_idle() and if that returns non-zero to fall back to
>> call pm_idle().
>>
>> commit a0bfa1373859e9d11dc92561a8667588803e42d8
>> Author: Len Brown <len.brown@intel.com>
>> Date: Fri Apr 1 19:34:59 2011 -0400
>>
>> cpuidle: stop depending on pm_idle
>>
>> However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled
>> by cpuidle.off. Which then causes pm_idle() to be called.
>>
>> This has some bad interaction with the following change that tries to
>> make use of disabling cpuidle in Xen to fall back to hlt.
>>
>> commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
>> Author: Len Brown <len.brown@intel.com>
>> Date: Fri Apr 1 18:28:35 2011 -0400
>>
>> cpuidle: replace xen access to x86 pm_idle and default_idle
>>
>> The problem I see is that select_idle_routine() is called from
>> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
>> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
>
> Right, b/c that is what d91ee5863b71e8c90eaf6035bff3078a85e2e7b5 was suppose
> to do - " xen scribble on pm_idle and access default_idle,
> have it simply disable_cpuidle() so acpi_idle will not load and
> architecture default HLT will be used."
>
> But it seems that select_idle_routine() was not thought off.
>
>> In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
>> problems, but mwait_idle just causes crashes. From the reports I have
>> this may be related to older hypervisors (3.1 and older) not clearing the mwait
>> capability. But overall there seems something wrong in the interaction.
>>
>> I am not really sure whether the logic of calling pm_idle() on all errors from
>> cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
>> being able to prevent the wrong idle function by turning cpuidle off is incorrect.
>> One quick fix could be to add some Xen case into select_idle_routine() which
>> picks default_idle...
>
> What about using the cpuidle_disabled() functionality and adhere to that?
> As so:
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index e7e3b01..1f7f8c8 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -14,6 +14,7 @@
> #include <linux/utsname.h>
> #include <trace/events/power.h>
> #include <linux/hw_breakpoint.h>
> +#include <linux/cpuidle.h>
> #include <asm/cpu.h>
> #include <asm/system.h>
> #include <asm/apic.h>
> @@ -587,6 +588,10 @@ void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> if (pm_idle)
> return;
>
> + if (cpuidle_disabled()) {
> + pm_idle = default_idle;
> + return;
> + }
> if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> /*
> * One CPU supports mwait => All CPUs supports mwait
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index b51629e..123fe9e 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -122,6 +122,7 @@ struct cpuidle_driver {
> };
>
> #ifdef CONFIG_CPU_IDLE
> +extern int cpuidle_disabled(void);
> extern void disable_cpuidle(void);
> extern int cpuidle_idle_call(void);
>
> @@ -137,6 +138,7 @@ extern int cpuidle_enable_device(struct cpuidle_device *dev);
> extern void cpuidle_disable_device(struct cpuidle_device *dev);
>
> #else
> +static inline int cpuidle_disabled(void) { return 1; }
> static inline void disable_cpuidle(void) { }
> static inline int cpuidle_idle_call(void) { return -ENODEV; }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

>From reading over it, this should work. Though I would be interested to hear
from the linux-acpi folks. Also to double check that calling pm_idle when
cpuidle.off was specified really is what is intended.

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
> > What about using the cpuidle_disabled() functionality and adhere to that?
> > As so:
> >
.. snip..
>
> >From reading over it, this should work. Though I would be interested to hear
> from the linux-acpi folks. Also to double check that calling pm_idle when
> cpuidle.off was specified really is what is intended.

Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
tested :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:
>>> What about using the cpuidle_disabled() functionality and adhere to that?
>>> As so:
>>>
> .. snip..
>>
>> >From reading over it, this should work. Though I would be interested to hear
>> from the linux-acpi folks. Also to double check that calling pm_idle when
>> cpuidle.off was specified really is what is intended.
>
> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
> tested :-)

I can volunteer to do the testing. But I am lazy enough to hold back a bit as
someone may tell us this is completely the wrong way to fix it. :)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote:
> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:
> >>> What about using the cpuidle_disabled() functionality and adhere to that?
> >>> As so:
> >>>
> > .. snip..
> >>
> >> >From reading over it, this should work. Though I would be interested to hear
> >> from the linux-acpi folks. Also to double check that calling pm_idle when
> >> cpuidle.off was specified really is what is intended.
> >
> > Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
> > tested :-)
>
> I can volunteer to do the testing. But I am lazy enough to hold back a bit as
> someone may tell us this is completely the wrong way to fix it. :)

So the other option is to use 'idle=halt' on the Linux command line. That should
provide the workaround for the folks reporting this (is there a BZ for it?).

At least until a good solution is hammered out.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote:
>> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:
>>>>> What about using the cpuidle_disabled() functionality and adhere to that?
>>>>> As so:
>>>>>
>>> .. snip..
>>>>
>>>> >From reading over it, this should work. Though I would be interested to hear
>>>> from the linux-acpi folks. Also to double check that calling pm_idle when
>>>> cpuidle.off was specified really is what is intended.
>>>
>>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
>>> tested :-)
>>
>> I can volunteer to do the testing. But I am lazy enough to hold back a bit as
>> someone may tell us this is completely the wrong way to fix it. :)
>
> So the other option is to use 'idle=halt' on the Linux command line. That should
> provide the workaround for the folks reporting this (is there a BZ for it?).
>
Believe upstream bz is still down. Got the issue reported here, though:

http://bugs.launchpad.net/bugs/881076

Hm, as a workaround probably. I guess it could be added when the test images are
done. Unfortunately, when running your image in the cloud it is kind of hard to
recover. Even more as you could have tested on an AMD based host.

Wonder whether it would be an option to automatically mask the mwait capability
off when running in paravirt mode...

-Stefan

> At least until a good solution is hammered out.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Thu, Nov 10, 2011 at 09:33:06AM +0100, Stefan Bader wrote:
> On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote:
> > On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote:
> >> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:
> >>>>> What about using the cpuidle_disabled() functionality and adhere to that?
> >>>>> As so:
> >>>>>
> >>> .. snip..
> >>>>
> >>>> >From reading over it, this should work. Though I would be interested to hear
> >>>> from the linux-acpi folks. Also to double check that calling pm_idle when
> >>>> cpuidle.off was specified really is what is intended.
> >>>
> >>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
> >>> tested :-)
> >>
> >> I can volunteer to do the testing. But I am lazy enough to hold back a bit as
> >> someone may tell us this is completely the wrong way to fix it. :)
> >
> > So the other option is to use 'idle=halt' on the Linux command line. That should
> > provide the workaround for the folks reporting this (is there a BZ for it?).
> >
> Believe upstream bz is still down. Got the issue reported here, though:
>
> http://bugs.launchpad.net/bugs/881076

Ok, added that to the patch description.
>
> Hm, as a workaround probably. I guess it could be added when the test images are
> done. Unfortunately, when running your image in the cloud it is kind of hard to
> recover. Even more as you could have tested on an AMD based host.

Sadly my AMD boxes did not explode when I tested Len's patches :-(.
>
> Wonder whether it would be an option to automatically mask the mwait capability
> off when running in paravirt mode...

We could, but why not strive to do it correctly first.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On 10.11.2011 15:55, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 10, 2011 at 09:33:06AM +0100, Stefan Bader wrote:
>> On 09.11.2011 18:58, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Oct 26, 2011 at 03:57:15PM +0200, Stefan Bader wrote:
>>>> On 26.10.2011 15:48, Konrad Rzeszutek Wilk wrote:
>>>>>>> What about using the cpuidle_disabled() functionality and adhere to that?
>>>>>>> As so:
>>>>>>>
>>>>> .. snip..
>>>>>>
>>>>>> >From reading over it, this should work. Though I would be interested to hear
>>>>>> from the linux-acpi folks. Also to double check that calling pm_idle when
>>>>>> cpuidle.off was specified really is what is intended.
>>>>>
>>>>> Oh yeah, definitly need the input from linux-acpi folks. And also to be actually
>>>>> tested :-)
>>>>
>>>> I can volunteer to do the testing. But I am lazy enough to hold back a bit as
>>>> someone may tell us this is completely the wrong way to fix it. :)
>>>
>>> So the other option is to use 'idle=halt' on the Linux command line. That should
>>> provide the workaround for the folks reporting this (is there a BZ for it?).
>>>
>> Believe upstream bz is still down. Got the issue reported here, though:
>>
>> http://bugs.launchpad.net/bugs/881076
>
> Ok, added that to the patch description.
>>
>> Hm, as a workaround probably. I guess it could be added when the test images are
>> done. Unfortunately, when running your image in the cloud it is kind of hard to
>> recover. Even more as you could have tested on an AMD based host.
>

> Sadly my AMD boxes did not explode when I tested Len's patches :-(.

No, to my knowledge AMD CPUs do not have mwait. So the only alternate idle to
happen would be the e400 aware one (which is basically default_idle with
conditionally programming a broadcast timer interrupt to get out of c1e). I saw
this happen on my box, too. And there was no visible badness caused by it.

>>
>> Wonder whether it would be an option to automatically mask the mwait capability
>> off when running in paravirt mode...
>

> We could, but why not strive to do it correctly first.

Just seemed (from looking at those replies about wanting mwait idle even when
cpuidle is disabled) that there is a slight controversy about what correctly is.
I personally would also think that a state of cpuidle disabled means no special
idle function. But that does not have to be the correct interpretation.

So I was reasoning that if the mwait function gets selected because the caps say
its available, but it does not work in paravirt, then maybe claiming its
availability is wrong as well.

Of course there is also the initial question about calling a generic function
and if that fails call the idle function hook directly. Which, because having
cpuidle disabled is returned as an error, ended up calling the hook directly
again. I really have trouble understanding the reasoning there...

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Wed, Oct 26, 2011 at 6:24 AM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> The following commit changes calls to pm_idle into first trying
> cpuidle_call_idle() and if that returns non-zero to fall back to
> call pm_idle().
>
> commit a0bfa1373859e9d11dc92561a8667588803e42d8
> Author: Len Brown <len.brown@intel.com>
> Date:   Fri Apr 1 19:34:59 2011 -0400
>
>    cpuidle: stop depending on pm_idle
>
> However cpuidle_call_idle() will return -ENODEV if it is supposed to be disabled
> by cpuidle.off. Which then causes pm_idle() to be called.
>
> This has some bad interaction with the following change that tries to
> make use of disabling cpuidle in Xen to fall back to hlt.
>
> commit d91ee5863b71e8c90eaf6035bff3078a85e2e7b5
> Author: Len Brown <len.brown@intel.com>
> Date:   Fri Apr 1 18:28:35 2011 -0400
>
>    cpuidle: replace xen access to x86 pm_idle and default_idle
>
> The problem I see is that select_idle_routine() is called from
> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
> problems, but mwait_idle just causes crashes. From the reports I have
> this may be related to older hypervisors (3.1 and older) not clearing the mwait
> capability. But overall there seems something wrong in the interaction.

Why is Xen advertising X86_FEATURE_MWAIT and then crashing
when the dom0 (or other guests) use what it advertises?

What versions of Xen have this bug?

> I am not really sure whether the logic of calling pm_idle() on all errors from
> cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
> being able to prevent the wrong idle function by turning cpuidle off is incorrect.

The patches above appear to be operating as intended.
What wasn't expected, was that some version of Xen is deployed that
advertises the MWAIT feature, but crashes when it is used.

> One quick fix could be to add some Xen case into select_idle_routine() which
> picks default_idle...

No.

Working around this Xen bug for a newly compiled Dom0 is insufficient.

All guests that also look for MWAIT support w/o asking ACPI
(ie. all versions of Linux that use intel_idle, such as the last few
Fedora's, RHEL, SLES etc.)
will trip over the same Xen bug, even if Dom0 doesn't.

Xen must not advertises MWAIT support if it doesn't have MWAIT support.

thanks,
Len Brown, Intel Open Source Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote:

>> The problem I see is that select_idle_routine() is called from
>> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
>> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
>> In testing it seem amd_e400_idle in PVM domU at least does not immediately
>> cause
>> problems, but mwait_idle just causes crashes. From the reports I have
>> this may be related to older hypervisors (3.1 and older) not clearing the
>> mwait
>> capability. But overall there seems something wrong in the interaction.
>
> Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> when the dom0 (or other guests) use what it advertises?
>
> What versions of Xen have this bug?

Xen doesn't advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the
kernel via the cpuid pv_ops hook. This would probably be because Xen is
relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself
*can* use.

-- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
Hey Len,

> > The problem I see is that select_idle_routine() is called from
> > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> > In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
> > problems, but mwait_idle just causes crashes. From the reports I have
> > this may be related to older hypervisors (3.1 and older) not clearing the mwait
> > capability. But overall there seems something wrong in the interaction.
>
> Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> when the dom0 (or other guests) use what it advertises?

The only case where I've seen this is with Amazon EC2. The other
newer hypervisors (4.1.1 and such) do not trigger this.

>
> What versions of Xen have this bug?

Whatever Amazon is using. I think they are RHEL5 based hypervisor.

>
> > I am not really sure whether the logic of calling pm_idle() on all errors from
> > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
> > being able to prevent the wrong idle function by turning cpuidle off is incorrect.
>
> The patches above appear to be operating as intended.
> What wasn't expected, was that some version of Xen is deployed that
> advertises the MWAIT feature, but crashes when it is used.

How does that work with AMD? On those machines it ends up calling
amd_e400_idle instead of the default_idle. Granted it does not "BUG" out
but it does lead to extra trap-n-emulate (the MSR operation) in to the hypervisor
which is not good.

>
> > One quick fix could be to add some Xen case into select_idle_routine() which
> > picks default_idle...
>
> No.
>
> Working around this Xen bug for a newly compiled Dom0 is insufficient.
>
> All guests that also look for MWAIT support w/o asking ACPI
> (ie. all versions of Linux that use intel_idle, such as the last few
> Fedora's, RHEL, SLES etc.)
> will trip over the same Xen bug, even if Dom0 doesn't.
>
> Xen must not advertises MWAIT support if it doesn't have MWAIT support.

How does work out when we figure MWAIT support from the CPUID?
Or are you saying that it is correct - if the CPU advertises it, then yes
advertise it to the Linux kernel?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Sun, Nov 13, 2011 at 04:59:10PM +0000, Keir Fraser wrote:
> On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote:
>
> >> The problem I see is that select_idle_routine() is called from
> >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> >> In testing it seem amd_e400_idle in PVM domU at least does not immediately
> >> cause
> >> problems, but mwait_idle just causes crashes. From the reports I have
> >> this may be related to older hypervisors (3.1 and older) not clearing the
> >> mwait
> >> capability. But overall there seems something wrong in the interaction.
> >
> > Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> > when the dom0 (or other guests) use what it advertises?
> >
> > What versions of Xen have this bug?
>
> Xen doesn't advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the
> kernel via the cpuid pv_ops hook. This would probably be because Xen is

I can't seem to find anything in there advertising the MWAIT feature.

> relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself
> *can* use.

The Cx/Px patches that would parse the Cx/Px and then percolate those up
to the hypervisor are in mainline. So that is not it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Mon, Nov 14, 2011 at 09:31:24AM -0500, Konrad Rzeszutek Wilk wrote:
> Hey Len,
>
> > > The problem I see is that select_idle_routine() is called from
> > > arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> > > anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> > > In testing it seem amd_e400_idle in PVM domU at least does not immediately cause
> > > problems, but mwait_idle just causes crashes. From the reports I have
> > > this may be related to older hypervisors (3.1 and older) not clearing the mwait
> > > capability. But overall there seems something wrong in the interaction.
> >
> > Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> > when the dom0 (or other guests) use what it advertises?
>
> The only case where I've seen this is with Amazon EC2. The other
> newer hypervisors (4.1.1 and such) do not trigger this.
>
> >
> > What versions of Xen have this bug?
>
> Whatever Amazon is using. I think they are RHEL5 based hypervisor.

Vincent,

Would you have ideas of what might be happening? It looks as if some Amazon
instances are advertisting the MWAIT flag but not really supporting it. Any ideas
of what might be going on?

>
> >
> > > I am not really sure whether the logic of calling pm_idle() on all errors from
> > > cpuidle_call_idle() is already flawed or the assumption in the Xen patch about
> > > being able to prevent the wrong idle function by turning cpuidle off is incorrect.
> >
> > The patches above appear to be operating as intended.
> > What wasn't expected, was that some version of Xen is deployed that
> > advertises the MWAIT feature, but crashes when it is used.
>
> How does that work with AMD? On those machines it ends up calling
> amd_e400_idle instead of the default_idle. Granted it does not "BUG" out
> but it does lead to extra trap-n-emulate (the MSR operation) in to the hypervisor
> which is not good.

.. snip..
> > Working around this Xen bug for a newly compiled Dom0 is insufficient.

It looks to affect PV guests as well.

> >
> > All guests that also look for MWAIT support w/o asking ACPI
> > (ie. all versions of Linux that use intel_idle, such as the last few
> > Fedora's, RHEL, SLES etc.)
> > will trip over the same Xen bug, even if Dom0 doesn't.

Don't think I answered this one:

The intel_idle has not been used in those cases b/c the default_idle
had been used instead.

> >
> > Xen must not advertises MWAIT support if it doesn't have MWAIT support.
>
> How does work out when we figure MWAIT support from the CPUID?
> Or are you saying that it is correct - if the CPU advertises it, then yes
> advertise it to the Linux kernel?
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: Regression in 3.1 causes Xen to use wrong idle routine [ In reply to ]
On Mon, Nov 14, 2011 at 01:19:01PM -0500, Konrad Rzeszutek Wilk wrote:
> On Sun, Nov 13, 2011 at 04:59:10PM +0000, Keir Fraser wrote:
> > On 13/11/2011 03:46, "Len Brown" <lenb@kernel.org> wrote:
> >
> > >> The problem I see is that select_idle_routine() is called from
> > >> arch/x86/kernel/cpu/common.c and since Xen setup does not set pm_idle
> > >> anymore, it can cause mwait_idle or amd_e400_idle functions to be selected.
> > >> In testing it seem amd_e400_idle in PVM domU at least does not immediately
> > >> cause
> > >> problems, but mwait_idle just causes crashes. From the reports I have
> > >> this may be related to older hypervisors (3.1 and older) not clearing the
> > >> mwait
> > >> capability. But overall there seems something wrong in the interaction.
> > >
> > > Why is Xen advertising X86_FEATURE_MWAIT and then crashing
> > > when the dom0 (or other guests) use what it advertises?
> > >
> > > What versions of Xen have this bug?
> >
> > Xen doesn't advertise MWAIT. Possibly Xen-pv_ops is lying to the rest of the
> > kernel via the cpuid pv_ops hook. This would probably be because Xen is
>
> I can't seem to find anything in there advertising the MWAIT feature.
>
> > relying on the OSPM in dom0 kernel to parse out Cx/Px info which Xen itself
> > *can* use.
>
> The Cx/Px patches that would parse the Cx/Px and then percolate those up
> to the hypervisor are in mainline. So that is not it.

<sigh> I meant to say "are _not_ in mainline". Sorry for the confusion.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel