Mailing List Archive

[patch] Initialize xen_vcpu0 before initialize irq_ops
Hello,

I find a strange behavior. When a machine is slow (or with many debug
traces or a qemu vm),
a interrupt can occur between the pv_irq_ops initialization and the xen_vcpu[0]
initialization. This lead to a problem because some operations in
xen_irq_ops use
xen_vcpu.
I send you a patch to fix that but I'm not quite sure that is the
right solution.

Regards,
Anthoine

From ac683ad8264f83fa0a5d743e18c0422e43e871d2 Mon Sep 17 00:00:00 2001
From: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
Date: Wed, 23 Nov 2011 19:23:42 +0100
Subject: [PATCH] Initialize xen_vcpu0 before initialize irq_ops.

xen_vcpu is used by
xen_irq_ops.xen_{save_fl,restore_fl,irq_disable,irq_enable} and should
be initialized before xen_init_irq_ops that initialize the pv_irq_ops
with it. If not, a call to those functions could lead to a deference of
NULL pointer. This behaviour was find with a slow machine or qemu.
---
arch/x86/xen/enlighten.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2d69617..a8111a0 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1166,6 +1166,10 @@ asmlinkage void __init xen_start_kernel(void)
*/
xen_setup_stackprotector();

+ /* Don't do the full vcpu_info placement stuff until we have a
+ possible map and a non-dummy shared_info. */
+ per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
+
xen_init_irq_ops();
xen_init_cpuid_mask();

@@ -1207,9 +1211,6 @@ asmlinkage void __init xen_start_kernel(void)
__supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD);

__supported_pte_mask |= _PAGE_IOMAP;
- /* Don't do the full vcpu_info placement stuff until we have a
- possible map and a non-dummy shared_info. */
- per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];

local_irq_disable();
early_boot_irqs_disabled = true;
--
1.7.3.4
Re: [patch] Initialize xen_vcpu0 before initialize irq_ops [ In reply to ]
CC'ing Xen Liunx maintainers, please consult MAINTAINERS or
use ./scripts/get_maintainer.pl.

On Wed, 2011-11-23 at 18:56 +0000, Anthoine Bourgeois wrote:
> Hello,
>
> I find a strange behavior. When a machine is slow (or with many debug
> traces or a qemu vm),
> a interrupt can occur between the pv_irq_ops initialization and the xen_vcpu[0]
> initialization. This lead to a problem because some operations in
> xen_irq_ops use
> xen_vcpu.
> I send you a patch to fix that but I'm not quite sure that is the
> right solution.
>
> Regards,
> Anthoine
>
> From ac683ad8264f83fa0a5d743e18c0422e43e871d2 Mon Sep 17 00:00:00 2001
> From: Anthoine Bourgeois <anthoine.bourgeois@gmail.com>
> Date: Wed, 23 Nov 2011 19:23:42 +0100
> Subject: [PATCH] Initialize xen_vcpu0 before initialize irq_ops.
>
> xen_vcpu is used by
> xen_irq_ops.xen_{save_fl,restore_fl,irq_disable,irq_enable} and should
> be initialized before xen_init_irq_ops that initialize the pv_irq_ops
> with it. If not, a call to those functions could lead to a deference of
> NULL pointer. This behaviour was find with a slow machine or qemu.

Can you provide an example of the stack trace which leads to this
please.

> ---
> arch/x86/xen/enlighten.c | 7 ++++---
> 1 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 2d69617..a8111a0 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1166,6 +1166,10 @@ asmlinkage void __init xen_start_kernel(void)
> */
> xen_setup_stackprotector();
>
> + /* Don't do the full vcpu_info placement stuff until we have a
> + possible map and a non-dummy shared_info. */
> + per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
> +
> xen_init_irq_ops();
> xen_init_cpuid_mask();
>
> @@ -1207,9 +1211,6 @@ asmlinkage void __init xen_start_kernel(void)
> __supported_pte_mask &= ~(_PAGE_PWT | _PAGE_PCD);
>
> __supported_pte_mask |= _PAGE_IOMAP;
> - /* Don't do the full vcpu_info placement stuff until we have a
> - possible map and a non-dummy shared_info. */
> - per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
>
> local_irq_disable();
> early_boot_irqs_disabled = true;

This local_irq_disable is interesting. Aren't IRQs supposed to already
be disabled from entry to xen_start_kernel (really, since start of time)
until at least this point?

Enabling (or disabling) interrupts would require both xen_init_irq_ops()
and xen_vcpu[0] to be setup, so it seems that either interrupts are not
disabled at start of day (I'm fairly sure they are) or there is some
magic code somewhere which does it directly without using the generic
infrastructure (I can't find anything like that).

So where do interrupts get enabled? Is before xen_init_irq_ops really
early enough? I can't find anything between the old and new locations of
this setup code which looks like it would enable them. It is possible
that you just win the race on your slow systems now but that the window
is still there.

Wherever this init goes I expect we would want to pull the
local_irq_disable and early_boot_irqs_disabled stuff along with it.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Initialize xen_vcpu0 before initialize irq_ops [ In reply to ]
2011/11/24 Ian Campbell <Ian.Campbell@citrix.com>:
> CC'ing Xen Liunx maintainers, please consult MAINTAINERS or
> use ./scripts/get_maintainer.pl.
>
> This local_irq_disable is interesting. Aren't IRQs supposed to already
> be disabled from entry to xen_start_kernel (really, since start of time)
> until at least this point?
>
> Enabling (or disabling) interrupts would require both xen_init_irq_ops()
> and xen_vcpu[0] to be setup, so it seems that either interrupts are not
> disabled at start of day (I'm fairly sure they are) or there is some
> magic code somewhere which does it directly without using the generic
> infrastructure (I can't find anything like that).
>
> So where do interrupts get enabled? Is before xen_init_irq_ops really
> early enough? I can't find anything between the old and new locations of
> this setup code which looks like it would enable them. It is possible
> that you just win the race on your slow systems now but that the window
> is still there.

Hum, you're right, there is something strange here.
I don't know why interrupts are enabled. I investigate and come back to
you later. Looks like my bug is elsewhere.
I'll CC Xen Liunx maintainers as you have advised me next time.
Thanks.

--
Anthoine

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel