Mailing List Archive

Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted
On 23/01/15 00:29, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Xen has support for splitting heavy work work into a series
> of hypercalls, called multicalls, and preempting them through
> what Xen calls continuation [0]. Despite this though without
> CONFIG_PREEMPT preemption won't happen, without preemption
> a system can become pretty useless on heavy handed hypercalls.
> Such is the case for example when creating a > 50 GiB HVM guest,
> we can get softlockups [1] with:.
>
> kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
>
> The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check
> (default 120 seconds), on the Xen side in this particular case
> this happens when the following Xen hypervisor code is used:
>
> xc_domain_set_pod_target() -->
> do_memory_op() -->
> arch_memory_op() -->
> p2m_pod_set_mem_target()
> -- long delay (real or emulated) --
>
> This happens on arch_memory_op() on the XENMEM_set_pod_target memory
> op even though arch_memory_op() can handle continuation via
> hypercall_create_continuation() for example.
>
> Machines over 50 GiB of memory are on high demand and hard to come
> by so to help replicate this sort of issue long delays on select
> hypercalls have been emulated in order to be able to test this on
> smaller machines [2].
>
> On one hand this issue can be considered as expected given that
> CONFIG_PREEMPT=n is used however we have forced voluntary preemption
> precedent practices in the kernel even for CONFIG_PREEMPT=n through
> the usage of cond_resched() sprinkled in many places. To address
> this issue with Xen hypercalls though we need to find a way to aid
> to the schedular in the middle of hypercalls. We are motivated to
> address this issue on CONFIG_PREEMPT=n as otherwise the system becomes
> rather unresponsive for long periods of time; in the worst case, at least
> only currently by emulating long delays on select io disk bound
> hypercalls, this can lead to filesystem corruption if the delay happens
> for example on SCHEDOP_remote_shutdown (when we call 'xl <domain> shutdown').
>
> We can address this problem by trying to check if we should schedule
> on the xen timer in the middle of a hypercall on the return from the
> timer interrupt. We want to be careful to not always force voluntary
> preemption though so to do this we only selectively enable preemption
> on very specific xen hypercalls.
[...]
> @@ -1243,6 +1247,25 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
> set_irq_regs(old_regs);
> }
>
> +/*
> + * CONFIG_PREEMPT=n kernels can end up triggering the softlock
> + * TASK_UNINTERRUPTIBLE hanger check (default 120 seconds)
> + * when certain multicalls are used [0] on large systems, in
> + * that case we need a way to voluntarily preempt. This is
> + * only an issue on CONFIG_PREEMPT=n kernels.

Rewrite this comment as;

* Some hypercalls issued by the toolstack can take many 10s of
* seconds. Allow tasks running hypercalls via the privcmd driver to be
* voluntarily preempted even if full kernel preemption is disabled.

> + * [0] https://bugzilla.novell.com/show_bug.cgi?id=861093

This link isn't accessible so I don't think it should be included here.

> + */
> +void xen_end_upcall(struct pt_regs *regs)
> +{
> + if (xen_is_preemptible_hypercall(regs)) {
> + int cpuid = smp_processor_id();
> + if (_cond_resched())
> + trace_xen_hypercall_preemption(cpuid);

I don't think a tracepoint here is useful.

> + }
> +}
> +NOKPROBE_SYMBOL(xen_end_upcall);

Do we need this is this function is no longer notrace?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted [ In reply to ]
On Fri, Jan 23, 2015 at 11:45:06AM +0000, David Vrabel wrote:
> On 23/01/15 00:29, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Xen has support for splitting heavy work work into a series
> > of hypercalls, called multicalls, and preempting them through
> > what Xen calls continuation [0]. Despite this though without
> > CONFIG_PREEMPT preemption won't happen, without preemption
> > a system can become pretty useless on heavy handed hypercalls.
> > Such is the case for example when creating a > 50 GiB HVM guest,
> > we can get softlockups [1] with:.
> >
> > kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
> >
> > The softlock up triggers on the TASK_UNINTERRUPTIBLE hanger check
> > (default 120 seconds), on the Xen side in this particular case
> > this happens when the following Xen hypervisor code is used:
> >
> > xc_domain_set_pod_target() -->
> > do_memory_op() -->
> > arch_memory_op() -->
> > p2m_pod_set_mem_target()
> > -- long delay (real or emulated) --
> >
> > This happens on arch_memory_op() on the XENMEM_set_pod_target memory
> > op even though arch_memory_op() can handle continuation via
> > hypercall_create_continuation() for example.
> >
> > Machines over 50 GiB of memory are on high demand and hard to come
> > by so to help replicate this sort of issue long delays on select
> > hypercalls have been emulated in order to be able to test this on
> > smaller machines [2].
> >
> > On one hand this issue can be considered as expected given that
> > CONFIG_PREEMPT=n is used however we have forced voluntary preemption
> > precedent practices in the kernel even for CONFIG_PREEMPT=n through
> > the usage of cond_resched() sprinkled in many places. To address
> > this issue with Xen hypercalls though we need to find a way to aid
> > to the schedular in the middle of hypercalls. We are motivated to
> > address this issue on CONFIG_PREEMPT=n as otherwise the system becomes
> > rather unresponsive for long periods of time; in the worst case, at least
> > only currently by emulating long delays on select io disk bound
> > hypercalls, this can lead to filesystem corruption if the delay happens
> > for example on SCHEDOP_remote_shutdown (when we call 'xl <domain> shutdown').
> >
> > We can address this problem by trying to check if we should schedule
> > on the xen timer in the middle of a hypercall on the return from the
> > timer interrupt. We want to be careful to not always force voluntary
> > preemption though so to do this we only selectively enable preemption
> > on very specific xen hypercalls.
> [...]
> > @@ -1243,6 +1247,25 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
> > set_irq_regs(old_regs);
> > }
> >
> > +/*
> > + * CONFIG_PREEMPT=n kernels can end up triggering the softlock
> > + * TASK_UNINTERRUPTIBLE hanger check (default 120 seconds)
> > + * when certain multicalls are used [0] on large systems, in
> > + * that case we need a way to voluntarily preempt. This is
> > + * only an issue on CONFIG_PREEMPT=n kernels.
>
> Rewrite this comment as;
>
> * Some hypercalls issued by the toolstack can take many 10s of

Its not just hypercalls though, this is all about the interactions
with multicalls no?

> * seconds. Allow tasks running hypercalls via the privcmd driver to be
> * voluntarily preempted even if full kernel preemption is disabled.
>
> > + * [0] https://bugzilla.novell.com/show_bug.cgi?id=861093
>
> This link isn't accessible so I don't think it should be included here.

OK.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted [ In reply to ]
On Fri, Jan 23, 2015 at 11:45:06AM +0000, David Vrabel wrote:
> > + */
> > +void xen_end_upcall(struct pt_regs *regs)
> > +{
> > + if (xen_is_preemptible_hypercall(regs)) {
> > + int cpuid = smp_processor_id();
> > + if (_cond_resched())
> > + trace_xen_hypercall_preemption(cpuid);
>
> I don't think a tracepoint here is useful.

OK.. I'll remove.

> > + }
> > +}
> > +NOKPROBE_SYMBOL(xen_end_upcall);
>
> Do we need this is this function is no longer notrace?

Stephen and Andy were going down some corner case rabbit hole
and it seemed to me that the conclusion was not settled so
to be safe I kept it. I'll let them decide. I did remove
the notrace junk.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted [ In reply to ]
>>> On 23.01.15 at 19:58, <mcgrof@suse.com> wrote:
> On Fri, Jan 23, 2015 at 11:45:06AM +0000, David Vrabel wrote:
>> On 23/01/15 00:29, Luis R. Rodriguez wrote:
>> > @@ -1243,6 +1247,25 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
>> > set_irq_regs(old_regs);
>> > }
>> >
>> > +/*
>> > + * CONFIG_PREEMPT=n kernels can end up triggering the softlock
>> > + * TASK_UNINTERRUPTIBLE hanger check (default 120 seconds)
>> > + * when certain multicalls are used [0] on large systems, in
>> > + * that case we need a way to voluntarily preempt. This is
>> > + * only an issue on CONFIG_PREEMPT=n kernels.
>>
>> Rewrite this comment as;
>>
>> * Some hypercalls issued by the toolstack can take many 10s of
>
> Its not just hypercalls though, this is all about the interactions
> with multicalls no?

multicalls are just a special case of hypercalls.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [RFC v4 2/2] x86/xen: allow privcmd hypercalls to be preempted [ In reply to ]
On 23/01/15 18:58, Luis R. Rodriguez wrote:
>
> Its not just hypercalls though, this is all about the interactions
> with multicalls no?

No. This applies to any preemptible hypercall and the toolstack doesn't
use multicalls for most of its work.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/