Mailing List Archive: Need help with fixing the Xen waitqueue feature

Need help with fixing the Xen waitqueue feature

Nov 8, 2011, 1:20 PM

Post #1 of 49 (807 views)

The patch 'mem_event: use wait queue when ring is full' I just sent out
makes use of the waitqueue feature. There are two issues I get with the
change applied:

I think I got the logic right, and in my testing vcpu->pause_count drops
to zero in p2m_mem_paging_resume(). But for some reason the vcpu does
not make progress after the first wakeup. In my debugging there is one
wakeup, the ring is still full, but further wakeups dont happen.
The fully decoded xentrace output may provide some hints about the
underlying issue. But its hard to get due to the second issue.

Another thing is that sometimes the host suddenly reboots without any
message. I think the reason for this is that a vcpu whose stack was put
aside and that was later resumed may find itself on another physical
cpu. And if that happens, wouldnt that invalidate some of the local
variables back in the callchain? If some of them point to the old
physical cpu, how could this be fixed? Perhaps a few "volatiles" are
needed in some places.

I will check wether pinning the guests vcpus to physical cpus actually
avoids the sudden reboots.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

keir.xen at gmail

Nov 8, 2011, 2:05 PM

Post #2 of 49 (770 views)

On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:

> Another thing is that sometimes the host suddenly reboots without any
> message. I think the reason for this is that a vcpu whose stack was put
> aside and that was later resumed may find itself on another physical
> cpu. And if that happens, wouldnt that invalidate some of the local
> variables back in the callchain? If some of them point to the old
> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
> needed in some places.

>From how many call sites can we end up on a wait queue? I know we were going
to end up with a small and explicit number (e.g., in __hvm_copy()) but does
this patch make it a more generally-used mechanism? There will unavoidably
be many constraints on callers who want to be able to yield the cpu. We can
add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually
I don't think it's *that* common that hypercall contexts cache things like
per-cpu pointers. But every caller will need auditing, I expect.

A sudden reboot is very extreme. No message even on a serial line? That most
commonly indicates bad page tables. Most other bugs you'd at least get a
double fault message.

-- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 8, 2011, 2:20 PM

Post #3 of 49 (769 views)

On Tue, Nov 08, Keir Fraser wrote:

> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
>
> > Another thing is that sometimes the host suddenly reboots without any
> > message. I think the reason for this is that a vcpu whose stack was put
> > aside and that was later resumed may find itself on another physical
> > cpu. And if that happens, wouldnt that invalidate some of the local
> > variables back in the callchain? If some of them point to the old
> > physical cpu, how could this be fixed? Perhaps a few "volatiles" are
> > needed in some places.
>
> From how many call sites can we end up on a wait queue? I know we were going
> to end up with a small and explicit number (e.g., in __hvm_copy()) but does
> this patch make it a more generally-used mechanism? There will unavoidably
> be many constraints on callers who want to be able to yield the cpu. We can
> add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually
> I don't think it's *that* common that hypercall contexts cache things like
> per-cpu pointers. But every caller will need auditing, I expect.

I havent started to audit the callers. In my testing
mem_event_put_request() is called from p2m_mem_paging_drop_page() and
p2m_mem_paging_populate(). The latter is called from more places.

My plan is to put the sleep into ept_get_entry(), but I'm not there yet.
First I want to test waitqueues in a rather simple code path like
mem_event_put_request().

> A sudden reboot is very extreme. No message even on a serial line? That most
> commonly indicates bad page tables. Most other bugs you'd at least get a
> double fault message.

There is no output on serial, I boot with this cmdline:
vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all
sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin
dom0_max_vcpus=2
My base changeset is 24003, the testhost is a Xeon X5670 @ 2.93GHz.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

keir.xen at gmail

Nov 8, 2011, 2:54 PM

Post #4 of 49 (763 views)

On 08/11/2011 22:20, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Nov 08, Keir Fraser wrote:
>
>> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
>>
>>> Another thing is that sometimes the host suddenly reboots without any
>>> message. I think the reason for this is that a vcpu whose stack was put
>>> aside and that was later resumed may find itself on another physical
>>> cpu. And if that happens, wouldnt that invalidate some of the local
>>> variables back in the callchain? If some of them point to the old
>>> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
>>> needed in some places.
>>
>> From how many call sites can we end up on a wait queue? I know we were going
>> to end up with a small and explicit number (e.g., in __hvm_copy()) but does
>> this patch make it a more generally-used mechanism? There will unavoidably
>> be many constraints on callers who want to be able to yield the cpu. We can
>> add Linux-style get_cpu/put_cpu abstractions to catch some of them. Actually
>> I don't think it's *that* common that hypercall contexts cache things like
>> per-cpu pointers. But every caller will need auditing, I expect.
>
> I havent started to audit the callers. In my testing
> mem_event_put_request() is called from p2m_mem_paging_drop_page() and
> p2m_mem_paging_populate(). The latter is called from more places.

Tbh I wonder anyway whether stale hypercall context would be likely to cause
a silent machine reboot. Booting with max_cpus=1 would eliminate moving
between CPUs as a cause of inconsistencies, or pin the guest under test.
Another problem could be sleeping with locks held, but we do test for that
(in debug builds at least) and I'd expect crash/hang rather than silent
reboot. Another problem could be if the vcpu has its own state in an
inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
which then is attempted to be restored during a waitqueue wakeup. That could
certainly cause a reboot, but I don't know of an example where this might
happen.

-- Keir

> My plan is to put the sleep into ept_get_entry(), but I'm not there yet.
> First I want to test waitqueues in a rather simple code path like
> mem_event_put_request().
>
>> A sudden reboot is very extreme. No message even on a serial line? That most
>> commonly indicates bad page tables. Most other bugs you'd at least get a
>> double fault message.
>
> There is no output on serial, I boot with this cmdline:
> vga=mode-normal console=com1 com1=57600 loglvl=all guest_loglvl=all
> sync_console conring_size=123456 maxcpus=8 dom0_vcpus_pin
> dom0_max_vcpus=2
> My base changeset is 24003, the testhost is a Xeon X5670 @ 2.93GHz.
>
> Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 8, 2011, 7:37 PM

Post #5 of 49 (763 views)

Olaf,
are waitqueue's on the mem-event ring meant to be the way to deal with
ring exhaustion? i.e. is this meant to go beyond a testing vehicle for
waitqueue's?

With the pager itself generating events, and foreign mappings generating
events, you'll end up putting dom0 vcpu's in a waitqueue. This will
basically deadlock the host.

Am I missing something here?
Andres

> Date: Tue, 8 Nov 2011 22:20:24 +0100
> From: Olaf Hering <olaf@aepfle.de>
> Subject: [Xen-devel] Need help with fixing the Xen waitqueue feature
> To: xen-devel@lists.xensource.com
> Message-ID: <20111108212024.GA5276@aepfle.de>
> Content-Type: text/plain; charset=utf-8
>
>
> The patch 'mem_event: use wait queue when ring is full' I just sent out
> makes use of the waitqueue feature. There are two issues I get with the
> change applied:
>
> I think I got the logic right, and in my testing vcpu->pause_count drops
> to zero in p2m_mem_paging_resume(). But for some reason the vcpu does
> not make progress after the first wakeup. In my debugging there is one
> wakeup, the ring is still full, but further wakeups dont happen.
> The fully decoded xentrace output may provide some hints about the
> underlying issue. But its hard to get due to the second issue.
>
> Another thing is that sometimes the host suddenly reboots without any
> message. I think the reason for this is that a vcpu whose stack was put
> aside and that was later resumed may find itself on another physical
> cpu. And if that happens, wouldnt that invalidate some of the local
> variables back in the callchain? If some of them point to the old
> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
> needed in some places.
>
> I will check wether pinning the guests vcpus to physical cpus actually
> avoids the sudden reboots.
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 8, 2011, 7:52 PM

Post #6 of 49 (763 views)

> Date: Tue, 08 Nov 2011 22:05:41 +0000
> From: Keir Fraser <keir.xen@gmail.com>
> Subject: Re: [Xen-devel] Need help with fixing the Xen waitqueue
> feature
> To: Olaf Hering <olaf@aepfle.de>, <xen-devel@lists.xensource.com>
> Message-ID: <CADF5835.245E1%keir.xen@gmail.com>
> Content-Type: text/plain; charset="US-ASCII"
>
> On 08/11/2011 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:
>
>> Another thing is that sometimes the host suddenly reboots without any
>> message. I think the reason for this is that a vcpu whose stack was put
>> aside and that was later resumed may find itself on another physical
>> cpu. And if that happens, wouldnt that invalidate some of the local
>> variables back in the callchain? If some of them point to the old
>> physical cpu, how could this be fixed? Perhaps a few "volatiles" are
>> needed in some places.
>
>>From how many call sites can we end up on a wait queue? I know we were
>> going
> to end up with a small and explicit number (e.g., in __hvm_copy()) but
> does
> this patch make it a more generally-used mechanism? There will unavoidably
> be many constraints on callers who want to be able to yield the cpu. We
> can
> add Linux-style get_cpu/put_cpu abstractions to catch some of them.
> Actually
> I don't think it's *that* common that hypercall contexts cache things like
> per-cpu pointers. But every caller will need auditing, I expect.

Tbh, for paging to be effective, we need to be prepared to yield on every
p2m lookup.

Let's compare paging to PoD. They're essentially the same thing: pages
disappear, and get allocated on the fly when you need them. PoD is a
highly optimized in-hypervisor optimization that does not need a
user-space helper -- but the pager could do PoD easily and remove all that
p2m-pod.c code from the hypervisor.

PoD only introduces extraneous side-effects when there is a complete
absence of memory to allocate pages. The same cannot be said of paging, to
put it mildly. It returns EINVAL all over the place. Right now, qemu can
be crashed in a blink by paging out the right gfn.

To get paging to where PoD is, all these situations need to be handled in
a manner other than returning EINVAL. That means putting the vcpu on a
waitqueue on every location p2m_pod_demand_populate is called, not just
__hvm_copy.

I don't know that that's gonna be altogether doable. Many of these gfn
lookups happen in atomic contexts, not to mention cpu-specific pointers.
But at least we should aim for that.

Andres
>
> A sudden reboot is very extreme. No message even on a serial line? That
> most
> commonly indicates bad page tables. Most other bugs you'd at least get a
> double fault message.
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 8, 2011, 11:02 PM

Post #7 of 49 (764 views)

On Tue, Nov 08, Andres Lagar-Cavilla wrote:

> Olaf,
> are waitqueue's on the mem-event ring meant to be the way to deal with
> ring exhaustion? i.e. is this meant to go beyond a testing vehicle for
> waitqueue's?

Putting the guest to sleep when the ring is full is at least required
for p2m_mem_paging_drop_page(), so that the page gets informed about all
gfns from decrease_reservation.

> With the pager itself generating events, and foreign mappings generating
> events, you'll end up putting dom0 vcpu's in a waitqueue. This will
> basically deadlock the host.

Those vcpus can not go to sleep and my change handles that case.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 8, 2011, 11:09 PM

Post #8 of 49 (764 views)

On Tue, Nov 08, Andres Lagar-Cavilla wrote:

> Tbh, for paging to be effective, we need to be prepared to yield on every
> p2m lookup.

Yes, if a gfn is missing the vcpu should go to sleep rather than
returning -ENOENT to the caller. Only the query part of gfn_to_mfn
should return the p2m paging types.

> Let's compare paging to PoD. They're essentially the same thing: pages
> disappear, and get allocated on the fly when you need them. PoD is a
> highly optimized in-hypervisor optimization that does not need a
> user-space helper -- but the pager could do PoD easily and remove all that
> p2m-pod.c code from the hypervisor.

Perhaps PoD and paging could be merged, I havent had time to study the
PoD code.

> PoD only introduces extraneous side-effects when there is a complete
> absence of memory to allocate pages. The same cannot be said of paging, to
> put it mildly. It returns EINVAL all over the place. Right now, qemu can
> be crashed in a blink by paging out the right gfn.

I have seen qemu crashes when using emulated storage, but havent
debugged them yet. I suspect they were caused by a race between nominate
and evict.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 9, 2011, 1:21 PM

Post #9 of 49 (762 views)

Hi there,
> On Tue, Nov 08, Andres Lagar-Cavilla wrote:
>
>> Tbh, for paging to be effective, we need to be prepared to yield on
>> every
>> p2m lookup.
>
> Yes, if a gfn is missing the vcpu should go to sleep rather than
> returning -ENOENT to the caller. Only the query part of gfn_to_mfn
> should return the p2m paging types.
>
>> Let's compare paging to PoD. They're essentially the same thing: pages
>> disappear, and get allocated on the fly when you need them. PoD is a
>> highly optimized in-hypervisor optimization that does not need a
>> user-space helper -- but the pager could do PoD easily and remove all
>> that
>> p2m-pod.c code from the hypervisor.
>
> Perhaps PoD and paging could be merged, I havent had time to study the
> PoD code.

Well, PoD can be implemented with a pager that simply shortcuts the step
that actually populates the page with contents. A zeroed heap page is good
enough. It's fairly simple for a pager to know for which pages it should
return zero.

PoD also does emergency sweeps under memory pressure to identify zeroes,
that can be easily implemented by a user-space utility.

The hypervisor code keeps a list of 2M superpages -- that feature would be
lost.

But I doubt this would fly anyways: PoD works for non-ept modes, which I
guess don't want to lose that functionality.

>
>> PoD only introduces extraneous side-effects when there is a complete
>> absence of memory to allocate pages. The same cannot be said of paging,
>> to
>> put it mildly. It returns EINVAL all over the place. Right now, qemu can
>> be crashed in a blink by paging out the right gfn.
>
> I have seen qemu crashes when using emulated storage, but havent
> debugged them yet. I suspect they were caused by a race between nominate
> and evict.
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 9, 2011, 1:30 PM

Post #10 of 49 (762 views)

Also,
> On Tue, Nov 08, Andres Lagar-Cavilla wrote:
>
>> Tbh, for paging to be effective, we need to be prepared to yield on
>> every
>> p2m lookup.
>
> Yes, if a gfn is missing the vcpu should go to sleep rather than
> returning -ENOENT to the caller. Only the query part of gfn_to_mfn
> should return the p2m paging types.
>
>> Let's compare paging to PoD. They're essentially the same thing: pages
>> disappear, and get allocated on the fly when you need them. PoD is a
>> highly optimized in-hypervisor optimization that does not need a
>> user-space helper -- but the pager could do PoD easily and remove all
>> that
>> p2m-pod.c code from the hypervisor.
>
> Perhaps PoD and paging could be merged, I havent had time to study the
> PoD code.
>
>> PoD only introduces extraneous side-effects when there is a complete
>> absence of memory to allocate pages. The same cannot be said of paging,
>> to
>> put it mildly. It returns EINVAL all over the place. Right now, qemu can
>> be crashed in a blink by paging out the right gfn.
>
> I have seen qemu crashes when using emulated storage, but havent
> debugged them yet. I suspect they were caused by a race between nominate
> and evict.

After a bit of thinking, things are far more complicated. I don't think
this is a "race." If the pager removed a page that later gets scheduled by
the guest OS for IO, qemu will want to foreign-map that. With the
hypervisor returning ENOENT, the foreign map will fail, and there goes
qemu.

Same will happen for pv backend mapping grants, or the checkpoint/migrate
code.

I guess qemu/migrate/libxc could retry until the pager is done and the
mapping succeeds. It will be delicate. It won't work for pv backends. It
will flood the mem_event ring.

Wait-queueing the dom0 vcpu is a no-go -- the machine will deadlock quicly.

My thinking is that the best bet is to wait-queue the dom0 process. The
dom0 kernel code handling the foreign map will need to put the mapping
thread in a wait-queue. It can establish a ring-based notification
mechanism with Xen. When Xen completes the paging in, it can add a
notification to the ring. dom0 can then awake the mapping thread and
retry.

Not simple at all. Ideas out there?

Andres

>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 9, 2011, 2:11 PM

Post #11 of 49 (766 views)

On Wed, Nov 09, Andres Lagar-Cavilla wrote:

> After a bit of thinking, things are far more complicated. I don't think
> this is a "race." If the pager removed a page that later gets scheduled by
> the guest OS for IO, qemu will want to foreign-map that. With the
> hypervisor returning ENOENT, the foreign map will fail, and there goes
> qemu.

The tools are supposed to catch ENOENT and try again.
linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
appears to do that as well. What code path uses qemu that leads to a
crash?

> I guess qemu/migrate/libxc could retry until the pager is done and the
> mapping succeeds. It will be delicate. It won't work for pv backends. It
> will flood the mem_event ring.

There will no flood, only one request is sent per gfn in
p2m_mem_paging_populate().

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 9, 2011, 8:29 PM

Post #12 of 49 (755 views)

Olaf,
> On Wed, Nov 09, Andres Lagar-Cavilla wrote:
>
>> After a bit of thinking, things are far more complicated. I don't think
>> this is a "race." If the pager removed a page that later gets scheduled
>> by
>> the guest OS for IO, qemu will want to foreign-map that. With the
>> hypervisor returning ENOENT, the foreign map will fail, and there goes
>> qemu.
>
> The tools are supposed to catch ENOENT and try again.
> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> appears to do that as well. What code path uses qemu that leads to a
> crash?

The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?

And for backend drivers implemented in the kernel (netback, etc), there is
no retrying.

All those ram_paging types and their interactions give me a headache, but
I'll trust you that only one event is put in the ring.

I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to
evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
harder, but qemu also dies. No pv drivers. I haven't been able to trace
back the qemu crash (segfault on a NULL ide_if field for a dma callback)
to the exact paging action yet, but no crashes without paging.

Andres

>
>> I guess qemu/migrate/libxc could retry until the pager is done and the
>> mapping succeeds. It will be delicate. It won't work for pv backends. It
>> will flood the mem_event ring.
>
> There will no flood, only one request is sent per gfn in
> p2m_mem_paging_populate().
>
> Olaf
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

JBeulich at suse

Nov 10, 2011, 1:20 AM

Post #13 of 49 (756 views)

>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote:
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn't on mainline linux 3.0, 3.1, etc.

Seems like nobody cared to port over the code from the old 2.6.18 tree
(or the forward ports thereof).

> Which dom0 kernel are you using?

Certainly one of our forward port kernels.

> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.

As above, seems like nobody cared to forward port those bits either.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

keir.xen at gmail

Nov 10, 2011, 1:26 AM

Post #14 of 49 (754 views)

On 10/11/2011 04:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org> wrote:

>> The tools are supposed to catch ENOENT and try again.
>> linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
>> appears to do that as well. What code path uses qemu that leads to a
>> crash?
>
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?
>
> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.

Getting this working without a new Linux kernel -- and with
as-yet-to-be-written new stuff in it -- is unlikely to be on the cards is
it?

I think you suggested an in-kernel mechanism to wait for page-in and then
retry mapping. If that could be used by the in-kernel drivers and exposed
via our privcmd interface for qemu and rest of userspace too, that may be
the best single solution. Perhaps it could be largely hidden behind the
existing privcmd-mmap ioctls.

-- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 10, 2011, 2:18 AM

Post #15 of 49 (757 views)

On Wed, Nov 09, Andres Lagar-Cavilla wrote:

> Olaf,
> > On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> >
> >> After a bit of thinking, things are far more complicated. I don't think
> >> this is a "race." If the pager removed a page that later gets scheduled
> >> by
> >> the guest OS for IO, qemu will want to foreign-map that. With the
> >> hypervisor returning ENOENT, the foreign map will fail, and there goes
> >> qemu.
> >
> > The tools are supposed to catch ENOENT and try again.
> > linux_privcmd_map_foreign_bulk() does that. linux_gnttab_grant_map()
> > appears to do that as well. What code path uses qemu that leads to a
> > crash?
>
> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?

I'm running SLES11 as dom0. Now thats really odd that there is no ENOENT
handling in mainline, I will go and check the code.

> And for backend drivers implemented in the kernel (netback, etc), there is
> no retrying.

A while ago I fixed the grant status handling, perhaps that change was
never forwarded to pvops, at least I didnt do it at that time.

> I'm using 24066:54a5e994a241. I start windows 7, make xenpaging try to
> evict 90% of the RAM, qemu lasts for about two seconds. Linux fights
> harder, but qemu also dies. No pv drivers. I haven't been able to trace
> back the qemu crash (segfault on a NULL ide_if field for a dma callback)
> to the exact paging action yet, but no crashes without paging.

If the kernel is pvops it may need some audit to check the ENOENT
handling.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 10, 2011, 4:05 AM

Post #16 of 49 (754 views)

On Thu, Nov 10, Olaf Hering wrote:

> On Wed, Nov 09, Andres Lagar-Cavilla wrote:
> > The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported. Which
> > it isn't on mainline linux 3.0, 3.1, etc. Which dom0 kernel are you using?
> I'm running SLES11 as dom0. Now thats really odd that there is no ENOENT
> handling in mainline, I will go and check the code.

xen_remap_domain_mfn_range() has to catch -ENOENT returned from
HYPERVISOR_mmu_update() and return it to its callers. Then
drivers/xen/xenfs/privcmd.c:traverse_pages() will do the right thing.
See
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/0051d294bb60

The granttable part needs more changes, see
http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/rev/7c7efaea8b54

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

andres at lagarcavilla

Nov 10, 2011, 5:57 AM

Post #17 of 49 (751 views)

Thanks Jan, and thanks Olaf for the pointers to specific patches. I'll try
to cherry pick those into my dom0 (debian mainline 3.0). Somebody else
should get those in mainline though. Soonish :)

Andres
>>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org>
>>>> wrote:
>> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported.
>> Which
>> it isn't on mainline linux 3.0, 3.1, etc.
>
> Seems like nobody cared to port over the code from the old 2.6.18 tree
> (or the forward ports thereof).
>
>> Which dom0 kernel are you using?
>
> Certainly one of our forward port kernels.
>
>> And for backend drivers implemented in the kernel (netback, etc), there
>> is
>> no retrying.
>
> As above, seems like nobody cared to forward port those bits either.
>
> Jan
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

konrad.wilk at oracle

Nov 10, 2011, 7:32 AM

Post #18 of 49 (751 views)

On Thu, Nov 10, 2011 at 05:57:18AM -0800, Andres Lagar-Cavilla wrote:
> Thanks Jan, and thanks Olaf for the pointers to specific patches. I'll try
> to cherry pick those into my dom0 (debian mainline 3.0). Somebody else
> should get those in mainline though. Soonish :)

Well, could you post them once you have cherry-picked them?

Thanks.
>
> Andres
> >>>> On 10.11.11 at 05:29, "Andres Lagar-Cavilla" <andres@lagarcavilla.org>
> >>>> wrote:
> >> The tools retry as long as IOCTL_PRIVCMD_MMAPBATCH_V2 is supported.
> >> Which
> >> it isn't on mainline linux 3.0, 3.1, etc.
> >
> > Seems like nobody cared to port over the code from the old 2.6.18 tree
> > (or the forward ports thereof).
> >
> >> Which dom0 kernel are you using?
> >
> > Certainly one of our forward port kernels.
> >
> >> And for backend drivers implemented in the kernel (netback, etc), there
> >> is
> >> no retrying.
> >
> > As above, seems like nobody cared to forward port those bits either.
> >
> > Jan
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 11, 2011, 2:56 PM

Post #19 of 49 (743 views)

Keir,

just do dump my findings to the list:

On Tue, Nov 08, Keir Fraser wrote:

> Tbh I wonder anyway whether stale hypercall context would be likely to cause
> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
> between CPUs as a cause of inconsistencies, or pin the guest under test.
> Another problem could be sleeping with locks held, but we do test for that
> (in debug builds at least) and I'd expect crash/hang rather than silent
> reboot. Another problem could be if the vcpu has its own state in an
> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
> which then is attempted to be restored during a waitqueue wakeup. That could
> certainly cause a reboot, but I don't know of an example where this might
> happen.

The crashes also happen with maxcpus=1 and a single guest cpu.
Today I added wait_event to ept_get_entry and this works.

But at some point the codepath below is executed, after that wake_up the
host hangs hard. I will trace it further next week, maybe the backtrace
gives a glue what the cause could be.

Also, the 3K stacksize is still too small, this path uses 3096.

(XEN) prep 127a 30 0
(XEN) wake 127a 30
(XEN) prep 1cf71 30 0
(XEN) wake 1cf71 30
(XEN) prep 1cf72 30 0
(XEN) wake 1cf72 30
(XEN) prep 1cee9 30 0
(XEN) wake 1cee9 30
(XEN) prep 121a 30 0
(XEN) wake 121a 30

(This means 'gfn (p2m_unshare << 4) in_atomic)'

(XEN) prep 1ee61 20 0
(XEN) max stacksize c18
(XEN) Xen WARN at wait.c:126
(XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
(XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
(XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000
(XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0
(XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000
(XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003
(XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000
(XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802b7378:
(XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
(XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
(XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
(XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
(XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
(XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
(XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
(XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
(XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
(XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
(XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
(XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
(XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
(XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
(XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
(XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
(XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
(XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
(XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
(XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
(XEN) Xen call trace:
(XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
(XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
(XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
(XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
(XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
(XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
(XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
(XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
(XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
(XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
(XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
(XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
(XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
(XEN)
(XEN) wake 1ee61 20

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

keir.xen at gmail

Nov 11, 2011, 11:00 PM

Post #20 of 49 (739 views)

On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:

> Keir,
>
> just do dump my findings to the list:
>
> On Tue, Nov 08, Keir Fraser wrote:
>
>> Tbh I wonder anyway whether stale hypercall context would be likely to cause
>> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
>> between CPUs as a cause of inconsistencies, or pin the guest under test.
>> Another problem could be sleeping with locks held, but we do test for that
>> (in debug builds at least) and I'd expect crash/hang rather than silent
>> reboot. Another problem could be if the vcpu has its own state in an
>> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
>> which then is attempted to be restored during a waitqueue wakeup. That could
>> certainly cause a reboot, but I don't know of an example where this might
>> happen.
>
> The crashes also happen with maxcpus=1 and a single guest cpu.
> Today I added wait_event to ept_get_entry and this works.
>
> But at some point the codepath below is executed, after that wake_up the
> host hangs hard. I will trace it further next week, maybe the backtrace
> gives a glue what the cause could be.

So you run with a single CPU, and with wait_event() in one location, and
that works for a while (actually doing full waitqueue work: executing wait()
and wake_up()), but then hangs? That's weird, but pretty interesting if I've
understood correctly.

> Also, the 3K stacksize is still too small, this path uses 3096.

I'll allocate a whole page for the stack then.

-- Keir

> (XEN) prep 127a 30 0
> (XEN) wake 127a 30
> (XEN) prep 1cf71 30 0
> (XEN) wake 1cf71 30
> (XEN) prep 1cf72 30 0
> (XEN) wake 1cf72 30
> (XEN) prep 1cee9 30 0
> (XEN) wake 1cee9 30
> (XEN) prep 121a 30 0
> (XEN) wake 121a 30
>
> (This means 'gfn (p2m_unshare << 4) in_atomic)'
>
> (XEN) prep 1ee61 20 0
> (XEN) max stacksize c18
> (XEN) Xen WARN at wait.c:126
> (XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C
> ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000
> (XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0
> (XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000
> (XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003
> (XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000
> (XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> (XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
> (XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
> (XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
> (XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
> (XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
> (XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
> (XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
> (XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
> (XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
> (XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
> (XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
> (XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
> (XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
> (XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
> (XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
> (XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
> (XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
> (XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
> (XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
> (XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
> (XEN) Xen call trace:
> (XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> (XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> (XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> (XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> (XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> (XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> (XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> (XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> (XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> (XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> (XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> (XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
> (XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> (XEN)
> (XEN) wake 1ee61 20
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 22, 2011, 3:40 AM

Post #21 of 49 (719 views)

On Sat, Nov 12, Keir Fraser wrote:

> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:
>
> > Keir,
> >
> > just do dump my findings to the list:
> >
> > On Tue, Nov 08, Keir Fraser wrote:
> >
> >> Tbh I wonder anyway whether stale hypercall context would be likely to cause
> >> a silent machine reboot. Booting with max_cpus=1 would eliminate moving
> >> between CPUs as a cause of inconsistencies, or pin the guest under test.
> >> Another problem could be sleeping with locks held, but we do test for that
> >> (in debug builds at least) and I'd expect crash/hang rather than silent
> >> reboot. Another problem could be if the vcpu has its own state in an
> >> inconsistent/invalid state temporarily (e.g., its pagetable base pointers)
> >> which then is attempted to be restored during a waitqueue wakeup. That could
> >> certainly cause a reboot, but I don't know of an example where this might
> >> happen.
> >
> > The crashes also happen with maxcpus=1 and a single guest cpu.
> > Today I added wait_event to ept_get_entry and this works.
> >
> > But at some point the codepath below is executed, after that wake_up the
> > host hangs hard. I will trace it further next week, maybe the backtrace
> > gives a glue what the cause could be.
>
> So you run with a single CPU, and with wait_event() in one location, and
> that works for a while (actually doing full waitqueue work: executing wait()
> and wake_up()), but then hangs? That's weird, but pretty interesting if I've
> understood correctly.

Yes, thats what happens with single cpu in dom0 and domU.
I have added some more debug. After the backtrace below I see one more
call to check_wakeup_from_wait() for dom0, then the host hangs hard.

> > Also, the 3K stacksize is still too small, this path uses 3096.
>
> I'll allocate a whole page for the stack then.

Thanks.

Olaf

> > (XEN) prep 127a 30 0
> > (XEN) wake 127a 30
> > (XEN) prep 1cf71 30 0
> > (XEN) wake 1cf71 30
> > (XEN) prep 1cf72 30 0
> > (XEN) wake 1cf72 30
> > (XEN) prep 1cee9 30 0
> > (XEN) wake 1cee9 30
> > (XEN) prep 121a 30 0
> > (XEN) wake 121a 30
> >
> > (This means 'gfn (p2m_unshare << 4) in_atomic)'
> >
> > (XEN) prep 1ee61 20 0
> > (XEN) max stacksize c18
> > (XEN) Xen WARN at wait.c:126
> > (XEN) ----[ Xen-4.2.24114-20111111.221356 x86_64 debug=y Tainted: C
> > ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> > (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff830201f76000 rcx: 0000000000000000
> > (XEN) rdx: ffff82c4802b7f18 rsi: 000000000000000a rdi: ffff82c4802673f0
> > (XEN) rbp: ffff82c4802b73a8 rsp: ffff82c4802b7378 r8: 0000000000000000
> > (XEN) r9: ffff82c480221da0 r10: 00000000fffffffa r11: 0000000000000003
> > (XEN) r12: ffff82c4802b7f18 r13: ffff830201f76000 r14: ffff83003ea5c000
> > (XEN) r15: 000000000001ee61 cr0: 000000008005003b cr4: 00000000000026f0
> > (XEN) cr3: 000000020336d000 cr2: 00007fa88ac42000
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7378:
> > (XEN) 0000000000000020 000000000001ee61 0000000000000002 ffff830201aa9e90
> > (XEN) ffff830201aa9f60 0000000000000020 ffff82c4802b7428 ffff82c4801e02f9
> > (XEN) ffff830000000002 0000000000000000 ffff82c4802b73f8 ffff82c4802b73f4
> > (XEN) 0000000000000000 ffff82c4802b74e0 ffff82c4802b74e4 0000000101aa9e90
> > (XEN) 000000ffffffffff ffff830201aa9e90 000000000001ee61 ffff82c4802b74e4
> > (XEN) 0000000000000002 0000000000000000 ffff82c4802b7468 ffff82c4801d810f
> > (XEN) ffff82c4802b74e0 000000000001ee61 ffff830201aa9e90 ffff82c4802b75bc
> > (XEN) 00000000002167f5 ffff88001ee61900 ffff82c4802b7518 ffff82c480211b80
> > (XEN) ffff8302167f5000 ffff82c4801c168c 0000000000000000 ffff83003ea5c000
> > (XEN) ffff88001ee61900 0000000001805063 0000000001809063 000000001ee001e3
> > (XEN) 000000001ee61067 00000000002167f5 000000000022ee70 000000000022ed10
> > (XEN) ffffffffffffffff 0000000a00000007 0000000000000004 ffff82c48025db80
> > (XEN) ffff83003ea5c000 ffff82c4802b75bc ffff88001ee61900 ffff830201aa9e90
> > (XEN) ffff82c4802b7528 ffff82c480211cb1 ffff82c4802b7568 ffff82c4801da97f
> > (XEN) ffff82c4801be053 0000000000000008 ffff82c4802b7b58 ffff88001ee61900
> > (XEN) 0000000000000000 ffff82c4802b78b0 ffff82c4802b75f8 ffff82c4801aaec8
> > (XEN) 0000000000000003 ffff88001ee61900 ffff82c4802b78b0 ffff82c4802b7640
> > (XEN) ffff83003ea5c000 00000000000000a0 0000000000000900 0000000000000008
> > (XEN) 00000003802b7650 0000000000000004 00000003802b7668 0000000000000000
> > (XEN) ffff82c4802b7b58 0000000000000001 0000000000000003 ffff82c4802b78b0
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c48012b85e>] prepare_to_wait+0x178/0x1b2
> > (XEN) [<ffff82c4801e02f9>] ept_get_entry+0x81/0xd8
> > (XEN) [<ffff82c4801d810f>] gfn_to_mfn_type_p2m+0x55/0x114
> > (XEN) [<ffff82c480211b80>] hap_p2m_ga_to_gfn_4_levels+0x1c4/0x2d6
> > (XEN) [<ffff82c480211cb1>] hap_gva_to_gfn_4_levels+0x1f/0x2e
> > (XEN) [<ffff82c4801da97f>] paging_gva_to_gfn+0xae/0xc4
> > (XEN) [<ffff82c4801aaec8>] hvmemul_linear_to_phys+0xf1/0x25c
> > (XEN) [<ffff82c4801ab762>] hvmemul_rep_movs+0xe8/0x31a
> > (XEN) [<ffff82c48018de07>] x86_emulate+0x4e01/0x10fde
> > (XEN) [<ffff82c4801aab3c>] hvm_emulate_one+0x12d/0x1c5
> > (XEN) [<ffff82c4801b68a9>] handle_mmio+0x4e/0x1d8
> > (XEN) [<ffff82c4801b3a1e>] hvm_hap_nested_page_fault+0x1e7/0x302
> > (XEN) [<ffff82c4801d1ff6>] vmx_vmexit_handler+0x12cf/0x1594
> > (XEN)
> > (XEN) wake 1ee61 20
> >
> >
> >
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 22, 2011, 5:04 AM

Post #22 of 49 (713 views)

On 22/11/2011 11:40, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Sat, Nov 12, Keir Fraser wrote:
>
>> On 11/11/2011 22:56, "Olaf Hering" <olaf@aepfle.de> wrote:
>>
>> So you run with a single CPU, and with wait_event() in one location, and
>> that works for a while (actually doing full waitqueue work: executing wait()
>> and wake_up()), but then hangs? That's weird, but pretty interesting if I've
>> understood correctly.
>
> Yes, thats what happens with single cpu in dom0 and domU.
> I have added some more debug. After the backtrace below I see one more
> call to check_wakeup_from_wait() for dom0, then the host hangs hard.

I think I checked before, but: also unresponsive to serial debug keys?

And dom0 isn't getting put on a waitqueue I assume? Since I guess dom0 is
doing the work to wake things from the waitqueue, that couldn't go well. :-)

>>> Also, the 3K stacksize is still too small, this path uses 3096.
>>
>> I'll allocate a whole page for the stack then.
>
> Thanks.

Forgot about it. Done now!

-- Keir

>
> Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 22, 2011, 5:54 AM

Post #23 of 49 (716 views)

On Tue, Nov 22, Keir Fraser wrote:

> I think I checked before, but: also unresponsive to serial debug keys?

Good point, I will check that. So far I havent used these keys.

> Forgot about it. Done now!

What about domain_crash() instead of BUG_ON() in __prepare_to_wait()?
If the stacksize would be checked before its copied the hypervisor could
survive.

Olaf

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 22, 2011, 6:24 AM

Post #24 of 49 (716 views)

On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Nov 22, Keir Fraser wrote:
>
>> I think I checked before, but: also unresponsive to serial debug keys?
>
> Good point, I will check that. So far I havent used these keys.

If they work then 'd' will give you a backtrace on every CPU, and 'q' will
dump domain and vcpu states. That should make things easier!

>> Forgot about it. Done now!
>
> What about domain_crash() instead of BUG_ON() in __prepare_to_wait()?
> If the stacksize would be checked before its copied the hypervisor could
> survive.

Try the attached patch (please also try reducing the size of the new
parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the
domain-crashing path).

-- Keir

> Olaf

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

George.Dunlap at eu

Nov 22, 2011, 6:34 AM

Post #25 of 49 (716 views)

On Wed, Nov 9, 2011 at 9:21 PM, Andres Lagar-Cavilla
<andres@lagarcavilla.org> wrote:
> PoD also does emergency sweeps under memory pressure to identify zeroes,
> that can be easily implemented by a user-space utility.

PoD is certainly a special-case, hypervisor-handled version of paging.
The main question is whether a user-space version can be made to
perform well enough. My guess is that it can, but it's far from
certain. If it can, I'm all in favor of making the paging handle PoD.

> The hypervisor code keeps a list of 2M superpages -- that feature would be
> lost.

This is actually pretty important; Windows scrubs memory on boot, so
it's guaranteed that the majority of the memory will be touched and
re-populated.

> But I doubt this would fly anyways: PoD works for non-ept modes, which I
> guess don't want to lose that functionality.

Is there a particular reason we can't do paging on shadow code? I
thought it was just that doing HAP was simpler to get started with.
That would be another blocker to getting rid of PoD, really.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel