Mailing List Archive: Need help with fixing the Xen waitqueue feature

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

Nov 22, 2011, 7:40 AM

Post #26 of 49 (1382 views)

On 22/11/2011 15:07, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Nov 22, Keir Fraser wrote:
>
>> On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote:
>>
>>> On Tue, Nov 22, Keir Fraser wrote:
>>>
>>>> I think I checked before, but: also unresponsive to serial debug keys?
>>>
>>> Good point, I will check that. So far I havent used these keys.
>>
>> If they work then 'd' will give you a backtrace on every CPU, and 'q' will
>> dump domain and vcpu states. That should make things easier!
>
> They do indeed work. The backtrace below is from another system.
> Looks like hpet_broadcast_exit() is involved.
>
> Does that output below give any good hints?

It tells us that the hypervisor itself is in good shape. The deterministic
RIP in hpet_broadcast_exit() is simply because the serial rx interrupt is
always waking us from the idle loop. That RIP value will simply be the first
possible interruption point after the HLT instruction.

I have a new theory, which is that if we go round the for-loop in
wait_event() more than once, the vcpu's pause counter gets messed up and
goes negative, condemning it to sleep forever.

I have *just* pushed a change to the debug 'q' key (ignore the changeset
comment referring to 'd' key, I got that wrong!) which will print per-vcpu
and per-domain pause_count values. Please get the system stuck again, and
send the output from 'q' key with that new changeset (c/s 24178).

Finally, I don't really know what the prep/wake/done messages from your logs
mean, as you didn't send the patch that prints them.

-- Keir

>> Try the attached patch (please also try reducing the size of the new
>> parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the
>> domain-crashing path).
>
> Thanks, I will try it.
>
>
> Olaf
>
>
> ..........
>
> (XEN) 'q' pressed -> dumping domain info (now=0x5E:F50D77F8)
> (XEN) General information for domain 0:
> (XEN) refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN) Interrupts { 0-207 }
> (XEN) I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN) DomPage list too long to display
> (XEN) XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN) VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN) 250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN) refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN) handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN) paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN) I/O Ports { }
> (XEN) Interrupts { }
> (XEN) I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN) DomPage list too long to display
> (XEN) PoD entries=0 cachesize=0
> (XEN) XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN) VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN) paging assistance: hap, 4 levels
> (XEN) No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) 'q' pressed -> dumping domain info (now=0x60:A7DD8B08)
> (XEN) General information for domain 0:
> (XEN) refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN) handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN) I/O Ports { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN) Interrupts { 0-207 }
> (XEN) I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN) DomPage list too long to display
> (XEN) XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002
> (XEN) XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN) VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN) 250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN) refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN) handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN) paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN) I/O Ports { }
> (XEN) Interrupts { }
> (XEN) I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN) DomPage list too long to display
> (XEN) PoD entries=0 cachesize=0
> (XEN) XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001
> (XEN) XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN) VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN) paging assistance: hap, 4 levels
> (XEN) No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) 'd' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218 x86_64 debug=y Tainted: C
> ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40 rbx: 000000674742e72d rcx: 0000000000000001
> (XEN) rdx: 0000000000000000 rsi: ffff82c48030f000 rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0 rsp: ffff82c4802bfe78 r8: 000000008c858211
> (XEN) r9: 0000000000000003 r10: ffff82c4803064e0 r11: 000000676bf885a3
> (XEN) r12: ffff83021e70e840 r13: ffff83021e70e8d0 r14: 00000067471bdb62
> (XEN) r15: ffff82c48030e440 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000 cr2: 0000000000beb000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN) ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0
> (XEN) 0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200
> (XEN) 0000152900006fe3 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18
> (XEN) ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000
> (XEN) ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00
> (XEN) 0000000000000000 0000000000001000 0000000000001000 0000000000000000
> (XEN) 8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef
> (XEN) ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a
> (XEN) 00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000
> (XEN) ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10
> (XEN) 000000000000e02b 000000000000beef 000000000000beef 000000000000beef
> (XEN) 000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000
> (XEN) 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
> (XEN) 'd' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218 x86_64 debug=y Tainted: C
> ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40 rbx: 00000078f4fbe7ed rcx: 0000000000000001
> (XEN) rdx: 0000000000000000 rsi: ffff82c48030f000 rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0 rsp: ffff82c4802bfe78 r8: 00000000cd4f8db6
> (XEN) r9: 0000000000000002 r10: ffff82c480308780 r11: 000000790438291d
> (XEN) r12: ffff83021e70e840 r13: ffff83021e70e8d0 r14: 00000078f412a61c
> (XEN) r15: ffff82c48030e440 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000 cr2: 0000000000beb000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN) ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0
> (XEN) 0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200
> (XEN) 0000239e00007657 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18
> (XEN) ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000
> (XEN) ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00
> (XEN) 0000000000000000 0000000000001000 0000000000001000 0000000000000000
> (XEN) 8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef
> (XEN) ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a
> (XEN) 00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000
> (XEN) ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10
> (XEN) 000000000000e02b 000000000000beef 000000000000beef 000000000000beef
> (XEN) 000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000
> (XEN) 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Re: Need help with fixing the Xen waitqueue feature [ In reply to ]

keir at xen

Nov 22, 2011, 7:54 AM

Post #27 of 49 (1371 views)

Mailing List Archive

Mailing List Archive

Attached Files: