Mailing List Archive

2.6.38 x86_64 domU - BUG: Bad page state
We've been experiencing this behavior since switching to 2.6.38 64 bit.
Lots of reports across our fleet, so not an isolated problem...

DomU: 2.6.38 x86_64
Xen: 3.4.1

BUG: Bad page state in process swapper pfn:1a399
page:ffffea00005bc978 count:-1 mapcount:0 mapping: (null)
index:0xffff88001a399700
page flags: 0x100000000000000()
Pid: 0, comm: swapper Not tainted 2.6.38-x86_64-linode17 #1
Call Trace:
<IRQ> [<ffffffff810aa910>] ? dump_page+0xb1/0xb6
[<ffffffff810ab86a>] ? bad_page+0xd8/0xf0
[<ffffffff810ad0ca>] ? get_page_from_freelist+0x487/0x715
[<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff810dc49a>] ? kmem_cache_free+0x71/0xad
[<ffffffff810ad55c>] ? __alloc_pages_nodemask+0x14d/0x6ab
[<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
[<ffffffff8144b392>] ? ip_rcv_finish+0x319/0x343
[<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
[<ffffffff810d6f35>] ? alloc_pages_current+0xaa/0xcd
[<ffffffff81372fd0>] ? xennet_alloc_rx_buffers+0x7a/0x2d9
[<ffffffff81374d32>] ? xennet_poll+0xbef/0xc85
[<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff8140d709>] ? net_rx_action+0xb6/0x1dc
[<ffffffff812f1bf7>] ? unmask_evtchn+0x1f/0xa3
[<ffffffff810431a4>] ? __do_softirq+0xc7/0x1a3
[<ffffffff81085ca9>] ? handle_fasteoi_irq+0xd2/0xe1
[<ffffffff810069b2>] ? check_events+0x12/0x20
[<ffffffff8100a85c>] ? call_softirq+0x1c/0x30
[<ffffffff8100bebd>] ? do_softirq+0x41/0x7e
[<ffffffff8104303b>] ? irq_exit+0x36/0x78
[<ffffffff812f273c>] ? xen_evtchn_do_upcall+0x2f/0x3c
[<ffffffff8100a8ae>] ? xen_do_hypervisor_callback+0x1e/0x30
<EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
[<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
[<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
[<ffffffff810063a3>] ? xen_safe_halt+0x10/0x1a
[<ffffffff81010998>] ? default_idle+0x4b/0x85
[<ffffffff81008d53>] ? cpu_idle+0x60/0x97
[<ffffffff81533d09>] ? rest_init+0x6d/0x6f
[<ffffffff81b2bd34>] ? start_kernel+0x37f/0x38a
[<ffffffff81b2b2cd>] ? x86_64_start_reservations+0xb8/0xbc
[<ffffffff81b2ee71>] ? xen_start_kernel+0x528/0x52f

... it continues with more BUGs. Full log here:

http://www.theshore.net/~caker/xen/BUGS/2.6.38/

Thanks,
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: 2.6.38 x86_64 domU - BUG: Bad page state [ In reply to ]
On Tue, Apr 26, 2011 at 11:32:29AM -0400, Christopher S. Aker wrote:
> We've been experiencing this behavior since switching to 2.6.38 64
> bit. Lots of reports across our fleet, so not an isolated
> problem...
>
> DomU: 2.6.38 x86_64
> Xen: 3.4.1
>
> BUG: Bad page state in process swapper pfn:1a399
> page:ffffea00005bc978 count:-1 mapcount:0 mapping: (null)

And the same issue as somebody else reported. Where the page
count is negative.

Ian, any thoughts on this? Could the grant freeing have a race?
Or double-freeing?


> index:0xffff88001a399700
> page flags: 0x100000000000000()
> Pid: 0, comm: swapper Not tainted 2.6.38-x86_64-linode17 #1
> Call Trace:
> <IRQ> [<ffffffff810aa910>] ? dump_page+0xb1/0xb6
> [<ffffffff810ab86a>] ? bad_page+0xd8/0xf0
> [<ffffffff810ad0ca>] ? get_page_from_freelist+0x487/0x715
> [<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
> [<ffffffff810dc49a>] ? kmem_cache_free+0x71/0xad
> [<ffffffff810ad55c>] ? __alloc_pages_nodemask+0x14d/0x6ab
> [<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
> [<ffffffff8144b392>] ? ip_rcv_finish+0x319/0x343
> [<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
> [<ffffffff810d6f35>] ? alloc_pages_current+0xaa/0xcd
> [<ffffffff81372fd0>] ? xennet_alloc_rx_buffers+0x7a/0x2d9
> [<ffffffff81374d32>] ? xennet_poll+0xbef/0xc85
> [<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
> [<ffffffff8140d709>] ? net_rx_action+0xb6/0x1dc
> [<ffffffff812f1bf7>] ? unmask_evtchn+0x1f/0xa3
> [<ffffffff810431a4>] ? __do_softirq+0xc7/0x1a3
> [<ffffffff81085ca9>] ? handle_fasteoi_irq+0xd2/0xe1
> [<ffffffff810069b2>] ? check_events+0x12/0x20
> [<ffffffff8100a85c>] ? call_softirq+0x1c/0x30
> [<ffffffff8100bebd>] ? do_softirq+0x41/0x7e
> [<ffffffff8104303b>] ? irq_exit+0x36/0x78
> [<ffffffff812f273c>] ? xen_evtchn_do_upcall+0x2f/0x3c
> [<ffffffff8100a8ae>] ? xen_do_hypervisor_callback+0x1e/0x30
> <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> [<ffffffff810063a3>] ? xen_safe_halt+0x10/0x1a
> [<ffffffff81010998>] ? default_idle+0x4b/0x85
> [<ffffffff81008d53>] ? cpu_idle+0x60/0x97
> [<ffffffff81533d09>] ? rest_init+0x6d/0x6f
> [<ffffffff81b2bd34>] ? start_kernel+0x37f/0x38a
> [<ffffffff81b2b2cd>] ? x86_64_start_reservations+0xb8/0xbc
> [<ffffffff81b2ee71>] ? xen_start_kernel+0x528/0x52f
>
> ... it continues with more BUGs. Full log here:
>
> http://www.theshore.net/~caker/xen/BUGS/2.6.38/
>
> Thanks,
> -Chris
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: 2.6.38 x86_64 domU - BUG: Bad page state [ In reply to ]
On Tue, 2011-04-26 at 18:43 +0100, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 26, 2011 at 11:32:29AM -0400, Christopher S. Aker wrote:
> > We've been experiencing this behavior since switching to 2.6.38 64
> > bit. Lots of reports across our fleet, so not an isolated
> > problem...
> >
> > DomU: 2.6.38 x86_64
> > Xen: 3.4.1
> >
> > BUG: Bad page state in process swapper pfn:1a399
> > page:ffffea00005bc978 count:-1 mapcount:0 mapping: (null)
>
> And the same issue as somebody else reported. Where the page
> count is negative.
>
> Ian, any thoughts on this?

Nothing in particular. Is it reproducible enough to be bisectable?

You mention switching to 2.6.38 64 bit, what were you running before? Do
you have any feeling for (or data suggesting) whether it is related to
the switch to 64 bit or the switch to 2.6.38?

> Could the grant freeing have a race?
> or double-freeing?

netfront is relatively unchanged in 2.6.38 but the m2p override stuff
went in during the 2.6.38 merge window, perhaps this relates to that?

The full log shows:
Disabling lock debugging due to kernel taint
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81373037>] xennet_alloc_rx_buffers+0xe1/0x2d9
PGD 1d990067 PUD 1d9df067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G B 2.6.38-x86_64-linode17 #1
RIP: e030:[<ffffffff81373037>] [<ffffffff81373037>] xennet_alloc_rx_buffers+0xe1/0x2d9

It'd be useful to know what ffffffff81373037 and.or
xennet_alloc_rx_buffers+0xe1 corresponds to in this particular kernel
image.

> > index:0xffff88001a399700
> > page flags: 0x100000000000000()
> > Pid: 0, comm: swapper Not tainted 2.6.38-x86_64-linode17 #1
> > Call Trace:
> > <IRQ> [<ffffffff810aa910>] ? dump_page+0xb1/0xb6
> > [<ffffffff810ab86a>] ? bad_page+0xd8/0xf0
> > [<ffffffff810ad0ca>] ? get_page_from_freelist+0x487/0x715
> > [<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
> > [<ffffffff810dc49a>] ? kmem_cache_free+0x71/0xad
> > [<ffffffff810ad55c>] ? __alloc_pages_nodemask+0x14d/0x6ab
> > [<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
> > [<ffffffff8144b392>] ? ip_rcv_finish+0x319/0x343
> > [<ffffffff81403a73>] ? __netdev_alloc_skb+0x1d/0x3a
> > [<ffffffff810d6f35>] ? alloc_pages_current+0xaa/0xcd
> > [<ffffffff81372fd0>] ? xennet_alloc_rx_buffers+0x7a/0x2d9
> > [<ffffffff81374d32>] ? xennet_poll+0xbef/0xc85
> > [<ffffffff8100699f>] ? xen_restore_fl_direct_end+0x0/0x1
> > [<ffffffff8140d709>] ? net_rx_action+0xb6/0x1dc
> > [<ffffffff812f1bf7>] ? unmask_evtchn+0x1f/0xa3
> > [<ffffffff810431a4>] ? __do_softirq+0xc7/0x1a3
> > [<ffffffff81085ca9>] ? handle_fasteoi_irq+0xd2/0xe1
> > [<ffffffff810069b2>] ? check_events+0x12/0x20
> > [<ffffffff8100a85c>] ? call_softirq+0x1c/0x30
> > [<ffffffff8100bebd>] ? do_softirq+0x41/0x7e
> > [<ffffffff8104303b>] ? irq_exit+0x36/0x78
> > [<ffffffff812f273c>] ? xen_evtchn_do_upcall+0x2f/0x3c
> > [<ffffffff8100a8ae>] ? xen_do_hypervisor_callback+0x1e/0x30
> > <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> > [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> > [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1006
> > [<ffffffff810063a3>] ? xen_safe_halt+0x10/0x1a
> > [<ffffffff81010998>] ? default_idle+0x4b/0x85
> > [<ffffffff81008d53>] ? cpu_idle+0x60/0x97
> > [<ffffffff81533d09>] ? rest_init+0x6d/0x6f
> > [<ffffffff81b2bd34>] ? start_kernel+0x37f/0x38a
> > [<ffffffff81b2b2cd>] ? x86_64_start_reservations+0xb8/0xbc
> > [<ffffffff81b2ee71>] ? xen_start_kernel+0x528/0x52f
> >
> > ... it continues with more BUGs. Full log here:
> >
> > http://www.theshore.net/~caker/xen/BUGS/2.6.38/
> >
> > Thanks,
> > -Chris
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel