Mailing List Archive

Kernel oops in radeon on evergreen_bandwidth_update
I've been seeing radeon related crashes upon boot on my laptop for a while
now, but I could just recently capture this Oops, which can be related. If
the machine hard-locks on boot, I can capture no messages. It looks, that
most of the time it can regenerate from that state somehow and continues
on without any problems. There was a kernel version months before, for
which boot resulted in a hard-lock each time. It happens once in every
couple of startups only.

It's a null pointer dereference. Where should I report that?

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffffc01968d3>] evergreen_bandwidth_update+0x63/0x120 [radeon]
PGD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: x86_pkg_temp_thermal coretemp microcode xhci_pci
xhci_hcd radeon(+) cfbfillrect cfbimgblt cfbcopyarea drm_kms_helper ttm
ehci_pci ehci_
CPU: 1 PID: 1730 Comm: laptop_mode Not tainted 4.1.7-hardened-r1 #1
Hardware name: Hewlett-Packard HP EliteBook 8560w/1631, BIOS 68SVD Ver.
F.50 08/04/2014
task: ffff88023247e150 ti: ffff88023247ed78 task.ti: ffff88023247ed78
RIP: 0010:[<ffffffffc01968d3>] [<ffffffffc01968d3>]
evergreen_bandwidth_update+0x63/0x120 [radeon]
RSP: 0018:ffffc900062e3cc8 EFLAGS: 00010287
RAX: ffff8800017e04c8 RBX: ffff8800017e0000 RCX: ffff8800017e04e0
RDX: 0000000000000000 RSI: ffffffffc0196870 RDI: ffff8800017e0000
RBP: ffff8800017e1b78 R08: ffff880230c6be92 R09: 000000000000044c
R10: ffff8800017e1e90 R11: 0000000000006cf0 R12: 0000000000000000
R13: 8000000000000000 R14: ffff8800017e1ba0 R15: ffff8800017e14b0
FS: 000003693c4f3700(0000) GS:ffff88023cc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000090 CR3: 0000000012e52000 CR4: 00000000000606f0
Stack:
ffff8800017e1b78 ffff8800017e0000 ffff8800017e1b78 0000000000000001
8000000000000000 ffff8800017e1ba0 ffff8800017e14b0 ffffffffc018d9f9
ffff8800017e1b78 ffff8800017e0000 ffff8800017e1b78 000000000000000c
Call Trace:
[<ffffffffc018d9f9>] ? radeon_pm_compute_clocks+0x6e9/0x950 [radeon]
[<ffffffffc018e28b>] ? radeon_set_dpm_state+0x7b/0x140 [radeon]
[<ffffffff9227f442>] ? kernfs_fop_write+0x142/0x490
[<ffffffff921f3bf3>] ? __vfs_write+0x53/0x150
[<ffffffff921f792d>] ? __sb_start_write+0x4d/0x120
[<ffffffff921f4644>] ? vfs_write+0xf4/0x2a0
[<ffffffff921f5936>] ? SyS_write+0x56/0xd0
[<ffffffff92e3c959>] ? system_call_fastpath+0x12/0x83
Code: 8b 93 f0 24 00 00 85 d2 7e d8 48 8d 83 b0 04 00 00 83 ea 01 45 31 e4
48 8d 8c d3 b8 04 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 10 <80> ba 90 00
00 00 01 41 83 dc ff 48 83 c0 08 48 39 c8 75 e9 48
RIP [<ffffffffc01968d3>] evergreen_bandwidth_update+0x63/0x120 [radeon]
RSP <ffffc900062e3cc8>
CR2: 0000000000000090
---[ end trace 672b1aaacf0d608d ]---


I'm still having early-boot intel-iommu traces on my other machine, but
without any symptoms. Might be related to a 3ware card, fully functioning,
housing the system - therefore I cannot verify...
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at drivers/iommu/intel-iommu.c:3214
intel_unmap+0x146/0x200()
Driver unmaps unmatched page at PFN 0
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.7-hardened-r1 #1
Hardware name: System manufacturer System Product Name/Z8P(N)E-D12(X),
BIOS 1302 06/25/2012
0000000000000000 0bf0f1830deb9167 ffffffffa61a1f87 0000000000000000
ffffffffa61a1f87 ffffffff9fe5d48d ffff880237c03d80 ffffffff9f0a96e7
ffffffffa61a1f87 0000000000000c8e ffffffffa61e1500 ffff880433ceb368
Call Trace:
<IRQ> [<ffffffff9fe5d48d>] ? dump_stack+0x40/0x56
[<ffffffff9f0a96e7>] ? warn_slowpath_common+0x77/0xb0
[<ffffffff9f0a978c>] ? warn_slowpath_fmt+0x6c/0x90
[<ffffffff9f6aa8e6>] ? intel_unmap+0x146/0x200
[<ffffffff9f77d78e>] ? twa_interrupt+0x48e/0x780
[<ffffffff9f0f9de3>] ? handle_irq_event_percpu+0x73/0x120
[<ffffffff9f0f9ec0>] ? handle_irq_event+0x30/0x50
[<ffffffff9f0fcfd8>] ? handle_fasteoi_irq+0x88/0x180
[<ffffffff9f005385>] ? handle_irq+0x85/0x160
[<ffffffff9f0ce264>] ? atomic_notifier_call_chain+0x24/0x30
[<ffffffff9f004c01>] ? do_IRQ+0x41/0xf0
[<ffffffff9fe69397>] ? common_interrupt+0x97/0x97
<EOI> [<ffffffff9f9db147>] ? cpuidle_enter_state+0xb7/0x160
[<ffffffff9f0eed5b>] ? cpu_startup_entry+0x27b/0x300
[<ffffffffaac1507a>] ? start_kernel+0x4a9/0x4ca
[<ffffffffaac14120>] ? early_idt_handler_array+0x120/0x120
[<ffffffffaac145f7>] ? x86_64_start_kernel+0x10b/0x12f
---[ end trace 366301a14398ecf4 ]---

Thanks:
Dw.
--
dr Tóth Attila, Radiológus, 06-20-825-8057
Attila Toth MD, Radiologist, +36-20-825-8057
Re: Kernel oops in radeon on evergreen_bandwidth_update [ In reply to ]
On 27 Sep 2015 at 10:44, "Tóth Attila" wrote:

> I've been seeing radeon related crashes upon boot on my laptop for a while
> now, but I could just recently capture this Oops, which can be related. If
> the machine hard-locks on boot, I can capture no messages. It looks, that
> most of the time it can regenerate from that state somehow and continues
> on without any problems. There was a kernel version months before, for
> which boot resulted in a hard-lock each time. It happens once in every
> couple of startups only.
>
> It's a null pointer dereference. Where should I report that?

to upstream kernel maintainers ;). but before you do that, enable
frame pointers to get a better backtrace and also DEBUG_INFO so
that addr2line can produce a precise location for the null deref.

> I'm still having early-boot intel-iommu traces on my other machine, but
> without any symptoms. Might be related to a 3ware card, fully functioning,
> housing the system - therefore I cannot verify...

twa_interrupt is from the 3ware 9xxx driver and it seems that it wants
to unmap a page it doesn't own. DEBUG_INFO and addr2line would help to
identify the bad call in twa_interrupt (ffffffff9f77d78e in the below
trace) then you can send it upstream ;).

> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at drivers/iommu/intel-iommu.c:3214
> intel_unmap+0x146/0x200()
> Driver unmaps unmatched page at PFN 0
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.7-hardened-r1 #1
> Hardware name: System manufacturer System Product Name/Z8P(N)E-D12(X),
> BIOS 1302 06/25/2012
> 0000000000000000 0bf0f1830deb9167 ffffffffa61a1f87 0000000000000000
> ffffffffa61a1f87 ffffffff9fe5d48d ffff880237c03d80 ffffffff9f0a96e7
> ffffffffa61a1f87 0000000000000c8e ffffffffa61e1500 ffff880433ceb368
> Call Trace:
> <IRQ> [<ffffffff9fe5d48d>] ? dump_stack+0x40/0x56
> [<ffffffff9f0a96e7>] ? warn_slowpath_common+0x77/0xb0
> [<ffffffff9f0a978c>] ? warn_slowpath_fmt+0x6c/0x90
> [<ffffffff9f6aa8e6>] ? intel_unmap+0x146/0x200
> [<ffffffff9f77d78e>] ? twa_interrupt+0x48e/0x780
> [<ffffffff9f0f9de3>] ? handle_irq_event_percpu+0x73/0x120
> [<ffffffff9f0f9ec0>] ? handle_irq_event+0x30/0x50
> [<ffffffff9f0fcfd8>] ? handle_fasteoi_irq+0x88/0x180
> [<ffffffff9f005385>] ? handle_irq+0x85/0x160
> [<ffffffff9f0ce264>] ? atomic_notifier_call_chain+0x24/0x30
> [<ffffffff9f004c01>] ? do_IRQ+0x41/0xf0
> [<ffffffff9fe69397>] ? common_interrupt+0x97/0x97