Hi All,
I have had a few oopses in the past week already and am trying to find out what the likely
cause is and, more importantly, how to resolve this.
I have tried google, but not found anything useful yet. Several results showing issues during
boot (not the case as it runs succesfully for nearly a week) or related to 2.6 kernel versions.
I have attached the dmesg-output after the first time I noticed the "oops" below this email.
(Actually, several in a row)
Last nights was unable to get as the server was frozen by the time I got to it. Which means I
am unable to confirm fully if the pattern was the same. The last message I could still read was
nearly identical to the ones I saw the first time.
I noticed the following section which looks interesting:
===
[317321.524229] BUG: unable to handle page fault for address: ffff888510ebd0e0
[317321.524307] #PF: supervisor write access in kernel mode
[317321.524368] #PF: error_code(0x0003) - permissions violation
===
But I have no idea if this is a cause or a result of the earlier trace messages in the output.
I found a new BIOS and Firmware version available for the mainboard, which I am planning on
applying this week.
The kernel is "tainted" because of the use of ZFS. No other out-of-tree modules are installed.
My distro: Gentoo
Kernel version: 5.4.38
ZFS version: 0.8.3
XEN version: 4.12.2
If more info is needed to analyse this, please let me know.
Additionally, if anyone has/knows good resources (online preferred, but hardcopy will be fine as
well) I can use to analyse/understand these kernel messages I would definitely appreciate it.
Many thanks in advance,
Joost Roeleveld
DMESG:
[317321.523586] ------------[ cut here ]------------
[317321.523600] WARNING: CPU: 1 PID: 25465 at arch/x86/xen/multicalls.c:102
xen_mc_flush+0x194/0x1c0
[317321.523601] ------------[ cut here ]------------
[317321.523603] Modules linked in:
[317321.523614] WARNING: CPU: 3 PID: 2162 at arch/x86/xen/multicalls.c:102
xen_mc_flush+0x194/0x1c0
[317321.523615] iscsi_tcp libiscsi_tcp libiscsi
[317321.523618] Modules linked in:
[317321.523619] scsi_transport_iscsi nfsd
[317321.523622] iscsi_tcp
[317321.523623] auth_rpcgss nfs_acl lockd
[317321.523626] libiscsi_tcp
[317321.523627] grace sunrpc br_netfilter xt_physdev xen_acpi_processor
[317321.523631] libiscsi
[317321.523632] xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev
[317321.523636] scsi_transport_iscsi
[317321.523637] xen_evtchn xenfs xen_privcmd bridge 8021q
[317321.523640] nfsd
[317321.523642] garp mrp stp llc
[317321.523644] auth_rpcgss
[317321.523646] bonding intel_rapl_msr iTCO_wdt
[317321.523648] nfs_acl
[317321.523650] iTCO_vendor_support intel_rapl_common sb_edac intel_powerclamp
[317321.523653] lockd
[317321.523654] crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
intel_rapl_perf
[317321.523658] grace
[317321.523659] pcspkr i2c_i801 ast
[317321.523662] sunrpc
[317321.523663] drm_vram_helper ttm i2c_algo_bit drm_kms_helper
[317321.523666] br_netfilter
[317321.523667] drm lpc_ich mei_me
[317321.523670] xt_physdev
[317321.523671] mei ipmi_ssif ixgbe mdio
[317321.523674] xen_acpi_processor
[317321.523675] ptp ioatdma pps_core dca ipmi_si
[317321.523679] xen_pciback
[317321.523680] ipmi_devintf ipmi_msghandler acpi_power_meter binfmt_misc
[317321.523683] xen_netback
[317321.523684] tun
[317321.523686] xen_blkback
[317321.523687] zfs(PO) zunicode(PO)
[317321.523689] xen_gntalloc
[317321.523690] zavl(PO) icp(PO)
[317321.523692] xen_gntdev
[317321.523694] zcommon(PO) znvpair(PO) spl(O)
[317321.523696] xen_evtchn
[317321.523698] zlua(PO)
[317321.523700] xenfs xen_privcmd bridge 8021q
[317321.523707] CPU: 1 PID: 25465 Comm: zfs_stats_cache Tainted: P O 5.4.38-
gentoo-host #1
[317321.523708] garp
[317321.523710] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS 3.1
06/08/2018
[317321.523711] mrp
[317321.523715] RIP: e030:xen_mc_flush+0x194/0x1c0
[317321.523716] stp llc bonding
[317321.523721] Code: 05 00 10 00 81 e8 ec 13 be 00 48 89 c1 48 89 45 18 48 c1 e9 3f 48 89
ce e9 03 ff ff ff 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 a0 8a fb 81 31
db 65 8b 0d e7 0a ff 7e e8
[317321.523722] intel_rapl_msr iTCO_wdt iTCO_vendor_support
[317321.523726] RSP: e02b:ffffc9000a00bbe8 EFLAGS: 00010002
[317321.523727] intel_rapl_common sb_edac
[317321.523730] intel_powerclamp crct10dif_pclmul crc32_pclmul
[317321.523736] RAX: ffff888686655858 RBX: 0000777f80000000 RCX: ffff888686655858
[317321.523737] crc32c_intel ghash_clmulni_intel
[317321.523741] RDX: 0000000000000001 RSI: 000000000000000d RDI: ffff888686655310
[317321.523742] intel_rapl_perf pcspkr i2c_i801 ast
I have had a few oopses in the past week already and am trying to find out what the likely
cause is and, more importantly, how to resolve this.
I have tried google, but not found anything useful yet. Several results showing issues during
boot (not the case as it runs succesfully for nearly a week) or related to 2.6 kernel versions.
I have attached the dmesg-output after the first time I noticed the "oops" below this email.
(Actually, several in a row)
Last nights was unable to get as the server was frozen by the time I got to it. Which means I
am unable to confirm fully if the pattern was the same. The last message I could still read was
nearly identical to the ones I saw the first time.
I noticed the following section which looks interesting:
===
[317321.524229] BUG: unable to handle page fault for address: ffff888510ebd0e0
[317321.524307] #PF: supervisor write access in kernel mode
[317321.524368] #PF: error_code(0x0003) - permissions violation
===
But I have no idea if this is a cause or a result of the earlier trace messages in the output.
I found a new BIOS and Firmware version available for the mainboard, which I am planning on
applying this week.
The kernel is "tainted" because of the use of ZFS. No other out-of-tree modules are installed.
My distro: Gentoo
Kernel version: 5.4.38
ZFS version: 0.8.3
XEN version: 4.12.2
If more info is needed to analyse this, please let me know.
Additionally, if anyone has/knows good resources (online preferred, but hardcopy will be fine as
well) I can use to analyse/understand these kernel messages I would definitely appreciate it.
Many thanks in advance,
Joost Roeleveld
DMESG:
[317321.523586] ------------[ cut here ]------------
[317321.523600] WARNING: CPU: 1 PID: 25465 at arch/x86/xen/multicalls.c:102
xen_mc_flush+0x194/0x1c0
[317321.523601] ------------[ cut here ]------------
[317321.523603] Modules linked in:
[317321.523614] WARNING: CPU: 3 PID: 2162 at arch/x86/xen/multicalls.c:102
xen_mc_flush+0x194/0x1c0
[317321.523615] iscsi_tcp libiscsi_tcp libiscsi
[317321.523618] Modules linked in:
[317321.523619] scsi_transport_iscsi nfsd
[317321.523622] iscsi_tcp
[317321.523623] auth_rpcgss nfs_acl lockd
[317321.523626] libiscsi_tcp
[317321.523627] grace sunrpc br_netfilter xt_physdev xen_acpi_processor
[317321.523631] libiscsi
[317321.523632] xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev
[317321.523636] scsi_transport_iscsi
[317321.523637] xen_evtchn xenfs xen_privcmd bridge 8021q
[317321.523640] nfsd
[317321.523642] garp mrp stp llc
[317321.523644] auth_rpcgss
[317321.523646] bonding intel_rapl_msr iTCO_wdt
[317321.523648] nfs_acl
[317321.523650] iTCO_vendor_support intel_rapl_common sb_edac intel_powerclamp
[317321.523653] lockd
[317321.523654] crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
intel_rapl_perf
[317321.523658] grace
[317321.523659] pcspkr i2c_i801 ast
[317321.523662] sunrpc
[317321.523663] drm_vram_helper ttm i2c_algo_bit drm_kms_helper
[317321.523666] br_netfilter
[317321.523667] drm lpc_ich mei_me
[317321.523670] xt_physdev
[317321.523671] mei ipmi_ssif ixgbe mdio
[317321.523674] xen_acpi_processor
[317321.523675] ptp ioatdma pps_core dca ipmi_si
[317321.523679] xen_pciback
[317321.523680] ipmi_devintf ipmi_msghandler acpi_power_meter binfmt_misc
[317321.523683] xen_netback
[317321.523684] tun
[317321.523686] xen_blkback
[317321.523687] zfs(PO) zunicode(PO)
[317321.523689] xen_gntalloc
[317321.523690] zavl(PO) icp(PO)
[317321.523692] xen_gntdev
[317321.523694] zcommon(PO) znvpair(PO) spl(O)
[317321.523696] xen_evtchn
[317321.523698] zlua(PO)
[317321.523700] xenfs xen_privcmd bridge 8021q
[317321.523707] CPU: 1 PID: 25465 Comm: zfs_stats_cache Tainted: P O 5.4.38-
gentoo-host #1
[317321.523708] garp
[317321.523710] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS 3.1
06/08/2018
[317321.523711] mrp
[317321.523715] RIP: e030:xen_mc_flush+0x194/0x1c0
[317321.523716] stp llc bonding
[317321.523721] Code: 05 00 10 00 81 e8 ec 13 be 00 48 89 c1 48 89 45 18 48 c1 e9 3f 48 89
ce e9 03 ff ff ff 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 a0 8a fb 81 31
db 65 8b 0d e7 0a ff 7e e8
[317321.523722] intel_rapl_msr iTCO_wdt iTCO_vendor_support
[317321.523726] RSP: e02b:ffffc9000a00bbe8 EFLAGS: 00010002
[317321.523727] intel_rapl_common sb_edac
[317321.523730] intel_powerclamp crct10dif_pclmul crc32_pclmul
[317321.523736] RAX: ffff888686655858 RBX: 0000777f80000000 RCX: ffff888686655858
[317321.523737] crc32c_intel ghash_clmulni_intel
[317321.523741] RDX: 0000000000000001 RSI: 000000000000000d RDI: ffff888686655310
[317321.523742] intel_rapl_perf pcspkr i2c_i801 ast