Mailing List Archive

xen 7.5 crash: kernel panic / dom0 / solarflare / sfc
Hi.

After increasing traffic on the VMs , I have problems(below) and the server
reboots. (Add: xen-crashdump-analyser.log)
Version: XenServer release 7.5.0 (xenenterprise)

Kernel: Linux br-pr-cwb1-xs1 4.4.0+10 #1 SMP Thu Aug 9 14:42:20 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux

Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet
Controller (rev 02)

driver: sfc
version: 4.10.1.1000-xen
firmware-version: 6.4.2.1020 rx1 tx1
bus-info: 0000:21:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

*dmesg:*

[ 3615.442305] EMERG: NMI watchdog: BUG: soft lockup - CPU#2 stuck for
23s! [swapper/2:0]
[ 3615.442319] WARN: Modules linked in: tun nfsv3 nfs fscache 8021q
garp mrp stp llc openvswitch nf_defrag_ipv6 libcrc32c ipt_REJECT
nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_multiport
xt_conntrack nf_conntrack iptable_filter dm_multipath nls_iso8859_1
nls_cp437 vfat fat ipmi_devintf dm_mod sg crc32_pclmul aesni_intel
aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper shpchp i2c_piix4
ipmi_si ipmi_msghandler tpm_tis tpm nls_utf8 isofs nfsd auth_rpcgss
oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables sd_mod
hid_generic usbhid hid mpt3sas(O) raid_class scsi_transport_sas sfc(O) mdio
ahci libahci libata igb(O) xhci_pci ptp pps_core xhci_hcd scsi_dh_rdac
scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod xen_wdt efivarfs ipv6
[ 3615.442366] WARN: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G
O 4.4.0+10 #1
[ 3615.442368] WARN: Hardware name: Supermicro Super Server/H11SSL-C,
BIOS 1.0b 04/27/2018
[ 3615.442371] WARN: task: ffff88022be25400 ti: ffff88022be50000
task.ti: ffff88022be50000
[ 3615.442372] WARN: RIP: e030:[<ffffffffa02703c0>]
[<ffffffffa02703c0>] efx_ef10_tx_limit_len+0x0/0x30 [sfc]
[ 3615.442384] WARN: RSP: e02b:ffff880234e43688 EFLAGS: 00000206
[ 3615.442386] WARN: RAX: ffff880002638000 RBX: ffff88022ac63300 RCX:
00000000b25cda62
[ 3615.442387] WARN: RDX: 0000000000000000 RSI: 000000110f255680 RDI:
ffff88022ac63300
[ 3615.442388] WARN: RBP: ffff880234e436b8 R08: 000000100f25563c R09:
0000000000000002
[ 3615.442389] WARN: R10: 0000000000000000 R11: ffffffff81a179a0 R12:
ffff88000263b948
[ 3615.442390] WARN: R13: ffffffff00000000 R14: 000000110f255680 R15:
ffffffffa0293e40
[ 3615.442398] WARN: FS: 00007f54d2e7e700(0000)
GS:ffff880234e40000(0000) knlGS:0000000000000000
[ 3615.442399] WARN: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033
[ 3615.442400] WARN: CR2: 00007fda26b3a000 CR3: 00000002040aa000 CR4:
0000000000040660
[ 3615.442403] WARN: Stack:
[ 3615.442404] WARN: ffffffffa027b897 0000000000000044
0000000000000001 fffffffffffffffe
[ 3615.442406] WARN: ffff8800b0c60600 ffff88022b24a000
ffff880234e43798 ffffffffa027c497
[ 3615.442408] WARN: ffff88022ac63000 000d505400000002
0000000000000000 0000880200000b96
[ 3615.442410] WARN: Call Trace:
[ 3615.442411] WARN: <IRQ>
[ 3615.442419] WARN: [<ffffffffa027b897>] ? efx_tx_map_chunk+0x47/0x90
[sfc]
[ 3615.442427] WARN: [<ffffffffa027c497>] efx_enqueue_skb+0x7c7/0xcc0
[sfc]
[ 3615.442434] WARN: [<ffffffff81097f32>] ?
default_wake_function+0x12/0x20
[ 3615.442438] WARN: [<ffffffff810aafb2>] ?
autoremove_wake_function+0x12/0x40
[ 3615.442440] WARN: [<ffffffff810b1c81>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 3615.442447] WARN: [<ffffffffa027ca29>]
efx_hard_start_xmit+0x99/0xb0 [sfc]
[ 3615.442452] WARN: [<ffffffff814e8ee0>]
dev_hard_start_xmit+0x2b0/0x3f0
[ 3615.442455] WARN: [<ffffffff8150ac97>] sch_direct_xmit+0x97/0x1e0
[ 3615.442457] WARN: [<ffffffff814e937e>] __dev_queue_xmit+0x26e/0x4c0
[ 3615.442459] WARN: [<ffffffff814e95e0>] dev_queue_xmit+0x10/0x20
[ 3615.442464] WARN: [<ffffffffa04b7a35>] ovs_vport_send+0xb5/0xc0
[openvswitch]
[ 3615.442467] WARN: [<ffffffffa04ab257>] do_output.isra.28+0x57/0x170
[openvswitch]
[ 3615.442470] WARN: [<ffffffffa04ac582>]
do_execute_actions+0x10a2/0x1110 [openvswitch]
[ 3615.442473] WARN: [<ffffffffa04ac622>]
ovs_execute_actions+0x32/0xc0 [openvswitch]
[ 3615.442476] WARN: [<ffffffffa04afa43>]
ovs_dp_process_packet+0xd3/0xf0 [openvswitch]
[ 3615.442480] WARN: [<ffffffffa04b7340>] ovs_vport_receive+0x90/0xa0
[openvswitch]
[ 3615.442483] WARN: [<ffffffffa04b7340>] ?
ovs_vport_receive+0x90/0xa0 [openvswitch]
[ 3615.442487] WARN: [<ffffffff81076aa5>] ? irq_exit+0x85/0x90
[ 3615.442490] WARN: [<ffffffff814d4282>] ? __alloc_skb+0x72/0x230
[ 3615.442494] WARN: [<ffffffff811b2ceb>] ?
__slab_alloc.constprop.60+0x44/0x52
[ 3615.442497] WARN: [<ffffffff811a6bbd>] ?
__kmalloc_track_caller+0x4d/0x170
[ 3615.442499] WARN: [<ffffffff814d4282>] ? __alloc_skb+0x72/0x230
[ 3615.442501] WARN: [<ffffffff814d351d>] ?
__kmalloc_reserve.isra.30+0x2d/0x70
[ 3615.442505] WARN: [<ffffffffa04b8560>]
netdev_frame_hook+0x140/0x180 [openvswitch]
[ 3615.442507] WARN: [<ffffffff814e6ca7>]
__netif_receive_skb_core+0x577/0x8d0
[ 3615.442511] WARN: [<ffffffff8100e357>] ?
set_phys_to_machine+0x17/0x50
[ 3615.442513] WARN: [<ffffffff8100e694>] ?
set_foreign_p2m_mapping+0x304/0x330
[ 3615.442517] WARN: [<ffffffff815a521a>] ?
_raw_spin_unlock_irqrestore+0x1a/0x20
[ 3615.442519] WARN: [<ffffffff814e704e>] __netif_receive_skb+0x4e/0x60
[ 3615.442521] WARN: [<ffffffff814e70ad>]
netif_receive_skb_internal+0x4d/0x90
[ 3615.442523] WARN: [<ffffffff814d63df>] ?
skb_checksum_setup+0x2bf/0x2f0
[ 3615.442525] WARN: [<ffffffff814e7150>] netif_receive_skb+0x60/0x70
[ 3615.442530] WARN: [<ffffffff8146aa1c>] xenvif_tx_action+0x86c/0x950
[ 3615.442533] WARN: [<ffffffff810df152>] ?
tick_program_event+0x62/0x70
[ 3615.442535] WARN: [<ffffffff8146d2e9>] xenvif_poll+0x39/0x70
[ 3615.442537] WARN: [<ffffffff814e745f>] net_rx_action+0x12f/0x320
[ 3615.442540] WARN: [<ffffffff81076729>] __do_softirq+0x129/0x290
[ 3615.442542] WARN: [<ffffffff81076a62>] irq_exit+0x42/0x90
[ 3615.442546] WARN: [<ffffffff813c8bb5>]
xen_evtchn_do_upcall+0x35/0x50
[ 3615.442548] WARN: [<ffffffff815a74ee>]
xen_do_hypervisor_callback+0x1e/0x40
[ 3615.442549] WARN: <EOI>
[ 3615.442552] WARN: [<ffffffff810013aa>] ?
xen_hypercall_sched_op+0xa/0x20
[ 3615.442554] WARN: [<ffffffff810013aa>] ?
xen_hypercall_sched_op+0xa/0x20
[ 3615.442556] WARN: [<ffffffff8100c570>] ? xen_safe_halt+0x10/0x20
[ 3615.442559] WARN: [<ffffffff81020d67>] ? default_idle+0x57/0xf0
[ 3615.442561] WARN: [<ffffffff8102149f>] ? arch_cpu_idle+0xf/0x20
[ 3615.442563] WARN: [<ffffffff810ab322>] ? default_idle_call+0x32/0x40
[ 3615.442565] WARN: [<ffffffff810ab57c>] ?
cpu_startup_entry+0x1ec/0x330
[ 3615.442568] WARN: [<ffffffff81013dd8>] ?
cpu_bringup_and_idle+0x18/0x20
[ 3615.442569] WARN: Code: 48 89 e5 c1 e0 0d 05 18 0a 00 00 21 d1 48 03
86 80 00 00 00 89 08 89 97 30 01 00 00 5d c3 66 66 66 66 2e 0f 1f 84 00 00
00 00 00 <0f> 1f 44 00 00 55 81 fa ff 3f 00 00 89 d0 48 89 e5 76 0e 48 8d
[ 3615.442589] EMERG: Kernel panic - not syncing: softlockup: hung tasks


<https://discussions.citrix.com/topic/401553-xenserver-75-crash-kernel-panic/?do=findComment&comment=2035186>

*dom0.log*

Call Trace:
[ffffffff810014aa] xen_hypercall_kexec_op+0xa/0x20
ffffffff81156fd4 panic+0xfa/0x241
ffffffff8110c5b4 watchdog_timer_fn+0x1a4/0x1d0
ffffffff8110c410 watchdog_timer_fn+0/0x1d0
ffffffff810d1f14 __hrtimer_run_queues+0x134/0x250
ffffffff810d2346 hrtimer_interrupt+0xa6/0x180
ffffffff8100c82e xen_timer_interrupt+0x2e/0x130
ffffffff8140c00d add_interrupt_randomness+0x18d/0x1a0
ffffffff810c053f handle_irq_event_percpu+0x7f/0x1e0
ffffffff810c3a8a handle_percpu_irq+0x3a/0x50
ffffffff810bfd42 generic_handle_irq+0x22/0x30
ffffffff813c9d8b __evtchn_fifo_handle_events+0x14b/0x170
ffffffff813c9dc0 evtchn_fifo_handle_events+0x10/0x20
ffffffff813c6dda __xen_evtchn_do_upcall+0x4a/0x80
ffffffff813c8bb0 xen_evtchn_do_upcall+0x30/0x50
ffffffff815a74ee xen_do_hypervisor_callback+0x1e/0x40
ffffffff81097f32 default_wake_function+0x12/0x20
ffffffff810aafb2 autoremove_wake_function+0x12/0x40
ffffffff810b1c81 __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
ffffffff814e8ee0 dev_hard_start_xmit+0x2b0/0x3f0
ffffffff8150ac97 sch_direct_xmit+0x97/0x1e0
ffffffff814e937e __dev_queue_xmit+0x26e/0x4c0
ffffffff814e95e0 dev_queue_xmit+0x10/0x20
ffffffff81076aa5 irq_exit+0x85/0x90
ffffffff814d4282 __alloc_skb+0x72/0x230
ffffffff811b2ceb __slab_alloc.constprop.60+0x44/0x52
ffffffff811a6bbd __kmalloc_track_caller+0x4d/0x170
ffffffff814d4282 __alloc_skb+0x72/0x230
ffffffff814d351d __kmalloc_reserve.isra.30+0x2d/0x70
ffffffff814e6ca7 __netif_receive_skb_core+0x577/0x8d0
ffffffff8100e357 set_phys_to_machine+0x17/0x50
ffffffff8100e694 set_foreign_p2m_mapping+0x304/0x330
ffffffff815a521a _raw_spin_unlock_irqrestore+0x1a/0x20
ffffffff814e704e __netif_receive_skb+0x4e/0x60
ffffffff814e70ad netif_receive_skb_internal+0x4d/0x90
ffffffff814d63df skb_checksum_setup+0x2bf/0x2f0
ffffffff814e7150 netif_receive_skb+0x60/0x70
ffffffff8146aa1c xenvif_tx_action+0x86c/0x950
ffffffff810df152 tick_program_event+0x62/0x70
ffffffff8146d2e9 xenvif_poll+0x39/0x70
ffffffff814e745f net_rx_action+0x12f/0x320
ffffffff81076729 __do_softirq+0x129/0x290
ffffffff81076a62 irq_exit+0x42/0x90
ffffffff813c8bb5 xen_evtchn_do_upcall+0x35/0x50
ffffffff815a74ee xen_do_hypervisor_callback+0x1e/0x40

--
Derick Fontes
dbfontes@gmail.com