Mailing List Archive

general protection fault on vif50.1-q1-guest
Hello,

I had recently a general protection fault on a Debian 8 server with
Xen (debian pacakge: 4.4.4lts4-0+deb8u1) and it looks like it was Xen
related because it showed the vif50.1-q1-guest kernel process in the
kernel log. I have copied the kernel log below in this mail for
reference. After this GPF the system was still responding but one domU
lost network connectivity and all the others where still working
properly. I decided to power-off and power-on the system as a soft GPF
renders the system in an unstable state.

Now I am trying to find out what is most likely the cause of this
general protection fault in order to avoid that again in the future
and would like your opinion on that:

- is this maybe a bug in Xen?
- a bug in the kernel used by Debian?
- a hardware issue?
- if it is a hardware issue, what is most likely? RAM? CPU?
- anything else I am missing?

Note that the hardware is enterprise grade hardware and that the BIOS
has been updated to the latest available version.The CPUs (dual CPU)
are Intel Xeon E5-2640 v3 @ 2.60GHz.

Thank you for your input.

Best regards,
John
Re: general protection fault on vif50.1-q1-guest [ In reply to ]
Here would be the kernel log:

May 6 15:22:04 server kernel: [20553234.022182] general protection
fault: 0000 [#1] SMP
May 6 15:22:04 server kernel: [20553234.126246] CPU: 0 PID: 8305
Comm: vif50.1-q1-gues Tainted: P O 3.16.0-10-amd64 #1
Debian 3.16.72-1
May 6 15:22:04 server kernel: [20553234.137328] Hardware name: Quanta
Computer Inc QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B12 05/30/2019
May 6 15:22:04 server kernel: [20553234.147371] task:
ffff88003c9f95d0 ti: ffff88004a3ac000 task.ti: ffff88004a3ac000
May 6 15:22:04 server kernel: [20553234.155951] RIP:
e030:[<ffffffffa08fcaa2>] [<ffffffffa08fcaa2>]
xenvif_gop_frag_copy+0x22/0x3b0 [xen_netback]
May 6 15:22:04 server kernel: [20553234.167089] RSP:
e02b:ffff88004a3afd98 EFLAGS: 00010282
May 6 15:22:04 server kernel: [20553234.173501] RAX: 0000000000001000
RBX: ffff8802e0841800 RCX: 7aec7d18f3f45689
May 6 15:22:04 server kernel: [20553234.181771] RDX: ffff88004a3afe80
RSI: ffff8802e0841800 RDI: 0000000111f703b7
May 6 15:22:04 server kernel: [20553234.189991] RBP: ffffc9002332c258
R08: 000000005ff8d9a9 R09: 00000000b1fe2a0e
May 6 15:22:04 server kernel: [20553234.198209] R10: ffff880000000000
R11: 0000000000000002 R12: 7aec7d18f3f45689
May 6 15:22:04 server kernel: [20553234.206413] R13: ffffc9002332c258
R14: ffff88004a3afe54 R15: 0000000000000001
May 6 15:22:04 server kernel: [20553234.214628] FS:
0000000000000000(0000) GS:ffff880484000000(0000)
knlGS:ffff880484000000
May 6 15:22:04 server kernel: [20553234.223800] CS: e033 DS: 0000
ES: 0000 CR0: 0000000080050033
May 6 15:22:04 server kernel: [20553234.230619] CR2: 00007f49c8679000
CR3: 0000000074855000 CR4: 0000000000042660
May 6 15:22:04 server kernel: [20553234.238829] Stack:
May 6 15:22:04 server kernel: [20553234.241914] 0000000058f6d400
ffffc90023336c08 00000000000002c0 ffff8802e0841800
May 6 15:22:04 server kernel: [20553234.250476] ffff88004a3afe80
0000000000000080 ffff8802e0841800 ffffc9002332c258
May 6 15:22:04 server kernel: [20553234.259046] 79eb3472cad61644
0000000000000028 ffff88004a3afe54 0000000000000001
May 6 15:22:04 server kernel: [20553234.267615] Call Trace:
May 6 15:22:04 server kernel: [20553234.271146] [<ffffffffa08ff2c9>]
? xenvif_kthread_guest_rx+0x549/0xce0 [xen_netback]
May 6 15:22:04 server kernel: [20553234.280122] [<ffffffffa08fed80>]
? xenvif_map_frontend_rings+0xd0/0xd0 [xen_netback]
May 6 15:22:04 server kernel: [20553234.289052] [<ffffffff810905d1>]
? kthread+0xd1/0xf0
May 6 15:22:04 server kernel: [20553234.295279] [<ffffffff8153be8f>]
? __schedule+0x22f/0x750
May 6 15:22:04 server kernel: [20553234.301822] [<ffffffff81090500>]
? kthread_create_on_node+0x1b0/0x1b0
May 6 15:22:04 server kernel: [20553234.309413] [<ffffffff8154030e>]
? ret_from_fork+0x6e/0xa0
May 6 15:22:04 server kernel: [20553234.316051] [<ffffffff81090500>]
? kthread_create_on_node+0x1b0/0x1b0
May 6 15:22:04 server kernel: [20553234.323643] Code: 2e 0f 1f 84 00
00 00 00 00 0f 1f 44 00 00 41 57 41 56 b8 00 10 00 00 41 55 41 54 49
89 cc 55 53 49 89 fd 4b 8d 3c 08 48 83 ec 30 <48> 8b 09 4c 8b 74 24 68
4
c 8b 7c 24 70 80 e5 40 74 08 49 8b 4c
May 6 15:22:04 server kernel: [20553234.345693] RIP
[<ffffffffa08fcaa2>] xenvif_gop_frag_copy+0x22/0x3b0 [xen_netback]
May 6 15:22:04 server kernel: [20553234.354480] RSP <ffff88004a3afd98>
May 6 15:22:35 server kernel: [20553265.003074] ---[ end trace
4fb039a0de2de66f ]---

On Fri, May 8, 2020 at 9:15 AM John Naggets <hostingnuggets@gmail.com> wrote:
>
> Hello,
>
> I had recently a general protection fault on a Debian 8 server with
> Xen (debian pacakge: 4.4.4lts4-0+deb8u1) and it looks like it was Xen
> related because it showed the vif50.1-q1-guest kernel process in the
> kernel log. I have copied the kernel log below in this mail for
> reference. After this GPF the system was still responding but one domU
> lost network connectivity and all the others where still working
> properly. I decided to power-off and power-on the system as a soft GPF
> renders the system in an unstable state.
>
> Now I am trying to find out what is most likely the cause of this
> general protection fault in order to avoid that again in the future
> and would like your opinion on that:
>
> - is this maybe a bug in Xen?
> - a bug in the kernel used by Debian?
> - a hardware issue?
> - if it is a hardware issue, what is most likely? RAM? CPU?
> - anything else I am missing?
>
> Note that the hardware is enterprise grade hardware and that the BIOS
> has been updated to the latest available version.The CPUs (dual CPU)
> are Intel Xeon E5-2640 v3 @ 2.60GHz.
>
> Thank you for your input.
>
> Best regards,
> John
Re: general protection fault on vif50.1-q1-guest [ In reply to ]
On 5/8/20 12:15 AM, John Naggets wrote:
> Hello,
>
> I had recently a general protection fault on a Debian 8 server with
> Xen (debian pacakge: 4.4.4lts4-0+deb8u1) and it looks like it was Xen
> related because it showed the vif50.1-q1-guest kernel process in the
> kernel log. I have copied the kernel log below in this mail for
> reference. After this GPF the system was still responding but one domU
> lost network connectivity and all the others where still working
> properly. I decided to power-off and power-on the system as a soft GPF
> renders the system in an unstable state.
>
> Now I am trying to find out what is most likely the cause of this
> general protection fault in order to avoid that again in the future
> and would like your opinion on that:
>
> - is this maybe a bug in Xen?
> - a bug in the kernel used by Debian?
> - a hardware issue?
> - if it is a hardware issue, what is most likely? RAM? CPU?
> - anything else I am missing?

Are you able to try a newer version? You're unlikely to get interest in debugging a version that has been out of support for 3 years:
https://wiki.xenproject.org/wiki/Xen_Project_Release_Features