Mailing List Archive

[Bug 1757] Dom0 hangs after few hours when using pci passthrough
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #1 from darkbasic4@gmail.com 2011-04-05 07:39 -------
Unfortunately I can't easily reproduce it, so please let me know ho to enable
all the debugging information you need and I will leave a pc attached to the
serial console until it crashes.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #2 from konrad.wilk@oracle.com 2011-04-05 08:27 -------
Look in http://wiki.xensource.com/xenwiki/XenParavirtOps, troubleshooting
section and enable those options. Also look in the serial Wiki


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #3 from darkbasic4@gmail.com 2011-04-08 18:30 -------
Created an attachment (id=992)
--> (http://bugzilla.xensource.com/bugzilla/attachment.cgi?id=992&action=view)
Oops when booting

I had no problems for two days, then I rebooted and I get an Oops. I rebooted
again, same problem. Once more, another Oops. I booted without pci passthrough:
no problem. I booted again with pci passthrough: the Oops disappeared.

I attached the log. I'm using debian stock kernel. If it doesn't have enough
debugging options please let me know and I will compile a new one.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #4 from darkbasic4@gmail.com 2011-04-13 03:13 -------
Up. It's a production server, we cannot use Xen if it isn't reliable.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #5 from konrad.wilk@oracle.com 2011-04-14 06:13 -------
(In reply to comment #3)
> Created an attachment (id=992)
--> (http://bugzilla.xensource.com/bugzilla/attachment.cgi?id=992&action=view) [edit]
> Oops when booting
>
> I had no problems for two days, then I rebooted and I get an Oops. I rebooted
> again, same problem. Once more, another Oops. I booted without pci passthrough:
> no problem. I booted again with pci passthrough: th

I am really confused. It sounds like you have a bunch of Oops-es and the some
of them show up depending on how you use the machine.

But the original bug was that you launched an guest, and two days in it the
dom0 (or Xen hypervisor) crashes - but I am not seeing any serial logs from
that? Does it still happen?


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #6 from darkbasic4@gmail.com 2011-04-14 06:41 -------
The original but happened two more times when I wasn't logging, but since I
attached the serial console 24/7 it doesn't want to crash anymore :/
I'm pretty pretty sure it will crash again as soon as I will detach the serial
console because of the Murphy's law -.-

About the new one (the Oops when booting), did you find the source of the
problem? It is quite important to solve it too because in case of a power
failure the machine will shutdown automatically and it will have to power on
without any user intervention.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #7 from konrad.wilk@oracle.com 2011-04-14 07:03 -------
(In reply to comment #6)
> The original but happened two more times when I wasn't logging, but since I
> attached the serial console 24/7 it doesn't want to crash anymore :/
> I'm pretty pretty sure it will crash again as soon as I will detach the serial
> console because of the Murphy's law -.-

Oooh, that actually isn't that strange. If you have a serial console the
interrupt delievery system gets update on a regular "hearbeat" - even if you
are not typing anything. On my Xyplex it keeps on sending empty characters even
if I am not typing anything.

>
> About the new one (the Oops when booting), did you find the source of the
> problem? It is quite important to solve it too because in case of a power
> failure the machine will shutdown automatically and it will have to power on
> without any user intervention.
>

Does it happen if you do 'poweroff -h -f' (that is a heavy handed way of
shutting the machine where it does not even sync - so make sure you mount your
drivers RO before you do this).

Does reboot work reliably?


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #8 from darkbasic4@gmail.com 2011-04-14 07:24 -------
> Oooh, that actually isn't that strange. If you have a serial console the
> interrupt delievery system gets update on a regular "hearbeat" - even if you
> are not typing anything. On my Xyplex it keeps on sending empty characters even
> if I am not typing anything.

So is there no way to let it crash while still keep logging to debug it?


> Does it happen if you do 'poweroff -h -f' (that is a heavy handed way of
> shutting the machine where it does not even sync - so make sure you mount your
> drivers RO before you do this).

I will check.


> Does reboot work reliably?

Usually both reboot and shutdown do work flawlessly. I don't know why sometimes
pci passthrough does trigger such strange problems, if they were easily
reproducible I would have been happy :(


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #9 from darkbasic4@gmail.com 2011-04-14 07:27 -------
> So is there no way to let it crash while still keep logging to debug it?

I will start disabling the serial console from /etc/inittab so I will no
receive tons of "INIT: Id "T0" respawning too fast: disabled for 5 minutes"
anymore.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #10 from konrad.wilk@oracle.com 2011-04-14 07:39 -------
(In reply to comment #9)
> > So is there no way to let it crash while still keep logging to debug it?
>
> I will start disabling the serial console from /etc/inittab so I will no
> receive tons of "INIT: Id "T0" respawning too fast: disabled for 5 minutes"
> anymore.
>
Yes there is. It just looks as if the kernel is stuck somewhere. You can use
the Alt-SysRQ to figure out where in the Linux kernel it is stuck. Or if it is
stuck in the hypervisor.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #11 from darkbasic4@gmail.com 2011-04-14 08:21 -------
> You can use the Alt-SysRQ to figure out where in the Linux kernel it is stuck.
> Or if it is stuck in the hypervisor.

Magic Sysrq keys did always work (I saw the output in the console), even if
'Alt'+'PrintScrn/SysRq'+'q' didn't really reboot the system, I just saw the
message in the console.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #12 from konrad.wilk@oracle.com 2011-04-14 08:39 -------
(In reply to comment #11)
> > You can use the Alt-SysRQ to figure out where in the Linux kernel it is stuck.
> > Or if it is stuck in the hypervisor.
>
> Magic Sysrq keys did always work (I saw the output in the console), even if
> 'Alt'+'PrintScrn/SysRq'+'q' didn't really reboot the system, I just saw the
> message in the console.
>

OK, can you use that when the dom0 is hung during shutdown then to see why it
can't turn itself off? One of the options is to print the stack trace (and
there is also a correspoding option with the Xen hypervisor).


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #13 from darkbasic4@gmail.com 2011-04-14 08:55 -------
> One of the options is to print the stack trace (and there is also a
> correspoding option with the Xen hypervisor).

Can you please tell me which ones should I use? Oops aren't easily
reproducible, so I don't want to waste them because I did something wrong on my
side.

Thank you


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs
[Bug 1757] Dom0 hangs after few hours when using pci passthrough [ In reply to ]
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1757





------- Comment #14 from konrad.wilk@oracle.com 2011-04-14 09:17 -------
(In reply to comment #13)
> > One of the options is to print the stack trace (and there is also a
> > correspoding option with the Xen hypervisor).
>
> Can you please tell me which ones should I use? Oops aren't easily
> reproducible, so I don't want to waste them because I did something wrong on

Just go through all of them.


--
Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

_______________________________________________
Xen-bugs mailing list
Xen-bugs@lists.xensource.com
http://lists.xensource.com/xen-bugs