Mailing List Archive

1 2  View All
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On Mon, Feb 17, 2020 at 10:49 AM Sarah Newman <srn@prgmr.com> wrote:
> On 2/17/20 10:33 AM, Tomas Mozes wrote:
> > Just a quick note - no stall after switching to credit scheduler on xen 4.12 after 3 days.
> That's great news. By 4.12 do you mean release 4.12.1, 4.12.2, or something else?
> I'm assuming when "PGNet Dev" reported 4.12 being bad and 4.13 being good, they were using the default scheduler of credit 2.

I hope they respond with details! :-)

> It's worth asking on xen-devel if there's a known bug in the credit 2 scheduler that's been fixed. It looks like there were some significant changes
> to the scheduling code in between Xen 4.12 and Xen 4.13, and if one was a fix I'm not sure it would have been recognized as being so.

Sarah, Tomas -

Is that something one of you wants to do? If not, I'm happy to take
that task, but don't want to step on toes.

In light of this report, I've added sched=credit to my bootloader, for
the *next* time, on my 4.12 production host. The guest on that host
- which is my production machine and which I am not stress testing -
has now been up for 8 days (typical when not stress testing, it lasts
for 3-14 days). Rather than rebooting to sched=credit now, I'm still
hoping it will stall again, so I can run the commands Sarah asked....
although I wonder if it's still worth it given what we're finding???

Sarah -

If you do feel it's worth it, I'm happy to wait. Here are the
commands I have lined up to run on the physical host (current guest
id=10) when the guest stalls next:

xl sysrq 10 l
xl sysrq 10 x
xl debug-keys q
xl dmesg
xl info

Is this right? Are there any other debugging commands I can/should
run on the host or guest when it stalls next? Anything that might be
useful I'm happy to grab, but since it might be 2AM I want to line
them all up in a file (as I have above) so I don't have to hunt while
trying to stay awake. :-)

After it stalls next and I grab the debugging output suggested, I'll
reboot the physical host into sched=credit for the production guest.

My test host/guest I'm going to leave on 15.0/4.10 for now - since
it's my future production host - until I do more testing on that
configuration and/or until we get this nailed down.

Glen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On Monday, February 17, 2020, Sarah Newman <srn@prgmr.com> wrote:
> On 2/17/20 10:33 AM, Tomas Mozes wrote:
>
>> Just a quick note - no stall after switching to credit scheduler on xen
>> 4.12 after 3 days.
>
> That's great news. By 4.12 do you mean release 4.12.1, 4.12.2, or
something else?

The latest 4.12.2 release.

>
> I'm assuming when "PGNet Dev" reported 4.12 being bad and 4.13 being
good, they were using the default scheduler of credit 2.
>
> It's worth asking on xen-devel if there's a known bug in the credit 2
scheduler that's been fixed. It looks like there were some significant
changes to the scheduling code in between Xen 4.12 and Xen 4.13, and if one
was a fix I'm not sure it would have been recognized as being so.

Maybe it's a different issue, because the domu host hanged on 4.13 too.

>
> --Sarah
>
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
> I'm assuming when "PGNet Dev" reported 4.12 being bad and 4.13 being good, they were using the default scheduler of credit 2.

yep, with Xen 4.13.x here,

grep credit /boot/grub2/xen-4.13.0_04-lp151.688.cfg
options=... sched=credit2 ...

explicitly set since ... not exactly sure, unfortunately.



_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On Mon, Feb 17, 2020 at 11:35 AM Tomas Mozes <hydrapolic@gmail.com> wrote:
> On Monday, February 17, 2020, Sarah Newman <srn@prgmr.com> wrote:
> > On 2/17/20 10:33 AM, Tomas Mozes wrote:
> >> Just a quick note - no stall after switching to credit scheduler on xen
> >> 4.12 after 3 days.
> > That's great news. By 4.12 do you mean release 4.12.1, 4.12.2, or something else?
> The latest 4.12.2 release.
> Maybe it's a different issue, because the domu host hanged on 4.13 too.

So given the success I had with Xen 4.10, and feeling confident that
that is *a* solution.... and given the success Tomas is having with
the original credit scheduler, and wanting to know if that is a better
solution.... and since I have until Friday to play a bit with this
setup...

I took my test host forward again to Open Suse 15.1 (Xen 4.12), and
activated the original credit scheduler sched=credit, and am
re-running my stress test on my guest again. Both host and guest are
now on OpenSuSE's "latest" primary release versions:

OpenSuse 15.1
Linux 4.12.14-lp151.28.36-default x86_64
Xen version 4.12.1_06-lp151.2.9

So far I'm noticing the lower load averages similar I saw on the guest
when it was running under 4.10, which I know is anecdotal, but I'm
still hopeful.

I'm also going to go ahead and apply sched=credit to my production
host on a reboot tonight.

I will report when anything stalls (or completes!)

Glen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On 17.02.20 19:49, Sarah Newman wrote:
> On 2/17/20 10:33 AM, Tomas Mozes wrote:
>
>> Just a quick note - no stall after switching to credit scheduler on xen
>> 4.12 after 3 days.
>
> That's great news. By 4.12 do you mean release 4.12.1, 4.12.2, or
> something else?
>
> I'm assuming when "PGNet Dev" reported 4.12 being bad and 4.13 being
> good, they were using the default scheduler of credit 2.
>
> It's worth asking on xen-devel if there's a known bug in the credit 2
> scheduler that's been fixed. It looks like there were some significant
> changes to the scheduling code in between Xen 4.12 and Xen 4.13, and if
> one was a fix I'm not sure it would have been recognized as being so.

It was me doing the massive scheduling changes in 4.13, and I'm pretty
sure I marked the fixes of existing bugs as such.

All fixes have been backported to 4.12 and are included in 4.12.2.


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On Mon, Feb 17, 2020 at 10:20 PM Jürgen Groß <jgross@suse.com> wrote:
> It was me doing the massive scheduling changes in 4.13, and I'm pretty
> sure I marked the fixes of existing bugs as such.
> All fixes have been backported to 4.12 and are included in 4.12.2.
> Juergen

Juergen -

Thank you for this!

Noting the @suse.com in your email address, I'm wondering if you know
whether 4.12.2 will be pushed out as an update to OpenSuse 15.1?

Right now it includes 4.12.1...

Glen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: Xen 4.12 DomU hang / freeze / stall under high network/disk load [ In reply to ]
On 18.02.20 21:55, Glen wrote:
> On Mon, Feb 17, 2020 at 10:20 PM Jürgen Groß <jgross@suse.com> wrote:
>> It was me doing the massive scheduling changes in 4.13, and I'm pretty
>> sure I marked the fixes of existing bugs as such.
>> All fixes have been backported to 4.12 and are included in 4.12.2.
>> Juergen
>
> Juergen -
>
> Thank you for this!
>
> Noting the @suse.com in your email address, I'm wondering if you know
> whether 4.12.2 will be pushed out as an update to OpenSuse 15.1?

Should happen.


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users

1 2  View All