Mailing List Archive

Performance degradation in 4.15 and above
Hi,

I have an old Proliant DL380 server running Gentoo Linux as Dom0 on Xen
with several DomUs also running Gentoo Linux. After upgrading to 4.15 I
have noticed that in some of the DomUs (that are used as Kubernetes
nodes) the load slowly keeps climbing until it reaches a level that the
DomU becomes unresponsive and needs to be restarted. This issue is not
present when running on Xen 4.14 and went away once I downgraded bask to
4.14. The same issue presented itself again after upgrading to 4.16.

According to some Munin graphs the load increases by 2-4 per day, but as
far as I can tell nothing else really changes (CPU usage, number of
processes - ) so I don't really have an idea what is causing the issue.

Both the Dom0 and DomUs are running on a hardened-gentoo kernel version
5.10.156 (see the attached .config).

If anyone has any pointers regarding where to look or what can be
tweaked, I would be grateful for the information.

Regards,
Gabor
Re: Performance degradation in 4.15 and above [ In reply to ]
On Fri, May 19, 2023 at 11:19?AM Gabor Hudiczius <ghudiczius@gmail.com>
wrote:

> Hi,
>
> I have an old Proliant DL380 server running Gentoo Linux as Dom0 on Xen
> with several DomUs also running Gentoo Linux. After upgrading to 4.15 I
> have noticed that in some of the DomUs (that are used as Kubernetes
> nodes) the load slowly keeps climbing until it reaches a level that the
> DomU becomes unresponsive and needs to be restarted. This issue is not
> present when running on Xen 4.14 and went away once I downgraded bask to
> 4.14. The same issue presented itself again after upgrading to 4.16.
>
> According to some Munin graphs the load increases by 2-4 per day, but as
> far as I can tell nothing else really changes (CPU usage, number of
> processes - ) so I don't really have an idea what is causing the issue.
>
> Both the Dom0 and DomUs are running on a hardened-gentoo kernel version
> 5.10.156 (see the attached .config).
>
> If anyone has any pointers regarding where to look or what can be
> tweaked, I would be grateful for the information.
>
> Regards,
> Gabor
>


Hello Gabor,
I remember having these problems:
- with credit2 scheduler
- kernel 5.15 in some point of time (around kernel 5.15.32), but is ok with
current versions.

Tomas
Re: Performance degradation in 4.15 and above [ In reply to ]
On 2023-05-19 11:48, Tomas Mozes wrote:
>
>
> On Fri, May 19, 2023 at 11:19?AM Gabor Hudiczius
> <ghudiczius@gmail.com> wrote:
>
> Hi,
>
> I have an old Proliant DL380 server running Gentoo Linux as Dom0
> on Xen
> with several DomUs also running Gentoo Linux. After upgrading to
> 4.15 I
> have noticed that in some of the DomUs (that are used as Kubernetes
> nodes) the load slowly keeps climbing until it reaches a level
> that the
> DomU becomes unresponsive and needs to be restarted. This issue is
> not
> present when running on Xen 4.14 and went away once I downgraded
> bask to
> 4.14. The same issue presented itself again after upgrading to 4.16.
>
> According to some Munin graphs the load increases by 2-4 per day,
> but as
> far as I can tell nothing else really changes (CPU usage, number of
> processes - ) so I don't really have an idea what is causing the
> issue.
>
> Both the Dom0 and DomUs are running on a hardened-gentoo kernel
> version
> 5.10.156 (see the attached .config).
>
> If anyone has any pointers regarding where to look or what can be
> tweaked, I would be grateful for the information.
>
> Regards,
> Gabor
>
>
>
> Hello Gabor,
> I remember having these problems:
> - with credit2 scheduler
I am using the credit scheduler since after upgrading to 4.12 my box
stalled several times and I followed the recommendation from the Gentoo
wiki
(https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_Xen_4.12.2B)
which seemed to solve the issue.
> - kernel 5.15 in some point of time (around kernel 5.15.32), but is ok
> with current versions.
>
> Tomas
Re: Performance degradation in 4.15 and above [ In reply to ]
On Fri, May 19, 2023 at 1:04?PM Gabor Hudiczius <ghudiczius@gmail.com>
wrote:

> On 2023-05-19 11:48, Tomas Mozes wrote:
>
>
>
> On Fri, May 19, 2023 at 11:19?AM Gabor Hudiczius <ghudiczius@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have an old Proliant DL380 server running Gentoo Linux as Dom0 on Xen
>> with several DomUs also running Gentoo Linux. After upgrading to 4.15 I
>> have noticed that in some of the DomUs (that are used as Kubernetes
>> nodes) the load slowly keeps climbing until it reaches a level that the
>> DomU becomes unresponsive and needs to be restarted. This issue is not
>> present when running on Xen 4.14 and went away once I downgraded bask to
>> 4.14. The same issue presented itself again after upgrading to 4.16.
>>
>> According to some Munin graphs the load increases by 2-4 per day, but as
>> far as I can tell nothing else really changes (CPU usage, number of
>> processes - ) so I don't really have an idea what is causing the issue.
>>
>> Both the Dom0 and DomUs are running on a hardened-gentoo kernel version
>> 5.10.156 (see the attached .config).
>>
>> If anyone has any pointers regarding where to look or what can be
>> tweaked, I would be grateful for the information.
>>
>> Regards,
>> Gabor
>>
>
>
> Hello Gabor,
> I remember having these problems:
> - with credit2 scheduler
>
> I am using the credit scheduler since after upgrading to 4.12 my box
> stalled several times and I followed the recommendation from the Gentoo
> wiki (https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_Xen_4.12.2B)
> which seemed to solve the issue.
>
> - kernel 5.15 in some point of time (around kernel 5.15.32), but is ok
> with current versions.
>
> Tomas
>
>
>
Maybe try a newer kernel? I'm using 5.15 LTS and it worked fine with Xen
4.15, just upgrading to Xen 4.16 on Gentoo.

Best regards,
Tomas
Re: Performance degradation in 4.15 and above [ In reply to ]
Another thing that came to my mind, the lockups occurred when the grant
table was full.

domU config:
max_grant_frames = 256

grub config:
GRUB_CMDLINE_XEN="gnttab_max_frames=256 sched=credit ..."

You can check it with:
xen-diag gnttab_query_size [domid]

On Fri, May 19, 2023 at 1:04?PM Gabor Hudiczius <ghudiczius@gmail.com>
wrote:

> On 2023-05-19 11:48, Tomas Mozes wrote:
>
>
>
> On Fri, May 19, 2023 at 11:19?AM Gabor Hudiczius <ghudiczius@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have an old Proliant DL380 server running Gentoo Linux as Dom0 on Xen
>> with several DomUs also running Gentoo Linux. After upgrading to 4.15 I
>> have noticed that in some of the DomUs (that are used as Kubernetes
>> nodes) the load slowly keeps climbing until it reaches a level that the
>> DomU becomes unresponsive and needs to be restarted. This issue is not
>> present when running on Xen 4.14 and went away once I downgraded bask to
>> 4.14. The same issue presented itself again after upgrading to 4.16.
>>
>> According to some Munin graphs the load increases by 2-4 per day, but as
>> far as I can tell nothing else really changes (CPU usage, number of
>> processes - ) so I don't really have an idea what is causing the issue.
>>
>> Both the Dom0 and DomUs are running on a hardened-gentoo kernel version
>> 5.10.156 (see the attached .config).
>>
>> If anyone has any pointers regarding where to look or what can be
>> tweaked, I would be grateful for the information.
>>
>> Regards,
>> Gabor
>>
>
>
> Hello Gabor,
> I remember having these problems:
> - with credit2 scheduler
>
> I am using the credit scheduler since after upgrading to 4.12 my box
> stalled several times and I followed the recommendation from the Gentoo
> wiki (https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_Xen_4.12.2B)
> which seemed to solve the issue.
>
> - kernel 5.15 in some point of time (around kernel 5.15.32), but is ok
> with current versions.
>
> Tomas
>
>
>
Re: Performance degradation in 4.15 and above [ In reply to ]
On 2023-05-23 10:16, Tomas Mozes wrote:
> Another thing that came to my mind, the lockups occurred when the
> grant table was full.
>
> domU config:
> max_grant_frames = 256
>
> grub config:
> GRUB_CMDLINE_XEN="gnttab_max_frames=256 sched=credit ..."
>
> You can check it with:
> xen-diag gnttab_query_size [domid]
for Dom0 nr_frames is 1, for the DomUs it's between 15-30 while the
max_nr_frames is 64 for all
>
> On Fri, May 19, 2023 at 1:04?PM Gabor Hudiczius <ghudiczius@gmail.com>
> wrote:
>
> On 2023-05-19 11:48, Tomas Mozes wrote:
>>
>>
>> On Fri, May 19, 2023 at 11:19?AM Gabor Hudiczius
>> <ghudiczius@gmail.com> wrote:
>>
>> Hi,
>>
>> I have an old Proliant DL380 server running Gentoo Linux as
>> Dom0 on Xen
>> with several DomUs also running Gentoo Linux. After upgrading
>> to 4.15 I
>> have noticed that in some of the DomUs (that are used as
>> Kubernetes
>> nodes) the load slowly keeps climbing until it reaches a
>> level that the
>> DomU becomes unresponsive and needs to be restarted. This
>> issue is not
>> present when running on Xen 4.14 and went away once I
>> downgraded bask to
>> 4.14. The same issue presented itself again after upgrading
>> to 4.16.
>>
>> According to some Munin graphs the load increases by 2-4 per
>> day, but as
>> far as I can tell nothing else really changes (CPU usage,
>> number of
>> processes - ) so I don't really have an idea what is causing
>> the issue.
>>
>> Both the Dom0 and DomUs are running on a hardened-gentoo
>> kernel version
>> 5.10.156 (see the attached .config).
>>
Tried with kernel version 5.15.110, but that did not help, I will give
6.1.28 a try as well
>
>>
>> If anyone has any pointers regarding where to look or what
>> can be
>> tweaked, I would be grateful for the information.
>>
>> Regards,
>> Gabor
>>
>>
>>
>> Hello Gabor,
>> I remember having these problems:
>> - with credit2 scheduler
> I am using the credit scheduler since after upgrading to 4.12 my
> box stalled several times and I followed the recommendation from
> the Gentoo wiki
> (https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_Xen_4.12.2B)
> which seemed to solve the issue.
>> - kernel 5.15 in some point of time (around kernel 5.15.32), but
>> is ok with current versions.
>>
>> Tomas
>
>
I also noticed that restarting the DomUs has little to no effect on the
load, only restarting the Dom0 decreases the load back to normal levels