Mailing List Archive: Null scheduler and vwfi native problem

Null scheduler and vwfi native problem

anders.tornqvist at codiax

Jan 21, 2021, 2:54 AM

Post #1 of 31 (5 views)

Hi,

I see a problem with destroy and restart of a domain. Interrupts are not
available when trying to restart a domain.

The situation seems very similar to the thread "null scheduler bug"
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.

The target system is a iMX8-based ARM board and Xen is a 4.13.0 version
built from https://source.codeaurora.org/external/imx/imx-xen.git.

Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but
then when trying to restart the domain
again with a "xl create <domain cfg>" I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the
domain without complaining about the IRQ.

Inspired from the discussion in the thread above I have put printks in
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of
complete_domain_destroy function\n") in the beginning.

With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.

I have also applied the patch from
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html.
This does seem to change the results.

Starting the system without "sched=null vwfi=native" does not result in
the problem.

BR
Anders

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 21, 2021, 10:32 AM

Post #2 of 31 (5 views)

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> Hi,
>
Hello,

> I see a problem with destroy and restart of a domain. Interrupts are
> not
> available when trying to restart a domain.
>
> The situation seems very similar to the thread "null scheduler bug"
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> .
>
Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?

> The target system is a iMX8-based ARM board and Xen is a 4.13.0
> version
> built from https://source.codeaurora.org/external/imx/imx-xen.git.
>
Mmm, perhaps it's me, but neither going at that url with a browser not
trying to clone it, I do not see anything. What I'm doing wrong?

> Xen is booted with sched=null vwfi=native.
> One physical CPU core is pinned to the domu.
> Some interrupts are passed through to the domu.
>
Ok, I guess it is involved, since you say "some interrupts are passed
through..."

> When destroying the domain with xl destroy etc it does not complain
> but
> then when trying to restart the domain
> again with a "xl create <domain cfg>" I get:
> (XEN) IRQ 210 is already used by domain 1
>
> "xl list" does not contain the domain.
>
> Repeating the "xl create" command 5-10 times eventually starts the
> domain without complaining about the IRQ.
>
> Inspired from the discussion in the thread above I have put printks
> in
> the xen/common/domain.c file.
> In the function domain_destroy I have a printk("End of domain_destroy
> function\n") in the end.
> In the function complete_domain_destroy have a printk("Begin of
> complete_domain_destroy function\n") in the beginning.
>
> With these printouts I get at "xl destroy":
> (XEN) End of domain_destroy function
>
> So it seems like the function complete_domain_destroy is not called.
>
Ok, thanks for making these tests. It's helpful to have this
information right away.

> "xl create" results in:
> (XEN) IRQ 210 is already used by domain 1
> (XEN) End of domain_destroy function
>
> Then repeated "xl create" looks the same until after a few tries I
> also get:
> (XEN) Begin of complete_domain_destroy function
>
> After that the next "xl create" creates the domain.
>
>
> I have also applied the patch from
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
> .
> This does seem to change the results.
>
Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it.

> Starting the system without "sched=null vwfi=native" does not result
> in
> the problem.
>
Ok, how about, if you're up for some more testing:

- booting with "sched=null" but not with "vwfi=native"
- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above

?

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

Jan 21, 2021, 11:16 AM

Post #3 of 31 (5 views)

On 21/01/2021 10:54, Anders Törnqvist wrote:
> Hi,

Hi Anders,

Thank you for reporting the bug. I am adding Stefano and Dario as IIRC
they were going to work on a solution.

Cheers,

> I see a problem with destroy and restart of a domain. Interrupts are not
> available when trying to restart a domain.
>
> The situation seems very similar to the thread "null scheduler bug"
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.
>
> The target system is a iMX8-based ARM board and Xen is a 4.13.0 version
> built from https://source.codeaurora.org/external/imx/imx-xen.git.
>
> Xen is booted with sched=null vwfi=native.
> One physical CPU core is pinned to the domu.
> Some interrupts are passed through to the domu.
>
> When destroying the domain with xl destroy etc it does not complain but
> then when trying to restart the domain
> again with a "xl create <domain cfg>" I get:
> (XEN) IRQ 210 is already used by domain 1
>
> "xl list" does not contain the domain.
>
> Repeating the "xl create" command 5-10 times eventually starts the
> domain without complaining about the IRQ.
>
> Inspired from the discussion in the thread above I have put printks in
> the xen/common/domain.c file.
> In the function domain_destroy I have a printk("End of domain_destroy
> function\n") in the end.
> In the function complete_domain_destroy have a printk("Begin of
> complete_domain_destroy function\n") in the beginning.
>
> With these printouts I get at "xl destroy":
> (XEN) End of domain_destroy function
>
> So it seems like the function complete_domain_destroy is not called.
>
> "xl create" results in:
> (XEN) IRQ 210 is already used by domain 1
> (XEN) End of domain_destroy function
>
> Then repeated "xl create" looks the same until after a few tries I also
> get:
> (XEN) Begin of complete_domain_destroy function
>
> After that the next "xl create" creates the domain.
>
>
> I have also applied the patch from
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html.
> This does seem to change the results.
>
> Starting the system without "sched=null vwfi=native" does not result in
> the problem.
>
> BR
> Anders
>
>
>

--
Julien Grall

Re: Null scheduler and vwfi native problem [ In reply to ]

Jan 21, 2021, 11:40 AM

Post #4 of 31 (5 views)

Hi Dario,

On 21/01/2021 18:32, Dario Faggioli wrote:
> On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
>> Hi,
>> I see a problem with destroy and restart of a domain. Interrupts are
>> not
>> available when trying to restart a domain.
>>
>> The situation seems very similar to the thread "null scheduler bug"
>>
>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
>> .
>>
> Right. Back then, PCI passthrough was involved, if I remember
> correctly. Is it the case for you as well?

PCI passthrough is not yet supported on Arm :). However, the bug was
reported with platform device passthrough.

[...]

>> "xl create" results in:
>> (XEN) IRQ 210 is already used by domain 1
>> (XEN) End of domain_destroy function
>>
>> Then repeated "xl create" looks the same until after a few tries I
>> also get:
>> (XEN) Begin of complete_domain_destroy function
>>
>> After that the next "xl create" creates the domain.
>>
>>
>> I have also applied the patch from
>>
>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
>> .
>> This does seem to change the results.
>>
> Ah... Really? That's a bit unexpected, TBH.
>
> Well, I'll think about it. >
>> Starting the system without "sched=null vwfi=native" does not result
>> in
>> the problem.
>>
> Ok, how about, if you're up for some more testing:
>
> - booting with "sched=null" but not with "vwfi=native"
> - booting with "sched=null vwfi=native" but not doing the IRQ
> passthrough that you mentioned above
>
> ?

I think we can skip the testing as the bug was fully diagnostics back
then. Unfortunately, I don't think a patch was ever posted. The
interesting bits start at [1]. Let me try to summarize here.

This has nothing to do with device passthrough, but the bug is easier to
spot as interrupts are only going to be released when then domain is
fully destroyed (we should really release them during the relinquish
period...).

The last step of the domain destruction (complete_domain_destroy()) will
*only* happen when all the CPUs are considered quiescent from the RCU PoV.

As you pointed out on that thread, the RCU implementation in Xen
requires the pCPU to enter in the hypervisor (via hypercalls,
interrupts...) time to time.

This assumption doesn't hold anymore when using "sched=null vwfi=native"
because a vCPU will not exit when it is idling (vwfi=native) and there
may not be any other source of interrupt on that vCPU.

Therefore the quiescent state will never be reached on the pCPU running
that vCPU.

From Xen PoV, any pCPU executing guest context can be considered
quiescent. So one way to solve the problem would be to mark the pCPU
when entering to the guest.

Cheers,

[1]
https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528ecfc@arm.com/

--
Julien Grall

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 21, 2021, 3:35 PM

Post #5 of 31 (5 views)

On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
> Hi Dario,
>
Hi!

> On 21/01/2021 18:32, Dario Faggioli wrote:
> > On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> > >
> > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> > > .
> > >
> > Right. Back then, PCI passthrough was involved, if I remember
> > correctly. Is it the case for you as well?
>
> PCI passthrough is not yet supported on Arm :). However, the bug was
> reported with platform device passthrough.
>
Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)

> > Well, I'll think about it. >
> > > Starting the system without "sched=null vwfi=native" does not
> > > result
> > > in
> > > the problem.
> > >
> > Ok, how about, if you're up for some more testing:
> >
> > - booting with "sched=null" but not with "vwfi=native"
> > - booting with "sched=null vwfi=native" but not doing the IRQ
> > passthrough that you mentioned above
> >
> > ?
>
> I think we can skip the testing as the bug was fully diagnostics back
> then. Unfortunately, I don't think a patch was ever posted.
>
True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.

Anyway, it's true that we left the issue pending, so something like
this:

> From Xen PoV, any pCPU executing guest context can be considered
> quiescent. So one way to solve the problem would be to mark the pCPU
> when entering to the guest.
>
Should be done anyway.

We'll then see if it actually solves this problem too, or if this is
really something else.

Thanks for the summary, BTW. :-)

I'll try to work on a patch.

Regards

> [1]
>
> https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528ecfc@arm.com/
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 22, 2021, 12:06 AM

Post #6 of 31 (5 views)

Thanks for the responses.

On 1/22/21 12:35 AM, Dario Faggioli wrote:
> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>> Hi Dario,
>>
> Hi!
>
>> On 21/01/2021 18:32, Dario Faggioli wrote:
>>> On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
>>>>
>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
>>>> .
>>>>
>>> Right. Back then, PCI passthrough was involved, if I remember
>>> correctly. Is it the case for you as well?
>> PCI passthrough is not yet supported on Arm :). However, the bug was
>> reported with platform device passthrough.
>>
> Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
> mismatch. :-)
>
>>> Well, I'll think about it. >
>>>> Starting the system without "sched=null vwfi=native" does not
>>>> result
>>>> in
>>>> the problem.
>>>>
>>> Ok, how about, if you're up for some more testing:
>>>
>>> - booting with "sched=null" but not with "vwfi=native"
>>> - booting with "sched=null vwfi=native" but not doing the IRQ
>>> passthrough that you mentioned above
>>>
>>> ?
>> I think we can skip the testing as the bug was fully diagnostics back
>> then. Unfortunately, I don't think a patch was ever posted.
>>
> True. But an hackish debug patch was provided and, back then, it
> worked.
>
> OTOH, Anders seems to be reporting that such a patch did not work here.
> I also continue to think that we're facing the same or a very similar
> problem... But I'm curious why applying the patch did not help this
> time. And that's why I asked for more testing.
I made the tests as suggested to shed some more light if needed.

- booting with "sched=null" but not with "vwfi=native"
Without "vwfi=native" it works fine to destroy and to re-create the domain.
Both printouts comes after a destroy:
(XEN) End of domain_destroy function
(XEN) End of complete_domain_destroy function

- booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above
"xl destroy" gives
(XEN) End of domain_destroy function

Then a "xl create" says nothing but the domain has not started correct.
"xl list" look like this for the domain:
mydomu 2 512 1 ------ 0.0

>
> Anyway, it's true that we left the issue pending, so something like
> this:
>
>> From Xen PoV, any pCPU executing guest context can be considered
>> quiescent. So one way to solve the problem would be to mark the pCPU
>> when entering to the guest.
>>
> Should be done anyway.
>
> We'll then see if it actually solves this problem too, or if this is
> really something else.
>
> Thanks for the summary, BTW. :-)
>
> I'll try to work on a patch.
Thanks, just let me know if I can do some testing to assist.
>
> Regards
>
>> [1]
>>
>> https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528ecfc@arm.com/

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 22, 2021, 12:07 AM

Post #7 of 31 (5 views)

On 1/21/21 7:32 PM, Dario Faggioli wrote:
> On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
>> Hi,
>>
> Hello,
>
>> I see a problem with destroy and restart of a domain. Interrupts are
>> not
>> available when trying to restart a domain.
>>
>> The situation seems very similar to the thread "null scheduler bug"
>>
>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
>> .
>>
> Right. Back then, PCI passthrough was involved, if I remember
> correctly. Is it the case for you as well?
>
>> The target system is a iMX8-based ARM board and Xen is a 4.13.0
>> version
>> built from https://source.codeaurora.org/external/imx/imx-xen.git.
>>
> Mmm, perhaps it's me, but neither going at that url with a browser not
> trying to clone it, I do not see anything. What I'm doing wrong?
Sorry. The link is https://source.codeaurora.org/external/imx/imx-xen.
>
>> Xen is booted with sched=null vwfi=native.
>> One physical CPU core is pinned to the domu.
>> Some interrupts are passed through to the domu.
>>
> Ok, I guess it is involved, since you say "some interrupts are passed
> through..."
>
>> When destroying the domain with xl destroy etc it does not complain
>> but
>> then when trying to restart the domain
>> again with a "xl create <domain cfg>" I get:
>> (XEN) IRQ 210 is already used by domain 1
>>
>> "xl list" does not contain the domain.
>>
>> Repeating the "xl create" command 5-10 times eventually starts the
>> domain without complaining about the IRQ.
>>
>> Inspired from the discussion in the thread above I have put printks
>> in
>> the xen/common/domain.c file.
>> In the function domain_destroy I have a printk("End of domain_destroy
>> function\n") in the end.
>> In the function complete_domain_destroy have a printk("Begin of
>> complete_domain_destroy function\n") in the beginning.
>>
>> With these printouts I get at "xl destroy":
>> (XEN) End of domain_destroy function
>>
>> So it seems like the function complete_domain_destroy is not called.
>>
> Ok, thanks for making these tests. It's helpful to have this
> information right away.
>
>> "xl create" results in:
>> (XEN) IRQ 210 is already used by domain 1
>> (XEN) End of domain_destroy function
>>
>> Then repeated "xl create" looks the same until after a few tries I
>> also get:
>> (XEN) Begin of complete_domain_destroy function
>>
>> After that the next "xl create" creates the domain.
>>
>>
>> I have also applied the patch from
>>
>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
>> .
>> This does seem to change the results.
>>
> Ah... Really? That's a bit unexpected, TBH.
>
> Well, I'll think about it.
>
>> Starting the system without "sched=null vwfi=native" does not result
>> in
>> the problem.
>>
> Ok, how about, if you're up for some more testing:
>
> - booting with "sched=null" but not with "vwfi=native"
> - booting with "sched=null vwfi=native" but not doing the IRQ
> passthrough that you mentioned above
>
> ?
>
> Regards

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 22, 2021, 1:05 AM

Post #8 of 31 (5 views)

On Fri, 2021-01-22 at 09:06 +0100, Anders Törnqvist wrote:
> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>
>
> - booting with "sched=null" but not with "vwfi=native"
> Without "vwfi=native" it works fine to destroy and to re-create the
> domain.
> Both printouts comes after a destroy:
> (XEN) End of domain_destroy function
> (XEN) End of complete_domain_destroy function
>
Ok, thanks for doing these tests.

The fact that not using "vwfi=native" makes things work, seem to point
in the direction that myself and Julien (and you as well!) were
suspecting. I.e., it is the same issue than the one in the old xen-
devel thread.

I'm still a but puzzled why the debug patch posted back then does not
work for you... but that's not really super important. Let's try to
come up with a new debug patch and, this time, a proper fix. :-)

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

Jan 22, 2021, 6:02 AM

Post #9 of 31 (5 views)

Hi Dario,

On 21/01/2021 23:35, Dario Faggioli wrote:
> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>> Hi Dario,
>>
> Hi!
>
>> On 21/01/2021 18:32, Dario Faggioli wrote:
>>> On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
>>>>
>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
>>>> .
>>>>
>>> Right. Back then, PCI passthrough was involved, if I remember
>>> correctly. Is it the case for you as well?
>>
>> PCI passthrough is not yet supported on Arm :). However, the bug was
>> reported with platform device passthrough.
>>
> Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
> mismatch. :-)
>
>>> Well, I'll think about it. >
>>>> Starting the system without "sched=null vwfi=native" does not
>>>> result
>>>> in
>>>> the problem.
>>>>
>>> Ok, how about, if you're up for some more testing:
>>>
>>> - booting with "sched=null" but not with "vwfi=native"
>>> - booting with "sched=null vwfi=native" but not doing the IRQ
>>> passthrough that you mentioned above
>>>
>>> ?
>>
>> I think we can skip the testing as the bug was fully diagnostics back
>> then. Unfortunately, I don't think a patch was ever posted.
>>
> True. But an hackish debug patch was provided and, back then, it
> worked.
>
> OTOH, Anders seems to be reporting that such a patch did not work here.
> I also continue to think that we're facing the same or a very similar
> problem... But I'm curious why applying the patch did not help this
> time. And that's why I asked for more testing.

I wonder if this is because your patch doesn't modify rsinterval. So
even if we call force_quiescent_state(), the softirq would only be
raised for the current CPU.

I guess the following HACK could confirm the theory:

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index a5a27af3def0..50020bc34ddf 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data *rdp,
{
cpumask_t cpumask;
raise_softirq(RCU_SOFTIRQ);
- if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
+ if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
rdp->last_rs_qlen = rdp->qlen;
/*
* Don't send IPI to itself. With irqs disabled,

Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem [ In reply to ]

Jan 22, 2021, 6:26 AM

Post #10 of 31 (5 views)

Hi Anders,

On 22/01/2021 08:06, Anders Törnqvist wrote:
> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
> - booting with "sched=null vwfi=native" but not doing the IRQ
> passthrough that you mentioned above
> "xl destroy" gives
> (XEN) End of domain_destroy function
>
> Then a "xl create" says nothing but the domain has not started correct.
> "xl list" look like this for the domain:
> mydomu 2 512 1 ------ 0.0

This is odd. I would have expected ``xl create`` to fail if something
went wrong with the domain creation.

The list of dash, suggests that the domain is:
- Not running
- Not blocked (i.e cannot run)
- Not paused
- Not shutdown

So this suggest the NULL scheduler didn't schedule the vCPU. Would it be
possible to describe your setup:
- How many pCPUs?
- How many vCPUs did you give to dom0?
- What was the number of the vCPUs given to the previous guest?

One possibility is the NULL scheduler doesn't release the pCPUs until
the domain is fully destroyed. So if there is no pCPU free, it wouldn't
be able to schedule the new domain.

However, I would have expected the NULL scheduler to refuse the domain
to create if there is no pCPU available.

@Dario, @Stefano, do you know when the NULL scheduler decides to
allocate the pCPU?

Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 22, 2021, 9:30 AM

Post #11 of 31 (5 views)

On 1/22/21 3:02 PM, Julien Grall wrote:
> Hi Dario,
>
> On 21/01/2021 23:35, Dario Faggioli wrote:
>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>>> Hi Dario,
>>>
>> Hi!
>>
>>> On 21/01/2021 18:32, Dario Faggioli wrote:
>>>> On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
>>>>> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
>>>>> .
>>>>>
>>>> Right. Back then, PCI passthrough was involved, if I remember
>>>> correctly. Is it the case for you as well?
>>>
>>> PCI passthrough is not yet supported on Arm :). However, the bug was
>>> reported with platform device passthrough.
>>>
>> Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
>> mismatch. :-)
>>
>>>> Well, I'll think about it. >
>>>>> Starting the system without "sched=null vwfi=native" does not
>>>>> result
>>>>> in
>>>>> the problem.
>>>>>
>>>> Ok, how about, if you're up for some more testing:
>>>>
>>>>    - booting with "sched=null" but not with "vwfi=native"
>>>>    - booting with "sched=null vwfi=native" but not doing the IRQ
>>>>      passthrough that you mentioned above
>>>>
>>>> ?
>>>
>>> I think we can skip the testing as the bug was fully diagnostics back
>>> then. Unfortunately, I don't think a patch was ever posted.
>>>
>> True. But an hackish debug patch was provided and, back then, it
>> worked.
>>
>> OTOH, Anders seems to be reporting that such a patch did not work here.
>> I also continue to think that we're facing the same or a very similar
>> problem... But I'm curious why applying the patch did not help this
>> time. And that's why I asked for more testing.
>
> I wonder if this is because your patch doesn't modify rsinterval. So
> even if we call force_quiescent_state(), the softirq would only be
> raised for the current CPU.
>
> I guess the following HACK could confirm the theory:
>
> diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> index a5a27af3def0..50020bc34ddf 100644
> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -250,7 +250,7 @@ static void force_quiescent_state(struct rcu_data
> *rdp,
> {
>      cpumask_t cpumask;
>      raise_softirq(RCU_SOFTIRQ);
> -    if (unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
> +    if (1 || unlikely(rdp->qlen - rdp->last_rs_qlen > rsinterval)) {
>          rdp->last_rs_qlen = rdp->qlen;
>          /*
>           * Don't send IPI to itself. With irqs disabled,
>
> Cheers,
>
I applied the patch above. No change. The function
complete_domain_destroy function is not call when I destroy the domain.

/Anders

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 22, 2021, 9:44 AM

Post #12 of 31 (5 views)

On 1/22/21 3:26 PM, Julien Grall wrote:
> Hi Anders,
>
> On 22/01/2021 08:06, Anders Törnqvist wrote:
>> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>> - booting with "sched=null vwfi=native" but not doing the IRQ
>> passthrough that you mentioned above
>> "xl destroy" gives
>> (XEN) End of domain_destroy function
>>
>> Then a "xl create" says nothing but the domain has not started
>> correct. "xl list" look like this for the domain:
>> mydomu                                   2   512     1 ------       0.0
>
> This is odd. I would have expected ``xl create`` to fail if something
> went wrong with the domain creation.
>
> The list of dash, suggests that the domain is:
>    - Not running
>    - Not blocked (i.e cannot run)
>    - Not paused
>    - Not shutdown
>
> So this suggest the NULL scheduler didn't schedule the vCPU. Would it
> be possible to describe your setup:
> - How many pCPUs?
There are 6 pCPUs
> - How many vCPUs did you give to dom0?
I gave it 5
> - What was the number of the vCPUs given to the previous guest?

Nr 0.

Listing vcpus looks like this when the domain is running:

xl vcpu-list
Name                                ID VCPU   CPU State   Time(s)
Affinity (Hard / Soft)
Domain-0                             0     0    0   r--     101.7 0 / all
Domain-0                             0     1    1   r--     101.0 1 / all
Domain-0                             0     2    2   r--     101.0 2 / all
Domain-0                             0     3    3   r--     100.9 3 / all
Domain-0                             0     4    4   r--     100.9 4 / all
mydomu                          1     0    5   r--      89.5 5 / all

vCPU nr 0 is also for dom0. Is that normal?

>
> One possibility is the NULL scheduler doesn't release the pCPUs until
> the domain is fully destroyed. So if there is no pCPU free, it
> wouldn't be able to schedule the new domain.
>
> However, I would have expected the NULL scheduler to refuse the domain
> to create if there is no pCPU available.
>
> @Dario, @Stefano, do you know when the NULL scheduler decides to
> allocate the pCPU?
>
> Cheers,
>

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 25, 2021, 7:45 AM

Post #13 of 31 (4 views)

On Fri, 2021-01-22 at 18:44 +0100, Anders Törnqvist wrote:
> Listing vcpus looks like this when the domain is running:
>
> xl vcpu-list
> Name                                ID VCPU   CPU State   Time(s)
> Affinity (Hard / Soft)
> Domain-0                             0     0    0   r--     101.7 0 /
> all
> Domain-0                             0     1    1   r--     101.0 1 /
> all
> Domain-0                             0     2    2   r--     101.0 2 /
> all
> Domain-0                             0     3    3   r--     100.9 3 /
> all
> Domain-0                             0     4    4   r--     100.9 4 /
> all
> mydomu                          1     0    5   r--      89.5 5 /
> all
>
> vCPU nr 0 is also for dom0. Is that normal?
>
Yeah, that's the vCPU IDs numbering. Each VM/guest (including dom0) has
its vCPUs and they have ID starting from 0.

What counts here, to make sure that the NULL scheduler "configuration"
is correct, is that each VCPU is associated to one and only one PCPU.

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 25, 2021, 8:11 AM

Post #14 of 31 (4 views)

On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
> Hi Anders,
>
> On 22/01/2021 08:06, Anders Törnqvist wrote:
> > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
> > - booting with "sched=null vwfi=native" but not doing the IRQ
> > passthrough that you mentioned above
> > "xl destroy" gives
> > (XEN) End of domain_destroy function
> >
> > Then a "xl create" says nothing but the domain has not started
> > correct.
> > "xl list" look like this for the domain:
> > mydomu 2 512 1 ------
> > 0.0
>
> This is odd. I would have expected ``xl create`` to fail if something
> went wrong with the domain creation.
>
So, Anders, would it be possible to issue a:

# xl debug-keys r
# xl dmesg

And send it to us ?

Ideally, you'd do it:
- with Julien's patch (the one he sent the other day, and that you
have already given a try to) applied
- while you are in the state above, i.e., after having tried to
destroy a domain and failing
- and maybe again after having tried to start a new domain

> One possibility is the NULL scheduler doesn't release the pCPUs until
> the domain is fully destroyed. So if there is no pCPU free, it
> wouldn't
> be able to schedule the new domain.
>
> However, I would have expected the NULL scheduler to refuse the
> domain
> to create if there is no pCPU available.
>
Yeah but, unfortunately, the scheduler does not have it easy to fail
domain creation at this stage (i.e., when we realize there are no
available pCPUs). That's the reason why the NULL scheduler has a
waitqueue, where vCPUs that cannot be put on any pCPU are put.

Of course, this is a configuration error (or a bug, like maybe in this
case :-/), and we print warnings when it happens.

> @Dario, @Stefano, do you know when the NULL scheduler decides to
> allocate the pCPU?
>
On which pCPU to allocate a vCPU is decided in null_unit_insert(),
called from sched_alloc_unit() and sched_init_vcpu().

On the other hand, a vCPU is properly removed from its pCPU, hence
making the pCPU free for being assigned to some other vCPU, in
unit_deassign(), called from null_unit_remove(), which in turn is
called from sched_destroy_vcpu() Which is indeed called from
complete_domain_destroy().

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 26, 2021, 9:03 AM

Post #15 of 31 (3 views)

On 1/25/21 5:11 PM, Dario Faggioli wrote:
> On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
>> Hi Anders,
>>
>> On 22/01/2021 08:06, Anders Törnqvist wrote:
>>> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>>>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>>> - booting with "sched=null vwfi=native" but not doing the IRQ
>>> passthrough that you mentioned above
>>> "xl destroy" gives
>>> (XEN) End of domain_destroy function
>>>
>>> Then a "xl create" says nothing but the domain has not started
>>> correct.
>>> "xl list" look like this for the domain:
>>> mydomu                                   2   512     1 ------
>>> 0.0
>> This is odd. I would have expected ``xl create`` to fail if something
>> went wrong with the domain creation.
>>
> So, Anders, would it be possible to issue a:
>
> # xl debug-keys r
> # xl dmesg
>
> And send it to us ?
>
> Ideally, you'd do it:
> - with Julien's patch (the one he sent the other day, and that you
> have already given a try to) applied
> - while you are in the state above, i.e., after having tried to
> destroy a domain and failing
> - and maybe again after having tried to start a new domain
Here are some logs.

The system is booted as before with the patch and the domu config does
not have the IRQs.

# xl list
Name                                        ID   Mem VCPUs State    Time(s)
Domain-0                                     0 3000     5 r-----     820.1
mydomu                                       1   511     1 r-----     157.0

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0
(XEN)     run: [1.0] pcpu=5

# xl dmesg
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080200000 - 00000000ffffffff
(XEN) RAM: 0000000880000000 - 00000008ffffffff
(XEN)
(XEN) MODULE[0]: 0000000080400000 - 000000008054d848 Xen
(XEN) MODULE[1]: 0000000083000000 - 0000000083018000 Device Tree
(XEN) MODULE[2]: 0000000088000000 - 0000000089701200 Kernel
(XEN) RESVD[0]: 0000000088000000 - 0000000090000000
(XEN) RESVD[1]: 0000000083000000 - 0000000083018000
(XEN) RESVD[2]: 0000000084000000 - 0000000085ffffff
(XEN) RESVD[3]: 0000000086000000 - 00000000863fffff
(XEN) RESVD[4]: 0000000090000000 - 00000000903fffff
(XEN) RESVD[5]: 0000000090400000 - 0000000091ffffff
(XEN) RESVD[6]: 0000000092000000 - 00000000921fffff
(XEN) RESVD[7]: 0000000092200000 - 00000000923fffff
(XEN) RESVD[8]: 0000000092400000 - 00000000943fffff
(XEN) RESVD[9]: 0000000094400000 - 0000000094bfffff
(XEN)
(XEN) CMDLINE[0000000088000000]:chosen console=hvc0 earlycon=xen
root=/dev/mmcblk0p3 mem=3000M hostname=myhost
video=HDMI-A-1:1920x1080@60 imxdrm.legacyfb_depth=32   quiet loglevel=3
logo.nologo vt.global_cursor_default=0
(XEN)
(XEN) Command line: console=dtuart dtuart=/serial@5a060000
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
sched=null vwfi=native
(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) partition id 4
(XEN) Domain name mydomu
(XEN) *****Initialized MU
(XEN) Looking for dtuart at "/serial@5a060000", options ""
Xen 4.13.1-pre
(XEN) Xen version 4.13.1-pre (anders@builder.local)
(aarch64-poky-linux-gcc (GCC) 8.3.0) debug=n Fri Jan 22 17:32:33 UTC 2021
(XEN) Latest ChangeSet: Wed Feb 27 17:56:28 2019 +0800 git:b64b8df-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000001002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features: 0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00000131:10011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 40000000 01260000 02102211
(XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 8000 KHz
(XEN) GICv3 initialization:
(XEN)       gic_dist_addr=0x00000051a00000
(XEN)       gic_maintenance_irq=25
(XEN)       gic_rdist_stride=0
(XEN)       gic_rdist_regions=1
(XEN)       redistributor regions:
(XEN)         - region 0: 0x00000051b00000 - 0x00000051bc0000
(XEN) GICv3 compatible with GICv2 cbase 0x00000052000000 vbase
0x00000052020000
(XEN) GICv3: 544 lines, (IID 0001143b).
(XEN) GICv3: CPU0: Found redistributor in region 0 @000000004002d000
(XEN) XSM Framework v1.0.0 initialized
(XEN) Initialising XSM SILO mode
(XEN) Using scheduler: null Scheduler (null)
(XEN) Initializing null scheduler
(XEN) WARNING: This is experimental software in development.
(XEN) Use at your own risk.
(XEN) Allocated console ring of 16 KiB.
(XEN) Bringing up CPU1
(XEN) GICv3: CPU1: Found redistributor in region 0 @00000000400ad000
(XEN) Bringing up CPU2
(XEN) GICv3: CPU2: Found redistributor in region 0 @00000000400cd000
(XEN) Bringing up CPU3
(XEN) GICv3: CPU3: Found redistributor in region 0 @000000004004d000
(XEN) Bringing up CPU4
(XEN) GICv3: CPU4: Found redistributor in region 0 @000000004006d000
(XEN) Bringing up CPU5
(XEN) GICv3: CPU5: Found redistributor in region 0 @000000004008d000
(XEN) Brought up 6 CPUs
(XEN) I/O virtualisation enabled
(XEN) - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading d0 kernel from boot module @ 0000000088000000
(XEN) Allocating 1:1 mappings totalling 3000MB for dom0:
(XEN) BANK[0] 0x00000098000000-0x00000100000000 (1664MB)
(XEN) BANK[1] 0x00000880000000-0x000008c0000000 (1024MB)
(XEN) BANK[2] 0x000008d0000000-0x000008e0000000 (256MB)
(XEN) BANK[3] 0x000008ec800000-0x000008f0000000 (56MB)
(XEN) Grant table range: 0x00000080400000-0x00000080440000
(XEN) HACK: skip /imx8_gpu_ss setup!
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading zImage from 0000000088000000 to
0000000098080000-0000000099781200
(XEN) Loading d0 DTB to 0x00000000a0000000-0x00000000a001772e
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Scrubbing Free RAM in background
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) ***************************************************
(XEN) WARNING: HMP COMPUTING HAS BEEN ENABLED.
(XEN) It has implications on the security and stability of the system,
(XEN) unless the cpu affinity of all domains is specified.
(XEN) ***************************************************
(XEN) 3... 2... 1...
(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 336kB init memory.
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER4
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER8
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER12
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER16
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER20
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER24
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER28
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER32
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER36
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER40
(XEN) Power on resource 215
(XEN) printk: 11 messages suppressed.
(XEN) d1v0: vGICR: SGI: unhandled word write 0x000000ffffffff to ICACTIVER0
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0
(XEN)     run: [1.0] pcpu=5

# xl destroy mydomu
(XEN) End of domain_destroy function

# xl list
Name                                        ID   Mem VCPUs State    Time(s)
Domain-0                                     0 3000     5 r-----    1057.9

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=223871439875
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0

# xl create mydomu.cfg
Parsing config from mydomu.cfg
(XEN) Power on resource 215

# xl list
Name                                        ID   Mem VCPUs State    Time(s)
Domain-0                                     0 3000     5 r-----    1152.1
mydomu                                       2   512     1 ------       0.0

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=241210530250
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN)     Domain: 2
(XEN)     7: [2.0] pcpu=-1
(XEN) Waitqueue: d2v0
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0

# xl dmesg
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080200000 - 00000000ffffffff
(XEN) RAM: 0000000880000000 - 00000008ffffffff
(XEN)
(XEN) MODULE[0]: 0000000080400000 - 000000008054d848 Xen
(XEN) MODULE[1]: 0000000083000000 - 0000000083018000 Device Tree
(XEN) MODULE[2]: 0000000088000000 - 0000000089701200 Kernel
(XEN) RESVD[0]: 0000000088000000 - 0000000090000000
(XEN) RESVD[1]: 0000000083000000 - 0000000083018000
(XEN) RESVD[2]: 0000000084000000 - 0000000085ffffff
(XEN) RESVD[3]: 0000000086000000 - 00000000863fffff
(XEN) RESVD[4]: 0000000090000000 - 00000000903fffff
(XEN) RESVD[5]: 0000000090400000 - 0000000091ffffff
(XEN) RESVD[6]: 0000000092000000 - 00000000921fffff
(XEN) RESVD[7]: 0000000092200000 - 00000000923fffff
(XEN) RESVD[8]: 0000000092400000 - 00000000943fffff
(XEN) RESVD[9]: 0000000094400000 - 0000000094bfffff
(XEN)
(XEN) CMDLINE[0000000088000000]:chosen console=hvc0 earlycon=xen
root=/dev/mmcblk0p3 mem=3000M hostname=myhost
video=HDMI-A-1:1920x1080@60 imxdrm.legacyfb_depth=32   quiet loglevel=3
logo.nologo vt.global_cursor_default=0
(XEN)
(XEN) Command line: console=dtuart dtuart=/serial@5a060000
dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
sched=null vwfi=native
(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) partition id 4
(XEN) Domain name mydomu
(XEN) *****Initialized MU
(XEN) Looking for dtuart at "/serial@5a060000", options ""
Xen 4.13.1-pre
(XEN) Xen version 4.13.1-pre (anders@builder.local)
(aarch64-poky-linux-gcc (GCC) 8.3.0) debug=n Fri Jan 22 17:32:33 UTC 2021
(XEN) Latest ChangeSet: Wed Feb 27 17:56:28 2019 +0800 git:b64b8df-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000001002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features: 0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00000131:10011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 40000000 01260000 02102211
(XEN) ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 8000 KHz
(XEN) GICv3 initialization:
(XEN)       gic_dist_addr=0x00000051a00000
(XEN)       gic_maintenance_irq=25
(XEN)       gic_rdist_stride=0
(XEN)       gic_rdist_regions=1
(XEN)       redistributor regions:
(XEN)         - region 0: 0x00000051b00000 - 0x00000051bc0000
(XEN) GICv3 compatible with GICv2 cbase 0x00000052000000 vbase
0x00000052020000
(XEN) GICv3: 544 lines, (IID 0001143b).
(XEN) GICv3: CPU0: Found redistributor in region 0 @000000004002d000
(XEN) XSM Framework v1.0.0 initialized
(XEN) Initialising XSM SILO mode
(XEN) Using scheduler: null Scheduler (null)
(XEN) Initializing null scheduler
(XEN) WARNING: This is experimental software in development.
(XEN) Use at your own risk.
(XEN) Allocated console ring of 16 KiB.
(XEN) Bringing up CPU1
(XEN) GICv3: CPU1: Found redistributor in region 0 @00000000400ad000
(XEN) Bringing up CPU2
(XEN) GICv3: CPU2: Found redistributor in region 0 @00000000400cd000
(XEN) Bringing up CPU3
(XEN) GICv3: CPU3: Found redistributor in region 0 @000000004004d000
(XEN) Bringing up CPU4
(XEN) GICv3: CPU4: Found redistributor in region 0 @000000004006d000
(XEN) Bringing up CPU5
(XEN) GICv3: CPU5: Found redistributor in region 0 @000000004008d000
(XEN) Brought up 6 CPUs
(XEN) I/O virtualisation enabled
(XEN) - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading d0 kernel from boot module @ 0000000088000000
(XEN) Allocating 1:1 mappings totalling 3000MB for dom0:
(XEN) BANK[0] 0x00000098000000-0x00000100000000 (1664MB)
(XEN) BANK[1] 0x00000880000000-0x000008c0000000 (1024MB)
(XEN) BANK[2] 0x000008d0000000-0x000008e0000000 (256MB)
(XEN) BANK[3] 0x000008ec800000-0x000008f0000000 (56MB)
(XEN) Grant table range: 0x00000080400000-0x00000080440000
(XEN) HACK: skip /imx8_gpu_ss setup!
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading zImage from 0000000088000000 to
0000000098080000-0000000099781200
(XEN) Loading d0 DTB to 0x00000000a0000000-0x00000000a001772e
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Scrubbing Free RAM in background
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) ***************************************************
(XEN) WARNING: HMP COMPUTING HAS BEEN ENABLED.
(XEN) It has implications on the security and stability of the system,
(XEN) unless the cpu affinity of all domains is specified.
(XEN) ***************************************************
(XEN) 3... 2... 1...
(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 336kB init memory.
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER4
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER8
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER12
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER16
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER20
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER24
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER28
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER32
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER36
(XEN) d0v0: vGICD: unhandled word write 0x000000ffffffff to ICACTIVER40
(XEN) Power on resource 215
(XEN) printk: 11 messages suppressed.
(XEN) d1v0: vGICR: SGI: unhandled word write 0x000000ffffffff to ICACTIVER0
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=191793008000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0
(XEN)     run: [1.0] pcpu=5
(XEN) End of domain_destroy function
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=223871439875
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0
(XEN) Power on resource 215
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=241210530250
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 1
(XEN)     6: [1.0] pcpu=5
(XEN)     Domain: 2
(XEN)     7: [2.0] pcpu=-1
(XEN) Waitqueue: d2v0
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d1v0

I then repeated the "xl create" some times until it caused the
complete_domain_destroy function to be called.
Then the information looked like this:

# xl debug-keys r
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=850134473000
(XEN) Online Cpus: 0-5
(XEN) Cpupool 0:
(XEN) Cpus: 0-5
(XEN) Scheduler: null Scheduler (null)
(XEN)     cpus_free =
(XEN) Domain info:
(XEN)     Domain: 0
(XEN)     1: [0.0] pcpu=0
(XEN)     2: [0.1] pcpu=1
(XEN)     3: [0.2] pcpu=2
(XEN)     4: [0.3] pcpu=3
(XEN)     5: [0.4] pcpu=4
(XEN)     Domain: 2
(XEN)     6: [2.0] pcpu=5
(XEN)     Domain: 3
(XEN)     Domain: 4
(XEN) Waitqueue:
(XEN) CPUs info:
(XEN) CPU[00] sibling={0}, core={0}, unit=d0v0
(XEN)     run: [0.0] pcpu=0
(XEN) CPU[01] sibling={1}, core={1}, unit=d0v1
(XEN)     run: [0.1] pcpu=1
(XEN) CPU[02] sibling={2}, core={2}, unit=d0v2
(XEN)     run: [0.2] pcpu=2
(XEN) CPU[03] sibling={3}, core={3}, unit=d0v3
(XEN)     run: [0.3] pcpu=3
(XEN) CPU[04] sibling={4}, core={4}, unit=d0v4
(XEN)     run: [0.4] pcpu=4
(XEN) CPU[05] sibling={5}, core={5}, unit=d2v0
(XEN)     run: [2.0] pcpu=5

# xl list
Name                                        ID   Mem VCPUs State    Time(s)
Domain-0                                     0 3000     5 r-----    4277.7
mydomu                                       2   511     1 r-----      15.6

>
>> One possibility is the NULL scheduler doesn't release the pCPUs until
>> the domain is fully destroyed. So if there is no pCPU free, it
>> wouldn't
>> be able to schedule the new domain.
>>
>> However, I would have expected the NULL scheduler to refuse the
>> domain
>> to create if there is no pCPU available.
>>
> Yeah but, unfortunately, the scheduler does not have it easy to fail
> domain creation at this stage (i.e., when we realize there are no
> available pCPUs). That's the reason why the NULL scheduler has a
> waitqueue, where vCPUs that cannot be put on any pCPU are put.
>
> Of course, this is a configuration error (or a bug, like maybe in this
> case :-/), and we print warnings when it happens.
>
>> @Dario, @Stefano, do you know when the NULL scheduler decides to
>> allocate the pCPU?
>>
> On which pCPU to allocate a vCPU is decided in null_unit_insert(),
> called from sched_alloc_unit() and sched_init_vcpu().
>
> On the other hand, a vCPU is properly removed from its pCPU, hence
> making the pCPU free for being assigned to some other vCPU, in
> unit_deassign(), called from null_unit_remove(), which in turn is
> called from sched_destroy_vcpu() Which is indeed called from
> complete_domain_destroy().
>
> Regards

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 26, 2021, 2:31 PM

Post #16 of 31 (3 views)

On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
> On 1/25/21 5:11 PM, Dario Faggioli wrote:
> > On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
> > > Hi Anders,
> > >
> > > On 22/01/2021 08:06, Anders Törnqvist wrote:
> > > > On 1/22/21 12:35 AM, Dario Faggioli wrote:
> > > > > On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
> > > > - booting with "sched=null vwfi=native" but not doing the IRQ
> > > > passthrough that you mentioned above
> > > > "xl destroy" gives
> > > > (XEN) End of domain_destroy function
> > > >
> > > > Then a "xl create" says nothing but the domain has not started
> > > > correct.
> > > > "xl list" look like this for the domain:
> > > > mydomu                                   2   512     1 ------
> > > > 0.0
> > > This is odd. I would have expected ``xl create`` to fail if
> > > something
> > > went wrong with the domain creation.
> > >
> > So, Anders, would it be possible to issue a:
> >
> > # xl debug-keys r
> > # xl dmesg
> >
> > And send it to us ?
> >
> > Ideally, you'd do it:
> > - with Julien's patch (the one he sent the other day, and that
> > you
> >     have already given a try to) applied
> > - while you are in the state above, i.e., after having tried to
> >     destroy a domain and failing
> > - and maybe again after having tried to start a new domain
> Here are some logs.
>
Great, thanks a lot!

> The system is booted as before with the patch and the domu config
> does
> not have the IRQs.
>
Ok.

> # xl list
> Name                                        ID   Mem VCPUs State
> Time(s)
> Domain-0                                     0 3000     5 r-----
> 820.1
> mydomu                                       1   511     1 r-----
> 157.0
>
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=191793008000
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)     1: [0.0] pcpu=0
> (XEN)     2: [0.1] pcpu=1
> (XEN)     3: [0.2] pcpu=2
> (XEN)     4: [0.3] pcpu=3
> (XEN)     5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)     6: [1.0] pcpu=5
> (XEN) Waitqueue:
>
So far, so good. All vCPUs are running on their assigned pCPU, and
there is no vCPU wanting to run but not having a vCPU where to do so.

> (XEN) Command line: console=dtuart dtuart=/serial@5a060000
> dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
> sched=null vwfi=native
>
Oh, just as a side note (and most likely unrelated to the problem we're
discussing), you should be able to get rid of dom0_vcpus_pin.

The NULL scheduler will do something similar to what that option itself
does anyway. And with the benefit that, if you want, you can actually
change to what pCPUs the dom0's vCPU are pinned. While, if you use
dom0_vcpus_pin, you can't.

So it using it has only downsides (and that's true in general, if you
ask me, but particularly so if using NULL).

> # xl destroy mydomu
> (XEN) End of domain_destroy function
>
> # xl list
> Name                                        ID   Mem VCPUs State
> Time(s)
> Domain-0                                     0 3000     5 r-----
> 1057.9
>
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=223871439875
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)     1: [0.0] pcpu=0
> (XEN)     2: [0.1] pcpu=1
> (XEN)     3: [0.2] pcpu=2
> (XEN)     4: [0.3] pcpu=3
> (XEN)     5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)     6: [1.0] pcpu=5
>
Right. And from the fact that: 1) we only see the "End of
domain_destroy function" line in the logs, and 2) we see that the vCPU
is still listed here, we have our confirmation (like there wase the
need for it :-/) that domain destruction is done only partially.

> # xl create mydomu.cfg
> Parsing config from mydomu.cfg
> (XEN) Power on resource 215
>
> # xl list
> Name                                        ID   Mem VCPUs State
> Time(s)
> Domain-0                                     0 3000     5 r-----
> 1152.1
> mydomu                                       2   512     1 ------
>        0.0
>
> # xl debug-keys r
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=241210530250
> (XEN) Online Cpus: 0-5
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-5
> (XEN) Scheduler: null Scheduler (null)
> (XEN)     cpus_free =
> (XEN) Domain info:
> (XEN)     Domain: 0
> (XEN)     1: [0.0] pcpu=0
> (XEN)     2: [0.1] pcpu=1
> (XEN)     3: [0.2] pcpu=2
> (XEN)     4: [0.3] pcpu=3
> (XEN)     5: [0.4] pcpu=4
> (XEN)     Domain: 1
> (XEN)     6: [1.0] pcpu=5
> (XEN)     Domain: 2
> (XEN)     7: [2.0] pcpu=-1
> (XEN) Waitqueue: d2v0
>
Yep, so, as we were suspecting, domain 1 was not destroyed properly.
Specifically, we did not get to the point where the vCPU is deallocated
and the pCPU to which such vCPU has been assigned to by the NULL
scheduler is released.

This means that the new vCPU (i.e., d2v0) has, from the point of view
of the NULL scheduler, no pCPU where to run. And it's therefore parked
in the waitqueue.

There should be a warning about that, which I don't see... but perhaps
I'm just misremembering.

Anyway, cool, this makes things even more clear.

Thanks again for letting us see these logs.
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 29, 2021, 12:08 AM

Post #17 of 31 (3 views)

On 1/26/21 11:31 PM, Dario Faggioli wrote:
> On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
>> On 1/25/21 5:11 PM, Dario Faggioli wrote:
>>> On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
>>>> Hi Anders,
>>>>
>>>> On 22/01/2021 08:06, Anders Törnqvist wrote:
>>>>> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>>>>>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>>>>> - booting with "sched=null vwfi=native" but not doing the IRQ
>>>>> passthrough that you mentioned above
>>>>> "xl destroy" gives
>>>>> (XEN) End of domain_destroy function
>>>>>
>>>>> Then a "xl create" says nothing but the domain has not started
>>>>> correct.
>>>>> "xl list" look like this for the domain:
>>>>> mydomu                                   2   512     1 ------
>>>>> 0.0
>>>> This is odd. I would have expected ``xl create`` to fail if
>>>> something
>>>> went wrong with the domain creation.
>>>>
>>> So, Anders, would it be possible to issue a:
>>>
>>> # xl debug-keys r
>>> # xl dmesg
>>>
>>> And send it to us ?
>>>
>>> Ideally, you'd do it:
>>> - with Julien's patch (the one he sent the other day, and that
>>> you
>>>     have already given a try to) applied
>>> - while you are in the state above, i.e., after having tried to
>>>     destroy a domain and failing
>>> - and maybe again after having tried to start a new domain
>> Here are some logs.
>>
> Great, thanks a lot!
>
>> The system is booted as before with the patch and the domu config
>> does
>> not have the IRQs.
>>
> Ok.
>
>> # xl list
>> Name                                        ID   Mem VCPUs State
>> Time(s)
>> Domain-0                                     0 3000     5 r-----
>> 820.1
>> mydomu                                       1   511     1 r-----
>> 157.0
>>
>> # xl debug-keys r
>> (XEN) sched_smt_power_savings: disabled
>> (XEN) NOW=191793008000
>> (XEN) Online Cpus: 0-5
>> (XEN) Cpupool 0:
>> (XEN) Cpus: 0-5
>> (XEN) Scheduler: null Scheduler (null)
>> (XEN)     cpus_free =
>> (XEN) Domain info:
>> (XEN)     Domain: 0
>> (XEN)     1: [0.0] pcpu=0
>> (XEN)     2: [0.1] pcpu=1
>> (XEN)     3: [0.2] pcpu=2
>> (XEN)     4: [0.3] pcpu=3
>> (XEN)     5: [0.4] pcpu=4
>> (XEN)     Domain: 1
>> (XEN)     6: [1.0] pcpu=5
>> (XEN) Waitqueue:
>>
> So far, so good. All vCPUs are running on their assigned pCPU, and
> there is no vCPU wanting to run but not having a vCPU where to do so.
>
>> (XEN) Command line: console=dtuart dtuart=/serial@5a060000
>> dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
>> sched=null vwfi=native
>>
> Oh, just as a side note (and most likely unrelated to the problem we're
> discussing), you should be able to get rid of dom0_vcpus_pin.
>
> The NULL scheduler will do something similar to what that option itself
> does anyway. And with the benefit that, if you want, you can actually
> change to what pCPUs the dom0's vCPU are pinned. While, if you use
> dom0_vcpus_pin, you can't.
>
> So it using it has only downsides (and that's true in general, if you
> ask me, but particularly so if using NULL).
Thanks for the feedback.
I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to
the problem we're discussing. The system still behaves the same.

When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:

Name                                ID VCPU   CPU State Time(s)
Affinity (Hard / Soft)
Domain-0                             0     0    0   r--      29.4 all / all
Domain-0                             0     1    1   r--      28.7 all / all
Domain-0                             0     2    2   r--      28.7 all / all
Domain-0                             0     3    3   r--      28.6 all / all
Domain-0                             0     4    4   r--      28.6 all / all
mydomu                              1     0    5   r--      21.6 5 / all

From this listing (with "all" as hard affinity for dom0) one might read
it like dom0 is not pinned with hard affinity to any specific pCPUs at
all but mudomu is pinned to pCPU 5.
Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run
on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?

What if I would like mydomu to be th only domain that uses pCPU 2?

>
>> # xl destroy mydomu
>> (XEN) End of domain_destroy function
>>
>> # xl list
>> Name                                        ID   Mem VCPUs State
>> Time(s)
>> Domain-0                                     0 3000     5 r-----
>> 1057.9
>>
>> # xl debug-keys r
>> (XEN) sched_smt_power_savings: disabled
>> (XEN) NOW=223871439875
>> (XEN) Online Cpus: 0-5
>> (XEN) Cpupool 0:
>> (XEN) Cpus: 0-5
>> (XEN) Scheduler: null Scheduler (null)
>> (XEN)     cpus_free =
>> (XEN) Domain info:
>> (XEN)     Domain: 0
>> (XEN)     1: [0.0] pcpu=0
>> (XEN)     2: [0.1] pcpu=1
>> (XEN)     3: [0.2] pcpu=2
>> (XEN)     4: [0.3] pcpu=3
>> (XEN)     5: [0.4] pcpu=4
>> (XEN)     Domain: 1
>> (XEN)     6: [1.0] pcpu=5
>>
> Right. And from the fact that: 1) we only see the "End of
> domain_destroy function" line in the logs, and 2) we see that the vCPU
> is still listed here, we have our confirmation (like there wase the
> need for it :-/) that domain destruction is done only partially.
Yes it looks like that.
>
>> # xl create mydomu.cfg
>> Parsing config from mydomu.cfg
>> (XEN) Power on resource 215
>>
>> # xl list
>> Name                                        ID   Mem VCPUs State
>> Time(s)
>> Domain-0                                     0 3000     5 r-----
>> 1152.1
>> mydomu                                       2   512     1 ------
>>        0.0
>>
>> # xl debug-keys r
>> (XEN) sched_smt_power_savings: disabled
>> (XEN) NOW=241210530250
>> (XEN) Online Cpus: 0-5
>> (XEN) Cpupool 0:
>> (XEN) Cpus: 0-5
>> (XEN) Scheduler: null Scheduler (null)
>> (XEN)     cpus_free =
>> (XEN) Domain info:
>> (XEN)     Domain: 0
>> (XEN)     1: [0.0] pcpu=0
>> (XEN)     2: [0.1] pcpu=1
>> (XEN)     3: [0.2] pcpu=2
>> (XEN)     4: [0.3] pcpu=3
>> (XEN)     5: [0.4] pcpu=4
>> (XEN)     Domain: 1
>> (XEN)     6: [1.0] pcpu=5
>> (XEN)     Domain: 2
>> (XEN)     7: [2.0] pcpu=-1
>> (XEN) Waitqueue: d2v0
>>
> Yep, so, as we were suspecting, domain 1 was not destroyed properly.
> Specifically, we did not get to the point where the vCPU is deallocated
> and the pCPU to which such vCPU has been assigned to by the NULL
> scheduler is released.
>
> This means that the new vCPU (i.e., d2v0) has, from the point of view
> of the NULL scheduler, no pCPU where to run. And it's therefore parked
> in the waitqueue.
>
> There should be a warning about that, which I don't see... but perhaps
> I'm just misremembering.
>
> Anyway, cool, this makes things even more clear.
>
> Thanks again for letting us see these logs.

Thanks for the attention to this :-)

Any ideas for how to solve it?

Re: Null scheduler and vwfi native problem [ In reply to ]

Jan 29, 2021, 12:18 AM

Post #18 of 31 (3 views)

On 29.01.21 09:08, Anders Törnqvist wrote:
> On 1/26/21 11:31 PM, Dario Faggioli wrote:
>> On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:
>>> On 1/25/21 5:11 PM, Dario Faggioli wrote:
>>>> On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:
>>>>> Hi Anders,
>>>>>
>>>>> On 22/01/2021 08:06, Anders Törnqvist wrote:
>>>>>> On 1/22/21 12:35 AM, Dario Faggioli wrote:
>>>>>>> On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:
>>>>>> - booting with "sched=null vwfi=native" but not doing the IRQ
>>>>>> passthrough that you mentioned above
>>>>>> "xl destroy" gives
>>>>>> (XEN) End of domain_destroy function
>>>>>>
>>>>>> Then a "xl create" says nothing but the domain has not started
>>>>>> correct.
>>>>>> "xl list" look like this for the domain:
>>>>>> mydomu                                   2   512     1 ------
>>>>>> 0.0
>>>>> This is odd. I would have expected ``xl create`` to fail if
>>>>> something
>>>>> went wrong with the domain creation.
>>>>>
>>>> So, Anders, would it be possible to issue a:
>>>>
>>>> # xl debug-keys r
>>>> # xl dmesg
>>>>
>>>> And send it to us ?
>>>>
>>>> Ideally, you'd do it:
>>>>    - with Julien's patch (the one he sent the other day, and that
>>>> you
>>>>      have already given a try to) applied
>>>>    - while you are in the state above, i.e., after having tried to
>>>>      destroy a domain and failing
>>>>    - and maybe again after having tried to start a new domain
>>> Here are some logs.
>>>
>> Great, thanks a lot!
>>
>>> The system is booted as before with the patch and the domu config
>>> does
>>> not have the IRQs.
>>>
>> Ok.
>>
>>> # xl list
>>> Name                                        ID   Mem VCPUs State
>>> Time(s)
>>> Domain-0                                     0 3000     5 r-----
>>> 820.1
>>> mydomu                                       1   511     1 r-----
>>> 157.0
>>>
>>> # xl debug-keys r
>>> (XEN) sched_smt_power_savings: disabled
>>> (XEN) NOW=191793008000
>>> (XEN) Online Cpus: 0-5
>>> (XEN) Cpupool 0:
>>> (XEN) Cpus: 0-5
>>> (XEN) Scheduler: null Scheduler (null)
>>> (XEN)     cpus_free =
>>> (XEN) Domain info:
>>> (XEN)     Domain: 0
>>> (XEN)     1: [0.0] pcpu=0
>>> (XEN)     2: [0.1] pcpu=1
>>> (XEN)     3: [0.2] pcpu=2
>>> (XEN)     4: [0.3] pcpu=3
>>> (XEN)     5: [0.4] pcpu=4
>>> (XEN)     Domain: 1
>>> (XEN)     6: [1.0] pcpu=5
>>> (XEN) Waitqueue:
>>>
>> So far, so good. All vCPUs are running on their assigned pCPU, and
>> there is no vCPU wanting to run but not having a vCPU where to do so.
>>
>>> (XEN) Command line: console=dtuart dtuart=/serial@5a060000
>>> dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin
>>> sched=null vwfi=native
>>>
>> Oh, just as a side note (and most likely unrelated to the problem we're
>> discussing), you should be able to get rid of dom0_vcpus_pin.
>>
>> The NULL scheduler will do something similar to what that option itself
>> does anyway. And with the benefit that, if you want, you can actually
>> change to what pCPUs the dom0's vCPU are pinned. While, if you use
>> dom0_vcpus_pin, you can't.
>>
>> So it using it has only downsides (and that's true in general, if you
>> ask me, but particularly so if using NULL).
> Thanks for the feedback.
> I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to
> the problem we're discussing. The system still behaves the same.
>
> When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:
>
> Name                                ID VCPU   CPU State Time(s)
> Affinity (Hard / Soft)
> Domain-0                             0     0    0   r--      29.4 all / all
> Domain-0                             0     1    1   r--      28.7 all / all
> Domain-0                             0     2    2   r--      28.7 all / all
> Domain-0                             0     3    3   r--      28.6 all / all
> Domain-0                             0     4    4   r--      28.6 all / all
> mydomu                              1     0    5   r--      21.6 5 / all
>
> From this listing (with "all" as hard affinity for dom0) one might read
> it like dom0 is not pinned with hard affinity to any specific pCPUs at
> all but mudomu is pinned to pCPU 5.
> Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run
> on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?

No.

>
> What if I would like mydomu to be th only domain that uses pCPU 2?

Setup a cpupool with that pcpu assigned to it and put your domain into
that cpupool.

Juergen

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 29, 2021, 2:16 AM

Post #19 of 31 (3 views)

On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote:
> On 29.01.21 09:08, Anders Törnqvist wrote:
> > >
> > > So it using it has only downsides (and that's true in general, if
> > > you
> > > ask me, but particularly so if using NULL).
> > Thanks for the feedback.
> > I removed dom0_vcpus_pin. And, as you said, it seems to be
> > unrelated to
> > the problem we're discussing.
>
Right. Don't put it back, and stay away from it, if you accept an
advice. :-)

> > The system still behaves the same.
> >
Yeah, that was expected.

> > When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:
> >
> > Name                                ID VCPU   CPU State Time(s)
> > Affinity (Hard / Soft)
> > Domain-0                             0     0    0   r--      29.4
> > all / all
> > Domain-0                             0     1    1   r--      28.7
> > all / all
> > Domain-0                             0     2    2   r--      28.7
> > all / all
> > Domain-0                             0     3    3   r--      28.6
> > all / all
> > Domain-0                             0     4    4   r--      28.6
> > all / all
> > mydomu                              1     0    5   r--      21.6 5
> > / all
> >
Right, and it makes sense for it to look like this.

> > From this listing (with "all" as hard affinity for dom0) one might
> > read
> > it like dom0 is not pinned with hard affinity to any specific pCPUs
> > at
> > all but mudomu is pinned to pCPU 5.
> > Will the dom0_max_vcpus=5 in this case guarantee that dom0 only
> > will run
> > on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?
>
> No.
>
Well, yes... if you use the NULL scheduler. Which is in use here. :-)

Basically, the NULL scheduler _always_ assign one and only one vCPU to
each pCPU. This happens at domain (well, at the vCPU) creation time.
And it _never_ move a vCPU away from the pCPU to which it has assigned
it.

And it also _never_ change this vCPU-->pCPU assignment/relationship,
unless some special event happens (such as, either the vCPU and/or the
pCPU goes offline, is removed from the cpupool, you change the affinity
[as I'll explain below], etc).

This is the NULL scheduler's mission and only job, so it does that by
default, _without_ any need for an affinity to be specified.

So, how can affinity be useful in the NULL scheduler? Well, it's useful
if you want to control and decide to what pCPU a certain vCPU should
go.

So, let's make an example. Let's say you are in this situation:

Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 29.4 all / all
Domain-0 0 1 1 r-- 28.7 all / all
Domain-0 0 2 2 r-- 28.7 all / all
Domain-0 0 3 3 r-- 28.6 all / all
Domain-0 0 4 4 r-- 28.6 all / all

I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are
not using dom0_vcpus_pin.

The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU
that can run on pCPU 0, despite its affinities being "all"... because
it's what the NULL scheduler does for you and it's the reason why one
uses it! :-)

Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3
and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above
for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as
well.

With no affinity being specified, which vCPU is assigned to which pCPU
is entirely under the NULL scheduler control. It has its heuristics
inside, to try to do that in a smart way, but that's an
internal/implementation detail and is not relevant here.

If you now create a domU with 1 vCPU, that vCPU will be assigned to
pCPU 5.

Now, let's say that, for whatever reason, you absolutely want that d0v2
to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is
what the NULL scheduler decided to pick for it). Well, what you do is
use xl, set the affinity of d0v2 to pCPU 5, and you will get something
like this as a result:

Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 0 0 r-- 29.4 all / all
Domain-0 0 1 1 r-- 28.7 all / all
Domain-0 0 2 5 r-- 28.7 5 / all
Domain-0 0 3 3 r-- 28.6 all / all
Domain-0 0 4 4 r-- 28.6 all / all

So, affinity is indeed useful, even when using NULL, if you want to
diverge from the default behavior and enact a certain policy, maybe due
to the nature of your workload, the characteristics of your hardware,
or whatever.

It is not, however, necessary to set the affinity to:
- have a vCPU to always stay on one --and always the same one too--
pCPU;
- avoid that any other vCPU would ever run on that pCPU.

That is guaranteed by the NULL scheduler itself. It just can't happen
that it behaves otherwise, because the whole point of doing it was to
make it simple (and fast :-)) *exactly* by avoiding to teach it how to
do such things. It can't do it, because the code for doing it is not
there... by design! :-D

And, BTW, if you now create a domU with 1 vCPU, that vCPU will be
assigned to pCPU 2.

>
> What if I would like mydomu to be th only domain that uses pCPU 2?

Setup a cpupool with that pcpu assigned to it and put your domain into
that cpupool.

Yes, with any other scheduler that is not NULL, that's the proper way
of doing it.

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Jan 30, 2021, 9:59 AM

Post #20 of 31 (3 views)

On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:
> On 1/26/21 11:31 PM, Dario Faggioli wrote:
> > Thanks again for letting us see these logs.
>
> Thanks for the attention to this :-)
>
> Any ideas for how to solve it?
>
So, you're up for testing patches, right?

How about applying these two, and letting me know what happens? :-D

They are on top of current staging. I can try to rebase on something
else, if it's easier for you to test.

Besides being attached, they're also available here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix

I could not test them properly on ARM, as I don't have an ARM system
handy, so everything is possible really... just let me know.

It should at least build fine, AFAICT from here:

https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213

Julien, back in:

https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36445@arm.com/

you said I should hook in enter_hypervisor_head(),
leave_hypervisor_tail(). Those functions are gone now and looking at
how the code changed, this is where I figured I should put the calls
(see the second patch). But feel free to educate me otherwise.

For x86 people that are listening... Do we have, in our beloved arch,
equally handy places (i.e., right before leaving Xen for a guest and
right after entering Xen from one), preferrably in a C file, and for
all guests... like it seems to be the case on ARM?

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 31, 2021, 10:53 PM

Post #21 of 31 (3 views)

On 1/29/21 11:16 AM, Dario Faggioli wrote:
> On Fri, 2021-01-29 at 09:18 +0100, Jürgen Groß wrote:
>> On 29.01.21 09:08, Anders Törnqvist wrote:
>>>> So it using it has only downsides (and that's true in general, if
>>>> you
>>>> ask me, but particularly so if using NULL).
>>> Thanks for the feedback.
>>> I removed dom0_vcpus_pin. And, as you said, it seems to be
>>> unrelated to
>>> the problem we're discussing.
> Right. Don't put it back, and stay away from it, if you accept an
> advice. :-)
>
>>> The system still behaves the same.
>>>
> Yeah, that was expected.
>
>>> When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:
>>>
>>> Name                                ID VCPU   CPU State Time(s)
>>> Affinity (Hard / Soft)
>>> Domain-0                             0     0    0   r--      29.4
>>> all / all
>>> Domain-0                             0     1    1   r--      28.7
>>> all / all
>>> Domain-0                             0     2    2   r--      28.7
>>> all / all
>>> Domain-0                             0     3    3   r--      28.6
>>> all / all
>>> Domain-0                             0     4    4   r--      28.6
>>> all / all
>>> mydomu                              1     0    5   r--      21.6 5
>>> / all
>>>
> Right, and it makes sense for it to look like this.
>
>>> From this listing (with "all" as hard affinity for dom0) one might
>>> read
>>> it like dom0 is not pinned with hard affinity to any specific pCPUs
>>> at
>>> all but mudomu is pinned to pCPU 5.
>>> Will the dom0_max_vcpus=5 in this case guarantee that dom0 only
>>> will run
>>> on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only?
>> No.
>>
> Well, yes... if you use the NULL scheduler. Which is in use here. :-)
>
> Basically, the NULL scheduler _always_ assign one and only one vCPU to
> each pCPU. This happens at domain (well, at the vCPU) creation time.
> And it _never_ move a vCPU away from the pCPU to which it has assigned
> it.
>
> And it also _never_ change this vCPU-->pCPU assignment/relationship,
> unless some special event happens (such as, either the vCPU and/or the
> pCPU goes offline, is removed from the cpupool, you change the affinity
> [as I'll explain below], etc).
>
> This is the NULL scheduler's mission and only job, so it does that by
> default, _without_ any need for an affinity to be specified.
>
> So, how can affinity be useful in the NULL scheduler? Well, it's useful
> if you want to control and decide to what pCPU a certain vCPU should
> go.
>
> So, let's make an example. Let's say you are in this situation:
>
> Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
> Domain-0 0 0 0 r-- 29.4 all / all
> Domain-0 0 1 1 r-- 28.7 all / all
> Domain-0 0 2 2 r-- 28.7 all / all
> Domain-0 0 3 3 r-- 28.6 all / all
> Domain-0 0 4 4 r-- 28.6 all / all
>
> I.e., you have 6 CPUs, you have only dom0, dom0 has 5 vCPUs and you are
> not using dom0_vcpus_pin.
>
> The NULL scheduler has put d0v0 on pCPU 0. And d0v0 is the only vCPU
> that can run on pCPU 0, despite its affinities being "all"... because
> it's what the NULL scheduler does for you and it's the reason why one
> uses it! :-)
>
> Similarly, it has put d0v1 on pCPU 1, d0v2 on pCPU 2, d0v3 on pCPU 3
> and d0v4 on pCPU 4. And the "exclusivity guarantee" exaplained above
> for d0v0 and pCPU 0, applies to all these other vCPUs and pCPUs as
> well.
>
> With no affinity being specified, which vCPU is assigned to which pCPU
> is entirely under the NULL scheduler control. It has its heuristics
> inside, to try to do that in a smart way, but that's an
> internal/implementation detail and is not relevant here.
>
> If you now create a domU with 1 vCPU, that vCPU will be assigned to
> pCPU 5.
>
> Now, let's say that, for whatever reason, you absolutely want that d0v2
> to run on pCPU 5, instead of being assigned and run on pCPU 2 (which is
> what the NULL scheduler decided to pick for it). Well, what you do is
> use xl, set the affinity of d0v2 to pCPU 5, and you will get something
> like this as a result:
>
> Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
> Domain-0 0 0 0 r-- 29.4 all / all
> Domain-0 0 1 1 r-- 28.7 all / all
> Domain-0 0 2 5 r-- 28.7 5 / all
> Domain-0 0 3 3 r-- 28.6 all / all
> Domain-0 0 4 4 r-- 28.6 all / all
>
> So, affinity is indeed useful, even when using NULL, if you want to
> diverge from the default behavior and enact a certain policy, maybe due
> to the nature of your workload, the characteristics of your hardware,
> or whatever.
>
> It is not, however, necessary to set the affinity to:
> - have a vCPU to always stay on one --and always the same one too--
> pCPU;
> - avoid that any other vCPU would ever run on that pCPU.
>
> That is guaranteed by the NULL scheduler itself. It just can't happen
> that it behaves otherwise, because the whole point of doing it was to
> make it simple (and fast :-)) *exactly* by avoiding to teach it how to
> do such things. It can't do it, because the code for doing it is not
> there... by design! :-D
>
> And, BTW, if you now create a domU with 1 vCPU, that vCPU will be
> assigned to pCPU 2.
Wow, what a great explanation. Thank you very much!
>> What if I would like mydomu to be th only domain that uses pCPU 2?
> Setup a cpupool with that pcpu assigned to it and put your domain into
> that cpupool.
>
> Yes, with any other scheduler that is not NULL, that's the proper way
> of doing it.
>
> Regards

Re: Null scheduler and vwfi native problem [ In reply to ]

anders.tornqvist at codiax

Jan 31, 2021, 10:55 PM

Post #22 of 31 (3 views)

On 1/30/21 6:59 PM, Dario Faggioli wrote:
> On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:
>> On 1/26/21 11:31 PM, Dario Faggioli wrote:
>>> Thanks again for letting us see these logs.
>> Thanks for the attention to this :-)
>>
>> Any ideas for how to solve it?
>>
> So, you're up for testing patches, right?
Absolutely. I will apply them and be back with the results. :-)

>
> How about applying these two, and letting me know what happens? :-D
>
> They are on top of current staging. I can try to rebase on something
> else, if it's easier for you to test.
>
> Besides being attached, they're also available here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix
>
> I could not test them properly on ARM, as I don't have an ARM system
> handy, so everything is possible really... just let me know.
>
> It should at least build fine, AFAICT from here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213
>
> Julien, back in:
>
> https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36445@arm.com/
>
>
> you said I should hook in enter_hypervisor_head(),
> leave_hypervisor_tail(). Those functions are gone now and looking at
> how the code changed, this is where I figured I should put the calls
> (see the second patch). But feel free to educate me otherwise.
>
> For x86 people that are listening... Do we have, in our beloved arch,
> equally handy places (i.e., right before leaving Xen for a guest and
> right after entering Xen from one), preferrably in a C file, and for
> all guests... like it seems to be the case on ARM?
>
> Regards

Re: Null scheduler and vwfi native problem [ In reply to ]

Feb 1, 2021, 11:59 PM

Post #23 of 31 (3 views)

Hi Dario,

On 30/01/2021 17:59, Dario Faggioli wrote:
> On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:
>> On 1/26/21 11:31 PM, Dario Faggioli wrote:
>>> Thanks again for letting us see these logs.
>>
>> Thanks for the attention to this :-)
>>
>> Any ideas for how to solve it?
>>
> So, you're up for testing patches, right?
>
> How about applying these two, and letting me know what happens? :-D
>
> They are on top of current staging. I can try to rebase on something
> else, if it's easier for you to test.
>
> Besides being attached, they're also available here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix
>
> I could not test them properly on ARM, as I don't have an ARM system
> handy, so everything is possible really... just let me know.
>
> It should at least build fine, AFAICT from here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213
>
> Julien, back in:
>
> https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36445@arm.com/
>
>
> you said I should hook in enter_hypervisor_head(),
> leave_hypervisor_tail(). Those functions are gone now and looking at
> how the code changed, this is where I figured I should put the calls
> (see the second patch). But feel free to educate me otherwise.

enter_hypervisor_from_guest() and leave_hypervisor_to_guest() are the
new functions.

I have had a quick look at your place. The RCU call in
leave_hypervisor_to_guest() needs to be placed just after the last call
to check_for_pcpu_work().

Otherwise, you may be preempted and keep the RCU quiet.

The placement in enter_hypervisor_from_guest() doesn't matter too much,
although I would consider to call it as a late as possible.

Cheers,

--
Julien Grall

Re: Null scheduler and vwfi native problem [ In reply to ]

dfaggioli at suse

Feb 2, 2021, 7:03 AM

Post #24 of 31 (3 views)

On Tue, 2021-02-02 at 07:59 +0000, Julien Grall wrote:
> Hi Dario,
>
Hi!

> I have had a quick look at your place. The RCU call in
> leave_hypervisor_to_guest() needs to be placed just after the last
> call
> to check_for_pcpu_work().
>
> Otherwise, you may be preempted and keep the RCU quiet.
>
Ok, makes sense. I'll move it.

> The placement in enter_hypervisor_from_guest() doesn't matter too
> much,
> although I would consider to call it as a late as possible.
>
Mmmm... Can I ask why? In fact, I would have said "as soon as
possible".

Thanks and Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Re: Null scheduler and vwfi native problem [ In reply to ]

Feb 2, 2021, 7:23 AM

Post #25 of 31 (3 views)

(Adding Andrew, Jan, Juergen for visibility)

Hi Dario,

On 02/02/2021 15:03, Dario Faggioli wrote:
> On Tue, 2021-02-02 at 07:59 +0000, Julien Grall wrote:
>> Hi Dario,
>>
>> I have had a quick look at your place. The RCU call in
>> leave_hypervisor_to_guest() needs to be placed just after the last
>> call
>> to check_for_pcpu_work().
>>
>> Otherwise, you may be preempted and keep the RCU quiet.
>>
> Ok, makes sense. I'll move it.
>
>> The placement in enter_hypervisor_from_guest() doesn't matter too
>> much,
>> although I would consider to call it as a late as possible.
>>
> Mmmm... Can I ask why? In fact, I would have said "as soon as
> possible".

Because those functions only access data for the current vCPU/domain.
This is already protected by the fact that the domain is running.

By leaving the "quiesce" mode later, you give an opportunity to the RCU
to release memory earlier.

In reality, it is probably still too early as a pCPU can be considered
quiesced until a call to rcu_lock*() (such rcu_lock_domain()).

But this would require some investigation to check if we effectively
protect all the region with the RCU helpers. This is likely too
complicated for 4.15.

Cheers,

--
Julien Grall