Mailing List Archive

1 2  View All
Re: Null scheduler and vwfi native problem [ In reply to ]
Hi again,

On Tue, 2021-02-02 at 15:23 +0000, Julien Grall wrote:
> (Adding Andrew, Jan, Juergen for visibility)
>
Thanks! :-)

> On 02/02/2021 15:03, Dario Faggioli wrote:
> > On Tue, 2021-02-02 at 07:59 +0000, Julien Grall wrote:
> > > The placement in enter_hypervisor_from_guest() doesn't matter too
> > > much,
> > > although I would consider to call it as a late as possible.
> > >
> > Mmmm... Can I ask why? In fact, I would have said "as soon as
> > possible".
>
> Because those functions only access data for the current vCPU/domain.
> This is already protected by the fact that the domain is running.
>
Mmm.. ok, yes, I think it makes sense.

> By leaving the "quiesce" mode later, you give an opportunity to the
> RCU
> to release memory earlier.
>
Yeah. What I wanted to be sure is that we put the CPU "back in the
race" :-) before any current or future use of RCUs.

> In reality, it is probably still too early as a pCPU can be
> considered
> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
>
Well, yes, in theory, we could track down which is the first RCU read
side crit. section on this path, and put the call right before that (if
I understood what you mean).

To me, however, this looks indeed too complex and difficult to
maintain, not only for 4.15 but in general. E.g., suppose we find such
a use of RCUs in function foo() called by bar() called by
hypervisor_enter_from_guest().

If someone at some points wants to use RCUs in bar(), how does she know
that she should also move the call to rcu_quiet_enter() from foo() to
there?

So, yes, I'll move it a little down, but still within
hypervisor_enter_from_guest().

In the meanwhile, I had a quick chat with Jourgen about x86. In fact, I
had a look and was not finding a place where to put the
rcu_quiet_{exit,enter}() calls as convenient as you have here on ARM.
I.e., two nice C functions that we traverse for all kind of guests, for
HVM and SVM, etc.

Actually, I was quite skeptical about it but, you know, one can hope!
Juergen confirmed that there's no such things, so I'll look at the
various entry.S files for the proper spots.

Regards
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
Re: Null scheduler and vwfi native problem [ In reply to ]
Hi,

On 03/02/2021 07:31, Dario Faggioli wrote:
> On Tue, 2021-02-02 at 15:23 +0000, Julien Grall wrote:
>> In reality, it is probably still too early as a pCPU can be
>> considered
>> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
>>
> Well, yes, in theory, we could track down which is the first RCU read
> side crit. section on this path, and put the call right before that (if
> I understood what you mean).

Oh, that's not what I meant. This will indeed be far more complex than I
originally had in mind.

AFAIU, the RCU uses critical section to protect data. So the "entering"
could be used as "the pCPU is not quiesced" and "exiting" could be used
as "the pCPU is quiesced".

The concern with my approach is we would need to make sure that Xen
correctly uses the rcu helpers. I know Juergen worked on that recently,
but I don't know whether this is fully complete.

Cheers,

--
Julien Grall
Re: Null scheduler and vwfi native problem [ In reply to ]
On 03.02.21 10:19, Julien Grall wrote:
> Hi,
>
> On 03/02/2021 07:31, Dario Faggioli wrote:
>> On Tue, 2021-02-02 at 15:23 +0000, Julien Grall wrote:
>>> In reality, it is probably still too early as a pCPU can be
>>> considered
>>> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
>>>
>> Well, yes, in theory, we could track down which is the first RCU read
>> side crit. section on this path, and put the call right before that (if
>> I understood what you mean).
>
> Oh, that's not what I meant. This will indeed be far more complex than I
> originally had in mind.
>
> AFAIU, the RCU uses critical section to protect data. So the "entering"
> could be used as "the pCPU is not quiesced" and "exiting" could be used
> as "the pCPU is quiesced".
>
> The concern with my approach is we would need to make sure that Xen
> correctly uses the rcu helpers. I know Juergen worked on that recently,
> but I don't know whether this is fully complete.

I think it is complete, but I can't be sure, of course.

One bit missing (for catching some wrong uses of the helpers) is this
patch:

https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html

I don't remember why it hasn't been taken, but I think there was a
specific reason for that.


Juergen
Re: Null scheduler and vwfi native problem [ In reply to ]
Hi Juergen,

On 03/02/2021 11:00, Jürgen Groß wrote:
> On 03.02.21 10:19, Julien Grall wrote:
>> Hi,
>>
>> On 03/02/2021 07:31, Dario Faggioli wrote:
>>> On Tue, 2021-02-02 at 15:23 +0000, Julien Grall wrote:
>>>> In reality, it is probably still too early as a pCPU can be
>>>> considered
>>>> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
>>>>
>>> Well, yes, in theory, we could track down which is the first RCU read
>>> side crit. section on this path, and put the call right before that (if
>>> I understood what you mean).
>>
>> Oh, that's not what I meant. This will indeed be far more complex than
>> I originally had in mind.
>>
>> AFAIU, the RCU uses critical section to protect data. So the
>> "entering" could be used as "the pCPU is not quiesced" and "exiting"
>> could be used as "the pCPU is quiesced".
>>
>> The concern with my approach is we would need to make sure that Xen
>> correctly uses the rcu helpers. I know Juergen worked on that
>> recently, but I don't know whether this is fully complete.
>
> I think it is complete, but I can't be sure, of course.
>
> One bit missing (for catching some wrong uses of the helpers) is this
> patch:
>
> https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html
>
> I don't remember why it hasn't been taken, but I think there was a
> specific reason for that.

Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit
puzzled to why this wasn't committed... I had to go to v6 and notice the
following message:

"albeit to be honest I'm not fully convinced we need to go this far."

Was the implication that his reviewed-by was conditional to someone else
answering to the e-mail?

Cheers,

--
Julien Grall
Re: Null scheduler and vwfi native problem [ In reply to ]
On 03.02.21 12:20, Julien Grall wrote:
> Hi Juergen,
>
> On 03/02/2021 11:00, Jürgen Groß wrote:
>> On 03.02.21 10:19, Julien Grall wrote:
>>> Hi,
>>>
>>> On 03/02/2021 07:31, Dario Faggioli wrote:
>>>> On Tue, 2021-02-02 at 15:23 +0000, Julien Grall wrote:
>>>>> In reality, it is probably still too early as a pCPU can be
>>>>> considered
>>>>> quiesced until a call to rcu_lock*() (such rcu_lock_domain()).
>>>>>
>>>> Well, yes, in theory, we could track down which is the first RCU read
>>>> side crit. section on this path, and put the call right before that (if
>>>> I understood what you mean).
>>>
>>> Oh, that's not what I meant. This will indeed be far more complex
>>> than I originally had in mind.
>>>
>>> AFAIU, the RCU uses critical section to protect data. So the
>>> "entering" could be used as "the pCPU is not quiesced" and "exiting"
>>> could be used as "the pCPU is quiesced".
>>>
>>> The concern with my approach is we would need to make sure that Xen
>>> correctly uses the rcu helpers. I know Juergen worked on that
>>> recently, but I don't know whether this is fully complete.
>>
>> I think it is complete, but I can't be sure, of course.
>>
>> One bit missing (for catching some wrong uses of the helpers) is this
>> patch:
>>
>> https://lists.xen.org/archives/html/xen-devel/2020-03/msg01759.html
>>
>> I don't remember why it hasn't been taken, but I think there was a
>> specific reason for that.
>
> Looking at v8 and the patch is suitably reviewed by Jan. So I am a bit
> puzzled to why this wasn't committed... I had to go to v6 and notice the
> following message:
>
> "albeit to be honest I'm not fully convinced we need to go this far."
>
> Was the implication that his reviewed-by was conditional to someone else
> answering to the e-mail?

I have no record for that being the case.

Patches 1-3 of that series were needed for getting rid of
stop_machine_run() in rcu handling and to fix other problems. Patch 4
was adding some additional ASSERT()s for making sure no potential
deadlocks due to wrong rcu usage could creep in again.

Patch 5 was more a "nice to have" addition in order to avoid any
wrong usage of rcu which should have no real negative impact on the
system stability.

So I believe Jan as the committer didn't want to commit it himself, but
was fine with the overall idea and implementation.

I still think for code sanity it would be nice, but I was rather busy
with Xenstore and event channel security work at that time, so I didn't
urge anyone to take this patch.


Juergen
Re: Null scheduler and vwfi native problem [ In reply to ]
On 1/30/21 6:59 PM, Dario Faggioli wrote:
> On Fri, 2021-01-29 at 09:08 +0100, Anders Törnqvist wrote:
>> On 1/26/21 11:31 PM, Dario Faggioli wrote:
>>> Thanks again for letting us see these logs.
>> Thanks for the attention to this :-)
>>
>> Any ideas for how to solve it?
>>
> So, you're up for testing patches, right?
>
> How about applying these two, and letting me know what happens? :-D

Great work guys!

Hi. Now I got the time to test the patches.

They was not possible to apply without fail on the code version I am
using which is commit b64b8df622963accf85b227e468fe12b2d56c128 from
https://source.codeaurora.org/external/imx/imx-xen.

I did some editing to get them into my code. I think I should have
removed some sched_tick_suspend/sched_tick_resume calls also.
See the attached patches for what I have applied on the code.

Anyway, after applying the patches including the original
rcu-quiesc-patch.patch the destroy of the domu seems to work.
I have rebooted, only destroyed-created and used Xen watchdog to reboot
the domu in total about 20 times and so far it has nicely destroyed and
the been able to start a new instance of the domu.

So it looks promising although my edited patches probably need some fixing.


>
> They are on top of current staging. I can try to rebase on something
> else, if it's easier for you to test.
>
> Besides being attached, they're also available here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/tree/rcu-quiet-fix
>
> I could not test them properly on ARM, as I don't have an ARM system
> handy, so everything is possible really... just let me know.
>
> It should at least build fine, AFAICT from here:
>
> https://gitlab.com/xen-project/people/dfaggioli/xen/-/pipelines/249101213
>
> Julien, back in:
>
> https://lore.kernel.org/xen-devel/315740e1-3591-0e11-923a-718e06c36445@arm.com/
>
>
> you said I should hook in enter_hypervisor_head(),
> leave_hypervisor_tail(). Those functions are gone now and looking at
> how the code changed, this is where I figured I should put the calls
> (see the second patch). But feel free to educate me otherwise.
>
> For x86 people that are listening... Do we have, in our beloved arch,
> equally handy places (i.e., right before leaving Xen for a guest and
> right after entering Xen from one), preferrably in a C file, and for
> all guests... like it seems to be the case on ARM?
>
> Regards

1 2  View All