Mailing List Archive

1 2  View All
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote:
>
>
> On 10.08.20 22:00, Julien Grall wrote:
>
> Hi Julien
>
> >
> >>>
> >>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
> >>>>> */
> >>>>> void leave_hypervisor_to_guest(void)
> >>>>> {
> >>>>> +#ifdef CONFIG_IOREQ_SERVER
> >>>>> + /*
> >>>>> + * XXX: Check the return. Shall we call that in
> >>>>> + * continue_running and context_switch instead?
> >>>>> + * The benefits would be to avoid calling
> >>>>> + * handle_hvm_io_completion on every return.
> >>>>> + */
> >>>>
> >>>> Yeah, that could be a simple and good optimization
> >>>
> >>> Well, it is not simple as it is sounds :).
> >>> handle_hvm_io_completion() is the function in charge to mark the
> >>> vCPU as waiting for I/O. So we would at least need to split the
> >>> function.
> >>>
> >>> I wrote this TODO because I wasn't sure about the complexity of
> >>> handle_hvm_io_completion(current). Looking at it again, the main
> >>> complexity is the looping over the IOREQ servers.
> >>>
> >>> I think it would be better to optimize handle_hvm_io_completion()
> >>> rather than trying to hack the context_switch() or continue_running().
> >> Well, is the idea in proposed dirty test patch below close to what
> >> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
> >> actions if vcpu's domain doesn't have ioreq_server, alternatively
> >> the check could be moved out of handle_hvm_io_completion() to avoid
> >> calling that function at all.
> >
> > This looks ok to me.
> >
> >> BTW, TODO also suggests checking the return value of
> >> handle_hvm_io_completion(), but I am completely sure we can simply
> >> just return from leave_hypervisor_to_guest() at this point. Could you
> >> please share your opinion?
> >
> > From my understanding, handle_hvm_io_completion() may return false if
> > there is pending I/O or a failure.
>
> It seems, yes
>
>
> >
> > In the former case, I think we want to call handle_hvm_io_completion()
> > later on. Possibly after we call do_softirq().
> >
> > I am wondering whether check_for_vcpu_work() could return whether
> > there are more work todo on the behalf of the vCPU.
> >
> > So we could have:
> >
> > do
> > {
> > check_for_pcpu_work();
> > } while (check_for_vcpu_work())
> >
> > The implementation of check_for_vcpu_work() would be:
> >
> > if ( !handle_hvm_io_completion() )
> > return true;
> >
> > /* Rest of the existing code */
> >
> > return false;
>
> Thank you, will give it a try.
>
> Can we behave the same way for both "pending I/O" and "failure" or we
> need to distinguish them?

We don't need to distinguish them. In both cases, we will want to
process softirqs. In all the failure cases, the domain will have
crashed. Therefore the vCPU will be unscheduled.

>
> Probably we need some sort of safe timeout/number attempts in order to
> not spin forever?

Well, anything based on timeout/number of attempts is flaky. How do
you know whether the I/O is just taking a "long time" to complete?

But a vCPU shouldn't continue until an I/O has completed. This is
nothing very different than what a processor would do.

In Xen case, if an I/O never completes then it most likely means that
something went horribly wrong with the Device Emulator. So it is most
likely not safe to continue. In HW, when there is a device failure,
the OS may receive an SError (this is implementation defined) and
could act accordingly if it is able to recognize the issue.

It *might* be possible to send a virtual SError but there are a couple
of issues with it:
* How do you detect a failure?
* SErrors are implementations defined. You would need to teach
your OS (or the firmware) how to deal with them.

I would expect quite a bit of effort in order to design and implement
it. For now, it is probably best to just let the vCPU spin forever.

This wouldn't be an issue for Xen as do_softirq() would be called at
every loop.

Cheers,
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 11.08.20 01:37, Julien Grall wrote:

Hi Julien

> On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote:
>>
>> On 10.08.20 22:00, Julien Grall wrote:
>>
>> Hi Julien
>>
>>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>>>> */
>>>>>>> void leave_hypervisor_to_guest(void)
>>>>>>> {
>>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>>> + /*
>>>>>>> + * XXX: Check the return. Shall we call that in
>>>>>>> + * continue_running and context_switch instead?
>>>>>>> + * The benefits would be to avoid calling
>>>>>>> + * handle_hvm_io_completion on every return.
>>>>>>> + */
>>>>>> Yeah, that could be a simple and good optimization
>>>>> Well, it is not simple as it is sounds :).
>>>>> handle_hvm_io_completion() is the function in charge to mark the
>>>>> vCPU as waiting for I/O. So we would at least need to split the
>>>>> function.
>>>>>
>>>>> I wrote this TODO because I wasn't sure about the complexity of
>>>>> handle_hvm_io_completion(current). Looking at it again, the main
>>>>> complexity is the looping over the IOREQ servers.
>>>>>
>>>>> I think it would be better to optimize handle_hvm_io_completion()
>>>>> rather than trying to hack the context_switch() or continue_running().
>>>> Well, is the idea in proposed dirty test patch below close to what
>>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
>>>> actions if vcpu's domain doesn't have ioreq_server, alternatively
>>>> the check could be moved out of handle_hvm_io_completion() to avoid
>>>> calling that function at all.
>>> This looks ok to me.
>>>
>>>> BTW, TODO also suggests checking the return value of
>>>> handle_hvm_io_completion(), but I am completely sure we can simply
>>>> just return from leave_hypervisor_to_guest() at this point. Could you
>>>> please share your opinion?
>>> From my understanding, handle_hvm_io_completion() may return false if
>>> there is pending I/O or a failure.
>> It seems, yes
>>
>>
>>> In the former case, I think we want to call handle_hvm_io_completion()
>>> later on. Possibly after we call do_softirq().
>>>
>>> I am wondering whether check_for_vcpu_work() could return whether
>>> there are more work todo on the behalf of the vCPU.
>>>
>>> So we could have:
>>>
>>> do
>>> {
>>> check_for_pcpu_work();
>>> } while (check_for_vcpu_work())
>>>
>>> The implementation of check_for_vcpu_work() would be:
>>>
>>> if ( !handle_hvm_io_completion() )
>>> return true;
>>>
>>> /* Rest of the existing code */
>>>
>>> return false;
>> Thank you, will give it a try.
>>
>> Can we behave the same way for both "pending I/O" and "failure" or we
>> need to distinguish them?
> We don't need to distinguish them. In both cases, we will want to
> process softirqs. In all the failure cases, the domain will have
> crashed. Therefore the vCPU will be unscheduled.

Got it.


>> Probably we need some sort of safe timeout/number attempts in order to
>> not spin forever?
> Well, anything based on timeout/number of attempts is flaky. How do
> you know whether the I/O is just taking a "long time" to complete?
>
> But a vCPU shouldn't continue until an I/O has completed. This is
> nothing very different than what a processor would do.
>
> In Xen case, if an I/O never completes then it most likely means that
> something went horribly wrong with the Device Emulator. So it is most
> likely not safe to continue. In HW, when there is a device failure,
> the OS may receive an SError (this is implementation defined) and
> could act accordingly if it is able to recognize the issue.
>
> It *might* be possible to send a virtual SError but there are a couple
> of issues with it:
> * How do you detect a failure?
> * SErrors are implementations defined. You would need to teach
> your OS (or the firmware) how to deal with them.
>
> I would expect quite a bit of effort in order to design and implement
> it. For now, it is probably best to just let the vCPU spin forever.
>
> This wouldn't be an issue for Xen as do_softirq() would be called at
> every loop.

 Thank you for clarification. Fair enough and sounds reasonable.


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 05.08.20 12:32, Julien Grall wrote:

Hi Julien, Stefano

>
>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>> index 5fdb6e8..5823f11 100644
>>> --- a/xen/include/asm-arm/p2m.h
>>> +++ b/xen/include/asm-arm/p2m.h
>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct
>>> domain *d, unsigned long gfn,
>>>                                           mfn_t mfn)
>>>   {
>>>       /*
>>> -     * NOTE: If this is implemented then proper reference counting of
>>> -     *       foreign entries will need to be implemented.
>>> +     * XXX: handle properly reference. It looks like the page may
>>> not always
>>> +     * belong to d.
>>
>> Just as a reference, and without taking away anything from the comment,
>> I think that QEMU is doing its own internal reference counting for these
>> mappings.
>
> I am not sure how this matters here? We can't really trust the DM to
> do the right thing if it is not running in dom0.
>
> But, IIRC, the problem is some of the pages doesn't belong to do a
> domain, so it is not possible to treat them as foreign mapping (e.g.
> you wouldn't be able to grab a reference). This investigation was done
> a couple of years ago, so this may have changed in recent Xen.

Well, emulator is going to be used in driver domain, so this TODO must
be resolved. I suspect that the check for a hardware domain in
acquire_resource() which I skipped in a hackish way [1] could be simply
removed once proper reference counting is implemented in Xen, correct?

Could you please provide some pointers on that problem? Maybe some
questions need to be investigated again? Unfortunately, it is not
completely clear to me the direction to follow...

***
I am wondering whether the similar problem exists on x86 as well? The
FIXME tag (before checking for a hardware domain in acquire_resource())
in the common code makes me think it is a common issue. From other side
x86's
implementation of set_foreign_p2m_entry() is exists unlike Arm's one
(which returned -EOPNOTSUPP so far). Or these are unrelated?
***

[1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html



--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 11/08/2020 18:09, Oleksandr wrote:
>
> On 05.08.20 12:32, Julien Grall wrote:
>
> Hi Julien, Stefano
>
>>
>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>> index 5fdb6e8..5823f11 100644
>>>> --- a/xen/include/asm-arm/p2m.h
>>>> +++ b/xen/include/asm-arm/p2m.h
>>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct
>>>> domain *d, unsigned long gfn,
>>>>                                           mfn_t mfn)
>>>>   {
>>>>       /*
>>>> -     * NOTE: If this is implemented then proper reference counting of
>>>> -     *       foreign entries will need to be implemented.
>>>> +     * XXX: handle properly reference. It looks like the page may
>>>> not always
>>>> +     * belong to d.
>>>
>>> Just as a reference, and without taking away anything from the comment,
>>> I think that QEMU is doing its own internal reference counting for these
>>> mappings.
>>
>> I am not sure how this matters here? We can't really trust the DM to
>> do the right thing if it is not running in dom0.
>>
>> But, IIRC, the problem is some of the pages doesn't belong to do a
>> domain, so it is not possible to treat them as foreign mapping (e.g.
>> you wouldn't be able to grab a reference). This investigation was done
>> a couple of years ago, so this may have changed in recent Xen.
>
> Well, emulator is going to be used in driver domain, so this TODO must
> be resolved. I suspect that the check for a hardware domain in
> acquire_resource() which I skipped in a hackish way [1] could be simply
> removed once proper reference counting is implemented in Xen, correct?

It depends how you are going to solve it. If you manage to solve it in a
generic way, then yes you could resolve. If not (i.e. it is solved in an
arch-specific way), we would need to keep the check on arch that are not
able to deal with it. See more below.

>
> Could you please provide some pointers on that problem? Maybe some
> questions need to be investigated again? Unfortunately, it is not
> completely clear to me the direction to follow...
>
> ***
> I am wondering whether the similar problem exists on x86 as well?

It is somewhat different. On Arm, we are able to handle properly foreign
mapping (i.e. mapping page from a another domain) as we would grab a
reference on the page (see XENMAPSPACE_gmfn_foreign handling in
xenmem_add_to_physmap()). The reference will then be released when the
entry is removed from the P2M (see p2m_free_entry()).

If all the pages given to set_foreign_p2m_entry() belong to a domain,
then you could use the same approach.

However, I remember to run into some issues in some of the cases. I had
a quick looked at the caller and I wasn't able to find any use cases
that may be an issue.

The refcounting in the IOREQ code has changed after XSA-276 (this was
found while working on the Arm port). Probably the best way to figure
out if it works would be to try it and see if it fails.

Note that set_foreign_p2m_entry() doesn't have a parameter for the
foreign domain. You would need to add a extra parameter for this.

> The
> FIXME tag (before checking for a hardware domain in acquire_resource())
> in the common code makes me think it is a common issue. From other side
> x86's
> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
> (which returned -EOPNOTSUPP so far). Or these are unrelated?

At the moment, x86 doesn't support refcounting for foreign mapping.
Hence the reason to restrict them to the hardware domain.

> ***
>
> [1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html
Cheers,

--
Julien Grall
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
Hi Julien


>>>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>>>>>     */
>>>>>>>>    void leave_hypervisor_to_guest(void)
>>>>>>>>    {
>>>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>>>> +    /*
>>>>>>>> +     * XXX: Check the return. Shall we call that in
>>>>>>>> +     * continue_running and context_switch instead?
>>>>>>>> +     * The benefits would be to avoid calling
>>>>>>>> +     * handle_hvm_io_completion on every return.
>>>>>>>> +     */
>>>>>>> Yeah, that could be a simple and good optimization
>>>>>> Well, it is not simple as it is sounds :).
>>>>>> handle_hvm_io_completion() is the function in charge to mark the
>>>>>> vCPU as waiting for I/O. So we would at least need to split the
>>>>>> function.
>>>>>>
>>>>>> I wrote this TODO because I wasn't sure about the complexity of
>>>>>> handle_hvm_io_completion(current). Looking at it again, the main
>>>>>> complexity is the looping over the IOREQ servers.
>>>>>>
>>>>>> I think it would be better to optimize handle_hvm_io_completion()
>>>>>> rather than trying to hack the context_switch() or
>>>>>> continue_running().
>>>>> Well, is the idea in proposed dirty test patch below close to what
>>>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
>>>>> actions if vcpu's domain doesn't have ioreq_server, alternatively
>>>>> the check could be moved out of handle_hvm_io_completion() to avoid
>>>>> calling that function at all.
>>>> This looks ok to me.
>>>>
>>>>> BTW, TODO also suggests checking the return value of
>>>>> handle_hvm_io_completion(), but I am completely sure we can simply
>>>>> just return from leave_hypervisor_to_guest() at this point. Could you
>>>>> please share your opinion?
>>>>  From my understanding, handle_hvm_io_completion() may return false if
>>>> there is pending I/O or a failure.
>>> It seems, yes
>>>
>>>
>>>> In the former case, I think we want to call handle_hvm_io_completion()
>>>> later on. Possibly after we call do_softirq().
>>>>
>>>> I am wondering whether check_for_vcpu_work() could return whether
>>>> there are more work todo on the behalf of the vCPU.
>>>>
>>>> So we could have:
>>>>
>>>> do
>>>> {
>>>>    check_for_pcpu_work();
>>>> } while (check_for_vcpu_work())
>>>>
>>>> The implementation of check_for_vcpu_work() would be:
>>>>
>>>> if ( !handle_hvm_io_completion() )
>>>>    return true;
>>>>
>>>> /* Rest of the existing code */
>>>>
>>>> return false;
>>> Thank you, will give it a try.
>>>
>>> Can we behave the same way for both "pending I/O" and "failure" or we
>>> need to distinguish them?
>> We don't need to distinguish them. In both cases, we will want to
>> process softirqs. In all the failure cases, the domain will have
>> crashed. Therefore the vCPU will be unscheduled.
>
> Got it.
>
>
>>> Probably we need some sort of safe timeout/number attempts in order to
>>> not spin forever?
>> Well, anything based on timeout/number of attempts is flaky. How do
>> you know whether the I/O is just taking a "long time" to complete?
>>
>> But a vCPU shouldn't continue until an I/O has completed. This is
>> nothing very different than what a processor would do.
>>
>> In Xen case, if an I/O never completes then it most likely means that
>> something went horribly wrong with the Device Emulator. So it is most
>> likely not safe to continue. In HW, when there is a device failure,
>> the OS may receive an SError (this is implementation defined) and
>> could act accordingly if it is able to recognize the issue.
>>
>> It *might* be possible to send a virtual SError but there are a couple
>> of issues with it:
>>       * How do you detect a failure?
>>       * SErrors are implementations defined. You would need to teach
>> your OS (or the firmware) how to deal with them.
>>
>> I would expect quite a bit of effort in order to design and implement
>> it. For now, it is probably best to just let the vCPU spin forever.
>>
>> This wouldn't be an issue for Xen as do_softirq() would be called at
>> every loop.
>
>  Thank you for clarification. Fair enough and sounds reasonable.
I added logic to properly handle the return value of
handle_hvm_io_completion() as you had suggested. For test purpose I
simulated handle_hvm_io_completion() to return false sometimes
(I couldn't detect real "pending I/O" failure during testing) to see how
new logic behaved. I assume I can take this solution for non-RFC series (?)


---
 xen/arch/arm/traps.c         | 36 ++++++++++++++++++++++--------------
 xen/common/hvm/ioreq.c       |  9 ++++++++-
 xen/include/asm-arm/domain.h |  1 +
 xen/include/xen/hvm/ioreq.h  |  5 +++++
 4 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 974c744..f74b514 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2264,12 +2264,26 @@ static void check_for_pcpu_work(void)
  * Process pending work for the vCPU. Any call should be fast or
  * implement preemption.
  */
-static void check_for_vcpu_work(void)
+static bool check_for_vcpu_work(void)
 {
     struct vcpu *v = current;

+#ifdef CONFIG_IOREQ_SERVER
+    if ( hvm_domain_has_ioreq_server(v->domain) )
+    {
+        bool handled;
+
+        local_irq_enable();
+        handled = handle_hvm_io_completion(v);
+        local_irq_disable();
+
+        if ( !handled )
+            return true;
+    }
+#endif
+
     if ( likely(!v->arch.need_flush_to_ram) )
-        return;
+        return false;

     /*
      * Give a chance for the pCPU to process work before handling the vCPU
@@ -2280,6 +2294,8 @@ static void check_for_vcpu_work(void)
     local_irq_enable();
     p2m_flush_vm(v);
     local_irq_disable();
+
+    return false;
 }

 /*
@@ -2290,20 +2306,12 @@ static void check_for_vcpu_work(void)
  */
 void leave_hypervisor_to_guest(void)
 {
-#ifdef CONFIG_IOREQ_SERVER
-    /*
-     * XXX: Check the return. Shall we call that in
-     * continue_running and context_switch instead?
-     * The benefits would be to avoid calling
-     * handle_hvm_io_completion on every return.
-     */
-    local_irq_enable();
-    handle_hvm_io_completion(current);
-#endif
     local_irq_disable();

-    check_for_vcpu_work();
-    check_for_pcpu_work();
+    do
+    {
+        check_for_pcpu_work();
+    } while ( check_for_vcpu_work() );

     vgic_sync_to_lrs();

diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
index 7e1fa23..81b41ab 100644
--- a/xen/common/hvm/ioreq.c
+++ b/xen/common/hvm/ioreq.c
@@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d,
unsigned int id,
                              struct hvm_ioreq_server *s)
 {
     ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
+           (s && !d->arch.hvm.ioreq_server.server[id]));

     d->arch.hvm.ioreq_server.server[id] = s;
+
+    if ( s )
+        d->arch.hvm.ioreq_server.nr_servers ++;
+    else
+        d->arch.hvm.ioreq_server.nr_servers --;
 }

 /*
@@ -1415,6 +1421,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool
buffered)
 void hvm_ioreq_init(struct domain *d)
 {
     spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+    d->arch.hvm.ioreq_server.nr_servers = 0;

     arch_hvm_ioreq_init(d);
 }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 6a01d69..484bd1a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -68,6 +68,7 @@ struct hvm_domain
     struct {
         spinlock_t              lock;
         struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
     } ioreq_server;

     bool_t qemu_mapcache_invalidate;
diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
index 40b7b5e..8f78852 100644
--- a/xen/include/xen/hvm/ioreq.h
+++ b/xen/include/xen/hvm/ioreq.h
@@ -23,6 +23,11 @@

 #include <asm/hvm/ioreq.h>

+static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
+{
+    return (d->arch.hvm.ioreq_server.nr_servers > 0);
+}
+
 #define GET_IOREQ_SERVER(d, id) \
     (d)->arch.hvm.ioreq_server.server[id]

--
2.7.4


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 11.08.20 20:50, Julien Grall wrote:

Hi Julien

>
>
> On 11/08/2020 18:09, Oleksandr wrote:
>>
>> On 05.08.20 12:32, Julien Grall wrote:
>>
>> Hi Julien, Stefano
>>
>>>
>>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>>> index 5fdb6e8..5823f11 100644
>>>>> --- a/xen/include/asm-arm/p2m.h
>>>>> +++ b/xen/include/asm-arm/p2m.h
>>>>> @@ -385,10 +385,11 @@ static inline int
>>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>>>                                           mfn_t mfn)
>>>>>   {
>>>>>       /*
>>>>> -     * NOTE: If this is implemented then proper reference
>>>>> counting of
>>>>> -     *       foreign entries will need to be implemented.
>>>>> +     * XXX: handle properly reference. It looks like the page may
>>>>> not always
>>>>> +     * belong to d.
>>>>
>>>> Just as a reference, and without taking away anything from the
>>>> comment,
>>>> I think that QEMU is doing its own internal reference counting for
>>>> these
>>>> mappings.
>>>
>>> I am not sure how this matters here? We can't really trust the DM to
>>> do the right thing if it is not running in dom0.
>>>
>>> But, IIRC, the problem is some of the pages doesn't belong to do a
>>> domain, so it is not possible to treat them as foreign mapping (e.g.
>>> you wouldn't be able to grab a reference). This investigation was
>>> done a couple of years ago, so this may have changed in recent Xen.
>>
>> Well, emulator is going to be used in driver domain, so this TODO
>> must be resolved. I suspect that the check for a hardware domain in
>> acquire_resource() which I skipped in a hackish way [1] could be
>> simply removed once proper reference counting is implemented in Xen,
>> correct?
>
> It depends how you are going to solve it. If you manage to solve it in
> a generic way, then yes you could resolve. If not (i.e. it is solved
> in an arch-specific way), we would need to keep the check on arch that
> are not able to deal with it. See more below.
>
>>
>> Could you please provide some pointers on that problem? Maybe some
>> questions need to be investigated again? Unfortunately, it is not
>> completely clear to me the direction to follow...
>>
>> ***
>> I am wondering whether the similar problem exists on x86 as well?
>
> It is somewhat different. On Arm, we are able to handle properly
> foreign mapping (i.e. mapping page from a another domain) as we would
> grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
> xenmem_add_to_physmap()). The reference will then be released when the
> entry is removed from the P2M (see p2m_free_entry()).
>
> If all the pages given to set_foreign_p2m_entry() belong to a domain,
> then you could use the same approach.
>
> However, I remember to run into some issues in some of the cases. I
> had a quick looked at the caller and I wasn't able to find any use
> cases that may be an issue.
>
> The refcounting in the IOREQ code has changed after XSA-276 (this was
> found while working on the Arm port). Probably the best way to figure
> out if it works would be to try it and see if it fails.
>
> Note that set_foreign_p2m_entry() doesn't have a parameter for the
> foreign domain. You would need to add a extra parameter for this.
>
>> The FIXME tag (before checking for a hardware domain in
>> acquire_resource()) in the common code makes me think it is a common
>> issue. From other side x86's
>> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
>> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>
> At the moment, x86 doesn't support refcounting for foreign mapping.
> Hence the reason to restrict them to the hardware domain.


Thank you for the pointers!


I checked that all pages given to set_foreign_p2m_entry() belonged to a
domain (at least in my use-case). I noticed two calls for acquiring
resource at the DomU creation time, the first call was for grant table
(single gfn)
and the second for ioreq server which carried 2 gfns (for shared and
buffered rings I assume). For the test purpose, I passed these gfns to
get_page_from_gfn() in order to grab references on the pages, after that
I tried to destroy DomU without calling put_page() for these pages. The
fact that I couldn't destroy DomU completely (a zombie domain was
observed) made me think that references were still taken, so worked as
expected.


I implemented a test patch (which uses approach from
xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
whether it would work.


---
 xen/arch/arm/p2m.c        | 30 ++++++++++++++++++++++++++++++
 xen/common/memory.c       |  2 +-
 xen/include/asm-arm/p2m.h | 12 ++----------
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e9ccba8..7359715 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
gfn_t gfn, mfn_t mfn,
     return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
 }

+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page;
+    p2m_type_t p2mt;
+    int rc;
+
+    /*
+     * Take reference to the foreign domain page. Reference will be
released
+     * in p2m_put_l3_page().
+     */
+    page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( p2m_is_ram(p2mt) )
+        p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
p2m_map_foreign_ro;
+    else
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
+    if ( rc )
+        put_page(page);
+
+    return 0;
+}
+
 static struct page_info *p2m_allocate_root(void)
 {
     struct page_info *page;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8d9f0a8..1de1d4f 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1171,7 +1171,7 @@ static int acquire_resource(

         for ( i = 0; !rc && i < xmar.nr_frames; i++ )
         {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                        _mfn(mfn_list[i]));
             /* rc should be -EIO for any iteration other than the first */
             if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5823f11..53ce373 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
unsigned int order)
     return gfn_add(gfn, 1UL << order);
 }

-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * XXX: handle properly reference. It looks like the page may not
always
-     * belong to d.
-     */
-
-    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
-}
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn,  mfn_t mfn);

 /*
  * A vCPU has cache enabled only when the MMU is enabled and data cache
--
2.7.4


And with that patch applied I was facing a BUG when destroying/rebooting
DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
that BUG:


Rebooting domain 2
root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    3
(XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
(XEN) LR:     0000000000246ef0
(XEN) SP:     0000800725eafd80
(XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
(XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
(XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
(XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
(XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
(XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
(XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
(XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
(XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
(XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafd80:
(XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
(XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
(XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
(XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
(XEN)    ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
(XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
(XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
(XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
(XEN)    0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
(XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
(XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
(XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
(XEN)    0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
(XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
(XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
(XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
(XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
(XEN)    0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
(XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
(XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...



Either I did something wrong (most likely) or there is an issue with
page ref-counting in the IOREQ code. I am still trying to understand
what is going on.
Some notes on that:
1. I checked that put_page() was called for these pages in
p2m_put_l3_page() when destroying domain. This happened before
hvm_free_ioreq_mfn() execution.
2. There was no BUG detected if I passed "p2m_ram_rw" instead of
"p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
be fully destroyed because of the reference taken.

--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 13/08/2020 19:41, Oleksandr wrote:
> Rebooting domain 2
> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
> (XEN) CPU:    3
> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
> (XEN) LR:     0000000000246ef0
> (XEN) SP:     0000800725eafd80
> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
> (XEN)
> (XEN)   VTCR_EL2: 80023558
> (XEN)  VTTBR_EL2: 0002000765f04000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 000000008078663f
> (XEN)  TTBR0_EL2: 00000000781c5000
> (XEN)
> (XEN)    ESR_EL2: f2000001
> (XEN)  HPFAR_EL2: 0000000000030010
> (XEN)    FAR_EL2: ffff000008005f00
> (XEN)
> (XEN) Xen stack trace from sp=0000800725eafd80:
> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000
> 00000000002477c8
> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500
> 0000000000000002
> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0
> 0000800725eafeb0
> (XEN)    0000800725eaff30 0000000060000145 000000000027882c
> 0000800725eafeb0
> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000
> 0000000000000006
> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1
> 000000019bc60002
> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004
> 000000000027c7d8
> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1
> 0000000000279f98
> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8
> 0000000000262c58
> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002
> 0000000000000001
> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208
> ffff800052e65100
> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0
> 0000000000000000
> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0
> 0000000000000002
> (XEN)    0000000000000001 0000000000000001 0000000000000029
> 0000ffff9badb3d0
> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200
> ffff8000460ec210
> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280
> 0000000000000124
> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100
> ffff0000223dbd20
> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4
> 5a000ea160000145
> (XEN)    0000000060000000 0000000000000000 0000000000000000
> ffff800052e65100
> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000
> 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) PSCI cpu off failed for CPU0 err=-3
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>
>
> Either I did something wrong (most likely) or there is an issue with
> page ref-counting in the IOREQ code. I am still trying to understand
> what is going on.

At a first glance, the implement of set_foreign_p2m_entry() looks fine
to me.

> Some notes on that:
> 1. I checked that put_page() was called for these pages in
> p2m_put_l3_page() when destroying domain. This happened before
> hvm_free_ioreq_mfn() execution.
> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
> be fully destroyed because of the reference taken.

This definitely looks like a page reference issue. Would it be possible
to print where the page reference are dropped? A WARN() in put_page()
would help.

To avoid a lot of message, I tend to use a global variable that store
the page I want to watch.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
Hi

Sorry for the possible format issue.

On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:

>
> On 11.08.20 20:50, Julien Grall wrote:
>
> Hi Julien
>
> >
> >
> > On 11/08/2020 18:09, Oleksandr wrote:
> >>
> >> On 05.08.20 12:32, Julien Grall wrote:
> >>
> >> Hi Julien, Stefano
> >>
> >>>
> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> >>>>> index 5fdb6e8..5823f11 100644
> >>>>> --- a/xen/include/asm-arm/p2m.h
> >>>>> +++ b/xen/include/asm-arm/p2m.h
> >>>>> @@ -385,10 +385,11 @@ static inline int
> >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> >>>>> mfn_t mfn)
> >>>>> {
> >>>>> /*
> >>>>> - * NOTE: If this is implemented then proper reference
> >>>>> counting of
> >>>>> - * foreign entries will need to be implemented.
> >>>>> + * XXX: handle properly reference. It looks like the page may
> >>>>> not always
> >>>>> + * belong to d.
> >>>>
> >>>> Just as a reference, and without taking away anything from the
> >>>> comment,
> >>>> I think that QEMU is doing its own internal reference counting for
> >>>> these
> >>>> mappings.
> >>>
> >>> I am not sure how this matters here? We can't really trust the DM to
> >>> do the right thing if it is not running in dom0.
> >>>
> >>> But, IIRC, the problem is some of the pages doesn't belong to do a
> >>> domain, so it is not possible to treat them as foreign mapping (e.g.
> >>> you wouldn't be able to grab a reference). This investigation was
> >>> done a couple of years ago, so this may have changed in recent Xen.
> >>
> >> Well, emulator is going to be used in driver domain, so this TODO
> >> must be resolved. I suspect that the check for a hardware domain in
> >> acquire_resource() which I skipped in a hackish way [1] could be
> >> simply removed once proper reference counting is implemented in Xen,
> >> correct?
> >
> > It depends how you are going to solve it. If you manage to solve it in
> > a generic way, then yes you could resolve. If not (i.e. it is solved
> > in an arch-specific way), we would need to keep the check on arch that
> > are not able to deal with it. See more below.
> >
> >>
> >> Could you please provide some pointers on that problem? Maybe some
> >> questions need to be investigated again? Unfortunately, it is not
> >> completely clear to me the direction to follow...
> >>
> >> ***
> >> I am wondering whether the similar problem exists on x86 as well?
> >
> > It is somewhat different. On Arm, we are able to handle properly
> > foreign mapping (i.e. mapping page from a another domain) as we would
> > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
> > xenmem_add_to_physmap()). The reference will then be released when the
> > entry is removed from the P2M (see p2m_free_entry()).
> >
> > If all the pages given to set_foreign_p2m_entry() belong to a domain,
> > then you could use the same approach.
> >
> > However, I remember to run into some issues in some of the cases. I
> > had a quick looked at the caller and I wasn't able to find any use
> > cases that may be an issue.
> >
> > The refcounting in the IOREQ code has changed after XSA-276 (this was
> > found while working on the Arm port). Probably the best way to figure
> > out if it works would be to try it and see if it fails.
> >
> > Note that set_foreign_p2m_entry() doesn't have a parameter for the
> > foreign domain. You would need to add a extra parameter for this.
> >
> >> The FIXME tag (before checking for a hardware domain in
> >> acquire_resource()) in the common code makes me think it is a common
> >> issue. From other side x86's
> >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
> >> (which returned -EOPNOTSUPP so far). Or these are unrelated?
> >
> > At the moment, x86 doesn't support refcounting for foreign mapping.
> > Hence the reason to restrict them to the hardware domain.
>
>
> Thank you for the pointers!
>
>
> I checked that all pages given to set_foreign_p2m_entry() belonged to a
> domain (at least in my use-case). I noticed two calls for acquiring
> resource at the DomU creation time, the first call was for grant table
> (single gfn)
> and the second for ioreq server which carried 2 gfns (for shared and
> buffered rings I assume). For the test purpose, I passed these gfns to
> get_page_from_gfn() in order to grab references on the pages, after that
> I tried to destroy DomU without calling put_page() for these pages. The
> fact that I couldn't destroy DomU completely (a zombie domain was
> observed) made me think that references were still taken, so worked as
> expected.
>
>
> I implemented a test patch (which uses approach from
> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
> whether it would work.
>
>
> ---
> xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++
> xen/common/memory.c | 2 +-
> xen/include/asm-arm/p2m.h | 12 ++----------
> 3 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index e9ccba8..7359715 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
> gfn_t gfn, mfn_t mfn,
> return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
> }
>
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> + unsigned long gfn, mfn_t mfn)
> +{
> + struct page_info *page;
> + p2m_type_t p2mt;
> + int rc;
> +
> + /*
> + * Take reference to the foreign domain page. Reference will be
> released
> + * in p2m_put_l3_page().
> + */
> + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
> + if ( !page )
> + return -EINVAL;
> +
> + if ( p2m_is_ram(p2mt) )
> + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
> p2m_map_foreign_ro;
> + else
> + {
> + put_page(page);
> + return -EINVAL;
> + }
> +
> + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
> + if ( rc )
> + put_page(page);
> +
> + return 0;
> +}
> +
> static struct page_info *p2m_allocate_root(void)
> {
> struct page_info *page;
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 8d9f0a8..1de1d4f 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>
> for ( i = 0; !rc && i < xmar.nr_frames; i++ )
> {
> - rc = set_foreign_p2m_entry(currd, gfn_list[i],
> + rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
> _mfn(mfn_list[i]));
> /* rc should be -EIO for any iteration other than the first
> */
> if ( rc && i )
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 5823f11..53ce373 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
> unsigned int order)
> return gfn_add(gfn, 1UL << order);
> }
>
> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
> gfn,
> - mfn_t mfn)
> -{
> - /*
> - * XXX: handle properly reference. It looks like the page may not
> always
> - * belong to d.
> - */
> -
> - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
> -}
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> + unsigned long gfn, mfn_t mfn);
>
> /*
> * A vCPU has cache enabled only when the MMU is enabled and data cache
> --
> 2.7.4
>
>
> And with that patch applied I was facing a BUG when destroying/rebooting
> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
> that BUG:
>
>
> Rebooting domain 2
> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]----
> (XEN) CPU: 3
> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
> (XEN) LR: 0000000000246ef0
> (XEN) SP: 0000800725eafd80
> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f
> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000
> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020
> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20
> (XEN)
> (XEN) VTCR_EL2: 80023558
> (XEN) VTTBR_EL2: 0002000765f04000
> (XEN)
> (XEN) SCTLR_EL2: 30cd183d
> (XEN) HCR_EL2: 000000008078663f
> (XEN) TTBR0_EL2: 00000000781c5000
> (XEN)
> (XEN) ESR_EL2: f2000001
> (XEN) HPFAR_EL2: 0000000000030010
> (XEN) FAR_EL2: ffff000008005f00
> (XEN)
> (XEN) Xen stack trace from sp=0000800725eafd80:
> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000
> 00000000002477c8
> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500
> 0000000000000002
> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0
> 0000800725eafeb0
> (XEN) 0000800725eaff30 0000000060000145 000000000027882c
> 0000800725eafeb0
> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000
> 0000000000000006
> (XEN) ffff800000000000 0000000000000002 000000005a000ea1
> 000000019bc60002
> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004
> 000000000027c7d8
> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1
> 0000000000279f98
> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8
> 0000000000262c58
> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002
> 0000000000000001
> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208
> ffff800052e65100
> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0
> 0000000000000000
> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0
> 0000000000000002
> (XEN) 0000000000000001 0000000000000001 0000000000000029
> 0000ffff9badb3d0
> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200
> ffff8000460ec210
> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280
> 0000000000000124
> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100
> ffff0000223dbd20
> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4
> 5a000ea160000145
> (XEN) 0000000060000000 0000000000000000 0000000000000000
> ffff800052e65100
> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000
> 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) PSCI cpu off failed for CPU0 err=-3
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>
>
> Either I did something wrong (most likely) or there is an issue with
> page ref-counting in the IOREQ code. I am still trying to understand
> what is going on.
> Some notes on that:
> 1. I checked that put_page() was called for these pages in
> p2m_put_l3_page() when destroying domain. This happened before
> hvm_free_ioreq_mfn() execution.
> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
> be fully destroyed because of the reference taken.
>

I think I understand why BUG is triggered.

I checked "page->count_info & PGC_count_mask" and noticed
that get_page_from_gfn() doesn't seem to increase ref counter (but it
should?)

1. hvm_alloc_ioreq_mfn() -> ref 2
2. set_foreign_p2m_entry() -> ref still 2
3. p2m_put_l3_page() -> ref 1
4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which
triggers BUG


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 13.08.20 23:36, Julien Grall wrote:

Hi Julien

>
>
> On 13/08/2020 19:41, Oleksandr wrote:
>> Rebooting domain 2
>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
>> (XEN) CPU:    3
>> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>> (XEN) LR:     0000000000246ef0
>> (XEN) SP:     0000800725eafd80
>> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2:
>> 000000000000001f
>> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5:
>> 0000000000400000
>> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8:
>> 0000000000000020
>> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11:
>> 0400000000000000
>> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14:
>> 0000000000000001
>> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17:
>> 0000ffff9badb3d0
>> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20:
>> 0000800725e68ec0
>> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23:
>> 000000005a000ea1
>> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26:
>> 000000000000001d
>> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP:
>> ffff0000223dbd20
>> (XEN)
>> (XEN)   VTCR_EL2: 80023558
>> (XEN)  VTTBR_EL2: 0002000765f04000
>> (XEN)
>> (XEN)  SCTLR_EL2: 30cd183d
>> (XEN)    HCR_EL2: 000000008078663f
>> (XEN)  TTBR0_EL2: 00000000781c5000
>> (XEN)
>> (XEN)    ESR_EL2: f2000001
>> (XEN)  HPFAR_EL2: 0000000000030010
>> (XEN)    FAR_EL2: ffff000008005f00
>> (XEN)
>> (XEN) Xen stack trace from sp=0000800725eafd80:
>> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000
>> 00000000002477c8
>> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500
>> 0000000000000002
>> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0
>> 0000800725eafeb0
>> (XEN)    0000800725eaff30 0000000060000145 000000000027882c
>> 0000800725eafeb0
>> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000
>> 0000000000000006
>> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1
>> 000000019bc60002
>> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004
>> 000000000027c7d8
>> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1
>> 0000000000279f98
>> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8
>> 0000000000262c58
>> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002
>> 0000000000000001
>> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208
>> ffff800052e65100
>> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0
>> 0000000000000000
>> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0
>> 0000000000000002
>> (XEN)    0000000000000001 0000000000000001 0000000000000029
>> 0000ffff9badb3d0
>> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200
>> ffff8000460ec210
>> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280
>> 0000000000000124
>> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100
>> ffff0000223dbd20
>> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4
>> 5a000ea160000145
>> (XEN)    0000000060000000 0000000000000000 0000000000000000
>> ffff800052e65100
>> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) PSCI cpu off failed for CPU0 err=-3
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
>>
>> Either I did something wrong (most likely) or there is an issue with
>> page ref-counting in the IOREQ code. I am still trying to understand
>> what is going on.
>
> At a first glance, the implement of set_foreign_p2m_entry() looks fine
> to me.
>
>> Some notes on that:
>> 1. I checked that put_page() was called for these pages in
>> p2m_put_l3_page() when destroying domain. This happened before
>> hvm_free_ioreq_mfn() execution.
>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU
>> couldn't be fully destroyed because of the reference taken.
>
> This definitely looks like a page reference issue. Would it be
> possible to print where the page reference are dropped? A WARN() in
> put_page() would help.
>
> To avoid a lot of message, I tend to use a global variable that store
> the page I want to watch.


Unfortunately it is unclear from the log who calls put_page() two times.
One of the call is made by p2m_put_l3_page() I assume, but who makes a
second call? Needs debugging.


Rebooting domain 2

root@generic-armv8-xt-dom0:~#

(XEN) put_page[1553] 0000000810e60e38 ---> ref = 3

(XEN) Xen WARN at mm.c:1554
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000272a48 put_page+0xa0/0xc4
(XEN) LR:     0000000000272a48
(XEN) SP:     0000800725eaf990
(XEN) CPSR:   80000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000310028  X1: 0000000000000001  X2: 0000800725ca4000
(XEN)      X3: 0000000000000020  X4: 0000000000000000  X5: 0000000000000020
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf6e8
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070
(XEN)     X18: 000000000000010f X19: 8040000000000003 X20: 0000000810e60e38
(XEN)     X21: 0000800725f07208 X22: 0000800725f07208 X23: 000000000051c041
(XEN)     X24: 0000000000000001 X25: 0000800725eafa78 X26: 000000000009c041
(XEN)     X27: 0000000000000000 X28: 0000000000000007  FP: ffff00002212bd50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eaf990:
(XEN)    034000051c0417ff 0000000000000000 00000000002743e8 0000800725f07208
(XEN)    034000051c0417ff 0000000000000000 0000800725e6d208 0000800725f07208
(XEN)    0000000000000003 0000000000274910 0000000000000000 ffffffffffffffff
(XEN)    0000000000000001 000000000009c041 0000800725f07208 0000000000000000
(XEN)    0000000000000012 0000000000000007 0000000000000001 0000000000000009
(XEN)    0000000000274e00 0000800725f07208 ffffffffffffffff 0000000025eafb0c
(XEN)    0000800725f07000 ffff000008f86528 0000000000000000 0000000200000000
(XEN)    00000041000000e0 0000800725e6d000 0000000000000001 0000800725f07208
(XEN)    000000000009c041 0000000000000000 0000800725f07000 ffff000008f86528
(XEN)    ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40 ffff800052784380
(XEN)    0000000000275770 0000800725eafb08 0000000000000000 0000000025f07000
(XEN)    fffffffffffffffe 0000800725f07000 0000000810e60e38 000000000021a930
(XEN)    0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528
(XEN)    0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d
(XEN)    00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920
(XEN)    00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0
(XEN)    ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0
(XEN)    0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8
(XEN)    0000000000000002 0000000000000000 0000000000000000 0000800725eafc70
(XEN)    0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240
(XEN)    000000000026c920 0000800725ecd128 0000000000000002 0000000000000000
(XEN)    0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49
(XEN)    0000000000000002 0000000000000000 0000000000276fc4 0000000000000001
(XEN)    0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0
(XEN)    000000000023f53c 000000000027dc14 0000000000311390 0000000000346008
(XEN)    000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004
(XEN)    0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20
(XEN)    ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800
(XEN)    0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001
(XEN)    0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000
(XEN)    0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c
(XEN)    ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002
(XEN)    000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80
(XEN)    0000800725eca000 0000000000310280 0000000000000000 0000000000000000
(XEN)    000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80
(XEN) Xen call trace:
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (PC)
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (LR)
(XEN)

(XEN) put_page[1553] 0000000810e60e38 ---> ref = 2

(XEN) Xen WARN at mm.c:1554
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000272a48 put_page+0xa0/0xc4
(XEN) LR:     0000000000272a48
(XEN) SP:     0000800725eafaf0
(XEN) CPSR:   80000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000310028  X1: 0000000000000000  X2: 0000800725ca4000
(XEN)      X3: 0000000000000020  X4: 0000000000000000  X5: 0000000000000021
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf848
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070
(XEN)     X18: 000000000000010f X19: 8040000000000002 X20: 0000000810e60e38
(XEN)     X21: 0000000810e60e38 X22: 0000000000000000 X23: 0000800725f07000
(XEN)     X24: ffff000008f86528 X25: ffff8000501b6540 X26: ffff8000501b6550
(XEN)     X27: ffff8000517bdb40 X28: ffff800052784380  FP: ffff00002212bd50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafaf0:
(XEN)    0000000000000000 0000800725f07000 000000000021a93c 000000000021a930
(XEN)    0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528
(XEN)    0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d
(XEN)    00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920
(XEN)    00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0
(XEN)    ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0
(XEN)    0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8
(XEN)    0000000000000002 0000000000000000 0000000000000000 0000800725eafc70
(XEN)    0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240
(XEN)    000000000026c920 0000800725ecd128 0000000000000002 0000000000000000
(XEN)    0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49
(XEN)    0000000000000002 0000000000000000 0000000000276fc4 0000000000000001
(XEN)    0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0
(XEN)    000000000023f53c 000000000027dc14 0000000000311390 0000000000346008
(XEN)    000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004
(XEN)    0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20
(XEN)    ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800
(XEN)    0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001
(XEN)    0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000
(XEN)    0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c
(XEN)    ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002
(XEN)    000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80
(XEN)    0000800725eca000 0000000000310280 0000000000000000 0000000000000000
(XEN)    000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80
(XEN)    0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249
(XEN)    000000000000000f ffff00002212bd90 ffff80004e381f80 000000000009c041
(XEN)    ffff7dffff000000 0000ffffb603d000 0000000000000000 0000ffffb603c000
(XEN)    ffff8000521fdc08 0000000000000200 ffff8000517bdb60 0000000000000000
(XEN)    0000000000000000 0000ffffc529614f 000000000000001b 0000000000000001
(XEN)    000000000000000c 0000ffffb5eaf070 000000000000010f 0000000000000002
(XEN)    ffff80004e381f80 0000000000007ff0 ffff7e0000000000 0000000000000001
(XEN)    ffff000008f86528 ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40
(XEN)    ffff800052784380 ffff00002212bd50 ffff000008537ab8 ffffffffffffffff
(XEN)    ffff0000080c1790 5a000ea1a0000145 0000000060000000 0000000000000000
(XEN)    0000000000000000 ffff800052784380 ffff00002212bd50 0000ffffb5eaf078
(XEN) Xen call trace:
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (PC)
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (LR)
(XEN)


(XEN) hvm_free_ioreq_mfn[417] ---> ref = 1

(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000246e2c ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0
(XEN) LR:     0000000000246dcc
(XEN) SP:     0000800725eafd70
(XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
(XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eafac8
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eab3d0
(XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000000810e60e48
(XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
(XEN)     X24: ffff800011ae9800 X25: 0000000000000124 X26: 000000000000001d
(XEN)     X27: ffff000008ad1000 X28: ffff800052784380  FP: ffff00002212bd20
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafd70:
(XEN)    0000800725e68ec0 0000800725e68ec0 0000000000246f7c 0000800725e68ec0
(XEN)    00008004dc040000 00000000002476cc ffffffffffffffea 0000000000000001
(XEN)    ffff800011ae9e80 0000000000000002 00000000002462bc 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725fb50a4 00000000ffffffff 0100000000277704
(XEN)    00008004dc040000 0000000000000006 0000000000000000 0000000000310288
(XEN)    0000000000000004 0000000125ec0002 0000ffffc52989a8 0000000000000020
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff800011ae9100
(XEN)    0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249
(XEN)    0000000000000002 0000000000000001 ffff800011ae9e80 ffff800011ae9e88
(XEN)    ffff800011ae9108 ffff800052784380 000000005068e148 0000ffffc5498000
(XEN)    ffff80005156ac10 0000000000000000 00e800008f88bf53 0400000000000000
(XEN)    ffff7e00013e22c0 0000000000000002 0000000000000001 0000000000000001
(XEN)    0000000000000029 0000ffffb5eab3d0 000000000000010f ffff800011ae9110
(XEN)    ffff800011ae9100 ffff800011ae9110 0000000000000001 ffff800011ae9e80
(XEN)    ffff800011ae9800 0000000000000124 000000000000001d ffff000008ad1000
(XEN)    ffff800052784380 ffff00002212bd20 ffff000008537004 ffffffffffffffff
(XEN)    ffff0000080c17e4 5a000ea160000145 0000000060000000 0000000000000000
(XEN)    0000000000000000 ffff800052784380 ffff00002212bd20 0000ffffb5eab3dc
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000246e2c>] ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0 (PC)
(XEN)    [<0000000000246dcc>] ioreq.c#hvm_free_ioreq_mfn+0x5c/0xc0 (LR)
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...




--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
> Hi
>
> Sorry for the possible format issue.
>
> On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> On 11.08.20 20:50, Julien Grall wrote:
>>
>> Hi Julien
>>
>> >
>> >
>> > On 11/08/2020 18:09, Oleksandr wrote:
>> >>
>> >> On 05.08.20 12:32, Julien Grall wrote:
>> >>
>> >> Hi Julien, Stefano
>> >>
>> >>>
>> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> >>>>> index 5fdb6e8..5823f11 100644
>> >>>>> --- a/xen/include/asm-arm/p2m.h
>> >>>>> +++ b/xen/include/asm-arm/p2m.h
>> >>>>> @@ -385,10 +385,11 @@ static inline int
>> >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>> >>>>> mfn_t mfn)
>> >>>>> {
>> >>>>> /*
>> >>>>> - * NOTE: If this is implemented then proper reference
>> >>>>> counting of
>> >>>>> - * foreign entries will need to be implemented.
>> >>>>> + * XXX: handle properly reference. It looks like the page may
>> >>>>> not always
>> >>>>> + * belong to d.
>> >>>>
>> >>>> Just as a reference, and without taking away anything from the
>> >>>> comment,
>> >>>> I think that QEMU is doing its own internal reference counting for
>> >>>> these
>> >>>> mappings.
>> >>>
>> >>> I am not sure how this matters here? We can't really trust the DM to
>> >>> do the right thing if it is not running in dom0.
>> >>>
>> >>> But, IIRC, the problem is some of the pages doesn't belong to do a
>> >>> domain, so it is not possible to treat them as foreign mapping (e.g.
>> >>> you wouldn't be able to grab a reference). This investigation was
>> >>> done a couple of years ago, so this may have changed in recent Xen.
>> >>
>> >> Well, emulator is going to be used in driver domain, so this TODO
>> >> must be resolved. I suspect that the check for a hardware domain in
>> >> acquire_resource() which I skipped in a hackish way [1] could be
>> >> simply removed once proper reference counting is implemented in Xen,
>> >> correct?
>> >
>> > It depends how you are going to solve it. If you manage to solve it in
>> > a generic way, then yes you could resolve. If not (i.e. it is solved
>> > in an arch-specific way), we would need to keep the check on arch that
>> > are not able to deal with it. See more below.
>> >
>> >>
>> >> Could you please provide some pointers on that problem? Maybe some
>> >> questions need to be investigated again? Unfortunately, it is not
>> >> completely clear to me the direction to follow...
>> >>
>> >> ***
>> >> I am wondering whether the similar problem exists on x86 as well?
>> >
>> > It is somewhat different. On Arm, we are able to handle properly
>> > foreign mapping (i.e. mapping page from a another domain) as we would
>> > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
>> > xenmem_add_to_physmap()). The reference will then be released when the
>> > entry is removed from the P2M (see p2m_free_entry()).
>> >
>> > If all the pages given to set_foreign_p2m_entry() belong to a domain,
>> > then you could use the same approach.
>> >
>> > However, I remember to run into some issues in some of the cases. I
>> > had a quick looked at the caller and I wasn't able to find any use
>> > cases that may be an issue.
>> >
>> > The refcounting in the IOREQ code has changed after XSA-276 (this was
>> > found while working on the Arm port). Probably the best way to figure
>> > out if it works would be to try it and see if it fails.
>> >
>> > Note that set_foreign_p2m_entry() doesn't have a parameter for the
>> > foreign domain. You would need to add a extra parameter for this.
>> >
>> >> The FIXME tag (before checking for a hardware domain in
>> >> acquire_resource()) in the common code makes me think it is a common
>> >> issue. From other side x86's
>> >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
>> >> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>> >
>> > At the moment, x86 doesn't support refcounting for foreign mapping.
>> > Hence the reason to restrict them to the hardware domain.
>>
>>
>> Thank you for the pointers!
>>
>>
>> I checked that all pages given to set_foreign_p2m_entry() belonged to a
>> domain (at least in my use-case). I noticed two calls for acquiring
>> resource at the DomU creation time, the first call was for grant table
>> (single gfn)
>> and the second for ioreq server which carried 2 gfns (for shared and
>> buffered rings I assume). For the test purpose, I passed these gfns to
>> get_page_from_gfn() in order to grab references on the pages, after that
>> I tried to destroy DomU without calling put_page() for these pages. The
>> fact that I couldn't destroy DomU completely (a zombie domain was
>> observed) made me think that references were still taken, so worked as
>> expected.
>>
>>
>> I implemented a test patch (which uses approach from
>> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
>> whether it would work.
>>
>>
>> ---
>> xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++
>> xen/common/memory.c | 2 +-
>> xen/include/asm-arm/p2m.h | 12 ++----------
>> 3 files changed, 33 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index e9ccba8..7359715 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
>> gfn_t gfn, mfn_t mfn,
>> return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>> }
>>
>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>> + unsigned long gfn, mfn_t mfn)
>> +{
>> + struct page_info *page;
>> + p2m_type_t p2mt;
>> + int rc;
>> +
>> + /*
>> + * Take reference to the foreign domain page. Reference will be
>> released
>> + * in p2m_put_l3_page().
>> + */
>> + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
>> + if ( !page )
>> + return -EINVAL;
>> +
>> + if ( p2m_is_ram(p2mt) )
>> + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
>> p2m_map_foreign_ro;
>> + else
>> + {
>> + put_page(page);
>> + return -EINVAL;
>> + }
>> +
>> + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
>> + if ( rc )
>> + put_page(page);
>> +
>> + return 0;
>> +}
>> +
>> static struct page_info *p2m_allocate_root(void)
>> {
>> struct page_info *page;
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 8d9f0a8..1de1d4f 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>>
>> for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>> {
>> - rc = set_foreign_p2m_entry(currd, gfn_list[i],
>> + rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>> _mfn(mfn_list[i]));
>> /* rc should be -EIO for any iteration other than the first */
>> if ( rc && i )
>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 5823f11..53ce373 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
>> unsigned int order)
>> return gfn_add(gfn, 1UL << order);
>> }
>>
>> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
>> gfn,
>> - mfn_t mfn)
>> -{
>> - /*
>> - * XXX: handle properly reference. It looks like the page may not
>> always
>> - * belong to d.
>> - */
>> -
>> - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>> -}
>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>> + unsigned long gfn, mfn_t mfn);
>>
>> /*
>> * A vCPU has cache enabled only when the MMU is enabled and data cache
>> --
>> 2.7.4
>>
>>
>> And with that patch applied I was facing a BUG when destroying/rebooting
>> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
>> that BUG:
>>
>>
>> Rebooting domain 2
>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]----
>> (XEN) CPU: 3
>> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>> (XEN) LR: 0000000000246ef0
>> (XEN) SP: 0000800725eafd80
>> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f
>> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000
>> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020
>> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
>> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
>> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
>> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
>> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
>> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
>> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20
>> (XEN)
>> (XEN) VTCR_EL2: 80023558
>> (XEN) VTTBR_EL2: 0002000765f04000
>> (XEN)
>> (XEN) SCTLR_EL2: 30cd183d
>> (XEN) HCR_EL2: 000000008078663f
>> (XEN) TTBR0_EL2: 00000000781c5000
>> (XEN)
>> (XEN) ESR_EL2: f2000001
>> (XEN) HPFAR_EL2: 0000000000030010
>> (XEN) FAR_EL2: ffff000008005f00
>> (XEN)
>> (XEN) Xen stack trace from sp=0000800725eafd80:
>> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
>> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
>> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
>> (XEN) 0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
>> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
>> (XEN) ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
>> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
>> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
>> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
>> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
>> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
>> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
>> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
>> (XEN) 0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
>> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
>> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
>> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
>> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
>> (XEN) 0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
>> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) PSCI cpu off failed for CPU0 err=-3
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
>>
>> Either I did something wrong (most likely) or there is an issue with
>> page ref-counting in the IOREQ code. I am still trying to understand
>> what is going on.
>> Some notes on that:
>> 1. I checked that put_page() was called for these pages in
>> p2m_put_l3_page() when destroying domain. This happened before
>> hvm_free_ioreq_mfn() execution.
>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
>> be fully destroyed because of the reference taken.
>
>
> I think I understand why BUG is triggered.
>
> I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?)
>
> 1. hvm_alloc_ioreq_mfn() -> ref 2
> 2. set_foreign_p2m_entry() -> ref still 2
> 3. p2m_put_l3_page() -> ref 1
> 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG

I looked again at your diff. It is actually not doing the right thing.
The parameter 'gfn' is a physical frame from 'd' (your current domain)
not 'fd'.
So you will end up grabbing a reference count on the wrong page. You
are quite lucky the 'gfn' is also valid for your foreign domain.

But in this case, you already have the MFN in hand. So what you want
to do is something like:

if (!get_page(mfn_to_page(mfn), fd))
return -EINVAL;

/* Map page */
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 14.08.20 01:14, Julien Grall wrote:

Hi Julien

> On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>
>> Hi
>>
>> Sorry for the possible format issue.
>>
>> On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>
>>> On 11.08.20 20:50, Julien Grall wrote:
>>>
>>> Hi Julien
>>>
>>>>
>>>> On 11/08/2020 18:09, Oleksandr wrote:
>>>>> On 05.08.20 12:32, Julien Grall wrote:
>>>>>
>>>>> Hi Julien, Stefano
>>>>>
>>>>>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>>>>>> index 5fdb6e8..5823f11 100644
>>>>>>>> --- a/xen/include/asm-arm/p2m.h
>>>>>>>> +++ b/xen/include/asm-arm/p2m.h
>>>>>>>> @@ -385,10 +385,11 @@ static inline int
>>>>>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>>>>>> mfn_t mfn)
>>>>>>>> {
>>>>>>>> /*
>>>>>>>> - * NOTE: If this is implemented then proper reference
>>>>>>>> counting of
>>>>>>>> - * foreign entries will need to be implemented.
>>>>>>>> + * XXX: handle properly reference. It looks like the page may
>>>>>>>> not always
>>>>>>>> + * belong to d.
>>>>>>> Just as a reference, and without taking away anything from the
>>>>>>> comment,
>>>>>>> I think that QEMU is doing its own internal reference counting for
>>>>>>> these
>>>>>>> mappings.
>>>>>> I am not sure how this matters here? We can't really trust the DM to
>>>>>> do the right thing if it is not running in dom0.
>>>>>>
>>>>>> But, IIRC, the problem is some of the pages doesn't belong to do a
>>>>>> domain, so it is not possible to treat them as foreign mapping (e.g.
>>>>>> you wouldn't be able to grab a reference). This investigation was
>>>>>> done a couple of years ago, so this may have changed in recent Xen.
>>>>> Well, emulator is going to be used in driver domain, so this TODO
>>>>> must be resolved. I suspect that the check for a hardware domain in
>>>>> acquire_resource() which I skipped in a hackish way [1] could be
>>>>> simply removed once proper reference counting is implemented in Xen,
>>>>> correct?
>>>> It depends how you are going to solve it. If you manage to solve it in
>>>> a generic way, then yes you could resolve. If not (i.e. it is solved
>>>> in an arch-specific way), we would need to keep the check on arch that
>>>> are not able to deal with it. See more below.
>>>>
>>>>> Could you please provide some pointers on that problem? Maybe some
>>>>> questions need to be investigated again? Unfortunately, it is not
>>>>> completely clear to me the direction to follow...
>>>>>
>>>>> ***
>>>>> I am wondering whether the similar problem exists on x86 as well?
>>>> It is somewhat different. On Arm, we are able to handle properly
>>>> foreign mapping (i.e. mapping page from a another domain) as we would
>>>> grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
>>>> xenmem_add_to_physmap()). The reference will then be released when the
>>>> entry is removed from the P2M (see p2m_free_entry()).
>>>>
>>>> If all the pages given to set_foreign_p2m_entry() belong to a domain,
>>>> then you could use the same approach.
>>>>
>>>> However, I remember to run into some issues in some of the cases. I
>>>> had a quick looked at the caller and I wasn't able to find any use
>>>> cases that may be an issue.
>>>>
>>>> The refcounting in the IOREQ code has changed after XSA-276 (this was
>>>> found while working on the Arm port). Probably the best way to figure
>>>> out if it works would be to try it and see if it fails.
>>>>
>>>> Note that set_foreign_p2m_entry() doesn't have a parameter for the
>>>> foreign domain. You would need to add a extra parameter for this.
>>>>
>>>>> The FIXME tag (before checking for a hardware domain in
>>>>> acquire_resource()) in the common code makes me think it is a common
>>>>> issue. From other side x86's
>>>>> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
>>>>> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>>>> At the moment, x86 doesn't support refcounting for foreign mapping.
>>>> Hence the reason to restrict them to the hardware domain.
>>>
>>> Thank you for the pointers!
>>>
>>>
>>> I checked that all pages given to set_foreign_p2m_entry() belonged to a
>>> domain (at least in my use-case). I noticed two calls for acquiring
>>> resource at the DomU creation time, the first call was for grant table
>>> (single gfn)
>>> and the second for ioreq server which carried 2 gfns (for shared and
>>> buffered rings I assume). For the test purpose, I passed these gfns to
>>> get_page_from_gfn() in order to grab references on the pages, after that
>>> I tried to destroy DomU without calling put_page() for these pages. The
>>> fact that I couldn't destroy DomU completely (a zombie domain was
>>> observed) made me think that references were still taken, so worked as
>>> expected.
>>>
>>>
>>> I implemented a test patch (which uses approach from
>>> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
>>> whether it would work.
>>>
>>>
>>> ---
>>> xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++
>>> xen/common/memory.c | 2 +-
>>> xen/include/asm-arm/p2m.h | 12 ++----------
>>> 3 files changed, 33 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>> index e9ccba8..7359715 100644
>>> --- a/xen/arch/arm/p2m.c
>>> +++ b/xen/arch/arm/p2m.c
>>> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
>>> gfn_t gfn, mfn_t mfn,
>>> return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>>> }
>>>
>>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>>> + unsigned long gfn, mfn_t mfn)
>>> +{
>>> + struct page_info *page;
>>> + p2m_type_t p2mt;
>>> + int rc;
>>> +
>>> + /*
>>> + * Take reference to the foreign domain page. Reference will be
>>> released
>>> + * in p2m_put_l3_page().
>>> + */
>>> + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
>>> + if ( !page )
>>> + return -EINVAL;
>>> +
>>> + if ( p2m_is_ram(p2mt) )
>>> + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
>>> p2m_map_foreign_ro;
>>> + else
>>> + {
>>> + put_page(page);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
>>> + if ( rc )
>>> + put_page(page);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> static struct page_info *p2m_allocate_root(void)
>>> {
>>> struct page_info *page;
>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 8d9f0a8..1de1d4f 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>>>
>>> for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>> {
>>> - rc = set_foreign_p2m_entry(currd, gfn_list[i],
>>> + rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>> _mfn(mfn_list[i]));
>>> /* rc should be -EIO for any iteration other than the first */
>>> if ( rc && i )
>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>> index 5823f11..53ce373 100644
>>> --- a/xen/include/asm-arm/p2m.h
>>> +++ b/xen/include/asm-arm/p2m.h
>>> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
>>> unsigned int order)
>>> return gfn_add(gfn, 1UL << order);
>>> }
>>>
>>> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
>>> gfn,
>>> - mfn_t mfn)
>>> -{
>>> - /*
>>> - * XXX: handle properly reference. It looks like the page may not
>>> always
>>> - * belong to d.
>>> - */
>>> -
>>> - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>>> -}
>>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>>> + unsigned long gfn, mfn_t mfn);
>>>
>>> /*
>>> * A vCPU has cache enabled only when the MMU is enabled and data cache
>>> --
>>> 2.7.4
>>>
>>>
>>> And with that patch applied I was facing a BUG when destroying/rebooting
>>> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
>>> that BUG:
>>>
>>>
>>> Rebooting domain 2
>>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
>>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>>> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]----
>>> (XEN) CPU: 3
>>> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>>> (XEN) LR: 0000000000246ef0
>>> (XEN) SP: 0000800725eafd80
>>> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler)
>>> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f
>>> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000
>>> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020
>>> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
>>> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
>>> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
>>> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
>>> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
>>> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
>>> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20
>>> (XEN)
>>> (XEN) VTCR_EL2: 80023558
>>> (XEN) VTTBR_EL2: 0002000765f04000
>>> (XEN)
>>> (XEN) SCTLR_EL2: 30cd183d
>>> (XEN) HCR_EL2: 000000008078663f
>>> (XEN) TTBR0_EL2: 00000000781c5000
>>> (XEN)
>>> (XEN) ESR_EL2: f2000001
>>> (XEN) HPFAR_EL2: 0000000000030010
>>> (XEN) FAR_EL2: ffff000008005f00
>>> (XEN)
>>> (XEN) Xen stack trace from sp=0000800725eafd80:
>>> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
>>> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
>>> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
>>> (XEN) 0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
>>> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
>>> (XEN) ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
>>> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
>>> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
>>> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
>>> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
>>> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
>>> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
>>> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
>>> (XEN) 0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
>>> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
>>> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
>>> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
>>> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
>>> (XEN) 0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
>>> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>>> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 3:
>>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) PSCI cpu off failed for CPU0 err=-3
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>>
>>>
>>>
>>> Either I did something wrong (most likely) or there is an issue with
>>> page ref-counting in the IOREQ code. I am still trying to understand
>>> what is going on.
>>> Some notes on that:
>>> 1. I checked that put_page() was called for these pages in
>>> p2m_put_l3_page() when destroying domain. This happened before
>>> hvm_free_ioreq_mfn() execution.
>>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
>>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
>>> be fully destroyed because of the reference taken.
>>
>> I think I understand why BUG is triggered.
>>
>> I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?)
>>
>> 1. hvm_alloc_ioreq_mfn() -> ref 2
>> 2. set_foreign_p2m_entry() -> ref still 2
>> 3. p2m_put_l3_page() -> ref 1
>> 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG
> I looked again at your diff. It is actually not doing the right thing.
> The parameter 'gfn' is a physical frame from 'd' (your current domain)
> not 'fd'.
> So you will end up grabbing a reference count on the wrong page. You
> are quite lucky the 'gfn' is also valid for your foreign domain.
>
> But in this case, you already have the MFN in hand. So what you want
> to do is something like:
>
> if (!get_page(mfn_to_page(mfn), fd))
> return -EINVAL;
>
> /* Map page */


Indeed, thank you for the pointer. Now it is clear for me what went
wrong. With the proposed change I didn't face any issues in my setup!


BTW, below the IOREQ server page life-cycle which looks correct.

create domain:

(XEN) 0000000810e60e38(0->1): hvm_alloc_ioreq_mfn() -> alloc_domheap_page()
(XEN) 0000000810e60e38(1->2): hvm_alloc_ioreq_mfn() -> get_page_and_type()
(XEN) 0000000810e60e38(2->3): acquire_resource() ->
set_foreign_p2m_entry() -> get_page()

reboot domain:

(XEN) 0000000810e60e38(3->4): do_memory_op() -> get_page_from_gfn()
(XEN) 0000000810e60e38(4->3): do_memory_op() ->
guest_physmap_remove_page() -> p2m_put_l3_page() -> put_page()
(XEN) 0000000810e60e38(3->2): do_memory_op() -> put_page()
(XEN) 0000000810e60e38(2->1): hvm_free_ioreq_mfn() -> put_page_alloc_ref()
(XEN) 0000000810e60e38(1->0): hvm_free_ioreq_mfn() -> put_page_and_type()


---
 xen/arch/arm/p2m.c        | 16 ++++++++++++++++
 xen/common/memory.c       |  2 +-
 xen/include/asm-arm/p2m.h | 12 ++----------
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e9ccba8..4c99dd6 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1385,6 +1385,22 @@ int guest_physmap_remove_page(struct domain *d,
gfn_t gfn, mfn_t mfn,
     return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
 }

+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    if ( !get_page(page, fd) )
+        return -EINVAL;
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+    if ( rc )
+        put_page(page);
+
+    return 0;
+}
+
 static struct page_info *p2m_allocate_root(void)
 {
     struct page_info *page;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8d9f0a8..1de1d4f 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1171,7 +1171,7 @@ static int acquire_resource(

         for ( i = 0; !rc && i < xmar.nr_frames; i++ )
         {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                        _mfn(mfn_list[i]));
             /* rc should be -EIO for any iteration other than the first */
             if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5823f11..53ce373 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
unsigned int order)
     return gfn_add(gfn, 1UL << order);
 }

-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * XXX: handle properly reference. It looks like the page may not
always
-     * belong to d.
-     */
-
-    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
-}
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn,  mfn_t mfn);

 /*
  * A vCPU has cache enabled only when the MMU is enabled and data cache
--
2.7.4



--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
Hi,

On 04/08/2020 15:01, Julien Grall wrote:
> On 04/08/2020 08:49, Paul Durrant wrote:
>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
>>> index 931404c..b5fc066 100644
>>> --- a/tools/libxc/xc_dom_arm.c
>>> +++ b/tools/libxc/xc_dom_arm.c
>>> @@ -26,11 +26,19 @@
>>>   #include "xg_private.h"
>>>   #include "xc_dom.h"
>>>
>>> -#define NR_MAGIC_PAGES 4
>>> +
>>>   #define CONSOLE_PFN_OFFSET 0
>>>   #define XENSTORE_PFN_OFFSET 1
>>>   #define MEMACCESS_PFN_OFFSET 2
>>>   #define VUART_PFN_OFFSET 3
>>> +#define IOREQ_SERVER_PFN_OFFSET 4
>>> +
>>> +#define NR_IOREQ_SERVER_PAGES 8
>>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
>>> +
>>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
>>> +
>>> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
>>
>> Why introduce 'magic pages' for Arm? It's quite a horrible hack that
>> we have begun to do away with by adding resource mapping.
>
> This would require us to mandate at least Linux 4.17 in a domain that
> will run an IOREQ server. If we don't mandate this, the minimum version
> would be 4.10 where DM OP was introduced.
>
> Because of XSA-300, we could technically not safely run an IOREQ server
> with existing Linux. So it is probably OK to enforce the use of the
> acquire interface.
One more thing. We are using atomic operations on the IOREQ pages. As
our implementation is based on LL/SC instructions so far, we have
mitigation in place to prevent a domain DoS Xen. However, this relies on
the page to be mapped in a single domain at the time.

AFAICT, with the legacy interface, the pages will be mapped in both the
target and the emulator. So this would defeat the mitigation we have in
place.

Because the legacy interface is relying on foreign mapping, the page has
to be mapped in the target P2M. It might be possible to restrict the
access for the target by setting the p2m bits r, w to 0. This would
still allow the foreign mapping to work as we only check the p2m type
during mapping.

Anyway, I think we agreed that we want to avoid to introduce the legacy
interface. But I wanted to answer just for completeness and keep a
record of potential pitfalls with the legacy interface on Arm.

>
> Note that I haven't yet looked at the rest of the series. So I am not
> sure if there is more work necessary to enable it.
>
> Cheers,
>

--
Julien Grall
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 15.08.20 20:56, Julien Grall wrote:

Hi Julien.

> Hi,
>
> On 04/08/2020 15:01, Julien Grall wrote:
>> On 04/08/2020 08:49, Paul Durrant wrote:
>>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
>>>> index 931404c..b5fc066 100644
>>>> --- a/tools/libxc/xc_dom_arm.c
>>>> +++ b/tools/libxc/xc_dom_arm.c
>>>> @@ -26,11 +26,19 @@
>>>>   #include "xg_private.h"
>>>>   #include "xc_dom.h"
>>>>
>>>> -#define NR_MAGIC_PAGES 4
>>>> +
>>>>   #define CONSOLE_PFN_OFFSET 0
>>>>   #define XENSTORE_PFN_OFFSET 1
>>>>   #define MEMACCESS_PFN_OFFSET 2
>>>>   #define VUART_PFN_OFFSET 3
>>>> +#define IOREQ_SERVER_PFN_OFFSET 4
>>>> +
>>>> +#define NR_IOREQ_SERVER_PAGES 8
>>>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
>>>> +
>>>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
>>>> +
>>>> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
>>>
>>> Why introduce 'magic pages' for Arm? It's quite a horrible hack that
>>> we have begun to do away with by adding resource mapping.
>>
>> This would require us to mandate at least Linux 4.17 in a domain that
>> will run an IOREQ server. If we don't mandate this, the minimum
>> version would be 4.10 where DM OP was introduced.
>>
>> Because of XSA-300, we could technically not safely run an IOREQ
>> server with existing Linux. So it is probably OK to enforce the use
>> of the acquire interface.
> One more thing. We are using atomic operations on the IOREQ pages. As
> our implementation is based on LL/SC instructions so far, we have
> mitigation in place to prevent a domain DoS Xen. However, this relies
> on the page to be mapped in a single domain at the time.
>
> AFAICT, with the legacy interface, the pages will be mapped in both
> the target and the emulator. So this would defeat the mitigation we
> have in place.
>
> Because the legacy interface is relying on foreign mapping, the page
> has to be mapped in the target P2M. It might be possible to restrict
> the access for the target by setting the p2m bits r, w to 0. This
> would still allow the foreign mapping to work as we only check the p2m
> type during mapping.
>
> Anyway, I think we agreed that we want to avoid to introduce the
> legacy interface. But I wanted to answer just for completeness and
> keep a record of potential pitfalls with the legacy interface on Arm.
ok, the HVMOP plumbing on Arm will be dropped for non-RFC series. It
seems that xenforeignmemory_map_resource() does needed things. Of
course, the corresponding Linux patch to support
IOCTL_PRIVCMD_MMAP_RESOURCE was cherry-picked for that purpose (I am
currently using v4.14).

Thank you.


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
Hello all.


I would like to clarify some questions based on the comments for the
patch series. I put them together (please see below).


On 06.08.20 14:29, Jan Beulich wrote:
> On 06.08.2020 13:08, Julien Grall wrote:
>> On 05/08/2020 20:30, Oleksandr wrote:
>>> I was thinking how to split handle_hvm_io_completion()
>>> gracefully but I failed find a good solution for that, so decided to add
>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>> I could add a comment describing why they are here if appropriate. But
>>> if you think they shouldn't be called from the common code in any way, I
>>> will try to split it.
>> I am not entirely sure what msix_write_completion is meant to do on x86.
>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
> Due to the split brain model of handling PCI pass-through (between
> Xen and qemu), a guest writing to an MSI-X entry needs this write
> handed to qemu, and upon completion of the write there Xen also
> needs to take some extra action.


1. Regarding common handle_hvm_io_completion() implementation:

Could msix_write_completion() be called later on so we would be able to
split handle_hvm_io_completion() gracefully or could we call it from
handle_mmio()?
The reason I am asking is to avoid calling it from the common code in
order to avoid introducing stub on Arm which is not going to be ever
implemented
(if msix_write_completion() is purely x86 material).

For the non-RFC patch series I moved handle_realmode_completion to the
x86 code and now my local implementation looks like:

bool handle_hvm_io_completion(struct vcpu *v)
{
    struct domain *d = v->domain;
    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
    struct hvm_ioreq_server *s;
    struct hvm_ioreq_vcpu *sv;
    enum hvm_io_completion io_completion;

    if ( has_vpci(d) && vpci_process_pending(v) )
    {
        raise_softirq(SCHEDULE_SOFTIRQ);
        return false;
    }

    sv = get_pending_vcpu(v, &s);
    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
        return false;

    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
        STATE_IORESP_READY : STATE_IOREQ_NONE;

    msix_write_completion(v);
    vcpu_end_shutdown_deferral(v);

    io_completion = vio->io_completion;
    vio->io_completion = HVMIO_no_completion;

    switch ( io_completion )
    {
    case HVMIO_no_completion:
        break;

    case HVMIO_mmio_completion:
        return handle_mmio();

    case HVMIO_pio_completion:
        return handle_pio(vio->io_req.addr, vio->io_req.size,
                          vio->io_req.dir);

    default:
        return arch_handle_hvm_io_completion(io_completion);
    }

    return true;
}

2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():

There was a request to consider renaming that function which is called
from the common code in the context of IOREQ series.
The point is, that the name of the function is pretty generic and can be
confusing on Arm (we already have a try_handle_mmio()).
I noticed that except common code that function is called from a few
places on x86 (I am not even sure whether all of them are IOREQ related).
The question is would x86 folks be happy with such renaming?

Alternatively I could provide the following in
include/asm-arm/hvm/ioreq.h without renaming it in the common code and
still using non-confusing variant on Arm (however I am not sure whether
this is a good idea):

#define handle_mmio ioreq_handle_complete_mmio


3. Regarding common IOREQ/DM stuff location:

Currently it is located at:
common/hvm/...
include/xen/hvm/...

For the non-RFC patch series I am going to avoid using "hvm" name (which
is internal detail of arch specific code and shouldn't be exposed to the
common code).
The question is whether I should use another directory name (probably
ioreq?) or just place them in common root directory?


Could you please share your opinion?

--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 20.08.2020 20:30, Oleksandr wrote:
> On 06.08.20 14:29, Jan Beulich wrote:
>> On 06.08.2020 13:08, Julien Grall wrote:
>>> On 05/08/2020 20:30, Oleksandr wrote:
>>>> I was thinking how to split handle_hvm_io_completion()
>>>> gracefully but I failed find a good solution for that, so decided to add
>>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>>> I could add a comment describing why they are here if appropriate. But
>>>> if you think they shouldn't be called from the common code in any way, I
>>>> will try to split it.
>>> I am not entirely sure what msix_write_completion is meant to do on x86.
>>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
>> Due to the split brain model of handling PCI pass-through (between
>> Xen and qemu), a guest writing to an MSI-X entry needs this write
>> handed to qemu, and upon completion of the write there Xen also
>> needs to take some extra action.
>
>
> 1. Regarding common handle_hvm_io_completion() implementation:
>
> Could msix_write_completion() be called later on so we would be able to
> split handle_hvm_io_completion() gracefully or could we call it from
> handle_mmio()?
> The reason I am asking is to avoid calling it from the common code in
> order to avoid introducing stub on Arm which is not going to be ever
> implemented
> (if msix_write_completion() is purely x86 material).

I'm unconvinced of this last fact, but as with about everything it is
quite certainly possible to call the function later. The question is
how ugly this would become, as this may involve redundant conditionals
(i.e. ones which need to remain in sync) and/or extra propagation of
state.

> For the non-RFC patch series I moved handle_realmode_completion to the
> x86 code and now my local implementation looks like:
>
> bool handle_hvm_io_completion(struct vcpu *v)
> {
>     struct domain *d = v->domain;
>     struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>     struct hvm_ioreq_server *s;
>     struct hvm_ioreq_vcpu *sv;
>     enum hvm_io_completion io_completion;
>
>     if ( has_vpci(d) && vpci_process_pending(v) )
>     {
>         raise_softirq(SCHEDULE_SOFTIRQ);
>         return false;
>     }
>
>     sv = get_pending_vcpu(v, &s);
>     if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>         return false;
>
>     vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
>         STATE_IORESP_READY : STATE_IOREQ_NONE;
>
>     msix_write_completion(v);
>     vcpu_end_shutdown_deferral(v);
>
>     io_completion = vio->io_completion;
>     vio->io_completion = HVMIO_no_completion;
>
>     switch ( io_completion )
>     {
>     case HVMIO_no_completion:
>         break;
>
>     case HVMIO_mmio_completion:
>         return handle_mmio();
>
>     case HVMIO_pio_completion:
>         return handle_pio(vio->io_req.addr, vio->io_req.size,
>                           vio->io_req.dir);
>
>     default:
>         return arch_handle_hvm_io_completion(io_completion);
>     }
>
>     return true;
> }
>
> 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():
>
> There was a request to consider renaming that function which is called
> from the common code in the context of IOREQ series.
> The point is, that the name of the function is pretty generic and can be
> confusing on Arm (we already have a try_handle_mmio()).
> I noticed that except common code that function is called from a few
> places on x86 (I am not even sure whether all of them are IOREQ related).
> The question is would x86 folks be happy with such renaming?

handle_mmio() without any parameters and used for a varying set
of purposes was imo never a good choice of name. The situation
has improved, but can do with further improvement. The new name,
if to be used for truly renaming the function need to fit all
uses though. As such, I don't think ioreq_handle_complete_mmio()
is an appropriate name.

> Alternatively I could provide the following in
> include/asm-arm/hvm/ioreq.h without renaming it in the common code and
> still using non-confusing variant on Arm (however I am not sure whether
> this is a good idea):
>
> #define handle_mmio ioreq_handle_complete_mmio

If anything, for x86 it ought to be the other way around, at
which point you wouldn't need any alias #define on Arm.

> 3. Regarding common IOREQ/DM stuff location:
>
> Currently it is located at:
> common/hvm/...
> include/xen/hvm/...
>
> For the non-RFC patch series I am going to avoid using "hvm" name (which
> is internal detail of arch specific code and shouldn't be exposed to the
> common code).
> The question is whether I should use another directory name (probably
> ioreq?) or just place them in common root directory?

I think there are arguments for and against hvm/. I'm not of
the opinion that ioreq/ would be a good name, so if hvm/ was to
be ruled out, I think the file(s) shouldn't go into separate
subdirs at all.

Jan
Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features [ In reply to ]
On 21.08.20 09:16, Jan Beulich wrote:

Hi Jan.

Thank you for your answer.

> On 20.08.2020 20:30, Oleksandr wrote:
>> On 06.08.20 14:29, Jan Beulich wrote:
>>> On 06.08.2020 13:08, Julien Grall wrote:
>>>> On 05/08/2020 20:30, Oleksandr wrote:
>>>>> I was thinking how to split handle_hvm_io_completion()
>>>>> gracefully but I failed find a good solution for that, so decided to add
>>>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>>>> I could add a comment describing why they are here if appropriate. But
>>>>> if you think they shouldn't be called from the common code in any way, I
>>>>> will try to split it.
>>>> I am not entirely sure what msix_write_completion is meant to do on x86.
>>>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
>>> Due to the split brain model of handling PCI pass-through (between
>>> Xen and qemu), a guest writing to an MSI-X entry needs this write
>>> handed to qemu, and upon completion of the write there Xen also
>>> needs to take some extra action.
>>
>> 1. Regarding common handle_hvm_io_completion() implementation:
>>
>> Could msix_write_completion() be called later on so we would be able to
>> split handle_hvm_io_completion() gracefully or could we call it from
>> handle_mmio()?
>> The reason I am asking is to avoid calling it from the common code in
>> order to avoid introducing stub on Arm which is not going to be ever
>> implemented
>> (if msix_write_completion() is purely x86 material).
> I'm unconvinced of this last fact, but as with about everything it is
> quite certainly possible to call the function later. The question is
> how ugly this would become, as this may involve redundant conditionals
> (i.e. ones which need to remain in sync) and/or extra propagation of
> state.


I understand. Would it be better to make handle_hvm_io_completion() per
arch then?
This would avoid using various stubs on Arm (we could get rid of
has_vpci, msix_write_completion, handle_pio,
arch_handle_hvm_io_completion, etc)
and avoid renaming of handle_mmio().

Julien what is your opinion on that?


For example the Arm implementation would look like:

bool handle_hvm_io_completion(struct vcpu *v)
{
    struct domain *d = v->domain;
    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
    struct hvm_ioreq_server *s;
    struct hvm_ioreq_vcpu *sv;
    enum hvm_io_completion io_completion;

    sv = get_pending_vcpu(v, &s);
    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
        return false;

    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
        STATE_IORESP_READY : STATE_IOREQ_NONE;

    vcpu_end_shutdown_deferral(v);

    io_completion = vio->io_completion;
    vio->io_completion = HVMIO_no_completion;

    switch ( io_completion )
    {
    case HVMIO_no_completion:
        break;

    case HVMIO_mmio_completion:
        return ioreq_handle_complete_mmio();

    default:
        ASSERT_UNREACHABLE();
        break;
    }

    return true;
}


>>
>> 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():
>>
>> There was a request to consider renaming that function which is called
>> from the common code in the context of IOREQ series.
>> The point is, that the name of the function is pretty generic and can be
>> confusing on Arm (we already have a try_handle_mmio()).
>> I noticed that except common code that function is called from a few
>> places on x86 (I am not even sure whether all of them are IOREQ related).
>> The question is would x86 folks be happy with such renaming?
> handle_mmio() without any parameters and used for a varying set
> of purposes was imo never a good choice of name. The situation
> has improved, but can do with further improvement. The new name,
> if to be used for truly renaming the function need to fit all
> uses though. As such, I don't think ioreq_handle_complete_mmio()
> is an appropriate name.
>
>> Alternatively I could provide the following in
>> include/asm-arm/hvm/ioreq.h without renaming it in the common code and
>> still using non-confusing variant on Arm (however I am not sure whether
>> this is a good idea):
>>
>> #define handle_mmio ioreq_handle_complete_mmio
> If anything, for x86 it ought to be the other way around, at
> which point you wouldn't need any alias #define on Arm.
But could this approach be accepted? I think it would be the easiest way
to avoid confusing on Arm and avoid renaming that function in the whole
x86 code.


>
>> 3. Regarding common IOREQ/DM stuff location:
>>
>> Currently it is located at:
>> common/hvm/...
>> include/xen/hvm/...
>>
>> For the non-RFC patch series I am going to avoid using "hvm" name (which
>> is internal detail of arch specific code and shouldn't be exposed to the
>> common code).
>> The question is whether I should use another directory name (probably
>> ioreq?) or just place them in common root directory?
> I think there are arguments for and against hvm/. I'm not of
> the opinion that ioreq/ would be a good name, so if hvm/ was to
> be ruled out, I think the file(s) shouldn't go into separate
> subdirs at all.

Got it.


--
Regards,

Oleksandr Tyshchenko

1 2  View All