Mailing List Archive

1 2  View All
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 08.08.20 00:50, Stefano Stabellini wrote:

Hi

> On Fri, 7 Aug 2020, Oleksandr wrote:
>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> Trying to simulate IO_RETRY handling mechanism (according to model below) I
>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>
>>> OK, thanks for the details. My interpretation seems to be correct.
>>>
>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>> around, and IO_HANDLED when after QEMU does its job.
>>>
>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>> early and let the scheduler do its job? Something like:
>>>
>>> enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>
>>> switch ( state )
>>> {
>>> case IO_ABORT:
>>> goto inject_abt;
>>> case IO_HANDLED:
>>> advance_pc(regs, hsr);
>>> return;
>>> case IO_RETRY:
>>> /* finish later */
>>> return;
>>> case IO_UNHANDLED:
>>> /* IO unhandled, try another way to handle it. */
>>> break;
>>> default:
>>> ASSERT_UNREACHABLE();
>>> }
>>>
>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>> handle_mmio just sets the user register with the read value.
>>>
>>> But it would be better if it called again the original function
>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>> IO_HANDLED instead of IO_RETRY,
>> I may miss some important point, but I failed to see why try_handle_mmio
>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
>> Or current try_fwd_ioserv() logic needs rework?
> I think you should check the ioreq->state in try_fwd_ioserv(), if the
> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> can return IO_HANDLED.
>
> That is assuming that you are looking at the live version of the ioreq
> shared with QEMU instead of a private copy of it, which I am not sure.
> Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
> copy? The live version is returned by get_ioreq() ?

If I understand the code correctly, indeed, get_ioreq() returns live
version shared with emulator.
Desired state change (STATE_IORESP_READY) what actually the
hvm_wait_for_io() is waiting for is set here (in my case):
https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=blob;f=demu.c;h=f785b394d0cf141dffa05bdddecf338214358aea;hb=refs/heads/master#l698


> Even in handle_hvm_io_completion, instead of setting vio->io_req.state
> to STATE_IORESP_READY by hand, it would be better to look at the live
> version of the ioreq because QEMU will have already set ioreq->state
> to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).
It seems that after detecting STATE_IORESP_READY in hvm_wait_for_io()
the state of live version is set to STATE_IOREQ_NONE immediately, so
looking at the live version down the handle_hvm_io_completion()
or in try_fwd_ioserv() shows us nothing I am afraid.


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi


On 08.08.20 01:19, Oleksandr wrote:
>
> On 08.08.20 00:50, Stefano Stabellini wrote:
>
> Hi Stefano
>
>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>
>>> Hi Stefano
>>>
>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>> below) I
>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>
>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>
>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>
>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>>> early and let the scheduler do its job? Something like:
>>>>
>>>>               enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>>
>>>>               switch ( state )
>>>>               {
>>>>               case IO_ABORT:
>>>>                   goto inject_abt;
>>>>               case IO_HANDLED:
>>>>                   advance_pc(regs, hsr);
>>>>                   return;
>>>>               case IO_RETRY:
>>>>                   /* finish later */
>>>>                   return;
>>>>               case IO_UNHANDLED:
>>>>                   /* IO unhandled, try another way to handle it. */
>>>>                   break;
>>>>               default:
>>>>                   ASSERT_UNREACHABLE();
>>>>               }
>>>>
>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>>> handle_mmio just sets the user register with the read value.
>>>>
>>>> But it would be better if it called again the original function
>>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>>> IO_HANDLED instead of IO_RETRY,
>>> I may miss some important point, but I failed to see why
>>> try_handle_mmio
>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
>>> stage.
>>> Or current try_fwd_ioserv() logic needs rework?
>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
>> can return IO_HANDLED.
>

I optimized test patch a bit (now it looks much simpler). I didn't face
any issues during a quick test.
Julien, Stefano, what do you think it does proper things for addressing
TODO? Or I missed something?


---
 xen/arch/arm/io.c    | 4 ----
 xen/arch/arm/ioreq.c | 7 ++++++-
 xen/arch/arm/traps.c | 4 +++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..3063577 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -156,10 +156,6 @@ static enum io_state try_fwd_ioserv(struct
cpu_user_regs *regs,
     else
         vio->io_completion = HVMIO_mmio_completion;

-    /* XXX: Decide what to do */
-    if ( rc == IO_RETRY )
-        rc = IO_HANDLED;
-
     return rc;
 }
 #endif
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..e5235c6 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,6 +33,8 @@
 #include <public/hvm/dm_op.h>
 #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
 bool handle_mmio(void)
 {
     struct vcpu *v = current;
@@ -52,7 +54,7 @@ bool handle_mmio(void)

     /* XXX: Do we need to take care of write here ? */
     if ( dabt.write )
-        return true;
+        goto done;

     /*
      * Sign extend if required.
@@ -72,6 +74,9 @@ bool handle_mmio(void)

     set_user_reg(regs, dabt.reg, r);

+done:
+    advance_pc(regs, hsr);
+
     return true;
 }

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..974c744 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
cpu_user_regs *regs,
             case IO_HANDLED:
                 advance_pc(regs, hsr);
                 return;
+            case IO_RETRY:
+                /* finish later */
+                return;
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
             default:
-                /* XXX: Handle IO_RETRY */
                 ASSERT_UNREACHABLE();
             }
         }
--
2.7.4


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi,

On 06/08/2020 12:37, Oleksandr wrote:
>
> On 05.08.20 16:30, Julien Grall wrote:
>> Hi,
>
> Hi Julien
>
>
>>
>> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>> splits IOREQ support into common and arch specific parts.
>>>
>>> This support is going to be used on Arm to be able run device
>>> emulator outside of Xen hypervisor.
>>>
>>> Please note, this is a split/cleanup of Julien's PoC:
>>> "Add support for Guest IO forwarding to a device emulator"
>>>
>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>>   xen/arch/x86/Kconfig            |    1 +
>>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>>   xen/arch/x86/hvm/io.c           |    2 +-
>>>   xen/arch/x86/hvm/ioreq.c        | 1431
>>> +--------------------------------------
>>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>>   xen/arch/x86/mm.c               |    2 +-
>>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>>   xen/common/Kconfig              |    3 +
>>>   xen/common/Makefile             |    1 +
>>>   xen/common/hvm/Makefile         |    1 +
>>>   xen/common/hvm/ioreq.c          | 1430
>>> ++++++++++++++++++++++++++++++++++++++
>>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>>
>> That's quite a lot of code moved in a single patch. How can we check
>> the code moved is still correct? Is it a verbatim copy?
> In this patch I mostly tried to separate a common IOREQ part which also
> resulted in updating x86 sources to include new header.  Also I moved
> hvm_ioreq_needs_completion() to common header (which probably wanted to
> be in a separate patch). It was a verbatim copy initially (w/o
> hvm_map_mem_type_to_ioreq_server) and then updated to deal with arch
> specific parts.

I would prefer if the two parts are done separately. IOW, the code
movement be nearly a verbatim copy.

> In which way do you want me to split this patch?
>
> I could think of the following:
>
> 1. Copy of x86's ioreq.c/ioreq.h to common code > 2. Update common ioreq.c/ioreq.h
> 3. Update x86's parts to be able to deal with common code
> 4. Move hvm_ioreq_needs_completion() to common code
>

Ideally the code movement should be done in the same patch. This helps
to check the patch is only moving code and also avoids mistakes on rebase.

So my preference would be to first modify the x86 code (e.g. renaming)
to make it common and then have one patch that will move the code.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 10.08.20 19:29, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 06/08/2020 12:37, Oleksandr wrote:
>>
>> On 05.08.20 16:30, Julien Grall wrote:
>>> Hi,
>>
>> Hi Julien
>>
>>
>>>
>>> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>>> splits IOREQ support into common and arch specific parts.
>>>>
>>>> This support is going to be used on Arm to be able run device
>>>> emulator outside of Xen hypervisor.
>>>>
>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>>   xen/arch/x86/Kconfig            |    1 +
>>>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>>>   xen/arch/x86/hvm/io.c           |    2 +-
>>>>   xen/arch/x86/hvm/ioreq.c        | 1431
>>>> +--------------------------------------
>>>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>>>   xen/arch/x86/mm.c               |    2 +-
>>>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>>>   xen/common/Kconfig              |    3 +
>>>>   xen/common/Makefile             |    1 +
>>>>   xen/common/hvm/Makefile         |    1 +
>>>>   xen/common/hvm/ioreq.c          | 1430
>>>> ++++++++++++++++++++++++++++++++++++++
>>>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>>>
>>> That's quite a lot of code moved in a single patch. How can we check
>>> the code moved is still correct? Is it a verbatim copy?
>> In this patch I mostly tried to separate a common IOREQ part which
>> also resulted in updating x86 sources to include new header.  Also I
>> moved hvm_ioreq_needs_completion() to common header (which probably
>> wanted to be in a separate patch). It was a verbatim copy initially
>> (w/o hvm_map_mem_type_to_ioreq_server) and then updated to deal with
>> arch specific parts.
>
> I would prefer if the two parts are done separately. IOW, the code
> movement be nearly a verbatim copy.
>
>> In which way do you want me to split this patch?
>>
>> I could think of the following:
>>
>> 1. Copy of x86's ioreq.c/ioreq.h to common code > 2. Update common
>> ioreq.c/ioreq.h
>> 3. Update x86's parts to be able to deal with common code
>> 4. Move hvm_ioreq_needs_completion() to common code
>>
>
> Ideally the code movement should be done in the same patch. This helps
> to check the patch is only moving code and also avoids mistakes on
> rebase.
>
> So my preference would be to first modify the x86 code (e.g. renaming)
> to make it common and then have one patch that will move the code.

ok, will try to split accordingly. Thank you


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 07/08/2020 00:48, Stefano Stabellini wrote:
> On Thu, 6 Aug 2020, Julien Grall wrote:
>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>> On 04/08/2020 20:11, Stefano Stabellini wrote:
>>>>> On Tue, 4 Aug 2020, Julien Grall wrote:
>>>>>> On 04/08/2020 12:10, Oleksandr wrote:
>>>>>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t
>>>>>>>>> *ioreq)
>>>>>>>>> +{
>>>>>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>>>>>> +           !ioreq->data_is_ptr &&
>>>>>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>>>>>> IOREQ_WRITE);
>>>>>>>>> +}
>>>>>>>> I don't think having this in common code is correct. The short-cut
>>>>>>>> of
>>>>>>>> not
>>>>>>>> completing PIO reads seems somewhat x86 specific.
>>>>>>
>>>>>> Hmmm, looking at the code, I think it doesn't wait for PIO writes to
>>>>>> complete
>>>>>> (not read). Did I miss anything?
>>>>>>
>>>>>>> Does ARM even
>>>>>>>> have the concept of PIO?
>>>>>>>
>>>>>>> I am not 100% sure here, but it seems that doesn't have.
>>>>>>
>>>>>> Technically, the PIOs exist on Arm, however they are accessed the same
>>>>>> way
>>>>>> as
>>>>>> MMIO and will have a dedicated area defined by the HW.
>>>>>>
>>>>>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>>>>>
>>>>>> Now the question is whether we want to expose them to the Device
>>>>>> Emulator
>>>>>> as
>>>>>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to care
>>>>>> about
>>>>>> the
>>>>>> architecture used. It should just be able to request a given IOport
>>>>>> region.
>>>>>>
>>>>>> So it may make sense to differentiate them in the common ioreq code as
>>>>>> well.
>>>>>>
>>>>>> I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs
>>>>>> address
>>>>>> space are different on Arm as well. Paul, Stefano, do you know what
>>>>>> they
>>>>>> are
>>>>>> doing?
>>>>>
>>>>> On the QEMU side, it looks like PIO (address_space_io) is used in
>>>>> connection with the emulation of the "in" or "out" instructions, see
>>>>> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
>>>>> space regardless of the architecture, such as
>>>>> hw/pci/pci_bridge.c:pci_bridge_initfn.
>>>>>
>>>>> However, because there is no "in" and "out" on ARM, I don't think
>>>>> address_space_io can be accessed. Specifically, there is no equivalent
>>>>> for target/i386/misc_helper.c:helper_inb on ARM.
>>>>
>>>> So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?
>>>
>>> PIO is also memory mapped on ARM and it seems to have its own MMIO
>>> address window.
>> This part is already well-understood :). However, this only tell us how an OS
>> is accessing a PIO.
>>
>> What I am trying to figure out is how the hardware (or QEMU) is meant to work.
>>
>> From my understanding, the MMIO access will be received by the hostbridge and
>> then forwarded to the appropriate PCI device. The two questions I am trying to
>> answer is: How the I/O BARs are configured? Will it contain an MMIO address or
>> an offset?
>>
>> If the answer is the latter, then we will need PIO because a DM will never see
>> the MMIO address (the hostbridge will be emulated in Xen).
>
> Now I understand the question :-)
>
> This is the way I understand it works. Let's say that the PIO aperture
> is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> 0x1000-0x2000 are addresses that cannot be accessed directly.
> 0x3eff0000-0x3eff1000 is the range that works.
>
> A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> for instance 0x1100.
>
> However, when the operating system access 0x1100, it will issue a read
> to 0x3eff0100.
>
> Xen will trap the read to 0x3eff0100 and send it to QEMU.
>
> QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> QEMU has also to know the address range of the MMIO aperture and its
> remappings, if any (it is possible to have address remapping for MMIO
> addresses too.)
>
> I think today this information is "built-in" QEMU, not configurable. It
> works fine because *I think* the PCI aperture is pretty much the same on
> x86 boards, at least the one supported by QEMU for Xen.

Well on x86, the OS will access PIO using inb/outb. So the address
received by Xen is 0x1000-0x2000 and then forwarded to the DM using the
PIO type.

>
> On ARM, I think we should explicitly declare the PCI MMIO aperture and
> its alias/address-remapping. When we do that, we can also declare the
> PIO aperture and its alias/address-remapping.

Well yes, we need to define PCI MMIO and PCI I/O region because the
guest OS needs to know them.

However, I am unsure how this would help us to solve the question
whether access to the PCI I/O aperture should be sent as a PIO or MMIO.

Per what you wrote, the PCI I/O Bar would be configured with the range
0x1000-0x2000. So a device emulator (this may not be QEMU and only
emulate one PCI device!!) will only see that range.

How does the device-emulator then know that it needs to watch the region
0x3eff0000-0x3eff1000?

It feels to me that it would be easier/make more sense if the DM only
say "I want to watch the PIO range 0x1000-0x2000". So Xen would be in
charge to do the translation between the OS view and the DM view.

This also means a DM would be completely arch-agnostic. This would
follow the HW where you can plug your PCI card on any HW.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Sat, 8 Aug 2020, Oleksandr wrote:
> On 08.08.20 00:50, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > >
> > > Hi Stefano
> > >
> > > Trying to simulate IO_RETRY handling mechanism (according to model below)
> > > I
> > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > >
> > > > OK, thanks for the details. My interpretation seems to be correct.
> > > >
> > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> > > > return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > also needs to handle try_handle_mmio returning IO_RETRY the first
> > > > around, and IO_HANDLED when after QEMU does its job.
> > > >
> > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > > > early and let the scheduler do its job? Something like:
> > > >
> > > > enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > > >
> > > > switch ( state )
> > > > {
> > > > case IO_ABORT:
> > > > goto inject_abt;
> > > > case IO_HANDLED:
> > > > advance_pc(regs, hsr);
> > > > return;
> > > > case IO_RETRY:
> > > > /* finish later */
> > > > return;
> > > > case IO_UNHANDLED:
> > > > /* IO unhandled, try another way to handle it. */
> > > > break;
> > > > default:
> > > > ASSERT_UNREACHABLE();
> > > > }
> > > >
> > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > > > handle_mmio just sets the user register with the read value.
> > > >
> > > > But it would be better if it called again the original function
> > > > do_trap_stage2_abort_guest to actually retry the original operation.
> > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > > > IO_HANDLED instead of IO_RETRY,
> > > I may miss some important point, but I failed to see why try_handle_mmio
> > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
> > > Or current try_fwd_ioserv() logic needs rework?
> > I think you should check the ioreq->state in try_fwd_ioserv(), if the
> > result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> > can return IO_HANDLED.
> >
> > That is assuming that you are looking at the live version of the ioreq
> > shared with QEMU instead of a private copy of it, which I am not sure.
> > Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
> > copy? The live version is returned by get_ioreq() ?
>
> If I understand the code correctly, indeed, get_ioreq() returns live version
> shared with emulator.
> Desired state change (STATE_IORESP_READY) what actually the hvm_wait_for_io()
> is waiting for is set here (in my case):
> https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=blob;f=demu.c;h=f785b394d0cf141dffa05bdddecf338214358aea;hb=refs/heads/master#l698
>
> > Even in handle_hvm_io_completion, instead of setting vio->io_req.state
> > to STATE_IORESP_READY by hand, it would be better to look at the live
> > version of the ioreq because QEMU will have already set ioreq->state
> > to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).
> It seems that after detecting STATE_IORESP_READY in hvm_wait_for_io() the
> state of live version is set to STATE_IOREQ_NONE immediately, so looking at
> the live version down the handle_hvm_io_completion()
> or in try_fwd_ioserv() shows us nothing I am afraid.

I see. That is because we want to "free" the ioreq as soon as possible,
which is good. handle_hvm_io_completion also sets vio->io_req.state to
STATE_IORESP_READY, so our private copy is still set to
STATE_IORESP_READY. Thus, try_fwd_ioserv should do the right thing
simply reading vio->io_req.state: try_fwd_ioserv should be able to
return IO_HANDLED when the "state" is STATE_IORESP_READY, right?
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Mon, 10 Aug 2020, Oleksandr wrote:
> On 08.08.20 01:19, Oleksandr wrote:
> > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > >
> > > > Hi Stefano
> > > >
> > > > Trying to simulate IO_RETRY handling mechanism (according to model
> > > > below) I
> > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > >
> > > > > OK, thanks for the details. My interpretation seems to be correct.
> > > > >
> > > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> > > > > return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > also needs to handle try_handle_mmio returning IO_RETRY the first
> > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > >
> > > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > > > > early and let the scheduler do its job? Something like:
> > > > >
> > > > >               enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > > > >
> > > > >               switch ( state )
> > > > >               {
> > > > >               case IO_ABORT:
> > > > >                   goto inject_abt;
> > > > >               case IO_HANDLED:
> > > > >                   advance_pc(regs, hsr);
> > > > >                   return;
> > > > >               case IO_RETRY:
> > > > >                   /* finish later */
> > > > >                   return;
> > > > >               case IO_UNHANDLED:
> > > > >                   /* IO unhandled, try another way to handle it. */
> > > > >                   break;
> > > > >               default:
> > > > >                   ASSERT_UNREACHABLE();
> > > > >               }
> > > > >
> > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > > > > handle_mmio just sets the user register with the read value.
> > > > >
> > > > > But it would be better if it called again the original function
> > > > > do_trap_stage2_abort_guest to actually retry the original operation.
> > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > > > > IO_HANDLED instead of IO_RETRY,
> > > > I may miss some important point, but I failed to see why try_handle_mmio
> > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
> > > > stage.
> > > > Or current try_fwd_ioserv() logic needs rework?
> > > I think you should check the ioreq->state in try_fwd_ioserv(), if the
> > > result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> > > can return IO_HANDLED.
> >
>
> I optimized test patch a bit (now it looks much simpler). I didn't face any
> issues during a quick test.

Both patches get much closer to following the proper state machine,
great! I think this patch is certainly a good improvement. I think the
other patch you sent earlier, slightly larger, is even better. It makes
the following additional changes that would be good to have:

- try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
- handle_mmio simply calls do_trap_stage2_abort_guest

I would also remove "handle_mmio_finish" and do the guest register
setting as well as setting vio->io_req.state to STATE_IOREQ_NONE
directly in try_fwd_ioserv.


> ---
>  xen/arch/arm/io.c    | 4 ----
>  xen/arch/arm/ioreq.c | 7 ++++++-
>  xen/arch/arm/traps.c | 4 +++-
>  3 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 436f669..3063577 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -156,10 +156,6 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      else
>          vio->io_completion = HVMIO_mmio_completion;
>
> -    /* XXX: Decide what to do */
> -    if ( rc 7== IO_RETRY )
> -        rc = IO_HANDLED;
> -
>      return rc;
>  }
>  #endif
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index 8f60c41..e5235c6 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -33,6 +33,8 @@
>  #include <public/hvm/dm_op.h>
>  #include <public/hvm/ioreq.h>
>
> +#include <asm/traps.h>
> +
>  bool handle_mmio(void)
>  {
>      struct vcpu *v = current;
> @@ -52,7 +54,7 @@ bool handle_mmio(void)
>
>      /* XXX: Do we need to take care of write here ? */
>      if ( dabt.write )
> -        return true;
> +        goto done;
>
>      /*
>       * Sign extend if required.
> @@ -72,6 +74,9 @@ bool handle_mmio(void)
>
>      set_user_reg(regs, dabt.reg, r);
>
> +done:
> +    advance_pc(regs, hsr);
> +
>      return true;
>  }
>
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index ea472d1..974c744 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
> cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
>              default:
> -                /* XXX: Handle IO_RETRY */
>                  ASSERT_UNREACHABLE();
>              }
>          }
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Mon, 10 Aug 2020, Julien Grall wrote:
> On 07/08/2020 00:48, Stefano Stabellini wrote:
> > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > > > On 04/08/2020 20:11, Stefano Stabellini wrote:
> > > > > > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > > > > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > > > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > > > > > +static inline bool hvm_ioreq_needs_completion(const ioreq_t
> > > > > > > > > > *ioreq)
> > > > > > > > > > +{
> > > > > > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > > > > > +           !ioreq->data_is_ptr &&
> > > > > > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
> > > > > > > > > > IOREQ_WRITE);
> > > > > > > > > > +}
> > > > > > > > > I don't think having this in common code is correct. The
> > > > > > > > > short-cut
> > > > > > > > > of
> > > > > > > > > not
> > > > > > > > > completing PIO reads seems somewhat x86 specific.
> > > > > > >
> > > > > > > Hmmm, looking at the code, I think it doesn't wait for PIO writes
> > > > > > > to
> > > > > > > complete
> > > > > > > (not read). Did I miss anything?
> > > > > > >
> > > > > > > > Does ARM even
> > > > > > > > > have the concept of PIO?
> > > > > > > >
> > > > > > > > I am not 100% sure here, but it seems that doesn't have.
> > > > > > >
> > > > > > > Technically, the PIOs exist on Arm, however they are accessed the
> > > > > > > same
> > > > > > > way
> > > > > > > as
> > > > > > > MMIO and will have a dedicated area defined by the HW.
> > > > > > >
> > > > > > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > > > > >
> > > > > > > Now the question is whether we want to expose them to the Device
> > > > > > > Emulator
> > > > > > > as
> > > > > > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to
> > > > > > > care
> > > > > > > about
> > > > > > > the
> > > > > > > architecture used. It should just be able to request a given
> > > > > > > IOport
> > > > > > > region.
> > > > > > >
> > > > > > > So it may make sense to differentiate them in the common ioreq
> > > > > > > code as
> > > > > > > well.
> > > > > > >
> > > > > > > I had a quick look at QEMU and wasn't able to tell if PIOs and
> > > > > > > MMIOs
> > > > > > > address
> > > > > > > space are different on Arm as well. Paul, Stefano, do you know
> > > > > > > what
> > > > > > > they
> > > > > > > are
> > > > > > > doing?
> > > > > >
> > > > > > On the QEMU side, it looks like PIO (address_space_io) is used in
> > > > > > connection with the emulation of the "in" or "out" instructions, see
> > > > > > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
> > > > > > space regardless of the architecture, such as
> > > > > > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > > > > >
> > > > > > However, because there is no "in" and "out" on ARM, I don't think
> > > > > > address_space_io can be accessed. Specifically, there is no
> > > > > > equivalent
> > > > > > for target/i386/misc_helper.c:helper_inb on ARM.
> > > > >
> > > > > So how PCI I/O BAR are accessed? Surely, they could be used on Arm,
> > > > > right?
> > > >
> > > > PIO is also memory mapped on ARM and it seems to have its own MMIO
> > > > address window.
> > > This part is already well-understood :). However, this only tell us how an
> > > OS
> > > is accessing a PIO.
> > >
> > > What I am trying to figure out is how the hardware (or QEMU) is meant to
> > > work.
> > >
> > > From my understanding, the MMIO access will be received by the hostbridge
> > > and
> > > then forwarded to the appropriate PCI device. The two questions I am
> > > trying to
> > > answer is: How the I/O BARs are configured? Will it contain an MMIO
> > > address or
> > > an offset?
> > >
> > > If the answer is the latter, then we will need PIO because a DM will never
> > > see
> > > the MMIO address (the hostbridge will be emulated in Xen).
> >
> > Now I understand the question :-)
> >
> > This is the way I understand it works. Let's say that the PIO aperture
> > is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> > 0x1000-0x2000 are addresses that cannot be accessed directly.
> > 0x3eff0000-0x3eff1000 is the range that works.
> >
> > A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> > for instance 0x1100.
> >
> > However, when the operating system access 0x1100, it will issue a read
> > to 0x3eff0100.
> >
> > Xen will trap the read to 0x3eff0100 and send it to QEMU.
> >
> > QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> > aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> > QEMU has also to know the address range of the MMIO aperture and its
> > remappings, if any (it is possible to have address remapping for MMIO
> > addresses too.)
> >
> > I think today this information is "built-in" QEMU, not configurable. It
> > works fine because *I think* the PCI aperture is pretty much the same on
> > x86 boards, at least the one supported by QEMU for Xen.
>
> Well on x86, the OS will access PIO using inb/outb. So the address received by
> Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
>
> > On ARM, I think we should explicitly declare the PCI MMIO aperture and
> > its alias/address-remapping. When we do that, we can also declare the
> > PIO aperture and its alias/address-remapping.
>
> Well yes, we need to define PCI MMIO and PCI I/O region because the guest OS
> needs to know them.

[1]
(see below)


> However, I am unsure how this would help us to solve the question whether
> access to the PCI I/O aperture should be sent as a PIO or MMIO.
>
> Per what you wrote, the PCI I/O Bar would be configured with the range
> 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate one
> PCI device!!) will only see that range.
>
> How does the device-emulator then know that it needs to watch the region
> 0x3eff0000-0x3eff1000?

It would know because the PCI PIO aperture, together with the alias, are
specified [1].


> It feels to me that it would be easier/make more sense if the DM only say "I
> want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to do
> the translation between the OS view and the DM view.
>
> This also means a DM would be completely arch-agnostic. This would follow the
> HW where you can plug your PCI card on any HW.

As you know, PIO access is actually not modelled by QEMU for ARM
targets. I worry about the long term stability of it, given that it is
untested. I.e. qemu-system-aarch64 could have a broken PIO emulation
and nobody would find out except for us when we send ioreqs to it.

Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
access-type that doesn't exist on the architecture?
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi Stefano,

On 11/08/2020 00:34, Stefano Stabellini wrote:
> On Mon, 10 Aug 2020, Oleksandr wrote:
>> On 08.08.20 01:19, Oleksandr wrote:
>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>
>>>>> Hi Stefano
>>>>>
>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>> below) I
>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>
>>>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>>>
>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>>>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>
>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>>>>> early and let the scheduler do its job? Something like:
>>>>>>
>>>>>>               enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>>>>
>>>>>>               switch ( state )
>>>>>>               {
>>>>>>               case IO_ABORT:
>>>>>>                   goto inject_abt;
>>>>>>               case IO_HANDLED:
>>>>>>                   advance_pc(regs, hsr);
>>>>>>                   return;
>>>>>>               case IO_RETRY:
>>>>>>                   /* finish later */
>>>>>>                   return;
>>>>>>               case IO_UNHANDLED:
>>>>>>                   /* IO unhandled, try another way to handle it. */
>>>>>>                   break;
>>>>>>               default:
>>>>>>                   ASSERT_UNREACHABLE();
>>>>>>               }
>>>>>>
>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>
>>>>>> But it would be better if it called again the original function
>>>>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>> I may miss some important point, but I failed to see why try_handle_mmio
>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
>>>>> stage.
>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>>>> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
>>>> can return IO_HANDLED.
>>>
>>
>> I optimized test patch a bit (now it looks much simpler). I didn't face any
>> issues during a quick test.
>
> Both patches get much closer to following the proper state machine,
> great! I think this patch is certainly a good improvement. I think the
> other patch you sent earlier, slightly larger, is even better. It makes
> the following additional changes that would be good to have:
>
> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> - handle_mmio simply calls do_trap_stage2_abort_guest

I don't think we should call do_trap_stage2_abort_guest() as part of the
completion because:
* The function do_trap_stage2_abort_guest() is using registers that
are not context switched (such as FAR_EL2). I/O handling is split in two
with likely a context switch in the middle. The second part is the
completion (i.e call to handle_mmio()). So the system registers will be
incorrect.
* A big chunk of do_trap_stage2_abort_guest() is not necessary for
the completion. For instance, there is no need to try to translate the
guest virtual address to a guest physical address.

Therefore the version below is probably the best approach.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 11.08.20 12:19, Julien Grall wrote:

Hi Julien, Stefano

> Hi Stefano,
>
> On 11/08/2020 00:34, Stefano Stabellini wrote:
>> On Mon, 10 Aug 2020, Oleksandr wrote:
>>> On 08.08.20 01:19, Oleksandr wrote:
>>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>>
>>>>>> Hi Stefano
>>>>>>
>>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>>> below) I
>>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>>
>>>>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>>>>
>>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
>>>>>>> should
>>>>>>> return IO_RETRY. Then,
>>>>>>> xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>>
>>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
>>>>>>> return
>>>>>>> early and let the scheduler do its job? Something like:
>>>>>>>
>>>>>>>                enum io_state state = try_handle_mmio(regs, hsr,
>>>>>>> gpa);
>>>>>>>
>>>>>>>                switch ( state )
>>>>>>>                {
>>>>>>>                case IO_ABORT:
>>>>>>>                    goto inject_abt;
>>>>>>>                case IO_HANDLED:
>>>>>>>                    advance_pc(regs, hsr);
>>>>>>>                    return;
>>>>>>>                case IO_RETRY:
>>>>>>>                    /* finish later */
>>>>>>>                    return;
>>>>>>>                case IO_UNHANDLED:
>>>>>>>                    /* IO unhandled, try another way to handle
>>>>>>> it. */
>>>>>>>                    break;
>>>>>>>                default:
>>>>>>>                    ASSERT_UNREACHABLE();
>>>>>>>                }
>>>>>>>
>>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>>> handle_hvm_io_completion() after QEMU completes the emulation.
>>>>>>> Today,
>>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>>
>>>>>>> But it would be better if it called again the original function
>>>>>>> do_trap_stage2_abort_guest to actually retry the original
>>>>>>> operation.
>>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and
>>>>>>> gets
>>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>>> I may miss some important point, but I failed to see why
>>>>>> try_handle_mmio
>>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
>>>>>> stage.
>>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>>>>> result is ready, then ioreq->state should be STATE_IORESP_READY,
>>>>> and you
>>>>> can return IO_HANDLED.
>>>>
>>>
>>> I optimized test patch a bit (now it looks much simpler). I didn't
>>> face any
>>> issues during a quick test.
>>
>> Both patches get much closer to following the proper state machine,
>> great! I think this patch is certainly a good improvement. I think the
>> other patch you sent earlier, slightly larger, is even better. It makes
>> the following additional changes that would be good to have:
>>
>> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
>> - handle_mmio simply calls do_trap_stage2_abort_guest
>
> I don't think we should call do_trap_stage2_abort_guest() as part of
> the completion because:
>     * The function do_trap_stage2_abort_guest() is using registers
> that are not context switched (such as FAR_EL2). I/O handling is split
> in two with likely a context switch in the middle. The second part is
> the completion (i.e call to handle_mmio()). So the system registers
> will be incorrect.
>     * A big chunk of do_trap_stage2_abort_guest() is not necessary for
> the completion. For instance, there is no need to try to translate the
> guest virtual address to a guest physical address.
>
> Therefore the version below is probably the best approach.


Indeed, the first version (with calling do_trap_stage2_abort_guest() for
a completion) is a racy. When testing it more heavily I faced an issue
(sometimes) which resulted in DomU got stuck completely.

(XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00

I didn't investigate an issue in detail, but I assumed that code in
do_trap_stage2_abort_guest() caused that. This was the main reason why I
decided to optimize an initial patch (and took only advance_pc).
Reading Julien's answer I understand now what could happen.


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 11/08/2020 00:34, Stefano Stabellini wrote:
> On Mon, 10 Aug 2020, Julien Grall wrote:
>> On 07/08/2020 00:48, Stefano Stabellini wrote:
>>> On Thu, 6 Aug 2020, Julien Grall wrote:
>>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>>>> On 04/08/2020 20:11, Stefano Stabellini wrote:
>>>>>>> On Tue, 4 Aug 2020, Julien Grall wrote:
>>>>>>>> On 04/08/2020 12:10, Oleksandr wrote:
>>>>>>>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>>>>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t
>>>>>>>>>>> *ioreq)
>>>>>>>>>>> +{
>>>>>>>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>>>>>>>> +           !ioreq->data_is_ptr &&
>>>>>>>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>>>>>>>> IOREQ_WRITE);
>>>>>>>>>>> +}
>>>>>>>>>> I don't think having this in common code is correct. The
>>>>>>>>>> short-cut
>>>>>>>>>> of
>>>>>>>>>> not
>>>>>>>>>> completing PIO reads seems somewhat x86 specific.
>>>>>>>>
>>>>>>>> Hmmm, looking at the code, I think it doesn't wait for PIO writes
>>>>>>>> to
>>>>>>>> complete
>>>>>>>> (not read). Did I miss anything?
>>>>>>>>
>>>>>>>>> Does ARM even
>>>>>>>>>> have the concept of PIO?
>>>>>>>>>
>>>>>>>>> I am not 100% sure here, but it seems that doesn't have.
>>>>>>>>
>>>>>>>> Technically, the PIOs exist on Arm, however they are accessed the
>>>>>>>> same
>>>>>>>> way
>>>>>>>> as
>>>>>>>> MMIO and will have a dedicated area defined by the HW.
>>>>>>>>
>>>>>>>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>>>>>>>
>>>>>>>> Now the question is whether we want to expose them to the Device
>>>>>>>> Emulator
>>>>>>>> as
>>>>>>>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to
>>>>>>>> care
>>>>>>>> about
>>>>>>>> the
>>>>>>>> architecture used. It should just be able to request a given
>>>>>>>> IOport
>>>>>>>> region.
>>>>>>>>
>>>>>>>> So it may make sense to differentiate them in the common ioreq
>>>>>>>> code as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> I had a quick look at QEMU and wasn't able to tell if PIOs and
>>>>>>>> MMIOs
>>>>>>>> address
>>>>>>>> space are different on Arm as well. Paul, Stefano, do you know
>>>>>>>> what
>>>>>>>> they
>>>>>>>> are
>>>>>>>> doing?
>>>>>>>
>>>>>>> On the QEMU side, it looks like PIO (address_space_io) is used in
>>>>>>> connection with the emulation of the "in" or "out" instructions, see
>>>>>>> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
>>>>>>> space regardless of the architecture, such as
>>>>>>> hw/pci/pci_bridge.c:pci_bridge_initfn.
>>>>>>>
>>>>>>> However, because there is no "in" and "out" on ARM, I don't think
>>>>>>> address_space_io can be accessed. Specifically, there is no
>>>>>>> equivalent
>>>>>>> for target/i386/misc_helper.c:helper_inb on ARM.
>>>>>>
>>>>>> So how PCI I/O BAR are accessed? Surely, they could be used on Arm,
>>>>>> right?
>>>>>
>>>>> PIO is also memory mapped on ARM and it seems to have its own MMIO
>>>>> address window.
>>>> This part is already well-understood :). However, this only tell us how an
>>>> OS
>>>> is accessing a PIO.
>>>>
>>>> What I am trying to figure out is how the hardware (or QEMU) is meant to
>>>> work.
>>>>
>>>> From my understanding, the MMIO access will be received by the hostbridge
>>>> and
>>>> then forwarded to the appropriate PCI device. The two questions I am
>>>> trying to
>>>> answer is: How the I/O BARs are configured? Will it contain an MMIO
>>>> address or
>>>> an offset?
>>>>
>>>> If the answer is the latter, then we will need PIO because a DM will never
>>>> see
>>>> the MMIO address (the hostbridge will be emulated in Xen).
>>>
>>> Now I understand the question :-)
>>>
>>> This is the way I understand it works. Let's say that the PIO aperture
>>> is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
>>> 0x1000-0x2000 are addresses that cannot be accessed directly.
>>> 0x3eff0000-0x3eff1000 is the range that works.
>>>
>>> A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
>>> for instance 0x1100.

Are you sure about this?

>>>
>>> However, when the operating system access 0x1100, it will issue a read
>>> to 0x3eff0100.
>>>
>>> Xen will trap the read to 0x3eff0100 and send it to QEMU.
>>>
>>> QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
>>> aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
>>> QEMU has also to know the address range of the MMIO aperture and its
>>> remappings, if any (it is possible to have address remapping for MMIO
>>> addresses too.)
>>>
>>> I think today this information is "built-in" QEMU, not configurable. It
>>> works fine because *I think* the PCI aperture is pretty much the same on
>>> x86 boards, at least the one supported by QEMU for Xen.
>>
>> Well on x86, the OS will access PIO using inb/outb. So the address received by
>> Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
>>
>>> On ARM, I think we should explicitly declare the PCI MMIO aperture and
>>> its alias/address-remapping. When we do that, we can also declare the
>>> PIO aperture and its alias/address-remapping.
>>
>> Well yes, we need to define PCI MMIO and PCI I/O region because the guest OS
>> needs to know them.
>
> [1]
> (see below)
>
>
>> However, I am unsure how this would help us to solve the question whether
>> access to the PCI I/O aperture should be sent as a PIO or MMIO.
>>
>> Per what you wrote, the PCI I/O Bar would be configured with the range
>> 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate one
>> PCI device!!) will only see that range.
>>
>> How does the device-emulator then know that it needs to watch the region
>> 0x3eff0000-0x3eff1000?
>
> It would know because the PCI PIO aperture, together with the alias, are
> specified [1].

Are you suggesting fix it in the ABI or pass it as runtime information
to the Device Emulator?

>
>
>> It feels to me that it would be easier/make more sense if the DM only say "I
>> want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to do
>> the translation between the OS view and the DM view.
>>
>> This also means a DM would be completely arch-agnostic. This would follow the
>> HW where you can plug your PCI card on any HW.
>
> As you know, PIO access is actually not modelled by QEMU for ARM
> targets. I worry about the long term stability of it, given that it is
> untested. I.e. qemu-system-aarch64 could have a broken PIO emulation
> and nobody would find out except for us when we send ioreqs to it.

There are multiple references of PIO in the QEMU for Arm (see
hw/arm/virt.c). So what do you mean by not modelled?

> Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
> access-type that doesn't exist on the architecture?

The architecture doesn't define an instruction to access PIO, however
this doesn't mean such access doesn't exist on the platform.

For instance, PCI device may have I/O BAR. On Arm64, the hostbridge will
be responsible to do the translation between the MMIO access to a PIO
access for the PCI device.

I have the impression that we disagree in what the Device Emulator is
meant to do. IHMO, the goal of the device emulator is to emulate a
device in an arch-agnostic way.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Tue, 11 Aug 2020, Oleksandr wrote:
> On 11.08.20 12:19, Julien Grall wrote:
> > On 11/08/2020 00:34, Stefano Stabellini wrote:
> > > On Mon, 10 Aug 2020, Oleksandr wrote:
> > > > On 08.08.20 01:19, Oleksandr wrote:
> > > > > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > > > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > > > > >
> > > > > > > Hi Stefano
> > > > > > >
> > > > > > > Trying to simulate IO_RETRY handling mechanism (according to model
> > > > > > > below) I
> > > > > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > > > > >
> > > > > > > > OK, thanks for the details. My interpretation seems to be
> > > > > > > > correct.
> > > > > > > >
> > > > > > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
> > > > > > > > should
> > > > > > > > return IO_RETRY. Then,
> > > > > > > > xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > > > > also needs to handle try_handle_mmio returning IO_RETRY the
> > > > > > > > first
> > > > > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > > > > >
> > > > > > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
> > > > > > > > return
> > > > > > > > early and let the scheduler do its job? Something like:
> > > > > > > >
> > > > > > > >                enum io_state state = try_handle_mmio(regs, hsr,
> > > > > > > > gpa);
> > > > > > > >
> > > > > > > >                switch ( state )
> > > > > > > >                {
> > > > > > > >                case IO_ABORT:
> > > > > > > >                    goto inject_abt;
> > > > > > > >                case IO_HANDLED:
> > > > > > > >                    advance_pc(regs, hsr);
> > > > > > > >                    return;
> > > > > > > >                case IO_RETRY:
> > > > > > > >                    /* finish later */
> > > > > > > >                    return;
> > > > > > > >                case IO_UNHANDLED:
> > > > > > > >                    /* IO unhandled, try another way to handle
> > > > > > > > it. */
> > > > > > > >                    break;
> > > > > > > >                default:
> > > > > > > >                    ASSERT_UNREACHABLE();
> > > > > > > >                }
> > > > > > > >
> > > > > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > > > > handle_hvm_io_completion() after QEMU completes the emulation.
> > > > > > > > Today,
> > > > > > > > handle_mmio just sets the user register with the read value.
> > > > > > > >
> > > > > > > > But it would be better if it called again the original function
> > > > > > > > do_trap_stage2_abort_guest to actually retry the original
> > > > > > > > operation.
> > > > > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and
> > > > > > > > gets
> > > > > > > > IO_HANDLED instead of IO_RETRY,
> > > > > > > I may miss some important point, but I failed to see why
> > > > > > > try_handle_mmio
> > > > > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
> > > > > > > this
> > > > > > > stage.
> > > > > > > Or current try_fwd_ioserv() logic needs rework?
> > > > > > I think you should check the ioreq->state in try_fwd_ioserv(), if
> > > > > > the
> > > > > > result is ready, then ioreq->state should be STATE_IORESP_READY, and
> > > > > > you
> > > > > > can return IO_HANDLED.
> > > > >
> > > >
> > > > I optimized test patch a bit (now it looks much simpler). I didn't face
> > > > any
> > > > issues during a quick test.
> > >
> > > Both patches get much closer to following the proper state machine,
> > > great! I think this patch is certainly a good improvement. I think the
> > > other patch you sent earlier, slightly larger, is even better. It makes
> > > the following additional changes that would be good to have:
> > >
> > > - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> > > - handle_mmio simply calls do_trap_stage2_abort_guest
> >
> > I don't think we should call do_trap_stage2_abort_guest() as part of the
> > completion because:
> >     * The function do_trap_stage2_abort_guest() is using registers that are
> > not context switched (such as FAR_EL2). I/O handling is split in two with
> > likely a context switch in the middle. The second part is the completion
> > (i.e call to handle_mmio()). So the system registers will be incorrect.
> >     * A big chunk of do_trap_stage2_abort_guest() is not necessary for the
> > completion. For instance, there is no need to try to translate the guest
> > virtual address to a guest physical address.
> >
> > Therefore the version below is probably the best approach.
>
>
> Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
> completion) is a racy. When testing it more heavily I faced an issue
> (sometimes) which resulted in DomU got stuck completely.
>
> (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
>
> I didn't investigate an issue in detail, but I assumed that code in
> do_trap_stage2_abort_guest() caused that. This was the main reason why I
> decided to optimize an initial patch (and took only advance_pc).
> Reading Julien's answer I understand now what could happen.

From your and Julien's feedback it is clear that calling
do_trap_stage2_abort_guest() is not possible and not a good idea.


The reason for my suggestion was to complete the implementation of the
state machine so that "RETRY" actually means "let's try again the
emulation" but the second time it will return "HANDLED".

Looking at this again, we could achieve the same goal in a better way by
moving the register setting from "handle_mmio" to "try_handle_mmio" and
also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
would become almost empty like on x86.

1) do_trap_stage2_abort_guest ->
try_handle_mmio ->
try_fwd_ioserv ->
IO_RETRY

2) handle_hvm_io_completion ->
handle_mmio ->
try_handle_mmio ->
try_fwd_ioserv ->
IO_HANDLED


It is very similar to your second patch with a small change on calling
try_handle_mmio from handle_mmio and setting the register there. Do you
think that would work?
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Tue, 11 Aug 2020, Julien Grall wrote:
> On 11/08/2020 00:34, Stefano Stabellini wrote:
> > On Mon, 10 Aug 2020, Julien Grall wrote:
> > > On 07/08/2020 00:48, Stefano Stabellini wrote:
> > > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > > > On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > > > > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > > > > > On 04/08/2020 20:11, Stefano Stabellini wrote:
> > > > > > > > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > > > > > > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > > > > > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > > > > > > > +static inline bool hvm_ioreq_needs_completion(const
> > > > > > > > > > > > ioreq_t
> > > > > > > > > > > > *ioreq)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > > > > > > > +           !ioreq->data_is_ptr &&
> > > > > > > > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir
> > > > > > > > > > > > !=
> > > > > > > > > > > > IOREQ_WRITE);
> > > > > > > > > > > > +}
> > > > > > > > > > > I don't think having this in common code is correct. The
> > > > > > > > > > > short-cut
> > > > > > > > > > > of
> > > > > > > > > > > not
> > > > > > > > > > > completing PIO reads seems somewhat x86 specific.
> > > > > > > > >
> > > > > > > > > Hmmm, looking at the code, I think it doesn't wait for PIO
> > > > > > > > > writes
> > > > > > > > > to
> > > > > > > > > complete
> > > > > > > > > (not read). Did I miss anything?
> > > > > > > > >
> > > > > > > > > > Does ARM even
> > > > > > > > > > > have the concept of PIO?
> > > > > > > > > >
> > > > > > > > > > I am not 100% sure here, but it seems that doesn't have.
> > > > > > > > >
> > > > > > > > > Technically, the PIOs exist on Arm, however they are accessed
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > > way
> > > > > > > > > as
> > > > > > > > > MMIO and will have a dedicated area defined by the HW.
> > > > > > > > >
> > > > > > > > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > > > > > > >
> > > > > > > > > Now the question is whether we want to expose them to the
> > > > > > > > > Device
> > > > > > > > > Emulator
> > > > > > > > > as
> > > > > > > > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to
> > > > > > > > > care
> > > > > > > > > about
> > > > > > > > > the
> > > > > > > > > architecture used. It should just be able to request a given
> > > > > > > > > IOport
> > > > > > > > > region.
> > > > > > > > >
> > > > > > > > > So it may make sense to differentiate them in the common ioreq
> > > > > > > > > code as
> > > > > > > > > well.
> > > > > > > > >
> > > > > > > > > I had a quick look at QEMU and wasn't able to tell if PIOs and
> > > > > > > > > MMIOs
> > > > > > > > > address
> > > > > > > > > space are different on Arm as well. Paul, Stefano, do you know
> > > > > > > > > what
> > > > > > > > > they
> > > > > > > > > are
> > > > > > > > > doing?
> > > > > > > >
> > > > > > > > On the QEMU side, it looks like PIO (address_space_io) is used
> > > > > > > > in
> > > > > > > > connection with the emulation of the "in" or "out" instructions,
> > > > > > > > see
> > > > > > > > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate
> > > > > > > > PIO
> > > > > > > > space regardless of the architecture, such as
> > > > > > > > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > > > > > > >
> > > > > > > > However, because there is no "in" and "out" on ARM, I don't
> > > > > > > > think
> > > > > > > > address_space_io can be accessed. Specifically, there is no
> > > > > > > > equivalent
> > > > > > > > for target/i386/misc_helper.c:helper_inb on ARM.
> > > > > > >
> > > > > > > So how PCI I/O BAR are accessed? Surely, they could be used on
> > > > > > > Arm,
> > > > > > > right?
> > > > > >
> > > > > > PIO is also memory mapped on ARM and it seems to have its own MMIO
> > > > > > address window.
> > > > > This part is already well-understood :). However, this only tell us
> > > > > how an
> > > > > OS
> > > > > is accessing a PIO.
> > > > >
> > > > > What I am trying to figure out is how the hardware (or QEMU) is meant
> > > > > to
> > > > > work.
> > > > >
> > > > > From my understanding, the MMIO access will be received by the
> > > > > hostbridge
> > > > > and
> > > > > then forwarded to the appropriate PCI device. The two questions I am
> > > > > trying to
> > > > > answer is: How the I/O BARs are configured? Will it contain an MMIO
> > > > > address or
> > > > > an offset?
> > > > >
> > > > > If the answer is the latter, then we will need PIO because a DM will
> > > > > never
> > > > > see
> > > > > the MMIO address (the hostbridge will be emulated in Xen).
> > > >
> > > > Now I understand the question :-)
> > > >
> > > > This is the way I understand it works. Let's say that the PIO aperture
> > > > is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> > > > 0x1000-0x2000 are addresses that cannot be accessed directly.
> > > > 0x3eff0000-0x3eff1000 is the range that works.
> > > >
> > > > A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> > > > for instance 0x1100.
>
> Are you sure about this?

I am pretty sure, but only from reading the code. It would be great if
somebody ran QEMU and actually tested it. This is important because it
could make the whole discussion moot :-)


> > > > However, when the operating system access 0x1100, it will issue a read
> > > > to 0x3eff0100.
> > > >
> > > > Xen will trap the read to 0x3eff0100 and send it to QEMU.
> > > >
> > > > QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> > > > aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> > > > QEMU has also to know the address range of the MMIO aperture and its
> > > > remappings, if any (it is possible to have address remapping for MMIO
> > > > addresses too.)
> > > >
> > > > I think today this information is "built-in" QEMU, not configurable. It
> > > > works fine because *I think* the PCI aperture is pretty much the same on
> > > > x86 boards, at least the one supported by QEMU for Xen.
> > >
> > > Well on x86, the OS will access PIO using inb/outb. So the address
> > > received by
> > > Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
> > >
> > > > On ARM, I think we should explicitly declare the PCI MMIO aperture and
> > > > its alias/address-remapping. When we do that, we can also declare the
> > > > PIO aperture and its alias/address-remapping.
> > >
> > > Well yes, we need to define PCI MMIO and PCI I/O region because the guest
> > > OS
> > > needs to know them.
> >
> > [1]
> > (see below)
> >
> >
> > > However, I am unsure how this would help us to solve the question whether
> > > access to the PCI I/O aperture should be sent as a PIO or MMIO.
> > >
> > > Per what you wrote, the PCI I/O Bar would be configured with the range
> > > 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate
> > > one
> > > PCI device!!) will only see that range.
> > >
> > > How does the device-emulator then know that it needs to watch the region
> > > 0x3eff0000-0x3eff1000?
> >
> > It would know because the PCI PIO aperture, together with the alias, are
> > specified [1].
>
> Are you suggesting fix it in the ABI or pass it as runtime information to the
> Device Emulator?

I am suggesting of "fixing" it in the ABI. Whether we pass it at
runtime or not is less important I think.


> > > It feels to me that it would be easier/make more sense if the DM only say
> > > "I
> > > want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to
> > > do
> > > the translation between the OS view and the DM view.
> > >
> > > This also means a DM would be completely arch-agnostic. This would follow
> > > the
> > > HW where you can plug your PCI card on any HW.
> >
> > As you know, PIO access is actually not modelled by QEMU for ARM
> > targets. I worry about the long term stability of it, given that it is
> > untested. I.e. qemu-system-aarch64 could have a broken PIO emulation
> > and nobody would find out except for us when we send ioreqs to it.
>
> There are multiple references of PIO in the QEMU for Arm (see hw/arm/virt.c).
> So what do you mean by not modelled?

I mean that PIO is only emulated as MMIO region, not as port-mapped I/O.


> > Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
> > access-type that doesn't exist on the architecture?
>
> The architecture doesn't define an instruction to access PIO, however this
> doesn't mean such access doesn't exist on the platform.
>
> For instance, PCI device may have I/O BAR. On Arm64, the hostbridge will be
> responsible to do the translation between the MMIO access to a PIO access for
> the PCI device.

As far as I understand the host bridge is responsible for any
translations between the host address space and the address space of
devices. Even for MMIO addresses there are translations. The hostbridge
doesn't do any port-mapped I/O to communicate with the device. There is
a different protocol for those communications. Port-mapped I/O is how it
is exposed to the host by the hostbridge.

If we wanted to emulate the hostbridge/device interface properly we
would end up with something pretty different.


> I have the impression that we disagree in what the Device Emulator is meant to
> do. IHMO, the goal of the device emulator is to emulate a device in an
> arch-agnostic way.

That would be great in theory but I am not sure it is achievable: if we
use an existing emulator like QEMU, even a single device has to fit
into QEMU's view of the world, which makes assumptions about host
bridges and apertures. It is impossible today to build QEMU in an
arch-agnostic way, it has to be tied to an architecture.

I realize we are not building this interface for QEMU specifically, but
even if we try to make the interface arch-agnostic, in reality the
emulators won't be arch-agnostic. If we send a port-mapped I/O request
to qemu-system-aarch64 who knows what is going to happen: it is a code
path that it is not explicitly tested.
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi,

On 11/08/2020 23:48, Stefano Stabellini wrote:
>> I have the impression that we disagree in what the Device Emulator is meant to
>> do. IHMO, the goal of the device emulator is to emulate a device in an
>> arch-agnostic way.
>
> That would be great in theory but I am not sure it is achievable: if we
> use an existing emulator like QEMU, even a single device has to fit
> into QEMU's view of the world, which makes assumptions about host
> bridges and apertures. It is impossible today to build QEMU in an
> arch-agnostic way, it has to be tied to an architecture.

AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
because of TCG. If this wasn't built then you could easily write a
machine that doesn't depend on the instruction set.

The proof is, today, we are using QEMU x86 to serve Arm64 guest.
Although this is only for PV drivers.

>
> I realize we are not building this interface for QEMU specifically, but
> even if we try to make the interface arch-agnostic, in reality the
> emulators won't be arch-agnostic.

This depends on your goal. If your goal is to write a standalone
emulator for a single device, then it is entirely possible to make it
arch-agnostic.

Per above, this would even be possible if you were emulating a set of
devices.

What I want to avoid is requiring all the emulators to contain
arch-specific code just because it is easier to get QEMU working on Xen
on Arm.

> If we send a port-mapped I/O request
> to qemu-system-aarch64 who knows what is going to happen: it is a code
> path that it is not explicitly tested.

Maybe, maybe not. To me this is mostly software issues that can easily
be mitigated if we do proper testing...

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 12.08.20 01:47, Stefano Stabellini wrote:

Hi Stefano

> On Tue, 11 Aug 2020, Oleksandr wrote:
>> On 11.08.20 12:19, Julien Grall wrote:
>>> On 11/08/2020 00:34, Stefano Stabellini wrote:
>>>> On Mon, 10 Aug 2020, Oleksandr wrote:
>>>>> On 08.08.20 01:19, Oleksandr wrote:
>>>>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>>>>
>>>>>>>> Hi Stefano
>>>>>>>>
>>>>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>>>>> below) I
>>>>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>>>>
>>>>>>>>> OK, thanks for the details. My interpretation seems to be
>>>>>>>>> correct.
>>>>>>>>>
>>>>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
>>>>>>>>> should
>>>>>>>>> return IO_RETRY. Then,
>>>>>>>>> xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the
>>>>>>>>> first
>>>>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>>>>
>>>>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
>>>>>>>>> return
>>>>>>>>> early and let the scheduler do its job? Something like:
>>>>>>>>>
>>>>>>>>>                enum io_state state = try_handle_mmio(regs, hsr,
>>>>>>>>> gpa);
>>>>>>>>>
>>>>>>>>>                switch ( state )
>>>>>>>>>                {
>>>>>>>>>                case IO_ABORT:
>>>>>>>>>                    goto inject_abt;
>>>>>>>>>                case IO_HANDLED:
>>>>>>>>>                    advance_pc(regs, hsr);
>>>>>>>>>                    return;
>>>>>>>>>                case IO_RETRY:
>>>>>>>>>                    /* finish later */
>>>>>>>>>                    return;
>>>>>>>>>                case IO_UNHANDLED:
>>>>>>>>>                    /* IO unhandled, try another way to handle
>>>>>>>>> it. */
>>>>>>>>>                    break;
>>>>>>>>>                default:
>>>>>>>>>                    ASSERT_UNREACHABLE();
>>>>>>>>>                }
>>>>>>>>>
>>>>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>>>>> handle_hvm_io_completion() after QEMU completes the emulation.
>>>>>>>>> Today,
>>>>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>>>>
>>>>>>>>> But it would be better if it called again the original function
>>>>>>>>> do_trap_stage2_abort_guest to actually retry the original
>>>>>>>>> operation.
>>>>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and
>>>>>>>>> gets
>>>>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>>>>> I may miss some important point, but I failed to see why
>>>>>>>> try_handle_mmio
>>>>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
>>>>>>>> this
>>>>>>>> stage.
>>>>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if
>>>>>>> the
>>>>>>> result is ready, then ioreq->state should be STATE_IORESP_READY, and
>>>>>>> you
>>>>>>> can return IO_HANDLED.
>>>>> I optimized test patch a bit (now it looks much simpler). I didn't face
>>>>> any
>>>>> issues during a quick test.
>>>> Both patches get much closer to following the proper state machine,
>>>> great! I think this patch is certainly a good improvement. I think the
>>>> other patch you sent earlier, slightly larger, is even better. It makes
>>>> the following additional changes that would be good to have:
>>>>
>>>> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
>>>> - handle_mmio simply calls do_trap_stage2_abort_guest
>>> I don't think we should call do_trap_stage2_abort_guest() as part of the
>>> completion because:
>>>     * The function do_trap_stage2_abort_guest() is using registers that are
>>> not context switched (such as FAR_EL2). I/O handling is split in two with
>>> likely a context switch in the middle. The second part is the completion
>>> (i.e call to handle_mmio()). So the system registers will be incorrect.
>>>     * A big chunk of do_trap_stage2_abort_guest() is not necessary for the
>>> completion. For instance, there is no need to try to translate the guest
>>> virtual address to a guest physical address.
>>>
>>> Therefore the version below is probably the best approach.
>>
>> Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
>> completion) is a racy. When testing it more heavily I faced an issue
>> (sometimes) which resulted in DomU got stuck completely.
>>
>> (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
>>
>> I didn't investigate an issue in detail, but I assumed that code in
>> do_trap_stage2_abort_guest() caused that. This was the main reason why I
>> decided to optimize an initial patch (and took only advance_pc).
>> Reading Julien's answer I understand now what could happen.
> From your and Julien's feedback it is clear that calling
> do_trap_stage2_abort_guest() is not possible and not a good idea.
>
>
> The reason for my suggestion was to complete the implementation of the
> state machine so that "RETRY" actually means "let's try again the
> emulation" but the second time it will return "HANDLED".
>
> Looking at this again, we could achieve the same goal in a better way by
> moving the register setting from "handle_mmio" to "try_handle_mmio" and
> also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
> would become almost empty like on x86.
>
> 1) do_trap_stage2_abort_guest ->
> try_handle_mmio ->
> try_fwd_ioserv ->
> IO_RETRY
>
> 2) handle_hvm_io_completion ->
> handle_mmio ->
> try_handle_mmio ->
> try_fwd_ioserv ->
> IO_HANDLED
>
>
> It is very similar to your second patch with a small change on calling
> try_handle_mmio from handle_mmio and setting the register there. Do you
> think that would work?
If I understood correctly what you had suggested and properly
implemented then it works, at least I didn't face any issues during testing.
I think this variant adds some extra operations comparing to the
previous one (for example an attempt to find a mmio handler at
try_handle_mmio). But, if you think new variant is cleaner and better
represents how the state machine should look like, I would be absolutely
OK to take this variant for non-RFC series. Please note, there was a
request to move try_fwd_ioserv() to arm/ioreq.c (I am going to move new
handle_ioserv() as well).


---
 xen/arch/arm/io.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
 xen/arch/arm/ioreq.c | 38 +++++++-------------------------------
 xen/arch/arm/traps.c |  4 +++-
 3 files changed, 53 insertions(+), 36 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..4db7c55 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -109,6 +109,43 @@ static const struct mmio_handler
*find_mmio_handler(struct domain *d,
 }

 #ifdef CONFIG_IOREQ_SERVER
+static enum io_state handle_ioserv(struct cpu_user_regs *regs, struct
vcpu *v)
+{
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->arch.hvm.hvm_io.io_req.data;
+
+    /* We are done with the IO */
+    /* XXX: Is it the right place? */
+    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
+
+    /* XXX: Do we need to take care of write here ? */
+    if ( dabt.write )
+        return IO_HANDLED;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return IO_HANDLED;
+}
+
 static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
                                     struct vcpu *v, mmio_info_t *info)
 {
@@ -130,6 +167,10 @@ static enum io_state try_fwd_ioserv(struct
cpu_user_regs *regs,
     {
     case STATE_IOREQ_NONE:
         break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
     default:
         printk("d%u wrong state %u\n", v->domain->domain_id,
                vio->io_req.state);
@@ -156,10 +197,6 @@ static enum io_state try_fwd_ioserv(struct
cpu_user_regs *regs,
     else
         vio->io_completion = HVMIO_mmio_completion;

-    /* XXX: Decide what to do */
-    if ( rc == IO_RETRY )
-        rc = IO_HANDLED;
-
     return rc;
 }
 #endif
@@ -185,6 +222,8 @@ enum io_state try_handle_mmio(struct cpu_user_regs
*regs,

 #ifdef CONFIG_IOREQ_SERVER
         rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+            return handle_ioserv(regs, v);
 #endif

         return rc;
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..9068b8d 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,46 +33,22 @@
 #include <public/hvm/dm_op.h>
 #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
 bool handle_mmio(void)
 {
     struct vcpu *v = current;
     struct cpu_user_regs *regs = guest_cpu_user_regs();
     const union hsr hsr = { .bits = regs->hsr };
-    const struct hsr_dabt dabt = hsr.dabt;
-    /* Code is similar to handle_read */
-    uint8_t size = (1 << dabt.size) * 8;
-    register_t r = v->arch.hvm.hvm_io.io_req.data;
-
-    /* We should only be here on Guest Data Abort */
-    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
+    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;

-    /* We are done with the IO */
-    /* XXX: Is it the right place? */
-    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
-
-    /* XXX: Do we need to take care of write here ? */
-    if ( dabt.write )
-        return true;
-
-    /*
-     * Sign extend if required.
-     * Note that we expect the read handler to have zeroed the bits
-     * outside the requested access size.
-     */
-    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
     {
-        /*
-         * We are relying on register_t using the same as
-         * an unsigned long in order to keep the 32-bit assembly
-         * code smaller.
-         */
-        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-        r |= (~0UL) << size;
+        advance_pc(regs, hsr);
+        return true;
     }

-    set_user_reg(regs, dabt.reg, r);
-
-    return true;
+    return false;
 }

 /* Ask ioemu mapcache to invalidate mappings. */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..974c744 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
cpu_user_regs *regs,
             case IO_HANDLED:
                 advance_pc(regs, hsr);
                 return;
+            case IO_RETRY:
+                /* finish later */
+                return;
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
             default:
-                /* XXX: Handle IO_RETRY */
                 ASSERT_UNREACHABLE();
             }
         }
--
2.7.4



--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Wed, 12 Aug 2020, Oleksandr wrote:
> On 12.08.20 01:47, Stefano Stabellini wrote:
> > On Tue, 11 Aug 2020, Oleksandr wrote:
> > > On 11.08.20 12:19, Julien Grall wrote:
> > > > On 11/08/2020 00:34, Stefano Stabellini wrote:
> > > > > On Mon, 10 Aug 2020, Oleksandr wrote:
> > > > > > On 08.08.20 01:19, Oleksandr wrote:
> > > > > > > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > > > > > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > > > > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > > > > > > >
> > > > > > > > > Hi Stefano
> > > > > > > > >
> > > > > > > > > Trying to simulate IO_RETRY handling mechanism (according to
> > > > > > > > > model
> > > > > > > > > below) I
> > > > > > > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > > > > > > >
> > > > > > > > > > OK, thanks for the details. My interpretation seems to be
> > > > > > > > > > correct.
> > > > > > > > > >
> > > > > > > > > > In which case, it looks like
> > > > > > > > > > xen/arch/arm/io.c:try_fwd_ioserv
> > > > > > > > > > should
> > > > > > > > > > return IO_RETRY. Then,
> > > > > > > > > > xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > > > > > > also needs to handle try_handle_mmio returning IO_RETRY the
> > > > > > > > > > first
> > > > > > > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > > > > > > >
> > > > > > > > > > What should do_trap_stage2_abort_guest do on IO_RETRY?
> > > > > > > > > > Simply
> > > > > > > > > > return
> > > > > > > > > > early and let the scheduler do its job? Something like:
> > > > > > > > > >
> > > > > > > > > >                enum io_state state = try_handle_mmio(regs,
> > > > > > > > > > hsr,
> > > > > > > > > > gpa);
> > > > > > > > > >
> > > > > > > > > >                switch ( state )
> > > > > > > > > >                {
> > > > > > > > > >                case IO_ABORT:
> > > > > > > > > >                    goto inject_abt;
> > > > > > > > > >                case IO_HANDLED:
> > > > > > > > > >                    advance_pc(regs, hsr);
> > > > > > > > > >                    return;
> > > > > > > > > >                case IO_RETRY:
> > > > > > > > > >                    /* finish later */
> > > > > > > > > >                    return;
> > > > > > > > > >                case IO_UNHANDLED:
> > > > > > > > > >                    /* IO unhandled, try another way to
> > > > > > > > > > handle
> > > > > > > > > > it. */
> > > > > > > > > >                    break;
> > > > > > > > > >                default:
> > > > > > > > > >                    ASSERT_UNREACHABLE();
> > > > > > > > > >                }
> > > > > > > > > >
> > > > > > > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > > > > > > handle_hvm_io_completion() after QEMU completes the
> > > > > > > > > > emulation.
> > > > > > > > > > Today,
> > > > > > > > > > handle_mmio just sets the user register with the read value.
> > > > > > > > > >
> > > > > > > > > > But it would be better if it called again the original
> > > > > > > > > > function
> > > > > > > > > > do_trap_stage2_abort_guest to actually retry the original
> > > > > > > > > > operation.
> > > > > > > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio()
> > > > > > > > > > and
> > > > > > > > > > gets
> > > > > > > > > > IO_HANDLED instead of IO_RETRY,
> > > > > > > > > I may miss some important point, but I failed to see why
> > > > > > > > > try_handle_mmio
> > > > > > > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
> > > > > > > > > this
> > > > > > > > > stage.
> > > > > > > > > Or current try_fwd_ioserv() logic needs rework?
> > > > > > > > I think you should check the ioreq->state in try_fwd_ioserv(),
> > > > > > > > if
> > > > > > > > the
> > > > > > > > result is ready, then ioreq->state should be STATE_IORESP_READY,
> > > > > > > > and
> > > > > > > > you
> > > > > > > > can return IO_HANDLED.
> > > > > > I optimized test patch a bit (now it looks much simpler). I didn't
> > > > > > face
> > > > > > any
> > > > > > issues during a quick test.
> > > > > Both patches get much closer to following the proper state machine,
> > > > > great! I think this patch is certainly a good improvement. I think the
> > > > > other patch you sent earlier, slightly larger, is even better. It
> > > > > makes
> > > > > the following additional changes that would be good to have:
> > > > >
> > > > > - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> > > > > - handle_mmio simply calls do_trap_stage2_abort_guest
> > > > I don't think we should call do_trap_stage2_abort_guest() as part of the
> > > > completion because:
> > > >     * The function do_trap_stage2_abort_guest() is using registers that
> > > > are
> > > > not context switched (such as FAR_EL2). I/O handling is split in two
> > > > with
> > > > likely a context switch in the middle. The second part is the completion
> > > > (i.e call to handle_mmio()). So the system registers will be incorrect.
> > > >     * A big chunk of do_trap_stage2_abort_guest() is not necessary for
> > > > the
> > > > completion. For instance, there is no need to try to translate the guest
> > > > virtual address to a guest physical address.
> > > >
> > > > Therefore the version below is probably the best approach.
> > >
> > > Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
> > > completion) is a racy. When testing it more heavily I faced an issue
> > > (sometimes) which resulted in DomU got stuck completely.
> > >
> > > (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
> > >
> > > I didn't investigate an issue in detail, but I assumed that code in
> > > do_trap_stage2_abort_guest() caused that. This was the main reason why I
> > > decided to optimize an initial patch (and took only advance_pc).
> > > Reading Julien's answer I understand now what could happen.
> > From your and Julien's feedback it is clear that calling
> > do_trap_stage2_abort_guest() is not possible and not a good idea.
> >
> >
> > The reason for my suggestion was to complete the implementation of the
> > state machine so that "RETRY" actually means "let's try again the
> > emulation" but the second time it will return "HANDLED".
> >
> > Looking at this again, we could achieve the same goal in a better way by
> > moving the register setting from "handle_mmio" to "try_handle_mmio" and
> > also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
> > would become almost empty like on x86.
> >
> > 1) do_trap_stage2_abort_guest ->
> > try_handle_mmio ->
> > try_fwd_ioserv ->
> > IO_RETRY
> >
> > 2) handle_hvm_io_completion ->
> > handle_mmio ->
> > try_handle_mmio ->
> > try_fwd_ioserv ->
> > IO_HANDLED
> >
> >
> > It is very similar to your second patch with a small change on calling
> > try_handle_mmio from handle_mmio and setting the register there. Do you
> > think that would work?
> If I understood correctly what you had suggested and properly implemented then
> it works, at least I didn't face any issues during testing.
> I think this variant adds some extra operations comparing to the previous one
> (for example an attempt to find a mmio handler at try_handle_mmio). But, if
> you think new variant is cleaner and better represents how the state machine
> should look like, I would be absolutely OK to take this variant for non-RFC
> series. Please note, there was a request to move try_fwd_ioserv() to
> arm/ioreq.c (I am going to move new handle_ioserv() as well).

Yes, I think this version better represents the state machine, thanks
for looking into it. I think it is good.

In terms of number of operations, it doesn't look very concerning (in
the sense that it doesn't seem to add that many ops.) However we could
maybe improve it by passing a reference to the right mmio handler from
handle_mmio to try_handle_mmio if we have it. Or maybe we could save a
reference to the mmio handler as part of v->arch.hvm.hvm_io.io_req.

In any case, I think it is fine.


> ---
>  xen/arch/arm/io.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
>  xen/arch/arm/ioreq.c | 38 +++++++-------------------------------
>  xen/arch/arm/traps.c |  4 +++-
>  3 files changed, 53 insertions(+), 36 deletions(-)
>
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 436f669..4db7c55 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -109,6 +109,43 @@ static const struct mmio_handler
> *find_mmio_handler(struct domain *d,
>  }
>
>  #ifdef CONFIG_IOREQ_SERVER
> +static enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu
> *v)
> +{
> +    const union hsr hsr = { .bits = regs->hsr };
> +    const struct hsr_dabt dabt = hsr.dabt;
> +    /* Code is similar to handle_read */
> +    uint8_t size = (1 << dabt.size) * 8;
> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
> +
> +    /* We are done with the IO */
> +    /* XXX: Is it the right place? */
> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> +
> +    /* XXX: Do we need to take care of write here ? */
> +    if ( dabt.write )
> +        return IO_HANDLED;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;
> +    }
> +
> +    set_user_reg(regs, dabt.reg, r);
> +
> +    return IO_HANDLED;
> +}
> +
>  static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>                                      struct vcpu *v, mmio_info_t *info)
>  {
> @@ -130,6 +167,10 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      {
>      case STATE_IOREQ_NONE:
>          break;
> +
> +    case STATE_IORESP_READY:
> +        return IO_HANDLED;
> +
>      default:
>          printk("d%u wrong state %u\n", v->domain->domain_id,
>                 vio->io_req.state);
> @@ -156,10 +197,6 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      else
>          vio->io_completion = HVMIO_mmio_completion;
>
> -    /* XXX: Decide what to do */
> -    if ( rc == IO_RETRY )
> -        rc = IO_HANDLED;
> -
>      return rc;
>  }
>  #endif
> @@ -185,6 +222,8 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>
>  #ifdef CONFIG_IOREQ_SERVER
>          rc = try_fwd_ioserv(regs, v, &info);
> +        if ( rc == IO_HANDLED )
> +            return handle_ioserv(regs, v);
>  #endif
>
>          return rc;
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index 8f60c41..9068b8d 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -33,46 +33,22 @@
>  #include <public/hvm/dm_op.h>
>  #include <public/hvm/ioreq.h>
>
> +#include <asm/traps.h>
> +
>  bool handle_mmio(void)
>  {
>      struct vcpu *v = current;
>      struct cpu_user_regs *regs = guest_cpu_user_regs();
>      const union hsr hsr = { .bits = regs->hsr };
> -    const struct hsr_dabt dabt = hsr.dabt;
> -    /* Code is similar to handle_read */
> -    uint8_t size = (1 << dabt.size) * 8;
> -    register_t r = v->arch.hvm.hvm_io.io_req.data;
> -
> -    /* We should only be here on Guest Data Abort */
> -    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
>
> -    /* We are done with the IO */
> -    /* XXX: Is it the right place? */
> -    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> -
> -    /* XXX: Do we need to take care of write here ? */
> -    if ( dabt.write )
> -        return true;
> -
> -    /*
> -     * Sign extend if required.
> -     * Note that we expect the read handler to have zeroed the bits
> -     * outside the requested access size.
> -     */
> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>      {
> -        /*
> -         * We are relying on register_t using the same as
> -         * an unsigned long in order to keep the 32-bit assembly
> -         * code smaller.
> -         */
> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> -        r |= (~0UL) << size;
> +        advance_pc(regs, hsr);
> +        return true;
>      }
>
> -    set_user_reg(regs, dabt.reg, r);
> -
> -    return true;
> +    return false;
>  }
>
>  /* Ask ioemu mapcache to invalidate mappings. */
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index ea472d1..974c744 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
> cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
>              default:
> -                /* XXX: Handle IO_RETRY */
>                  ASSERT_UNREACHABLE();
>              }
>          }
> --
> 2.7.4
>
>
>
> --
> Regards,
>
> Oleksandr Tyshchenko
>
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 13/08/2020 00:08, Stefano Stabellini wrote:
>>> It is very similar to your second patch with a small change on calling
>>> try_handle_mmio from handle_mmio and setting the register there. Do you
>>> think that would work?
>> If I understood correctly what you had suggested and properly implemented then
>> it works, at least I didn't face any issues during testing.
>> I think this variant adds some extra operations comparing to the previous one
>> (for example an attempt to find a mmio handler at try_handle_mmio). But, if
>> you think new variant is cleaner and better represents how the state machine
>> should look like, I would be absolutely OK to take this variant for non-RFC
>> series. Please note, there was a request to move try_fwd_ioserv() to
>> arm/ioreq.c (I am going to move new handle_ioserv() as well).
>
> Yes, I think this version better represents the state machine, thanks
> for looking into it. I think it is good.
>
> In terms of number of operations, it doesn't look very concerning (in
> the sense that it doesn't seem to add that many ops.) However we could
> maybe improve it by passing a reference to the right mmio handler from
> handle_mmio to try_handle_mmio if we have it. Or maybe we could save a
> reference to the mmio handler as part of v->arch.hvm.hvm_io.io_req.

There is no MMIO handler for the IOREQ handling. So I am not entirely
sure what you are suggesting.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi Oleksandr,

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
> +{

[...]

> + /* Canonicalize read/write pointers to prevent their overflow. */
> + while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
> + qw++ < IOREQ_BUFFER_SLOT_NUM &&
> + pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
> + {
> + union bufioreq_pointers old = pg->ptrs, new;
> + unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
> +
> + new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> + new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> + cmpxchg(&pg->ptrs.full, old.full, new.full);

While working on the implementation of cmpxchg(), I realized the
operation will happen on memory shared with a the emulator.

This will need to be switched to guest_cmpxchg64() to prevent a domain
to DoS Xen on Arm.

I looked at the rest of the IOREQ and didn't notice any other example.

Cheers,

--
Julien Grall
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 15.08.20 20:30, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s,
>> ioreq_t *p)
>> +{
>
> [...]
>
>> +    /* Canonicalize read/write pointers to prevent their overflow. */
>> +    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
>> +            qw++ < IOREQ_BUFFER_SLOT_NUM &&
>> +            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
>> +    {
>> +        union bufioreq_pointers old = pg->ptrs, new;
>> +        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
>> +
>> +        new.read_pointer = old.read_pointer - n *
>> IOREQ_BUFFER_SLOT_NUM;
>> +        new.write_pointer = old.write_pointer - n *
>> IOREQ_BUFFER_SLOT_NUM;
>> +        cmpxchg(&pg->ptrs.full, old.full, new.full);
>
> While working on the implementation of cmpxchg(), I realized the
> operation will happen on memory shared with a the emulator.
>
> This will need to be switched to guest_cmpxchg64() to prevent a domain
> to DoS Xen on Arm.

Got it. I will create a separate patch for that purpose.

--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On 12.08.20 11:19, Julien Grall wrote:
> Hi,

Hi Julien, Stefano


>
> On 11/08/2020 23:48, Stefano Stabellini wrote:
>>> I have the impression that we disagree in what the Device Emulator
>>> is meant to
>>> do. IHMO, the goal of the device emulator is to emulate a device in an
>>> arch-agnostic way.
>>
>> That would be great in theory but I am not sure it is achievable: if we
>> use an existing emulator like QEMU, even a single device has to fit
>> into QEMU's view of the world, which makes assumptions about host
>> bridges and apertures. It is impossible today to build QEMU in an
>> arch-agnostic way, it has to be tied to an architecture.
>
> AFAICT, the only reason QEMU cannot build be in an arch-agnostic way
> is because of TCG. If this wasn't built then you could easily write a
> machine that doesn't depend on the instruction set.
>
> The proof is, today, we are using QEMU x86 to serve Arm64 guest.
> Although this is only for PV drivers.
>
>>
>> I realize we are not building this interface for QEMU specifically, but
>> even if we try to make the interface arch-agnostic, in reality the
>> emulators won't be arch-agnostic.
>
> This depends on your goal. If your goal is to write a standalone
> emulator for a single device, then it is entirely possible to make it
> arch-agnostic.
>
> Per above, this would even be possible if you were emulating a set of
> devices.
>
> What I want to avoid is requiring all the emulators to contain
> arch-specific code just because it is easier to get QEMU working on
> Xen on Arm.
>
>> If we send a port-mapped I/O request
>> to qemu-system-aarch64 who knows what is going to happen: it is a code
>> path that it is not explicitly tested.
>
> Maybe, maybe not. To me this is mostly software issues that can easily
> be mitigated if we do proper testing...

Could we please find a common ground on whether the PIO handling needs
to be implemented on Arm or not? At least for the current patch series.


Below my thoughts:
From one side I agree that emulator shouldn't contain any arch-specific
code, yes it is hypervisor specific but it should be arch agnostic if
possible. So PIO case should be handled.
From other side I tend to think that it might be possible to skip PIO
handling for the current patch series (leave it x86 specific for now as
we do with handle_realmode_completion()).
I think nothing will prevent us from adding PIO handling later on if
there is a real need (use-case) for that. Please correct me if I am wrong.

I would be absolutely OK with any options.

What do you think?


--
Regards,

Oleksandr Tyshchenko
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
On Thu, 20 Aug 2020, Oleksandr wrote:
> > On 11/08/2020 23:48, Stefano Stabellini wrote:
> > > > I have the impression that we disagree in what the Device Emulator is
> > > > meant to
> > > > do. IHMO, the goal of the device emulator is to emulate a device in an
> > > > arch-agnostic way.
> > >
> > > That would be great in theory but I am not sure it is achievable: if we
> > > use an existing emulator like QEMU, even a single device has to fit
> > > into QEMU's view of the world, which makes assumptions about host
> > > bridges and apertures. It is impossible today to build QEMU in an
> > > arch-agnostic way, it has to be tied to an architecture.
> >
> > AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
> > because of TCG. If this wasn't built then you could easily write a machine
> > that doesn't depend on the instruction set.
> >
> > The proof is, today, we are using QEMU x86 to serve Arm64 guest. Although
> > this is only for PV drivers.
> >
> > >
> > > I realize we are not building this interface for QEMU specifically, but
> > > even if we try to make the interface arch-agnostic, in reality the
> > > emulators won't be arch-agnostic.
> >
> > This depends on your goal. If your goal is to write a standalone emulator
> > for a single device, then it is entirely possible to make it arch-agnostic.
> >
> > Per above, this would even be possible if you were emulating a set of
> > devices.
> >
> > What I want to avoid is requiring all the emulators to contain arch-specific
> > code just because it is easier to get QEMU working on Xen on Arm.
> >
> > > If we send a port-mapped I/O request
> > > to qemu-system-aarch64 who knows what is going to happen: it is a code
> > > path that it is not explicitly tested.
> >
> > Maybe, maybe not. To me this is mostly software issues that can easily be
> > mitigated if we do proper testing...
>
> Could we please find a common ground on whether the PIO handling needs to be
> implemented on Arm or not? At least for the current patch series.

Can you do a test on QEMU to verify which address space the PIO BARs are
using on ARM? I don't know if there is an easy way to test it but it
would be very useful for this conversation.


> Below my thoughts:
> From one side I agree that emulator shouldn't contain any arch-specific code,
> yes it is hypervisor specific but it should be arch agnostic if possible. So
> PIO case should be handled.
> From other side I tend to think that it might be possible to skip PIO handling
> for the current patch series (leave it x86 specific for now as we do with
> handle_realmode_completion()).
> I think nothing will prevent us from adding PIO handling later on if there is
> a real need (use-case) for that. Please correct me if I am wrong.
>
> I would be absolutely OK with any options.
>
> What do you think?

I agree that PIO handling is not the most critical thing right now given
that we have quite a few other important TODOs in the series. I'd be
fine reviewing another version of the series with this issue still
pending.


Of course, PIO needs to be handled. The key to me is that QEMU (or other
emulator) should *not* emulate in/out instructions on ARM. PIO ioreq
requests should not be satisfied by using address_space_io directly (the
PIO address space that requires special instructions to access it). In
QEMU the PIO reads/writes should be done via address_space_memory (the
normal memory mapped address space).

So either way of the following approaches should be OK:

1) Xen sends out PIO addresses as memory mapped addresses, QEMU simply
reads/writes on them
2) Xen sends out PIO addresses as address_space_io, QEMU finds the
mapping to address_space_memory, then reads/writes on
address_space_memory

From an interface and implementation perspective, 1) means that
IOREQ_TYPE_PIO is unused on ARM, while 2) means that IOREQ_TYPE_PIO is
still used as part of the ioreq interface, even if QEMU doesn't directly
operate on those addresses.

My preference is 1) because it leads to a simpler solution.
Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common [ In reply to ]
Hi Stefano,

On 21/08/2020 01:53, Stefano Stabellini wrote:
> On Thu, 20 Aug 2020, Oleksandr wrote:
>>> On 11/08/2020 23:48, Stefano Stabellini wrote:
>>>>> I have the impression that we disagree in what the Device Emulator is
>>>>> meant to
>>>>> do. IHMO, the goal of the device emulator is to emulate a device in an
>>>>> arch-agnostic way.
>>>>
>>>> That would be great in theory but I am not sure it is achievable: if we
>>>> use an existing emulator like QEMU, even a single device has to fit
>>>> into QEMU's view of the world, which makes assumptions about host
>>>> bridges and apertures. It is impossible today to build QEMU in an
>>>> arch-agnostic way, it has to be tied to an architecture.
>>>
>>> AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
>>> because of TCG. If this wasn't built then you could easily write a machine
>>> that doesn't depend on the instruction set.
>>>
>>> The proof is, today, we are using QEMU x86 to serve Arm64 guest. Although
>>> this is only for PV drivers.
>>>
>>>>
>>>> I realize we are not building this interface for QEMU specifically, but
>>>> even if we try to make the interface arch-agnostic, in reality the
>>>> emulators won't be arch-agnostic.
>>>
>>> This depends on your goal. If your goal is to write a standalone emulator
>>> for a single device, then it is entirely possible to make it arch-agnostic.
>>>
>>> Per above, this would even be possible if you were emulating a set of
>>> devices.
>>>
>>> What I want to avoid is requiring all the emulators to contain arch-specific
>>> code just because it is easier to get QEMU working on Xen on Arm.
>>>
>>>> If we send a port-mapped I/O request
>>>> to qemu-system-aarch64 who knows what is going to happen: it is a code
>>>> path that it is not explicitly tested.
>>>
>>> Maybe, maybe not. To me this is mostly software issues that can easily be
>>> mitigated if we do proper testing...
>>
>> Could we please find a common ground on whether the PIO handling needs to be
>> implemented on Arm or not? At least for the current patch series.
>
> Can you do a test on QEMU to verify which address space the PIO BARs are
> using on ARM? I don't know if there is an easy way to test it but it
> would be very useful for this conversation.

This is basically configured by the machine itself. See create_pcie() in
hw/arm/virt.c.

So the hostcontroller is basically unaware that a MMIO region will be
used instead of a PIO region.

>
>
>> Below my thoughts:
>> From one side I agree that emulator shouldn't contain any arch-specific code,
>> yes it is hypervisor specific but it should be arch agnostic if possible. So
>> PIO case should be handled.
>> From other side I tend to think that it might be possible to skip PIO handling
>> for the current patch series (leave it x86 specific for now as we do with
>> handle_realmode_completion()).
>> I think nothing will prevent us from adding PIO handling later on if there is
>> a real need (use-case) for that. Please correct me if I am wrong.
>>
>> I would be absolutely OK with any options.
>>
>> What do you think?
>
> I agree that PIO handling is not the most critical thing right now given
> that we have quite a few other important TODOs in the series. I'd be
> fine reviewing another version of the series with this issue still
> pending.

For Arm64, the main user will be PCI. So this can be delayed until we
add support for vPCI.

>
>
> Of course, PIO needs to be handled. The key to me is that QEMU (or other
> emulator) should *not* emulate in/out instructions on ARM.

I don't think anyone here suggested that we would emulate in/out
instructions on Arm. There is actually none of them.

> PIO ioreq
> requests should not be satisfied by using address_space_io directly (the
> PIO address space that requires special instructions to access it). In
> QEMU the PIO reads/writes should be done via address_space_memory (the
> normal memory mapped address space).
>
> So either way of the following approaches should be OK:
>
> 1) Xen sends out PIO addresses as memory mapped addresses, QEMU simply
> reads/writes on them
> 2) Xen sends out PIO addresses as address_space_io, QEMU finds the
> mapping to address_space_memory, then reads/writes on
> address_space_memory
>
> From an interface and implementation perspective, 1) means that
> IOREQ_TYPE_PIO is unused on ARM, while 2) means that IOREQ_TYPE_PIO is
> still used as part of the ioreq interface, even if QEMU doesn't directly
> operate on those addresses.
>
> My preference is 1) because it leads to a simpler solution.

"simpler" is actually very subjective :). So maybe you can clarify some
of my concerns with this approach.

One part that have been barely discussed is the configuration part.

The discussion below is based on having the virtual PCI hostbridges
implemented in Xen.

In the case of MMIO BAR, then the emulator doesn't need to know where is
aperture in advance. This is because the BAR will contain the an
absolute MMIO address so it can configure the trap correctly.

In the case of PIO BAR, from my understanding, the BAR will contain a
relative offset from the base of PIO aperture. So the emulator needs to
know the base address of the PIO aperture. How do you plan to pass the
information to emulator? How about the case where there are multiple
hostbridges?

Furthermore, most of the discussions has been focused towards device
model that will provide emulation for all your devices (e.g. QEMU).
However, I think this is going to be less common of device model that
will emulate a single device (e.g. DEMU). This fits more in the
disaggregation model.

An emulator for a single PCI device is basically the same as a real PCI
device. Do you agree with that?

The HW engineer designing the PCI device doesn't need to know about the
architecture. It just needs to understand the interface with the
hostbridge. The hostbridge will then take care of the differences with
the architecture.

A developper should really be able to do the same with the emulator.
I.e. write it for x86 and then just recompile for Arm. With your
approach, he/she would have to understand how the architecture works.

I still don't quite understand why we are trying to differ here. Why
would our hostbridge implementation would not abstract the same way as a
real one does? Can you clarify it?

Maybe the problem is just the naming issue?

Cheers,

--
Julien Grall

1 2  View All