Mailing List Archive

[RFC] design: design doc for 1:1 direct-map
This is one draft design about the infrastructure for now, not ready
for upstream yet (hence the RFC tag), thought it'd be useful to firstly
start a discussion with the community.

Create one design doc for 1:1 direct-map.
It aims to describe why and how we allocate 1:1 direct-map(guest physical
== physical) domains.

This document is partly based on Stefano Stabellini's patch serie v1:
[direct-map DomUs](
https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00707.html).

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
---
For the part regarding allocating 1:1 direct-map domains with user-defined
memory regions, it will be included in next design of static memory
allocation.
---
docs/designs/1_1_direct-map.md | 87 ++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
create mode 100644 docs/designs/1_1_direct-map.md

diff --git a/docs/designs/1_1_direct-map.md b/docs/designs/1_1_direct-map.md
new file mode 100644
index 0000000000..ce3e2c77fd
--- /dev/null
+++ b/docs/designs/1_1_direct-map.md
@@ -0,0 +1,87 @@
+# Preface
+
+The document is an early draft for direct-map memory map
+(`guest physical == physical`) of domUs. And right now, it constrains to ARM
+architecture.
+
+It aims to describe why and how the guest would be created as direct-map domain.
+
+This document is partly based on Stefano Stabellini's patch serie v1:
+[direct-map DomUs](
+https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00707.html).
+
+This is a first draft and some questions are still unanswered. When this is the
+case, the text shall contain XXX.
+
+# Introduction
+
+## Background
+
+Cases where domU needs direct-map memory map:
+
+ * IOMMU not present in the system.
+ * IOMMU disabled, since it doesn't cover a specific device.
+ * IOMMU disabled, since it doesn't have enough bandwidth.
+ * IOMMU disabled, since it adds too much latency.
+
+*WARNING:
+Users should be careful that it is not always secure to assign a device without
+IOMMU/SMMU protection.
+Users must be aware of this risk, that guests having access to hardware with
+DMA capacity must be trusted, or it could use the DMA engine to access any
+other memory area.
+Guests could use additional security hardware component like NOC, System MPU
+to protect the memory.
+
+## Design
+
+The implementation may cover following aspects:
+
+### Native Address and IRQ numbers for GIC and UART(vPL011)
+
+Today, fixed addresses and IRQ numbers are used to map GIC and UART(vPL011)
+in DomUs. And it may cause potential clash on direct-map domains.
+So, Using native addresses and irq numbers for GIC, UART(vPL011), in
+direct-map domains is necessary.
+e.g.
+For the virtual interrupt of vPL011: instead of always using `GUEST_VPL011_SPI`,
+try to reuse the physical SPI number if possible.
+
+### Device tree option: `direct_map`
+
+Introduce a new device tree option `direct_map` for direct-map domains.
+Then, when users try to allocate one direct-map domain(except DOM0),
+`direct-map` property needs to be added under the appropriate `/chosen/domUx`.
+
+
+ chosen {
+ ...
+ domU1 {
+ compatible = "xen, domain";
+ #address-cells = <0x2>;
+ #size-cells = <0x1>;
+ direct-map;
+ ...
+ };
+ ...
+ };
+
+If users are using imagebuilder, they can add to boot.source something like the
+following:
+
+ fdt set /chosen/domU1 direct-map
+
+Users could also use `xl` to create direct-map domains, just use the following
+config option: `direct-map=true`
+
+### direct-map guest memory allocation
+
+Func `allocate_memory_direct_map` is based on `allocate_memory_11`, and shall
+be refined to allocate memory for all direct-map domains, including DOM0.
+Roughly speaking, firstly, it tries to allocate arbitrary memory chunk of
+requested size from domain sub-allocator(`alloc_domheap_pages`). If fail,
+split the chunk into halves, and re-try, until it succeed or bail out with the
+smallest chunk size.
+Then, `insert_11_bank` shall insert above allocated pages into a memory bank,
+which are ordered by address, and also set up guest P2M mapping(
+`guest_physmap_add_page`) to ensure `gfn == mfn`.
--
2.25.1
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
Hi Penny,

I am adding Paul and Zheng in the thread as there are similar interest
for the x86 side.

On 08/12/2020 05:21, Penny Zheng wrote:
> This is one draft design about the infrastructure for now, not ready
> for upstream yet (hence the RFC tag), thought it'd be useful to firstly
> start a discussion with the community.
>
> Create one design doc for 1:1 direct-map.
> It aims to describe why and how we allocate 1:1 direct-map(guest physical
> == physical) domains.
>
> This document is partly based on Stefano Stabellini's patch serie v1:
> [direct-map DomUs](
> https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00707.html).

May I ask why a different approach?

>
> Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> ---
> For the part regarding allocating 1:1 direct-map domains with user-defined
> memory regions, it will be included in next design of static memory
> allocation.

I don't think you can do without user-defined memory regions (see more
below).

> ---
> docs/designs/1_1_direct-map.md | 87 ++++++++++++++++++++++++++++++++++
> 1 file changed, 87 insertions(+)
> create mode 100644 docs/designs/1_1_direct-map.md
>
> diff --git a/docs/designs/1_1_direct-map.md b/docs/designs/1_1_direct-map.md
> new file mode 100644
> index 0000000000..ce3e2c77fd
> --- /dev/null
> +++ b/docs/designs/1_1_direct-map.md
> @@ -0,0 +1,87 @@
> +# Preface
> +
> +The document is an early draft for direct-map memory map
> +(`guest physical == physical`) of domUs. And right now, it constrains to ARM

s/constrains/limited/

Aside the interface to the user, you should be able to re-use the same
code on x86. Note that because the memory layout on x86 is fixed (always
starting at 0), you would only be able to have only one direct-mapped
domain.

> +architecture.
> +
> +It aims to describe why and how the guest would be created as direct-map domain.
> +
> +This document is partly based on Stefano Stabellini's patch serie v1:
> +[direct-map DomUs](
> +https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00707.html).
> +
> +This is a first draft and some questions are still unanswered. When this is the
> +case, the text shall contain XXX.
> +
> +# Introduction
> +
> +## Background
> +
> +Cases where domU needs direct-map memory map:
> +
> + * IOMMU not present in the system.
> + * IOMMU disabled, since it doesn't cover a specific device.

If the device is not covered by the IOMMU, then why would you want to
disable the IOMMUs for the rest of the system?

> + * IOMMU disabled, since it doesn't have enough bandwidth.

I am not sure to understand this one.

> + * IOMMU disabled, since it adds too much latency.

The list above sounds like direct-map memory would be necessary even
without device-passthrough. Can you clarify it?

> +
> +*WARNING:
> +Users should be careful that it is not always secure to assign a device without

s/careful/aware/ I think. Also, it is never secure to assign a device
without IOMMU/SMMU unless you have a replacement.

I would suggest to reword it something like:

"When the device is not protected by the IOMMU, the administrator should
make sure that:
- The device is assigned to a trusted guest
- You have an additional security mechanism on the platform (e.g
MPU) to protect the memory."

> +IOMMU/SMMU protection.
> +Users must be aware of this risk, that guests having access to hardware with
> +DMA capacity must be trusted, or it could use the DMA engine to access any
> +other memory area.
> +Guests could use additional security hardware component like NOC, System MPU
> +to protect the memory.

What's the NOC?

> +
> +## Design
> +
> +The implementation may cover following aspects:
> +
> +### Native Address and IRQ numbers for GIC and UART(vPL011)
> +
> +Today, fixed addresses and IRQ numbers are used to map GIC and UART(vPL011)
> +in DomUs. And it may cause potential clash on direct-map domains.
> +So, Using native addresses and irq numbers for GIC, UART(vPL011), in
> +direct-map domains is necessary.
> +e.g.

To me e.g. means example. But below this is not an example, this is a
requirement in order to use the vpl011 on system without pl011 UART.

> +For the virtual interrupt of vPL011: instead of always using `GUEST_VPL011_SPI`,
> +try to reuse the physical SPI number if possible.

How would you find the following region for guest using PV drivers;
- Event channel interrupt
- Grant table area

> +
> +### Device tree option: `direct_map`
> +
> +Introduce a new device tree option `direct_map` for direct-map domains.
> +Then, when users try to allocate one direct-map domain(except DOM0),
> +`direct-map` property needs to be added under the appropriate `/chosen/domUx`.
> +
> +
> + chosen {
> + ...
> + domU1 {
> + compatible = "xen, domain";
> + #address-cells = <0x2>;
> + #size-cells = <0x1>;
> + direct-map;
> + ...
> + };
> + ...
> + };
> +
> +If users are using imagebuilder, they can add to boot.source something like the

This documentations ounds like more something for imagebuilder rather
than Xen itself.

> +following:
> +
> + fdt set /chosen/domU1 direct-map
> +
> +Users could also use `xl` to create direct-map domains, just use the following
> +config option: `direct-map=true`
> +
> +### direct-map guest memory allocation
> +
> +Func `allocate_memory_direct_map` is based on `allocate_memory_11`, and shall
> +be refined to allocate memory for all direct-map domains, including DOM0.
> +Roughly speaking, firstly, it tries to allocate arbitrary memory chunk of
> +requested size from domain sub-allocator(`alloc_domheap_pages`). If fail,
> +split the chunk into halves, and re-try, until it succeed or bail out with the
> +smallest chunk size.

If you have a mix of domain with direct-mapped and normal domain, you
may end up to have the memory so small that your direct-mapped domain
will have many small banks. This is going to be a major problem if you
are creating the domain at runtime (you suggest xl can be used).

In addition, some users may want to be able to control the location of
the memory as this reduced the amount of work in the guest (e.g you
don't have to dynamically discover the memory).

I think it would be best to always require the admin to select the RAM
bank used by a direct mapped domain. Alternatively, we could have a pool
of memory that can only be used for direct mapped domain. This should
limit the fragmentation of the memory.

> +Then, `insert_11_bank` shall insert above allocated pages into a memory bank,
> +which are ordered by address, and also set up guest P2M mapping(
> +`guest_physmap_add_page`) to ensure `gfn == mfn`.

Cheers,

--
Julien Grall
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On 08.12.2020 10:07, Julien Grall wrote:
> On 08/12/2020 05:21, Penny Zheng wrote:
>> --- /dev/null
>> +++ b/docs/designs/1_1_direct-map.md
>> @@ -0,0 +1,87 @@
>> +# Preface
>> +
>> +The document is an early draft for direct-map memory map
>> +(`guest physical == physical`) of domUs. And right now, it constrains to ARM
>
> s/constrains/limited/
>
> Aside the interface to the user, you should be able to re-use the same
> code on x86. Note that because the memory layout on x86 is fixed (always
> starting at 0), you would only be able to have only one direct-mapped
> domain.

Even one seems challenging, if it's truly meant to have all of the
domain's memory direct-mapped: The use of space in the first Mb is
different between host and guest.

Jan
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On 2020-12-08 10:12, Jan Beulich wrote:
> On 08.12.2020 10:07, Julien Grall wrote:
> > On 08/12/2020 05:21, Penny Zheng wrote:
> >> --- /dev/null
> >> +++ b/docs/designs/1_1_direct-map.md
> >> @@ -0,0 +1,87 @@
> >> +# Preface
> >> +
> >> +The document is an early draft for direct-map memory map
> >> +(`guest physical == physical`) of domUs. And right now, it constrains to ARM
> >
> > s/constrains/limited/
> >
> > Aside the interface to the user, you should be able to re-use the same
> > code on x86. Note that because the memory layout on x86 is fixed (always
> > starting at 0), you would only be able to have only one direct-mapped
> > domain.
>
> Even one seems challenging, if it's truly meant to have all of the
> domain's memory direct-mapped: The use of space in the first Mb is
> different between host and guest.

Speaking about the case of x86, we can still direct-map the ram regions
to the single direct-mapped DomU because neither Xen nor dom0 require
those low mem.

We don't worry about (i.e. don't direct-map) non-ram regions (or any
range that is not reported as usable ram from DomU's PoV (dictated by
e820 table), so those can be MMIO or arbitrary mapping with EPT.

Fam
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On 2020-12-08 13:21, Penny Zheng wrote:
> +The document is an early draft for direct-map memory map
> +(`guest physical == physical`) of domUs. And right now, it constrains to ARM
> +architecture.

I'm also working on direct-map DomU on x86, so let's coordinate and
cover both arches.

> +
> +It aims to describe why and how the guest would be created as direct-map domain.
> +
> +This document is partly based on Stefano Stabellini's patch serie v1:
> +[direct-map DomUs](
> +https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00707.html).
> +
> +This is a first draft and some questions are still unanswered. When this is the
> +case, the text shall contain XXX.
> +
> +# Introduction
> +
> +## Background
> +
> +Cases where domU needs direct-map memory map:
> +
> + * IOMMU not present in the system.
> + * IOMMU disabled, since it doesn't cover a specific device.
> + * IOMMU disabled, since it doesn't have enough bandwidth.
> + * IOMMU disabled, since it adds too much latency.
> +
> +*WARNING:
> +Users should be careful that it is not always secure to assign a device without
> +IOMMU/SMMU protection.
> +Users must be aware of this risk, that guests having access to hardware with
> +DMA capacity must be trusted, or it could use the DMA engine to access any
> +other memory area.
> +Guests could use additional security hardware component like NOC, System MPU
> +to protect the memory.
> +
> +## Design
> +
> +The implementation may cover following aspects:
> +
> +### Native Address and IRQ numbers for GIC and UART(vPL011)
> +
> +Today, fixed addresses and IRQ numbers are used to map GIC and UART(vPL011)
> +in DomUs. And it may cause potential clash on direct-map domains.
> +So, Using native addresses and irq numbers for GIC, UART(vPL011), in
> +direct-map domains is necessary.
> +e.g.
> +For the virtual interrupt of vPL011: instead of always using `GUEST_VPL011_SPI`,
> +try to reuse the physical SPI number if possible.
> +
> +### Device tree option: `direct_map`
> +
> +Introduce a new device tree option `direct_map` for direct-map domains.
> +Then, when users try to allocate one direct-map domain(except DOM0),
> +`direct-map` property needs to be added under the appropriate `/chosen/domUx`.
> +
> +
> + chosen {
> + ...
> + domU1 {
> + compatible = "xen, domain";
> + #address-cells = <0x2>;
> + #size-cells = <0x1>;
> + direct-map;
> + ...
> + };
> + ...
> + };
> +
> +If users are using imagebuilder, they can add to boot.source something like the
> +following:
> +
> + fdt set /chosen/domU1 direct-map
> +
> +Users could also use `xl` to create direct-map domains, just use the following
> +config option: `direct-map=true`
> +
> +### direct-map guest memory allocation
> +
> +Func `allocate_memory_direct_map` is based on `allocate_memory_11`, and shall
> +be refined to allocate memory for all direct-map domains, including DOM0.
> +Roughly speaking, firstly, it tries to allocate arbitrary memory chunk of
> +requested size from domain sub-allocator(`alloc_domheap_pages`). If fail,
> +split the chunk into halves, and re-try, until it succeed or bail out with the
> +smallest chunk size.
> +Then, `insert_11_bank` shall insert above allocated pages into a memory bank,
> +which are ordered by address, and also set up guest P2M mapping(
> +`guest_physmap_add_page`) to ensure `gfn == mfn`.

A high level comment from x86 PoV: in the mfn addr space, we want to
explicitly reserve range for direct-map. This ensures Xen or Dom0 will
leave the pages for DomU at boot time, since as Julien mentioned, x86
machines have fixed mem layout starting from 0, so the corresponding
pages mustn't go into xenheap/domheap in the first place.

IOW x86 depends on some mechanism very similar to what badpage= does.
But I wouldn't overload/abuse the parameter for direct-map. Maybe
introduce a new option, like "identpage=".

Fam
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On 08.12.2020 11:22, Fam Zheng wrote:
> On 2020-12-08 10:12, Jan Beulich wrote:
>> On 08.12.2020 10:07, Julien Grall wrote:
>>> On 08/12/2020 05:21, Penny Zheng wrote:
>>>> --- /dev/null
>>>> +++ b/docs/designs/1_1_direct-map.md
>>>> @@ -0,0 +1,87 @@
>>>> +# Preface
>>>> +
>>>> +The document is an early draft for direct-map memory map
>>>> +(`guest physical == physical`) of domUs. And right now, it constrains to ARM
>>>
>>> s/constrains/limited/
>>>
>>> Aside the interface to the user, you should be able to re-use the same
>>> code on x86. Note that because the memory layout on x86 is fixed (always
>>> starting at 0), you would only be able to have only one direct-mapped
>>> domain.
>>
>> Even one seems challenging, if it's truly meant to have all of the
>> domain's memory direct-mapped: The use of space in the first Mb is
>> different between host and guest.
>
> Speaking about the case of x86, we can still direct-map the ram regions
> to the single direct-mapped DomU because neither Xen nor dom0 require
> those low mem.
>
> We don't worry about (i.e. don't direct-map) non-ram regions (or any
> range that is not reported as usable ram from DomU's PoV (dictated by
> e820 table), so those can be MMIO or arbitrary mapping with EPT.

For one, the very first page is considered special in x86 Xen. No
guest should gain access to MFN 0, unless you first audit all
code and address all the issues you find. And then there's also
Xen's low-memory trampoline living there. Plus besides the BDA
(at real-mode address 0040:0000) I suppose the EBDA also shouldn't
be exposed to a guest, nor anything else that the host finds
reserved in E820. IOW it would be the host E820 to dictate some
of the guest E820 in such a case.

Jan
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On Tue, 2020-12-08 at 11:53 +0100, Jan Beulich wrote:
> On 08.12.2020 11:22, Fam Zheng wrote:
> > On 2020-12-08 10:12, Jan Beulich wrote:
> > > On 08.12.2020 10:07, Julien Grall wrote:
> > > > On 08/12/2020 05:21, Penny Zheng wrote:
> > > > > --- /dev/null
> > > > > +++ b/docs/designs/1_1_direct-map.md
> > > > > @@ -0,0 +1,87 @@
> > > > > +# Preface
> > > > > +
> > > > > +The document is an early draft for direct-map memory map
> > > > > +(`guest physical == physical`) of domUs. And right now, it
> > > > > constrains to ARM
> > > >
> > > > s/constrains/limited/
> > > >
> > > > Aside the interface to the user, you should be able to re-use
> > > > the same
> > > > code on x86. Note that because the memory layout on x86 is
> > > > fixed (always
> > > > starting at 0), you would only be able to have only one direct-
> > > > mapped
> > > > domain.
> > >
> > > Even one seems challenging, if it's truly meant to have all of
> > > the
> > > domain's memory direct-mapped: The use of space in the first Mb
> > > is
> > > different between host and guest.
> >
> > Speaking about the case of x86, we can still direct-map the ram
> > regions
> > to the single direct-mapped DomU because neither Xen nor dom0
> > require
> > those low mem.
> >
> > We don't worry about (i.e. don't direct-map) non-ram regions (or
> > any
> > range that is not reported as usable ram from DomU's PoV (dictated
> > by
> > e820 table), so those can be MMIO or arbitrary mapping with EPT.
>
> For one, the very first page is considered special in x86 Xen. No
> guest should gain access to MFN 0, unless you first audit all
> code and address all the issues you find. And then there's also
> Xen's low-memory trampoline living there. Plus besides the BDA
> (at real-mode address 0040:0000) I suppose the EBDA also shouldn't
> be exposed to a guest, nor anything else that the host finds
> reserved in E820. IOW it would be the host E820 to dictate some
> of the guest E820 in such a case.
>

You're right about the trampoline area, it has to be specially taken
care of. Not a problem if we could disable cpu hotplug. I don't think
the guest will ever try to DMA from/to MFN 0, BDA or EBDA, so even not
direct mapping those should not make any functional difference.

In general, I agree the guest E820 as well as all direct mapping areas
mustn't break out of host E820 limitation, otherwise it will not work.

Fam
RE: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
Hi Julien

Thanks for the nice and detailed comments. (*^?^*)
Here are the replies:

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, December 8, 2020 5:07 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Kaly Xin
> <Kaly.Xin@arm.com>; Wei Chen <Wei.Chen@arm.com>; nd <nd@arm.com>;
> Paul Durrant <paul@xen.org>; famzheng@amazon.com
> Subject: Re: [RFC] design: design doc for 1:1 direct-map
>
> Hi Penny,
>
> I am adding Paul and Zheng in the thread as there are similar interest for the
> x86 side.
>
> On 08/12/2020 05:21, Penny Zheng wrote:
> > This is one draft design about the infrastructure for now, not ready
> > for upstream yet (hence the RFC tag), thought it'd be useful to
> > firstly start a discussion with the community.
> >
> > Create one design doc for 1:1 direct-map.
> > It aims to describe why and how we allocate 1:1 direct-map(guest
> > physical == physical) domains.
> >
> > This document is partly based on Stefano Stabellini's patch serie v1:
> > [direct-map DomUs](
> > https://lists.xenproject.org/archives/html/xen-devel/2020-
> 04/msg00707.html).
>
> May I ask why a different approach?

In Stefano original design, he'd like to allocate 1:1 direct-map with user-defined
memory regions and he prefers allocating it from sub-domain allocator.

And it brings quite a discussion there and in the last, everyone kinds of all
agrees that it is not workable. Since if requested memory ever goes into any
allocators, no matter boot, or sub-domain allocator, we could not ensure that
before actually allocating it for one 1:1 direct-map domain, it will not be into
any other use.

So I'd prefer to split original design into two parts: one is here, that user only
wants to allocate one 1:1 direct-map domain, not caring about where the ram
will be located into. Think about dom0. Then, we could stick to allocate memory
still from sub-domain allocator.

Another part which I said in below commits, "For the part regarding allocating
1:1 direct- map domains with user-defined memory regions, it will be included
in next design of static memory allocation".

But of course, If a combination can make community to better understand our
ideas, We're willing to combine them in next version. ????

Briefly speaking, if we allocating 1:1 direct-map domains with user-defined
memory regions, we need to reserve those memory regions in the beginning.

> >
> > Signed-off-by: Penny Zheng <penny.zheng@arm.com>
> > ---
> > For the part regarding allocating 1:1 direct-map domains with
> > user-defined memory regions, it will be included in next design of
> > static memory allocation.
>
> I don't think you can do without user-defined memory regions (see more
> below).
>
> > ---
> > docs/designs/1_1_direct-map.md | 87
> ++++++++++++++++++++++++++++++++++
> > 1 file changed, 87 insertions(+)
> > create mode 100644 docs/designs/1_1_direct-map.md
> >
> > diff --git a/docs/designs/1_1_direct-map.md
> > b/docs/designs/1_1_direct-map.md new file mode 100644 index
> > 0000000000..ce3e2c77fd
> > --- /dev/null
> > +++ b/docs/designs/1_1_direct-map.md
> > @@ -0,0 +1,87 @@
> > +# Preface
> > +
> > +The document is an early draft for direct-map memory map (`guest
> > +physical == physical`) of domUs. And right now, it constrains to ARM
>
> s/constrains/limited/
>
> Aside the interface to the user, you should be able to re-use the same code
> on x86. Note that because the memory layout on x86 is fixed (always starting
> at 0), you would only be able to have only one direct-mapped domain.
>

Sorry, I have little knowledge on x86. And it may need more investigation.

> > +architecture.
> > +
> > +It aims to describe why and how the guest would be created as direct-map
> domain.
> > +
> > +This document is partly based on Stefano Stabellini's patch serie v1:
> > +[direct-map DomUs](
> > +https://lists.xenproject.org/archives/html/xen-devel/2020-
> 04/msg00707.html).
> > +
> > +This is a first draft and some questions are still unanswered. When
> > +this is the case, the text shall contain XXX.
> > +
> > +# Introduction
> > +
> > +## Background
> > +
> > +Cases where domU needs direct-map memory map:
> > +
> > + * IOMMU not present in the system.
> > + * IOMMU disabled, since it doesn't cover a specific device.
>
> If the device is not covered by the IOMMU, then why would you want to
> disable the IOMMUs for the rest of the system?
>

This is a mixed scenario. We pass some devices to VM with SMMU, and we
pass other devices to VM without SMMU. We could not guarantee guest
DMA security.

So users may want to disable the SMMU, at least, they can gain some
performance improvement from SMMU disabled.

> > + * IOMMU disabled, since it doesn't have enough bandwidth.
>
> I am not sure to understand this one.
>

In some SoC, there would be multiple devices connected to one SMMU.

In some extreme situation, multiple devices do DMA concurrency, The
translation requests can exceed SMMU's translation capacity. This will
cause DMA latency.

> > + * IOMMU disabled, since it adds too much latency.
>
> The list above sounds like direct-map memory would be necessary even
> without device-passthrough. Can you clarify it?
>

Okay.

SMMU on different SoCs can be implemented differently. For example, some
SoC vendor may remove the TLB inside SMMU.

In this case, the SMMU will add latency in DMA progress. Users may want to
disable the SMMU for some Realtime scenarios.

> > +
> > +*WARNING:
> > +Users should be careful that it is not always secure to assign a
> > +device without
>
> s/careful/aware/ I think. Also, it is never secure to assign a device without
> IOMMU/SMMU unless you have a replacement.
>
> I would suggest to reword it something like:
>
> "When the device is not protected by the IOMMU, the administrator should
> make sure that:
> - The device is assigned to a trusted guest
> - You have an additional security mechanism on the platform (e.g
> MPU) to protect the memory."
>

Thanks for the rephrase. (*^?^*)

> > +IOMMU/SMMU protection.
> > +Users must be aware of this risk, that guests having access to
> > +hardware with DMA capacity must be trusted, or it could use the DMA
> > +engine to access any other memory area.
> > +Guests could use additional security hardware component like NOC,
> > +System MPU to protect the memory.
>
> What's the NOC?
>

Network on Chip.

Some kind of SoC level firewall that limits the devices' DMA access range
or CPU memory access range.

> > +
> > +## Design
> > +
> > +The implementation may cover following aspects:
> > +
> > +### Native Address and IRQ numbers for GIC and UART(vPL011)
> > +
> > +Today, fixed addresses and IRQ numbers are used to map GIC and
> > +UART(vPL011) in DomUs. And it may cause potential clash on direct-map
> domains.
> > +So, Using native addresses and irq numbers for GIC, UART(vPL011), in
> > +direct-map domains is necessary.
> > +e.g.
>
> To me e.g. means example. But below this is not an example, this is a
> requirement in order to use the vpl011 on system without pl011 UART.
>

Yes, right.
I'll delete e.g. here

> > +For the virtual interrupt of vPL011: instead of always using
> > +`GUEST_VPL011_SPI`, try to reuse the physical SPI number if possible.
>
> How would you find the following region for guest using PV drivers;
> - Event channel interrupt
> - Grant table area
>
Good catch! thousand thx. ????

We've done some investigation on this part. Correct me if I am wrong.

Pages like shared_info, grant table, etc, shared between guests and
xen, are mapped by ARM guests using the hypercall HYPERVISOR_memory_op
and always would not be directly mapped, even in dom0.

So, here, we suggest that maybe we could do some modification in the hypercall
to let it not only pass gfn to xen, but also receive already allocated mfns(e.g. grant
tables) from xen in direct map situation.
But If so, it involves modification in linux, o(???)o.

And also we incline to keep all guest related pages(including ram, grant tables,
etc) in one whole piece.

Right now, pages like grant tables are allocated separately in Xen heap, so don't
stand much chance to be consistent with the guest ram.

So what if we allocate more ram at first, such like, need 256MB, give it 257MB, let
extra 1MB used for those pages. Then if so, we could keep it as a whole.

This is my quite rough brainstorm, plz bear it and give me more thoughts on it.

> > +
> > +### Device tree option: `direct_map`
> > +
> > +Introduce a new device tree option `direct_map` for direct-map domains.
> > +Then, when users try to allocate one direct-map domain(except DOM0),
> > +`direct-map` property needs to be added under the appropriate
> `/chosen/domUx`.
> > +
> > +
> > + chosen {
> > + ...
> > + domU1 {
> > + compatible = "xen, domain";
> > + #address-cells = <0x2>;
> > + #size-cells = <0x1>;
> > + direct-map;
> > + ...
> > + };
> > + ...
> > + };
> > +
> > +If users are using imagebuilder, they can add to boot.source
> > +something like the
>
> This documentations ounds like more something for imagebuilder rather
> than Xen itself.
>

Yes, right. I'll delete this part.

> > +following:
> > +
> > + fdt set /chosen/domU1 direct-map
> > +
> > +Users could also use `xl` to create direct-map domains, just use the
> > +following config option: `direct-map=true`
> > +
> > +### direct-map guest memory allocation
> > +
> > +Func `allocate_memory_direct_map` is based on `allocate_memory_11`,
> > +and shall be refined to allocate memory for all direct-map domains,
> including DOM0.
> > +Roughly speaking, firstly, it tries to allocate arbitrary memory
> > +chunk of requested size from domain
> > +sub-allocator(`alloc_domheap_pages`). If fail, split the chunk into
> > +halves, and re-try, until it succeed or bail out with the smallest chunk size.
>
> If you have a mix of domain with direct-mapped and normal domain, you
> may end up to have the memory so small that your direct-mapped domain
> will have many small banks. This is going to be a major problem if you are
> creating the domain at runtime (you suggest xl can be used).
>
> In addition, some users may want to be able to control the location of the
> memory as this reduced the amount of work in the guest (e.g you don't have
> to dynamically discover the memory).
>
> I think it would be best to always require the admin to select the RAM bank
> used by a direct mapped domain. Alternatively, we could have a pool of
> memory that can only be used for direct mapped domain. This should limit
> the fragmentation of the memory.
>

Yep, in some cases, if we have mix of domains with direct-mapped with
user- defined memory regions (scattering loosely)and normal domains at
the beginning, it may fail when we later creating the domain at runtime (use
xl), no matter direct-map domain or not.

But, users should be free to allocate where they want, we may not limit a
pool of memory to use.

Of course, we could add warning to let them being aware.

But I'm with you that it would be best to always require the admin to select
the RAM bank used by a direct mapped domain.

Later, I will add this part design in my next series.

And Just adding 1:1 direct-map without user-defined regions as an extra option here.

> > +Then, `insert_11_bank` shall insert above allocated pages into a
> > +memory bank, which are ordered by address, and also set up guest P2M
> > +mapping(
> > +`guest_physmap_add_page`) to ensure `gfn == mfn`.
>
> Cheers,
>
> --
> Julien Grall

Cheers,

--
Penny
Re: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
On 10/12/2020 07:02, Penny Zheng wrote:
> Hi Julien

Hi Penny,

Apologies for the late answer.

>
> Thanks for the nice and detailed comments. (*^?^*)
> Here are the replies:
>
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Tuesday, December 8, 2020 5:07 PM
>> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Kaly Xin
>> <Kaly.Xin@arm.com>; Wei Chen <Wei.Chen@arm.com>; nd <nd@arm.com>;
>> Paul Durrant <paul@xen.org>; famzheng@amazon.com
>> Subject: Re: [RFC] design: design doc for 1:1 direct-map
>>
>> Hi Penny,
>>
>> I am adding Paul and Zheng in the thread as there are similar interest for the
>> x86 side.
>>
>> On 08/12/2020 05:21, Penny Zheng wrote:
>>> This is one draft design about the infrastructure for now, not ready
>>> for upstream yet (hence the RFC tag), thought it'd be useful to
>>> firstly start a discussion with the community.
>>>
>>> Create one design doc for 1:1 direct-map.
>>> It aims to describe why and how we allocate 1:1 direct-map(guest
>>> physical == physical) domains.
>>>
>>> This document is partly based on Stefano Stabellini's patch serie v1:
>>> [direct-map DomUs](
>>> https://lists.xenproject.org/archives/html/xen-devel/2020-
>> 04/msg00707.html).
>>
>> May I ask why a different approach?
>
> In Stefano original design, he'd like to allocate 1:1 direct-map with user-defined
> memory regions and he prefers allocating it from sub-domain allocator.

I am not entirely sure what you are referring to with "sub-domain
allocator".

>
> And it brings quite a discussion there and in the last, everyone kinds of all
> agrees that it is not workable. Since if requested memory ever goes into any
> allocators, no matter boot, or sub-domain allocator, we could not ensure that
> before actually allocating it for one 1:1 direct-map domain, it will not be into
> any other use.

Yes, you cannot give the memory to the heap allocator and expect the
region to always be free. However, you can mark them as reserve so the
allocator doesn't touch it.

We (AWS) also needs to reserve memory for later use in the case of
LiveUpdate. In our case, the memory already contain guest data, so it is
not possible to give them to any allocator.

We solved it by excluding the page from any allocator and then marking
then page as allocated/used when giving to the domain.

There are some corner cases unsolved when using NUMA. Aside that this
work because the heap allocator don't keep a list of in-use pages.

>
> So I'd prefer to split original design into two parts: one is here, that user only
> wants to allocate one 1:1 direct-map domain, not caring about where the ram
> will be located into.

While I understand that a user may not care where the direct-map memory
is allocated. However, I question the usefulness because:

1) This doesn't work with MPU
2) You may end up with provide the guest with many small regions if the
guest is not created right after boot or rebooting.

Can you outline what would be your use case here?


>
>>> +architecture.
>>> +
>>> +It aims to describe why and how the guest would be created as direct-map
>> domain.
>>> +
>>> +This document is partly based on Stefano Stabellini's patch serie v1:
>>> +[direct-map DomUs](
>>> +https://lists.xenproject.org/archives/html/xen-devel/2020-
>> 04/msg00707.html).
>>> +
>>> +This is a first draft and some questions are still unanswered. When
>>> +this is the case, the text shall contain XXX.
>>> +
>>> +# Introduction
>>> +
>>> +## Background
>>> +
>>> +Cases where domU needs direct-map memory map:
>>> +
>>> + * IOMMU not present in the system.
>>> + * IOMMU disabled, since it doesn't cover a specific device.
>>
>> If the device is not covered by the IOMMU, then why would you want to
>> disable the IOMMUs for the rest of the system?
>>
>
> This is a mixed scenario. We pass some devices to VM with SMMU, and we
> pass other devices to VM without SMMU. We could not guarantee guest
> DMA security.

Not really, you can guarantee DMA security if devices not protected by
an IOMMU are assigned to *trusted* domains.

>
> So users may want to disable the SMMU, at least, they can gain some
> performance improvement from SMMU disabled.

That's an understandable argument. Yet, I think this only works if you
trust *all* your domains. So a user may still want to keep IOMMU on when
assigning devices (as long as they are protected by an IOMMU) to a
non-trusted domain.

So I would suggest to rephrase your second bullet point with:

"IOMMU disabled if all the guests are trusted"

>>> + * IOMMU disabled, since it doesn't have enough bandwidth.
>>
>> I am not sure to understand this one.
>>
>
> In some SoC, there would be multiple devices connected to one SMMU.
>
> In some extreme situation, multiple devices do DMA concurrency, The
> translation requests can exceed SMMU's translation capacity. This will
> cause DMA latency.

Ok. So either the SoC doesn't fit your use-case or the SoC was not
correctly designed. Therefore, I would call that a workaround :). I
would suggest to update the design doc with more information.

OOI, is it really necessary to turn off the IOMMU? Would it be possible
to instead have a few devices by-passing the IOMMU when they are
assigned to a trusted domain?

>
>>> + * IOMMU disabled, since it adds too much latency.
>>
>> The list above sounds like direct-map memory would be necessary even
>> without device-passthrough. Can you clarify it?
>>
>
> Okay.
>
> SMMU on different SoCs can be implemented differently. For example, some
> SoC vendor may remove the TLB inside SMMU.
>
> In this case, the SMMU will add latency in DMA progress. Users may want to
> disable the SMMU for some Realtime scenarios.

Thanks for the explanation, however this wasn't my question. I was
pointed out that your example gave the impression that domaion with not
devices assigned would also need to be direct-mapped.

Could you confirm whether this is the intended purpose?

>
>>> +
>>> +*WARNING:
>>> +Users should be careful that it is not always secure to assign a
>>> +device without
>>
>> s/careful/aware/ I think. Also, it is never secure to assign a device without
>> IOMMU/SMMU unless you have a replacement.
>>
>> I would suggest to reword it something like:
>>
>> "When the device is not protected by the IOMMU, the administrator should
>> make sure that:
>> - The device is assigned to a trusted guest
>> - You have an additional security mechanism on the platform (e.g
>> MPU) to protect the memory."
>>
>
> Thanks for the rephrase. (*^?^*)
>
>>> +IOMMU/SMMU protection.
>>> +Users must be aware of this risk, that guests having access to
>>> +hardware with DMA capacity must be trusted, or it could use the DMA
>>> +engine to access any other memory area.
>>> +Guests could use additional security hardware component like NOC,
>>> +System MPU to protect the memory.
>>
>> What's the NOC?
>>
>
> Network on Chip.
>
> Some kind of SoC level firewall that limits the devices' DMA access range
> or CPU memory access range.

I would suggest to use the longer term or introduce an accronym section.

>
>>> +
>>> +## Design
>>> +
>>> +The implementation may cover following aspects:
>>> +
>>> +### Native Address and IRQ numbers for GIC and UART(vPL011)
>>> +
>>> +Today, fixed addresses and IRQ numbers are used to map GIC and
>>> +UART(vPL011) in DomUs. And it may cause potential clash on direct-map
>> domains.
>>> +So, Using native addresses and irq numbers for GIC, UART(vPL011), in
>>> +direct-map domains is necessary.
>>> +e.g.
>>
>> To me e.g. means example. But below this is not an example, this is a
>> requirement in order to use the vpl011 on system without pl011 UART.
>>
>
> Yes, right.
> I'll delete e.g. here
>
>>> +For the virtual interrupt of vPL011: instead of always using
>>> +`GUEST_VPL011_SPI`, try to reuse the physical SPI number if possible.
>>
>> How would you find the following region for guest using PV drivers;
>> - Event channel interrupt
>> - Grant table area
>>
> Good catch! thousand thx. ????
>
> We've done some investigation on this part. Correct me if I am wrong.
>
> Pages like shared_info, grant table, etc, shared between guests and
> xen, are mapped by ARM guests using the hypercall HYPERVISOR_memory_op
> and always would not be directly mapped, even in dom0.

Any memory shared with Xen (e.g grant table, shared info) should never
be used for DMA. So I don't think you need to directly mapped them.

In the case of shared memory between guest, I would suggest to look at
what we do in dom0 for dealing with DMA on "foreign" pages.

>
> So, here, we suggest that maybe we could do some modification in the hypercall
> to let it not only pass gfn to xen, but also receive already allocated mfns(e.g. grant
> tables) from xen in direct map situation.

Regardless the modification required in Linux, all the memory hypercalls
are part of the stable ABI. So any change should be carefully though to
avoid breaking backward compatibility.

However, I don't think you any to modify any of the hypercalls today
(see above).

> But If so, it involves modification in linux, o(???)o.
>
> And also we incline to keep all guest related pages(including ram, grant tables,
> etc) in one whole piece.

Do you mean physically contiguous in the host memory? If so, I am not
sure this can be achieved when letting the Xen chosing the placement and
having a good success rate.

>
> Right now, pages like grant tables are allocated separately in Xen heap, so don't
> stand much chance to be consistent with the guest ram.

I don't quite understand why you need that consistency. In fact, Dom0 is
direct mapped and we are able to have multiple memory ranges and all the
shared memory not direct mapped.

[...]

>
>>> +following:
>>> +
>>> + fdt set /chosen/domU1 direct-map
>>> +
>>> +Users could also use `xl` to create direct-map domains, just use the
>>> +following config option: `direct-map=true`
>>> +
>>> +### direct-map guest memory allocation
>>> +
>>> +Func `allocate_memory_direct_map` is based on `allocate_memory_11`,
>>> +and shall be refined to allocate memory for all direct-map domains,
>> including DOM0.
>>> +Roughly speaking, firstly, it tries to allocate arbitrary memory
>>> +chunk of requested size from domain
>>> +sub-allocator(`alloc_domheap_pages`). If fail, split the chunk into
>>> +halves, and re-try, until it succeed or bail out with the smallest chunk size.
>>
>> If you have a mix of domain with direct-mapped and normal domain, you
>> may end up to have the memory so small that your direct-mapped domain
>> will have many small banks. This is going to be a major problem if you are
>> creating the domain at runtime (you suggest xl can be used).
>>
>> In addition, some users may want to be able to control the location of the
>> memory as this reduced the amount of work in the guest (e.g you don't have
>> to dynamically discover the memory).
>>
>> I think it would be best to always require the admin to select the RAM bank
>> used by a direct mapped domain. Alternatively, we could have a pool of
>> memory that can only be used for direct mapped domain. This should limit
>> the fragmentation of the memory.
>>
>
> Yep, in some cases, if we have mix of domains with direct-mapped with
> user- defined memory regions (scattering loosely)and normal domains at
> the beginning, it may fail when we later creating the domain at runtime (use
> xl), no matter direct-map domain or not.

It is not only about creating a new domain. It is also rebooting a
running domain.

In the reboot case, you may be able to re-allocate the memory but this
will more by luck than anything else.

>
> But, users should be free to allocate where they want, we may not limit a
> pool of memory to use.

Right, the memory pool is to try to limit the risk when the user decides
to let Xen chosing where the memory is allocated.

Cheers,

--
Julien Grall
RE: [RFC] design: design doc for 1:1 direct-map [ In reply to ]
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, January 5, 2021 8:41 PM
> To: Penny Zheng <Penny.Zheng@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Kaly Xin
> <Kaly.Xin@arm.com>; Wei Chen <Wei.Chen@arm.com>; nd <nd@arm.com>;
> Paul Durrant <paul@xen.org>; famzheng@amazon.com
> Subject: Re: [RFC] design: design doc for 1:1 direct-map
>
>
>
> On 10/12/2020 07:02, Penny Zheng wrote:
> > Hi Julien
>
> Hi Penny,
>
> Apologies for the late answer.
>

Hi Julien

NP. Thanks for the detailed comments again ;).

> >
> > Thanks for the nice and detailed comments. (*^?^*) Here are the
> > replies:
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Tuesday, December 8, 2020 5:07 PM
> >> To: Penny Zheng <Penny.Zheng@arm.com>;
> >> xen-devel@lists.xenproject.org; sstabellini@kernel.org
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Kaly Xin
> >> <Kaly.Xin@arm.com>; Wei Chen <Wei.Chen@arm.com>; nd
> <nd@arm.com>;
> >> Paul Durrant <paul@xen.org>; famzheng@amazon.com
> >> Subject: Re: [RFC] design: design doc for 1:1 direct-map
> >>
> >> Hi Penny,
> >>
> >> I am adding Paul and Zheng in the thread as there are similar
> >> interest for the
> >> x86 side.
> >>
> >> On 08/12/2020 05:21, Penny Zheng wrote:
> >>> This is one draft design about the infrastructure for now, not ready
> >>> for upstream yet (hence the RFC tag), thought it'd be useful to
> >>> firstly start a discussion with the community.
> >>>
> >>> Create one design doc for 1:1 direct-map.
> >>> It aims to describe why and how we allocate 1:1 direct-map(guest
> >>> physical == physical) domains.
> >>>
> >>> This document is partly based on Stefano Stabellini's patch serie v1:
> >>> [direct-map DomUs](
> >>> https://lists.xenproject.org/archives/html/xen-devel/2020-
> >> 04/msg00707.html).
> >>
> >> May I ask why a different approach?
> >
> > In Stefano original design, he'd like to allocate 1:1 direct-map with
> > user-defined memory regions and he prefers allocating it from sub-domain
> allocator.
>
> I am not entirely sure what you are referring to with "sub-domain allocator".
>

Sorry. I mean domain-heap sub-allocator. ;)
I found this reference here
( https://github.com/xen-project/xen/blob/master/xen/common/page_alloc.c#L2233 ).

> >
> > And it brings quite a discussion there and in the last, everyone kinds
> > of all agrees that it is not workable. Since if requested memory ever
> > goes into any allocators, no matter boot, or sub-domain allocator, we
> > could not ensure that before actually allocating it for one 1:1
> > direct-map domain, it will not be into any other use.
>
> Yes, you cannot give the memory to the heap allocator and expect the region
> to always be free. However, you can mark them as reserve so the allocator
> doesn't touch it.
>
> We (AWS) also needs to reserve memory for later use in the case of
> LiveUpdate. In our case, the memory already contain guest data, so it is not
> possible to give them to any allocator.
>
> We solved it by excluding the page from any allocator and then marking then
> page as allocated/used when giving to the domain.
>
> There are some corner cases unsolved when using NUMA. Aside that this
> work because the heap allocator don't keep a list of in-use pages.
>

Yes. I agree.

Here is my new rough idea on how to allocate memory for direct-map domains:

# New Boot Module: `BOOTMOD_STATIC_MEM`

So in order to limit the fragmentation of the memory, users must select one
appropriate chunk of memory as static memory allocation pool during
compile-time.

Later, all direct-map guests' memory must be allocated from this static
memory allocation pool.

Also, we call this kind domain, whose RAM is allocated from static memory
allocation pool, `static domain`.

Static domains include not only direct-map domains. But this part is
irrelevant to this design.

Memory in static memory allocation pool shall be reserved from the beginning,
to ensure that it shall not go to any memory allocator for other use, no
matter boot allocator or domain-heap allocator.

Here introduces new `BOOTMOD_STATIC_MEM` node to define static memory
allocation pool under the `/chosen` node in device tree.

It contains the following properties:

- compatible

Must always include the following compatiblity string:

`"multiboot,static-mem"`

- reg

reg specifies the physical address of the module in RAM and the length
of the module.

Here is one example:
module@0xc0000000 {
compatible = "multiboot,static-mem";
reg = <0xc0000000 0x40000000>;
};

RAM at 0xc0000000 of 1G size will be reserved as static memory allocation pool.

# New Page Flag: `PGC_state_reserved`

In order to differentiate pages allocated from static memory allocation pool,
with those which are allocated from heap allocator for normal domains, we
shall introduce a new page flag `PGC_state_reserved` to tell.

During boot time of preparing memory for Xen(`setup_mm`), after setting up
frame tables for all RAM(`setup_frametable_mappings`), we shall do extra
initialization on part of frame tables referring static memory allocation
pool, that is, granting `PGC_state_reserved` flag to those `struct page_info`s.

# `allocate_static_mem()`

Usually, when allocating memory for normal domains, we try to allocate
requested guest memory from heap allocator. But here, for direct-map
domains(also static domain), it shall come from static memory allocation pool.

Here, introduce a new func `allocate_static_mem()` to implement static memory
allocation.

For each page, it includes the following steps:
1. Check if it is a valid page in static memory allocation pool.
2. Check if the page is reserved(`PGC_state_reserved`).
3. Do the necessary preparation on struct page_info, like, following the same
cache-coherency policy in `alloc_heap_pages`, turning page status from
`PGC_state_reserved` to `PGC_state_used`, etc.

> >
> > So I'd prefer to split original design into two parts: one is here,
> > that user only wants to allocate one 1:1 direct-map domain, not caring
> > about where the ram will be located into.
>
> While I understand that a user may not care where the direct-map memory
> is allocated. However, I question the usefulness because:
>
> 1) This doesn't work with MPU
> 2) You may end up with provide the guest with many small regions if the
> guest is not created right after boot or rebooting.
>
> Can you outline what would be your use case here?
>

Yes. It does not work with MPU.
But it’s workable when iommu disabled/missing with MMU on. In some cases, when
users doing DMA on trusted domains, GPA = PA is maybe just enough.

And for the point 2, I share the same concerns.
So how about all direct-map domains, no matter with user-defined memory
regions or not care where to locate, are all allocated through static memory
allocation pool. You could find more details above.

But, here, it may leave DOM0 as a very `special` direct-map domain, only
its memory are allocated from heap allocator.

And of course we could limit the number of memory slots and the size of it.
We fail on surpassing maximum memory slots, or allocated memory slot
too small.

Which one do you prefer? ;)

>
> >
> >>> +architecture.
> >>> +
> >>> +It aims to describe why and how the guest would be created as
> >>> +direct-map
> >> domain.
> >>> +
> >>> +This document is partly based on Stefano Stabellini's patch serie v1:
> >>> +[direct-map DomUs](
> >>> +https://lists.xenproject.org/archives/html/xen-devel/2020-
> >> 04/msg00707.html).
> >>> +
> >>> +This is a first draft and some questions are still unanswered. When
> >>> +this is the case, the text shall contain XXX.
> >>> +
> >>> +# Introduction
> >>> +
> >>> +## Background
> >>> +
> >>> +Cases where domU needs direct-map memory map:
> >>> +
> >>> + * IOMMU not present in the system.
> >>> + * IOMMU disabled, since it doesn't cover a specific device.
> >>
> >> If the device is not covered by the IOMMU, then why would you want to
> >> disable the IOMMUs for the rest of the system?
> >>
> >
> > This is a mixed scenario. We pass some devices to VM with SMMU, and we
> > pass other devices to VM without SMMU. We could not guarantee guest
> > DMA security.
>
> Not really, you can guarantee DMA security if devices not protected by an
> IOMMU are assigned to *trusted* domains.
>
> >
> > So users may want to disable the SMMU, at least, they can gain some
> > performance improvement from SMMU disabled.
>
> That's an understandable argument. Yet, I think this only works if you trust
> *all* your domains. So a user may still want to keep IOMMU on when
> assigning devices (as long as they are protected by an IOMMU) to a non-
> trusted domain.
>
> So I would suggest to rephrase your second bullet point with:
>
> "IOMMU disabled if all the guests are trusted"
>

Thanks
I'll do that. ;)

> >>> + * IOMMU disabled, since it doesn't have enough bandwidth.
> >>
> >> I am not sure to understand this one.
> >>
> >
> > In some SoC, there would be multiple devices connected to one SMMU.
> >
> > In some extreme situation, multiple devices do DMA concurrency, The
> > translation requests can exceed SMMU's translation capacity. This will
> > cause DMA latency.
>
> Ok. So either the SoC doesn't fit your use-case or the SoC was not correctly
> designed. Therefore, I would call that a workaround :). I would suggest to
> update the design doc with more information.
>
> OOI, is it really necessary to turn off the IOMMU? Would it be possible to
> instead have a few devices by-passing the IOMMU when they are assigned to
> a trusted domain?
>

Yes, of course. It's totally up to users to decide whether to just turn off or
by-passing. Sometimes, the by-passing is more desirable. ;)

And if users want to turn off, direct-map is their choice to assigning device
to trusted domain.

I should elaborate more in my design.
> >
> >>> + * IOMMU disabled, since it adds too much latency.
> >>
> >> The list above sounds like direct-map memory would be necessary even
> >> without device-passthrough. Can you clarify it?
> >>
> >
> > Okay.
> >
> > SMMU on different SoCs can be implemented differently. For example,
> > some SoC vendor may remove the TLB inside SMMU.
> >
> > In this case, the SMMU will add latency in DMA progress. Users may
> > want to disable the SMMU for some Realtime scenarios.
>
> Thanks for the explanation, however this wasn't my question. I was pointed
> out that your example gave the impression that domaion with not devices
> assigned would also need to be direct-mapped.
>
> Could you confirm whether this is the intended purpose?
>

Sorry, I don't know which part above give you the impression, and maybe I
could refine it to eliminate the ambiguity. ;)

My intended purpose here is still to cover all user scenarios where uses may want
to disable IOMMU, but still want to do DMA.

> >
> >>> +
> >>> +*WARNING:
> >>> +Users should be careful that it is not always secure to assign a
> >>> +device without
> >>
> >> s/careful/aware/ I think. Also, it is never secure to assign a device
> >> without IOMMU/SMMU unless you have a replacement.
> >>
> >> I would suggest to reword it something like:
> >>
> >> "When the device is not protected by the IOMMU, the administrator
> >> should make sure that:
> >> - The device is assigned to a trusted guest
> >> - You have an additional security mechanism on the platform (e.g
> >> MPU) to protect the memory."
> >>
> >
> > Thanks for the rephrase. (*^?^*)
> >
> >>> +IOMMU/SMMU protection.
> >>> +Users must be aware of this risk, that guests having access to
> >>> +hardware with DMA capacity must be trusted, or it could use the DMA
> >>> +engine to access any other memory area.
> >>> +Guests could use additional security hardware component like NOC,
> >>> +System MPU to protect the memory.
> >>
> >> What's the NOC?
> >>
> >
> > Network on Chip.
> >
> > Some kind of SoC level firewall that limits the devices' DMA access
> > range or CPU memory access range.
>
> I would suggest to use the longer term or introduce an accronym section.
>

Thx. I will use longer term to rephrase. ;)
> >
> >>> +
> >>> +## Design
> >>> +
> >>> +The implementation may cover following aspects:
> >>> +
> >>> +### Native Address and IRQ numbers for GIC and UART(vPL011)
> >>> +
> >>> +Today, fixed addresses and IRQ numbers are used to map GIC and
> >>> +UART(vPL011) in DomUs. And it may cause potential clash on
> >>> +direct-map
> >> domains.
> >>> +So, Using native addresses and irq numbers for GIC, UART(vPL011),
> >>> +in direct-map domains is necessary.
> >>> +e.g.
> >>
> >> To me e.g. means example. But below this is not an example, this is a
> >> requirement in order to use the vpl011 on system without pl011 UART.
> >>
> >
> > Yes, right.
> > I'll delete e.g. here
> >
> >>> +For the virtual interrupt of vPL011: instead of always using
> >>> +`GUEST_VPL011_SPI`, try to reuse the physical SPI number if possible.
> >>
> >> How would you find the following region for guest using PV drivers;
> >> - Event channel interrupt
> >> - Grant table area
> >>
> > Good catch! thousand thx. ????
> >
> > We've done some investigation on this part. Correct me if I am wrong.
> >
> > Pages like shared_info, grant table, etc, shared between guests and
> > xen, are mapped by ARM guests using the hypercall
> HYPERVISOR_memory_op
> > and always would not be directly mapped, even in dom0.
>
> Any memory shared with Xen (e.g grant table, shared info) should never be
> used for DMA. So I don't think you need to directly mapped them.
>
> In the case of shared memory between guest, I would suggest to look at what
> we do in dom0 for dealing with DMA on "foreign" pages.
>

Thx for pointing out. I will dig it later. ;)

> >
> > So, here, we suggest that maybe we could do some modification in the
> > hypercall to let it not only pass gfn to xen, but also receive already
> > allocated mfns(e.g. grant
> > tables) from xen in direct map situation.
>
> Regardless the modification required in Linux, all the memory hypercalls are
> part of the stable ABI. So any change should be carefully though to avoid
> breaking backward compatibility.
>
> However, I don't think you any to modify any of the hypercalls today (see
> above).
>
> > But If so, it involves modification in linux, o(???)o.
> >
> > And also we incline to keep all guest related pages(including ram,
> > grant tables,
> > etc) in one whole piece.
>
> Do you mean physically contiguous in the host memory? If so, I am not sure
> this can be achieved when letting the Xen chosing the placement and having
> a good success rate.
>
> >
> > Right now, pages like grant tables are allocated separately in Xen
> > heap, so don't stand much chance to be consistent with the guest ram.
>
> I don't quite understand why you need that consistency. In fact, Dom0 is
> direct mapped and we are able to have multiple memory ranges and all the
> shared memory not direct mapped.
>

Yes, right.

For now, Dom0 is direct mapped and all its shared memory with Xen is not direct
mapped. And it is still working well.
And also DMA shall never use this shared memory, So it's better for me not to
consider for it here.

And for why I brought up this physically consistency here, it may derive from some
MPU cases.

If trying to set up direct-map domains based on MPU, users may be constricted to
use very limited memory slots for all accessible memory(ram, shared memory with
Xen, etc).

Following current mechanism, it may end up to one slot for grant table, one slot for
shared_info, one slot for ioreq, etc. And some slots may hold very little ram, like,
ioreq shared pages only take two pages, which is quite a waste.

But like I said before, it should not be considered here. ;)

> [...]
>
> >
> >>> +following:
> >>> +
> >>> + fdt set /chosen/domU1 direct-map
> >>> +
> >>> +Users could also use `xl` to create direct-map domains, just use
> >>> +the following config option: `direct-map=true`
> >>> +
> >>> +### direct-map guest memory allocation
> >>> +
> >>> +Func `allocate_memory_direct_map` is based on `allocate_memory_11`,
> >>> +and shall be refined to allocate memory for all direct-map domains,
> >> including DOM0.
> >>> +Roughly speaking, firstly, it tries to allocate arbitrary memory
> >>> +chunk of requested size from domain
> >>> +sub-allocator(`alloc_domheap_pages`). If fail, split the chunk into
> >>> +halves, and re-try, until it succeed or bail out with the smallest chunk
> size.
> >>
> >> If you have a mix of domain with direct-mapped and normal domain, you
> >> may end up to have the memory so small that your direct-mapped
> domain
> >> will have many small banks. This is going to be a major problem if
> >> you are creating the domain at runtime (you suggest xl can be used).
> >>
> >> In addition, some users may want to be able to control the location
> >> of the memory as this reduced the amount of work in the guest (e.g
> >> you don't have to dynamically discover the memory).
> >>
> >> I think it would be best to always require the admin to select the
> >> RAM bank used by a direct mapped domain. Alternatively, we could have
> >> a pool of memory that can only be used for direct mapped domain. This
> >> should limit the fragmentation of the memory.
> >>
> >
> > Yep, in some cases, if we have mix of domains with direct-mapped with
> > user- defined memory regions (scattering loosely)and normal domains at
> > the beginning, it may fail when we later creating the domain at
> > runtime (use xl), no matter direct-map domain or not.
>
> It is not only about creating a new domain. It is also rebooting a running
> domain.
>
> In the reboot case, you may be able to re-allocate the memory but this will
> more by luck than anything else.
>

Yes , Much more confirming me that using static memory allocation pool
is better. Since it could also eliminate above risk.
> >
> > But, users should be free to allocate where they want, we may not
> > limit a pool of memory to use.
>
> Right, the memory pool is to try to limit the risk when the user decides to let
> Xen chosing where the memory is allocated.
>

I agree.

> Cheers,
>
> --
> Julien Grall

Cheers

--
Penny Zheng