Mailing List Archive

[PATCH 0 of 4] Patches for PCI passthrough with modified E820 (v3).
Hello,

This set of v3 patches allows a PV domain to see the machine's
E820 and figure out where the "PCI I/O" gap is and match it with the reality.

Changelog since v2 posting:
- Moved 'libxl__e820_alloc' to be called from do_domain_create and if
machine_e820 == true.
- Made no_machine_e820 be set to true, if the guest has no PCI devices (and is PV)
- Used Keir's re-worked code for E820 creation.
Changelog since v1 posting:
- Squashed the "x86: make the pv-only e820 array be dynamic" and
"x86: adjust the size of the e820 for pv guest to be dynamic" together.
- Made xc_domain_set_memmap_limit use the 'xc_domain_set_memory_map'
- Moved 'libxl_e820_alloc' and 'libxl_e820_sanitize' to be an internal
operation and called from 'libxl_device_pci_parse_bdf'.
- Expanded 'libxl_device_pci_parse_bdf' API call to have an extra argument
(optional).

The short end is that with these patches a PV domain can:

- Use the correct PCI I/O gap. Before these patches, Linux guest would
boot up and would tell:
[ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:c0000000)
while in actuality the PCI I/O gap should have been:
[ 0.000000] Allocating PCI resources starting at b0000000 (gap: b0000000:4c000000)

- The PV domain with PCI devices was limited to 3GB. It now can be booted
with 4GB, 8GB, or whatever number you want. The PCI devices will now _not_ conflict
with System RAM. Meaning the drivers can load.

- With 2.6.39 kernels (which has the 1-1 mapping code), the VM_IO flag will be
now automatically applied to regions that are considerd PCI I/O regions. You can
find out which those are by looking for '1-1' in the kernel bootup.

To use this patchset, the guest config file has to have the parameter 'pci=['<BDF>',...]'
enabled.

This has been tested with 2.6.18 (RHEL5), 2.6.27(SLES11), 2.6.36, 2.6.37, 2.6.38,
and 2.6.39 kernels. Also tested with PV NetBSD 5.1.

Tested this with the PCI devices (NIC, MSI), and with 2GB, 4GB, and 6GB guests
with success.

P.S.
*There is a bug in the Linux kernel so that if you want to save/restore a PV guest
that has 1-1 mapping it won't restore. Will post the patches shortly for that.

tools/libxc/xc_domain.c | 77 +++++++++-----
tools/libxc/xc_e820.h | 3
tools/libxc/xenctrl.h | 11 ++
tools/libxl/libxl.idl | 1
tools/libxl/libxl_create.c | 8 +
tools/libxl/libxl_internal.h | 1
tools/libxl/libxl_pci.c | 230 +++++++++++++++++++++++++++++++++++++++++++
tools/libxl/xl_cmdimpl.c | 3
xen/arch/x86/domain.c | 4
xen/arch/x86/mm.c | 44 ++++++--
xen/include/asm-x86/domain.h | 3
11 files changed, 351 insertions(+), 34 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 0 of 4] Patches for PCI passthrough with modified E820 (v3). [ In reply to ]
On Tue, Apr 12, 2011 at 04:31:35PM -0400, Konrad Rzeszutek Wilk wrote:
> Hello,
>
> This set of v3 patches allows a PV domain to see the machine's
> E820 and figure out where the "PCI I/O" gap is and match it with the reality.
>
> Changelog since v2 posting:
> - Moved 'libxl__e820_alloc' to be called from do_domain_create and if
> machine_e820 == true.
> - Made no_machine_e820 be set to true, if the guest has no PCI devices (and is PV)
> - Used Keir's re-worked code for E820 creation.
> Changelog since v1 posting:
> - Squashed the "x86: make the pv-only e820 array be dynamic" and
> "x86: adjust the size of the e820 for pv guest to be dynamic" together.
> - Made xc_domain_set_memmap_limit use the 'xc_domain_set_memory_map'
> - Moved 'libxl_e820_alloc' and 'libxl_e820_sanitize' to be an internal
> operation and called from 'libxl_device_pci_parse_bdf'.
> - Expanded 'libxl_device_pci_parse_bdf' API call to have an extra argument
> (optional).
>
> The short end is that with these patches a PV domain can:
>
> - Use the correct PCI I/O gap. Before these patches, Linux guest would
> boot up and would tell:
> [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:c0000000)
> while in actuality the PCI I/O gap should have been:
> [ 0.000000] Allocating PCI resources starting at b0000000 (gap: b0000000:4c000000)
>
> - The PV domain with PCI devices was limited to 3GB. It now can be booted
> with 4GB, 8GB, or whatever number you want. The PCI devices will now _not_ conflict
> with System RAM. Meaning the drivers can load.
>
> - With 2.6.39 kernels (which has the 1-1 mapping code), the VM_IO flag will be
> now automatically applied to regions that are considerd PCI I/O regions. You can
> find out which those are by looking for '1-1' in the kernel bootup.
>
> To use this patchset, the guest config file has to have the parameter 'pci=['<BDF>',...]'
> enabled.
>
> This has been tested with 2.6.18 (RHEL5), 2.6.27(SLES11), 2.6.36, 2.6.37, 2.6.38,
> and 2.6.39 kernels. Also tested with PV NetBSD 5.1.
>
> Tested this with the PCI devices (NIC, MSI), and with 2GB, 4GB, and 6GB guests
> with success.

Robert, Mikel, Jean,

If you apple these patches and use a fairly modern DomU (
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #master), you should be
able to now pass in any PCI device to a guest without any VM_IO type workaround,
and with more than 3GB.

It should be pretty obvious when your guest is using this, as the E820 in the
dmesg will mirror closely what Dom0 has.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel