Hello all,
I've been working on enabling passthrough for newer Nvidia cards and
drivers (GTX 980 specifically) on Xen and I'd like to document my
findings up to now and ask for assistance. I apologize if this is not
the correct mailing list, but I thought xen-devel is more suitable since
we are talking about code changes in Xen anyway.
Problem with Nvidia GPUs has been (for two years now) that drivers
detect it being running inside a VM and refuse to work (Code 43 error)
if the card is not a Quadro or other high-end non-consumer grade GPU
(though few other things could cause Code 43 or BSOD also). Now, since
KVM has supported Nvidia GPU passthrough for quite a while (and I've
personally succeeded in passing through GTX 980 using KVM on both Win 7
and Win 8.1 VM's), I decided to port those few patches from KVM to Xen.
------------------
#### Patch #1: Spoof Xen and Hypervisor signatures:
KVM has for a while supported hiding both the "KVMKVMKVMKVM" signature
(with "-cpu kvm=off" flag) as well as the Viridian hypervisor signature
("-cpu hv_vendor_id="..." flag). Currently there's no such functionality
in Xen, so I patched it in quite similar way to what Alex Willimson did
for KVM.
Attached is a patch for Xen 4.6.1 that spoofs Xen signature
("XenVMMXenVMM" to "ZenZenZenZen") and Viridian signature ("Microsoft
Hv" to "Wetware Labs") when "spoof_xen=1" and "spoof_viridian=1" are
added to VM configuration file.
The signatures are currently hard-coded, and currently there's no way to
modify them (beyond re-compiling Xen), since HVMLoader also uses a
hard-coded string to detect Xen and there's no API (understandably) to
change that signature in real-time.
This works with qemu-xen-traditional without any additional changes, but
qemu-xen requires that SeaBIOS is patched as well:
https://github.com/WetwareLabs/seabios/commit/ec102d72fc1d7b2e6c8e9607266dc9bd4a42bce0
With spoofing on, it was possible to use official binary drivers from
NVidia (tested version 367.27) on Arch Linux VM (without spoofing the
driver would fail with a error message such as "The NVIDIA GPU at
PCI:0:5:0 is not supported by the 367.27 NVIDIA driver". However this
was not enough on Windows VM's, as the Code 43 would occur regardless of
spoofing.
#### Patch #2: Disable NoSnoop.
Background information and the related patch for KVM is here:
https://patchwork.kernel.org/patch/3019371/
The fix was quite simple for Xen: Just modify the initial PCIe DEVCTL
capabilities to disable NoSnoop, and make the capability read-only.
Double-checking with Linux VM, I can see that NoSnoop is disabled for
all devices (with lspci -vvv), but this would not prevent Code 43 on
Windows VM.
##### Patch #3: Set CPUID to Core2duo
There have been few reports where forcing CPUID to Core2duo on KVM (-cpu
core2duo) would help alleviate Code 43 problems (and also increase
compatibility with Windows 10 VMs), so I copied all CPUID registers from
proven-to-be-working KVM configuration using libcpuid
(https://github.com/anrieff/libcpuid) and applied them to Xen VM. LibXL
is also patched (attached file) to allow hexadecimal input of CPUID (to
make it easier to convert CPUID output from libcpuid).
cpuid = [.
'0:eax=0000000a,ebx=756e6547,ecx=6c65746e,edx=49656e69',
'1:eax=000006fb,ebx=00000800,ecx=80202201,edx=0f8bfbff',
'2:eax=00000001,ebx=00000000,ecx=00000000,edx=002c307d',
'3:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'4,0:eax=00000121,ebx=01c0003f,ecx=0000003f,edx=00000001',
'4,1:eax=00000122,ebx=01c0003f,ecx=0000003f,edx=00000001',
'4,2:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'4,3:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'5:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'6:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'7,0:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'0x80000000:eax=80000008,ebx=756e6547,ecx=6c65746e,edx=49656e69',
'0x80000001:eax=000006fb,ebx=00000000,ecx=00000001,edx=20100800',
'0x80000002:eax=65746e49,ebx=2952286c,ecx=726f4320,edx=4d542865',
'0x80000003:eax=44203229,ebx=43206f75,ecx=20205550,edx=54202020',
'0x80000004:eax=30303737,ebx=20402020,ecx=30342e32,edx=007a4847',
'0x80000005:eax=01ff01ff,ebx=01ff01ff,ecx=40020140,edx=40020140',
'0x80000006:eax=00000000,ebx=42004200,ecx=02008140,edx=00000000',
'0x80000007:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'0x80000008:eax=00003028,ebx=00000000,ecx=00000000,edx=00000000'
]
This makes the /proc/cpuinfo almost identical between KVM and Xen VMs
running Linux. Only exceptions are flags "rep_good" (which is missing
under Xen) and "eager_fpu" and "xsaveopt" (not seen under KVM), but as
these are not explicitly set by CPUID but are Linux-specific flags, they
shouldn't (?) matter on Windows VMs.
------------------
Anyway, even applying all of these patches would not alleviate Code 43.
To be more specific, all NVidia drivers up to 364.72 would BSOD on boot
(SYSTEM_SERVICE_EXCEPTION), and newer drivers (368.22+) would cause Code
43. This happens on both Windows 7 Pro and 8.1 VMs. Result on qemu-xen
and -traditional is identical. Dom0 is Qubes 3.1 (Linux 4.1.24), Xen
4.6.1. Hardware: Intel i7-5820K, Asrock X99 WS motherboard, 32GB Corsair
mem, EVGA GTX980.
I would love if some of you could try these patches with both newer and
older NVidia cards. Also any suggestions, ideas and further patches
would be greatly appreciated! :)
Thanks!
Best regards,
Marcus
I've been working on enabling passthrough for newer Nvidia cards and
drivers (GTX 980 specifically) on Xen and I'd like to document my
findings up to now and ask for assistance. I apologize if this is not
the correct mailing list, but I thought xen-devel is more suitable since
we are talking about code changes in Xen anyway.
Problem with Nvidia GPUs has been (for two years now) that drivers
detect it being running inside a VM and refuse to work (Code 43 error)
if the card is not a Quadro or other high-end non-consumer grade GPU
(though few other things could cause Code 43 or BSOD also). Now, since
KVM has supported Nvidia GPU passthrough for quite a while (and I've
personally succeeded in passing through GTX 980 using KVM on both Win 7
and Win 8.1 VM's), I decided to port those few patches from KVM to Xen.
------------------
#### Patch #1: Spoof Xen and Hypervisor signatures:
KVM has for a while supported hiding both the "KVMKVMKVMKVM" signature
(with "-cpu kvm=off" flag) as well as the Viridian hypervisor signature
("-cpu hv_vendor_id="..." flag). Currently there's no such functionality
in Xen, so I patched it in quite similar way to what Alex Willimson did
for KVM.
Attached is a patch for Xen 4.6.1 that spoofs Xen signature
("XenVMMXenVMM" to "ZenZenZenZen") and Viridian signature ("Microsoft
Hv" to "Wetware Labs") when "spoof_xen=1" and "spoof_viridian=1" are
added to VM configuration file.
The signatures are currently hard-coded, and currently there's no way to
modify them (beyond re-compiling Xen), since HVMLoader also uses a
hard-coded string to detect Xen and there's no API (understandably) to
change that signature in real-time.
This works with qemu-xen-traditional without any additional changes, but
qemu-xen requires that SeaBIOS is patched as well:
https://github.com/WetwareLabs/seabios/commit/ec102d72fc1d7b2e6c8e9607266dc9bd4a42bce0
With spoofing on, it was possible to use official binary drivers from
NVidia (tested version 367.27) on Arch Linux VM (without spoofing the
driver would fail with a error message such as "The NVIDIA GPU at
PCI:0:5:0 is not supported by the 367.27 NVIDIA driver". However this
was not enough on Windows VM's, as the Code 43 would occur regardless of
spoofing.
#### Patch #2: Disable NoSnoop.
Background information and the related patch for KVM is here:
https://patchwork.kernel.org/patch/3019371/
The fix was quite simple for Xen: Just modify the initial PCIe DEVCTL
capabilities to disable NoSnoop, and make the capability read-only.
Double-checking with Linux VM, I can see that NoSnoop is disabled for
all devices (with lspci -vvv), but this would not prevent Code 43 on
Windows VM.
##### Patch #3: Set CPUID to Core2duo
There have been few reports where forcing CPUID to Core2duo on KVM (-cpu
core2duo) would help alleviate Code 43 problems (and also increase
compatibility with Windows 10 VMs), so I copied all CPUID registers from
proven-to-be-working KVM configuration using libcpuid
(https://github.com/anrieff/libcpuid) and applied them to Xen VM. LibXL
is also patched (attached file) to allow hexadecimal input of CPUID (to
make it easier to convert CPUID output from libcpuid).
cpuid = [.
'0:eax=0000000a,ebx=756e6547,ecx=6c65746e,edx=49656e69',
'1:eax=000006fb,ebx=00000800,ecx=80202201,edx=0f8bfbff',
'2:eax=00000001,ebx=00000000,ecx=00000000,edx=002c307d',
'3:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'4,0:eax=00000121,ebx=01c0003f,ecx=0000003f,edx=00000001',
'4,1:eax=00000122,ebx=01c0003f,ecx=0000003f,edx=00000001',
'4,2:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'4,3:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'5:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'6:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'7,0:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'0x80000000:eax=80000008,ebx=756e6547,ecx=6c65746e,edx=49656e69',
'0x80000001:eax=000006fb,ebx=00000000,ecx=00000001,edx=20100800',
'0x80000002:eax=65746e49,ebx=2952286c,ecx=726f4320,edx=4d542865',
'0x80000003:eax=44203229,ebx=43206f75,ecx=20205550,edx=54202020',
'0x80000004:eax=30303737,ebx=20402020,ecx=30342e32,edx=007a4847',
'0x80000005:eax=01ff01ff,ebx=01ff01ff,ecx=40020140,edx=40020140',
'0x80000006:eax=00000000,ebx=42004200,ecx=02008140,edx=00000000',
'0x80000007:eax=00000000,ebx=00000000,ecx=00000000,edx=00000000',
'0x80000008:eax=00003028,ebx=00000000,ecx=00000000,edx=00000000'
]
This makes the /proc/cpuinfo almost identical between KVM and Xen VMs
running Linux. Only exceptions are flags "rep_good" (which is missing
under Xen) and "eager_fpu" and "xsaveopt" (not seen under KVM), but as
these are not explicitly set by CPUID but are Linux-specific flags, they
shouldn't (?) matter on Windows VMs.
------------------
Anyway, even applying all of these patches would not alleviate Code 43.
To be more specific, all NVidia drivers up to 364.72 would BSOD on boot
(SYSTEM_SERVICE_EXCEPTION), and newer drivers (368.22+) would cause Code
43. This happens on both Windows 7 Pro and 8.1 VMs. Result on qemu-xen
and -traditional is identical. Dom0 is Qubes 3.1 (Linux 4.1.24), Xen
4.6.1. Hardware: Intel i7-5820K, Asrock X99 WS motherboard, 32GB Corsair
mem, EVGA GTX980.
I would love if some of you could try these patches with both newer and
older NVidia cards. Also any suggestions, ideas and further patches
would be greatly appreciated! :)
Thanks!
Best regards,
Marcus