Mailing List Archive

UEFI support on Dell boxes (was: Re: Status of 4.13)
On Fri, Nov 22, 2019 at 10:00:13PM -0800, Roman Shaposhnik wrote:
> 3. Bad news: Marek's suggestion didn't work on Dell product line (and yes
> I double checked that I built it correctly).
>
> So... when it comes to RC2 regression -- we're all good.
>
> But since we're here anyway -- I'm wondering if anyone would be
> interested in helping me figure out why Xen on those Dell boxes coredumps
> without efi=no-rs ?
>
> Marek, any chance I can interest you in helping me a bit here? ;-)

Yes, I am interested in helping with UEFI state there. Do you have by
a chance messages of that crash (without efi=no-rs, but with
EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial output is
available?

PS trimmed CC list as isn't really 'Status of 4.13' anymore.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Fri, Nov 22, 2019 at 10:00:13PM -0800, Roman Shaposhnik wrote:
> > 3. Bad news: Marek's suggestion didn't work on Dell product line (and yes
> > I double checked that I built it correctly).
> >
> > So... when it comes to RC2 regression -- we're all good.
> >
> > But since we're here anyway -- I'm wondering if anyone would be
> > interested in helping me figure out why Xen on those Dell boxes coredumps
> > without efi=no-rs ?
> >
> > Marek, any chance I can interest you in helping me a bit here? ;-)
>
> Yes, I am interested in helping with UEFI state there.

Thanks! That's very much appreciated!

Btw, I'll keep CCing xen-devel in case anyone else is interested in
this conversation.

> Do you have by
> a chance messages of that crash (without efi=no-rs, but with
> EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial output is
> available?

With my awesome soldering skills ;-) I managed to rig a serial console.

Output is attached. Please let me know if you'd like me to run any
other experiments.

Thanks,
Roman.
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote:
> On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> > Do you have by
> > a chance messages of that crash (without efi=no-rs, but with
> > EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial output is
> > available?
>
> With my awesome soldering skills ;-) I managed to rig a serial console.
>
> Output is attached. Please let me know if you'd like me to run any
> other experiments.

Looks helpful, lets try to do something:

> Xen 4.13.0-rc
> (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y Tue Nov 26 03:19:38 UTC 2019
> (XEN) Latest ChangeSet:
> (XEN) build-id: 07aa9f711fe09a91be2588ee7df10d93ebe34c80
> (XEN) Bootloader: GRUB 2.03
> (XEN) Command line: com1=115200,8n1 console=com1 loglvl=all noreboot dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false
(...)
> (XEN) EFI memory map:
(...)
> (XEN) 0000077587000-00000775f4fff type=5 attr=800000000000000f

This is code that crashes - runtime services code, so somewhere with
actual UEFI code.

(...)
> (XEN) 00000ff900000-00000ffffffff type=11 attr=8000000000000000
> (XEN) Unknown cachability for MFNs 0xff900-0xfffff

The faulting address is in this range. And because of unknown
cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen
cmdline.

(...)

> (XEN) Xen call trace:
> (XEN) [<00000000775e0d21>] R 00000000775e0d21
> (XEN) [<00000000775ddb8e>] S 00000000775ddb8e
> (XEN) [<0000000000000000>] F 0000000000000000
> (XEN) [<7fffffff00000000>] F 7fffffff00000000
> (XEN)
> (XEN) Pagetable walk from 00000000ff920020:
> (XEN) L4[0x000] = 00000000787c0063 ffffffffffffffff
> (XEN) L3[0x003] = 0000000071298063 ffffffffffffffff
> (XEN) L2[0x1fc] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 00000000ff920020
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)


--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote:
> > On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki
> > <marmarek@invisiblethingslab.com> wrote:
> > > Do you have by
> > > a chance messages of that crash (without efi=no-rs, but with
> > > EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial output is
> > > available?
> >
> > With my awesome soldering skills ;-) I managed to rig a serial console.
> >
> > Output is attached. Please let me know if you'd like me to run any
> > other experiments.
>
> Looks helpful, lets try to do something:
>
> > Xen 4.13.0-rc
> > (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y Tue Nov 26 03:19:38 UTC 2019
> > (XEN) Latest ChangeSet:
> > (XEN) build-id: 07aa9f711fe09a91be2588ee7df10d93ebe34c80
> > (XEN) Bootloader: GRUB 2.03
> > (XEN) Command line: com1=115200,8n1 console=com1 loglvl=all noreboot dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false
> (...)
> > (XEN) EFI memory map:
> (...)
> > (XEN) 0000077587000-00000775f4fff type=5 attr=800000000000000f
>
> This is code that crashes - runtime services code, so somewhere with
> actual UEFI code.

Yup -- that was my hunch with adding efi=no-rs option.

> (...)
> > (XEN) 00000ff900000-00000ffffffff type=11 attr=8000000000000000
> > (XEN) Unknown cachability for MFNs 0xff900-0xfffff
>
> The faulting address is in this range. And because of unknown
> cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen
> cmdline.

Feels like we're getting exactly the same failure. Log attached.

Thanks,
Roman.
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
Hi Marek, after applying Jan's patch I'm making much further progress.
Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
my end).

I'm attaching the logs from Xen and Dom0.

At this point it seems that adding efi=attr=uc is a better option for
these boxes than a wholesale efi=no-rs

Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
supposed to cover by default (so I don't have to add efi=attr=uc)?

Question #2: is there any downside to *always* specifying efi=attr=uc?
Even for servers that, strictly speaking, don't need it?

Thanks,
Roman.

On Mon, Nov 25, 2019 at 11:02 PM Roman Shaposhnik <roman@zededa.com> wrote:
>
> On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote:
> > > On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki
> > > <marmarek@invisiblethingslab.com> wrote:
> > > > Do you have by
> > > > a chance messages of that crash (without efi=no-rs, but with
> > > > EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial output is
> > > > available?
> > >
> > > With my awesome soldering skills ;-) I managed to rig a serial console.
> > >
> > > Output is attached. Please let me know if you'd like me to run any
> > > other experiments.
> >
> > Looks helpful, lets try to do something:
> >
> > > Xen 4.13.0-rc
> > > (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y Tue Nov 26 03:19:38 UTC 2019
> > > (XEN) Latest ChangeSet:
> > > (XEN) build-id: 07aa9f711fe09a91be2588ee7df10d93ebe34c80
> > > (XEN) Bootloader: GRUB 2.03
> > > (XEN) Command line: com1=115200,8n1 console=com1 loglvl=all noreboot dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false
> > (...)
> > > (XEN) EFI memory map:
> > (...)
> > > (XEN) 0000077587000-00000775f4fff type=5 attr=800000000000000f
> >
> > This is code that crashes - runtime services code, so somewhere with
> > actual UEFI code.
>
> Yup -- that was my hunch with adding efi=no-rs option.
>
> > (...)
> > > (XEN) 00000ff900000-00000ffffffff type=11 attr=8000000000000000
> > > (XEN) Unknown cachability for MFNs 0xff900-0xfffff
> >
> > The faulting address is in this range. And because of unknown
> > cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen
> > cmdline.
>
> Feels like we're getting exactly the same failure. Log attached.
>
> Thanks,
> Roman.
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
> Hi Marek, after applying Jan's patch I'm making much further progress.
> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
> my end).
>
> I'm attaching the logs from Xen and Dom0.
>
> At this point it seems that adding efi=attr=uc is a better option for
> these boxes than a wholesale efi=no-rs
>
> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
> supposed to cover by default (so I don't have to add efi=attr=uc)?

No, this looks like some different firmware (?) issue.

> Question #2: is there any downside to *always* specifying efi=attr=uc?
> Even for servers that, strictly speaking, don't need it?

TL;DR: It should be fine. It is what Linux does too.

Details:

Lets take a look why 'efi=attr=uc' helps, and how can we make it work
out of the box:

The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
attr=8000000000000000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
attribute is defined. For the record, defined attributes are (UEFI spec
.6):

EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
being configured as not cacheable.

EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
being configured as write combining.

EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
being configured as cacheable with a “write through” policy.
Writes that hit in the cache will also be written to main memory.

EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
being configured as cacheable with a “write back” policy. Reads
and writes that hit in the cache do not propagate to main memory.
Dirty data is written back to main memory when a new cache line
is allocated.

EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
being configured as not cacheable, exported, and supports the
“fetch and add” semaphore mechanism.

My reading of UEFI spec doesn't give much hints what to do with memory
mappings without any cachability attribute. The only related info I've
found is about EfiMemoryMappedIO:

This memory is not used by the OS. All system memory-mapped IO
information should come from ACPI tables.

So, maybe there is some more info?

Anyway, if I understand correctly, MMIO region should be mapped as UC,
right?

I've also taken look at what Linux does. And basically, the only bit
Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
as uncachable (page cache disabled bit in page table entry). So,
basically Linux by default does what Xen's efi=attr=uc does.

So, to improve Xen's hardware/firmware compatibility, I have two ideas:

1. Make efi=attr=uc the default (it's still possible to disable it with
efi=attr=no).

2. Map type=11 (MMIO) as UC, unless attributes specify otherwise.

Any preference? I can prepare a patch for either version. But I guess
it's too late for getting it into 4.13.

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
> > Hi Marek, after applying Jan's patch I'm making much further progress.
> > Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
> > my end).
> >
> > I'm attaching the logs from Xen and Dom0.
> >
> > At this point it seems that adding efi=attr=uc is a better option for
> > these boxes than a wholesale efi=no-rs
> >
> > Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
> > supposed to cover by default (so I don't have to add efi=attr=uc)?
>
> No, this looks like some different firmware (?) issue.
>
> > Question #2: is there any downside to *always* specifying efi=attr=uc?
> > Even for servers that, strictly speaking, don't need it?
>
> TL;DR: It should be fine. It is what Linux does too.
>
> Details:
>
> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
> out of the box:
>
> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
> attr=8000000000000000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
> attribute is defined. For the record, defined attributes are (UEFI spec
> .6):
>
> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
> being configured as not cacheable.
>
> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
> being configured as write combining.
>
> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
> being configured as cacheable with a “write through” policy.
> Writes that hit in the cache will also be written to main memory.
>
> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
> being configured as cacheable with a “write back” policy. Reads
> and writes that hit in the cache do not propagate to main memory.
> Dirty data is written back to main memory when a new cache line
> is allocated.
>
> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
> being configured as not cacheable, exported, and supports the
> “fetch and add” semaphore mechanism.
>
> My reading of UEFI spec doesn't give much hints what to do with memory
> mappings without any cachability attribute. The only related info I've
> found is about EfiMemoryMappedIO:
>
> This memory is not used by the OS. All system memory-mapped IO
> information should come from ACPI tables.
>
> So, maybe there is some more info?
>
> Anyway, if I understand correctly, MMIO region should be mapped as UC,
> right?
>
> I've also taken look at what Linux does. And basically, the only bit
> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
> as uncachable (page cache disabled bit in page table entry). So,
> basically Linux by default does what Xen's efi=attr=uc does.

Very interesting! Thanks for doing the research.

> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>
> 1. Make efi=attr=uc the default (it's still possible to disable it with
> efi=attr=no).

I'd be very much in favor of that too (especially since it seems to match
Linux behaviour) What do others think?

> 2. Map type=11 (MMIO) as UC, unless attributes specify otherwise.

This seems to be the subset of the #1 option. As such -- perhaps it
is "safer" than a wholesale efi=attr=uc but at the same time Linux
behaviour gives me pretty good confidence that we should probably
be safe, no?

> Any preference? I can prepare a patch for either version. But I guess
> it's too late for getting it into 4.13.

Good question as well.

Thanks,
Roman.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On 26/11/2019 20:12, Roman Shaposhnik wrote:
> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
>> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
>>> Hi Marek, after applying Jan's patch I'm making much further progress.
>>> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
>>> my end).
>>>
>>> I'm attaching the logs from Xen and Dom0.
>>>
>>> At this point it seems that adding efi=attr=uc is a better option for
>>> these boxes than a wholesale efi=no-rs
>>>
>>> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
>>> supposed to cover by default (so I don't have to add efi=attr=uc)?
>> No, this looks like some different firmware (?) issue.
>>
>>> Question #2: is there any downside to *always* specifying efi=attr=uc?
>>> Even for servers that, strictly speaking, don't need it?
>> TL;DR: It should be fine. It is what Linux does too.
>>
>> Details:
>>
>> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>> out of the box:
>>
>> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>> attr=8000000000000000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>> attribute is defined. For the record, defined attributes are (UEFI spec
>> .6):
>>
>> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>> being configured as not cacheable.
>>
>> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>> being configured as write combining.
>>
>> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>> being configured as cacheable with a “write through” policy.
>> Writes that hit in the cache will also be written to main memory.
>>
>> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>> being configured as cacheable with a “write back” policy. Reads
>> and writes that hit in the cache do not propagate to main memory.
>> Dirty data is written back to main memory when a new cache line
>> is allocated.
>>
>> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>> being configured as not cacheable, exported, and supports the
>> “fetch and add” semaphore mechanism.
>>
>> My reading of UEFI spec doesn't give much hints what to do with memory
>> mappings without any cachability attribute. The only related info I've
>> found is about EfiMemoryMappedIO:
>>
>> This memory is not used by the OS. All system memory-mapped IO
>> information should come from ACPI tables.
>>
>> So, maybe there is some more info?
>>
>> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>> right?
>>
>> I've also taken look at what Linux does. And basically, the only bit
>> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>> as uncachable (page cache disabled bit in page table entry). So,
>> basically Linux by default does what Xen's efi=attr=uc does.
> Very interesting! Thanks for doing the research.
>
>> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>>
>> 1. Make efi=attr=uc the default (it's still possible to disable it with
>> efi=attr=no).
> I'd be very much in favor of that too (especially since it seems to match
> Linux behaviour) What do others think?

Its more than just this.  Linux also doesn't use EFI reboot because it
is broken almost everywhere (because Windows doesn't use it because its
broken almost everywhere, so it never gets fixed).

Xen should be following Linux, but I'm exhausted arguing this point.

A consequence is that downstream tend to share a pile of "unbreak Xen on
UEFI" patches which have been rejected upstream on philosophical rather
than technical grounds, despite this being a toxic environment to work in.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Nov 26, 2019, at 15:23, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> ?On 26/11/2019 20:12, Roman Shaposhnik wrote:
>>> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>>> <marmarek@invisiblethingslab.com> wrote:
>>> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
>>>> Hi Marek, after applying Jan's patch I'm making much further progress.
>>>> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
>>>> my end).
>>>> I'm attaching the logs from Xen and Dom0.
>>>> At this point it seems that adding efi=attr=uc is a better option for
>>>> these boxes than a wholesale efi=no-rs
>>>> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
>>>> supposed to cover by default (so I don't have to add efi=attr=uc)?
>>> No, this looks like some different firmware (?) issue.
>>>> Question #2: is there any downside to *always* specifying efi=attr=uc?
>>>> Even for servers that, strictly speaking, don't need it?
>>> TL;DR: It should be fine. It is what Linux does too.
>>> Details:
>>> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>>> out of the box:
>>> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>>> attr=8000000000000000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>>> attribute is defined. For the record, defined attributes are (UEFI spec
>>> .6):
>>> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>>> being configured as not cacheable.
>>> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>>> being configured as write combining.
>>> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>>> being configured as cacheable with a “write through” policy.
>>> Writes that hit in the cache will also be written to main memory.
>>> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>>> being configured as cacheable with a “write back” policy. Reads
>>> and writes that hit in the cache do not propagate to main memory.
>>> Dirty data is written back to main memory when a new cache line
>>> is allocated.
>>> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>>> being configured as not cacheable, exported, and supports the
>>> “fetch and add” semaphore mechanism.
>>> My reading of UEFI spec doesn't give much hints what to do with memory
>>> mappings without any cachability attribute. The only related info I've
>>> found is about EfiMemoryMappedIO:
>>> This memory is not used by the OS. All system memory-mapped IO
>>> information should come from ACPI tables.
>>> So, maybe there is some more info?
>>> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>>> right?
>>> I've also taken look at what Linux does. And basically, the only bit
>>> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>>> as uncachable (page cache disabled bit in page table entry). So,
>>> basically Linux by default does what Xen's efi=attr=uc does.
>> Very interesting! Thanks for doing the research.
>>
>>> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>>> 1. Make efi=attr=uc the default (it's still possible to disable it with
>>> efi=attr=no).
>> I'd be very much in favor of that too (especially since it seems to match
>> Linux behaviour) What do others think?
>
> Its more than just this. Linux also doesn't use EFI reboot because it
> is broken almost everywhere (because Windows doesn't use it because its
> broken almost everywhere, so it never gets fixed).
>
> Xen should be following Linux, but I'm exhausted arguing this point.
>
> A consequence is that downstream tend to share a pile of "unbreak Xen on
> UEFI" patches which have been rejected upstream on philosophical rather
> than technical grounds, despite this being a toxic environment to work in.

As an intermediate step, could we have an umbrella opt-in Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for maximum hardware compatibility? For this thread and Xen 4.13, that would be EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more options/quirks are added in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get them by default.

The long-term solution is an OSS virtualization-security test tool (e.g. with Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on pre-production firmware and hardware. That is the most OEM-actionable development window where firmware quality issues can be detected and fixed. Microsoft's hardware logo/certification work with Windows 10 OEMs on "secured core" features is also tackling firmware improvements for virtualization-based security.

From the business side, Dell/HP/Lenovo + other OEMs and ODMs could add premium "FirmCare" SKUs into their custom build ordering systems, where customers could pay a small fee for additional firmware support, custom root-of-trust (e.g. BootGuard) key management, or even coreboot. This could move from cost-center incentives [1] to high-margin incentives [2] for firmware and platform health, safety & security. Another step would be including firmware requirements in supply chain contracts [3] for large customer orders.

While we wait on these ecosystem improvements, CONFIG_EFI_NONSPEC_COMPATIBILITY or a similar option for Xen 4.13 would help users of existing platforms.

Rich


[1] Firmware is the new Software, https://www.platformsecuritysummit.com/2018/speaker/hudson/

[2] https://i.blackhat.com/USA-19/Thursday/us-19-Krstic-Behind-The-Scenes-Of-IOS-And-Mas-Security.pdf

[3] "Humans" videos and Q&A, https://www.platformsecuritysummit.com/2019/videos/
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Tue, Nov 26, 2019 at 1:20 PM Rich Persaud <persaur@gmail.com> wrote:
>
> On Nov 26, 2019, at 15:23, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>
>
> ?On 26/11/2019 20:12, Roman Shaposhnik wrote:
>
> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>
> <marmarek@invisiblethingslab.com> wrote:
>
> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
>
> Hi Marek, after applying Jan's patch I'm making much further progress.
>
> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
>
> my end).
>
>
> I'm attaching the logs from Xen and Dom0.
>
>
> At this point it seems that adding efi=attr=uc is a better option for
>
> these boxes than a wholesale efi=no-rs
>
>
> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
>
> supposed to cover by default (so I don't have to add efi=attr=uc)?
>
> No, this looks like some different firmware (?) issue.
>
>
> Question #2: is there any downside to *always* specifying efi=attr=uc?
>
> Even for servers that, strictly speaking, don't need it?
>
> TL;DR: It should be fine. It is what Linux does too.
>
>
> Details:
>
>
> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>
> out of the box:
>
>
> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>
> attr=8000000000000000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>
> attribute is defined. For the record, defined attributes are (UEFI spec
>
> .6):
>
>
> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>
> being configured as not cacheable.
>
>
> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>
> being configured as write combining.
>
>
> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>
> being configured as cacheable with a “write through” policy.
>
> Writes that hit in the cache will also be written to main memory.
>
>
> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>
> being configured as cacheable with a “write back” policy. Reads
>
> and writes that hit in the cache do not propagate to main memory.
>
> Dirty data is written back to main memory when a new cache line
>
> is allocated.
>
>
> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>
> being configured as not cacheable, exported, and supports the
>
> “fetch and add” semaphore mechanism.
>
>
> My reading of UEFI spec doesn't give much hints what to do with memory
>
> mappings without any cachability attribute. The only related info I've
>
> found is about EfiMemoryMappedIO:
>
>
> This memory is not used by the OS. All system memory-mapped IO
>
> information should come from ACPI tables.
>
>
> So, maybe there is some more info?
>
>
> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>
> right?
>
>
> I've also taken look at what Linux does. And basically, the only bit
>
> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>
> as uncachable (page cache disabled bit in page table entry). So,
>
> basically Linux by default does what Xen's efi=attr=uc does.
>
> Very interesting! Thanks for doing the research.
>
>
> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>
>
> 1. Make efi=attr=uc the default (it's still possible to disable it with
>
> efi=attr=no).
>
> I'd be very much in favor of that too (especially since it seems to match
>
> Linux behaviour) What do others think?
>
>
> Its more than just this. Linux also doesn't use EFI reboot because it
> is broken almost everywhere (because Windows doesn't use it because its
> broken almost everywhere, so it never gets fixed).
>
> Xen should be following Linux, but I'm exhausted arguing this point.
>
> A consequence is that downstream tend to share a pile of "unbreak Xen on
> UEFI" patches which have been rejected upstream on philosophical rather
> than technical grounds, despite this being a toxic environment to work in.
>
>
> As an intermediate step, could we have an umbrella opt-in Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for maximum hardware compatibility? For this thread and Xen 4.13, that would be EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more options/quirks are added in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get them by default.

As one of those downstream users I have to say I like this A LOT!

> The long-term solution is an OSS virtualization-security test tool (e.g. with Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on pre-production firmware and hardware. That is the most OEM-actionable development window where firmware quality issues can be detected and fixed. Microsoft's hardware logo/certification work with Windows 10 OEMs on "secured core" features is also tackling firmware improvements for virtualization-based security.

That's a good proposal, but the question, as always becomes who moves
the needle on this one so we avoid a sort of "tragedy of the commons"
type of situation.

Now, I'm not even talking about writing (and maintaining!) the actual
code -- but rather all the BD activities that would have to take place
to make it a reality. This actually may be a good question to ask
Linux Foundation since I've seen them be helpful in situations like
this.

> From the business side, Dell/HP/Lenovo + other OEMs and ODMs could add premium "FirmCare" SKUs into their custom build ordering systems, where customers could pay a small fee for additional firmware support, custom root-of-trust (e.g. BootGuard) key management, or even coreboot. This could move from cost-center incentives [1] to high-margin incentives [2] for firmware and platform health, safety & security. Another step would be including firmware requirements in supply chain contracts [3] for large customer orders.

Yup! I could see this as well!

> While we wait on these ecosystem improvements, CONFIG_EFI_NONSPEC_COMPATIBILITY or a similar option for Xen 4.13 would help users of existing platforms.

Right -- because at the end of the day -- as I am discovering now,
there seems to be a non-trivial downstream constituency "curating"
those types of patches in separate silos (Project EVE included) it
would be great to at least have one central bucket (even if
non-default and protect by XXX_OPTION) for these patches to be curated
-- and that's upstream Xen.

Thanks,
Roman.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes [ In reply to ]
On 26.11.2019 19:32, Marek Marczykowski-Górecki wrote:
> Anyway, if I understand correctly, MMIO region should be mapped as UC,
> right?

While MMIO typically would want to be UC, there are clearly cases
where they'd better be WC, and there may even be cases where they
want to be WT, WP, or WB. Hence the lack of firmware indication is
a problem even for this specific memory type.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes [ In reply to ]
On 26.11.2019 21:18, Andrew Cooper wrote:
> On 26/11/2019 20:12, Roman Shaposhnik wrote:
>> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>>> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>>>
>>> 1. Make efi=attr=uc the default (it's still possible to disable it with
>>> efi=attr=no).
>> I'd be very much in favor of that too (especially since it seems to match
>> Linux behaviour) What do others think?
>
> Its more than just this.  Linux also doesn't use EFI reboot because it
> is broken almost everywhere (because Windows doesn't use it because its
> broken almost everywhere, so it never gets fixed).
>
> Xen should be following Linux, but I'm exhausted arguing this point.

Where it makes sense, yes. But there are cases where it doesn't (we
don't, for example, want to blindly inherit bugs). Nor do I see why
Linux should be the only possible reference. If other OSes work
around issues in a better way than Linux does, why should we follow
Linux rather than such alternative implementation?

> A consequence is that downstream tend to share a pile of "unbreak Xen on
> UEFI" patches which have been rejected upstream on philosophical rather
> than technical grounds, despite this being a toxic environment to work in.

We'll get out of this recurring debate only if you or anyone else
propose to have someone other than me be the UEFI code maintainer.
No matter that you call them philosophical rather than technical
arguments, I continue to be of the firm opinion that workarounds
for all sorts of things are acceptable, but shouldn't impact in
any way systems adhering to standards. (It is probably [bad] luck
that I've not myself been severely impacted by UEFI implementation
issues with any of the boxes I routinely test on.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On 26.11.2019 22:20, Rich Persaud wrote:
> As an intermediate step, could we have an umbrella opt-in
> Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that
> enables multiple EFI options for maximum hardware compatibility?
> For this thread and Xen 4.13, that would be
> EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more
> options/quirks are added in the future, downstreams using
> EFI_NONSPEC_COMPATIBILITY would get them by default.

While I don't particularly like it, I'd be okay with having such
an option, provided it doesn't hamper code readability too much.
However - why would you stop at those two things? Why not also
exclude reboot through UEFI (as indicated by Andrew), or use of
runtime services as a whole? What about /mapbs? The fundamental
problem I see here really is - where would we draw the line?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
On Wed, Nov 27, 2019 at 10:14:56AM +0100, Jan Beulich wrote:
> On 26.11.2019 22:20, Rich Persaud wrote:
> > As an intermediate step, could we have an umbrella opt-in
> > Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that
> > enables multiple EFI options for maximum hardware compatibility?
> > For this thread and Xen 4.13, that would be
> > EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more
> > options/quirks are added in the future, downstreams using
> > EFI_NONSPEC_COMPATIBILITY would get them by default.
>
> While I don't particularly like it, I'd be okay with having such
> an option, provided it doesn't hamper code readability too much.
> However - why would you stop at those two things? Why not also
> exclude reboot through UEFI (as indicated by Andrew), or use of
> runtime services as a whole? What about /mapbs? The fundamental
> problem I see here really is - where would we draw the line?

Yes, it isn't easy to draw that line for all the downstream projects at
once. For example it looks like efi=no-rs is an acceptable compromise
for Project EVE, while it isn't for Qubes or OpenXT. But moving from
"apply this set of patches" to "enable those options" would be an
improvement.

Ideally Xen should work out of the box on as many boxes as possible. If
that means enabling some workarounds by default, I'm fine with it
(unless it _severely_ impact other configurations). In Qubes we struggle
with hardware compatibility because of large variety of client hardware,
firmware and configuration. Whatever we say here, in the end it boils
down to "does project X work on my hardware?". Not sure about other Xen
use cases, but we prefer to have the answer "yes", whenever it's
reasonably possible. I think enabling efi=attr=uc and
EFI_SET_VIRTUAL_ADDRESS_MAP by default is a reasonable approach.
Defaulting to a different reboot method may be too, but I haven't seen
too many machines impacted by this particular issue. Maybe because
Xen+UEFI breaks much earlier there.

FWIW we do enable efi=attr=uc, /mapbs and /noexitboot by default (until
EFI_SET_VIRTUAL_ADDRESS_MAP was added).

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Re: UEFI support on Dell boxes (was: Re: Status of 4.13) [ In reply to ]
?On Nov 27, 2019, at 04:16, Jan Beulich <JBeulich@suse.com> wrote:
>
> ?On 26.11.2019 22:20, Rich Persaud wrote:
>> As an intermediate step, could we have an umbrella opt-in
>> Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that
>> enables multiple EFI options for maximum hardware compatibility?
>> For this thread and Xen 4.13, that would be
>> EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more
>> options/quirks are added in the future, downstreams using
>> EFI_NONSPEC_COMPATIBILITY would get them by default.
>
> While I don't particularly like it, I'd be okay with having such
> an option, provided it doesn't hamper code readability too much.
> However - why would you stop at those two things? Why not also
> exclude reboot through UEFI (as indicated by Andrew), or use of
> runtime services as a whole? What about /mapbs? The fundamental
> problem I see here really is - where would we draw the line?

If we take this thread as an example, a middle ground was found among developers motivated to maintain the workarounds for downstream projects with affected hardware. Qubes, EVE & OpenXT are used on edge/client devices that often have (relative to servers) a shorter lifetime, with more device churn and support costs.

These two initial options would address current pain points and enable the use of upstream Xen + EFI RS on more devices, e.g. for OTA updates with forward-sealed integrity measurements. The line could change if more downstreams adopt the option and/or new devices appear that have both customer adoption and problematic firmware behavior.

Rich
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel