Mailing List Archive

1 2 3 4 5  View All
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>
>
> > On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> Finally, security considerations that apply irrespective of whether the
> >> platform is confidential or not are also outside of the scope of this
> >> document. This includes topics ranging from timing attacks to social
> >> engineering.
> >
> > Why are timing attacks by hypervisor on the guest out of scope?
>
> Good point.
>
> I was thinking that mitigation against timing attacks is the same
> irrespective of the source of the attack. However, because the HV
> controls CPU time allocation, there are presumably attacks that
> are made much easier through the HV. Those should be listed.

Not just that, also because it can and does emulate some devices.
For example, are disk encryption systems protected against timing of
disk accesses?
This is why some people keep saying "forget about emulated devices, require
passthrough, include devices in the trust zone".

> >
> >> </doc>
> >>
> >> Feel free to comment and reword at will ;-)
> >>
> >>
> >> 3/ PCI-as-a-threat: where does that come from
> >>
> >> Isn't there a fundamental difference, from a threat model perspective,
> >> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> should defeat) and compromised software feeding us bad data? I think there
> >> is: at leats inside the TCB, we can detect bad software using measurements,
> >> and prevent it from running using attestation. In other words, we first
> >> check what we will run, then we run it. The security there is that we know
> >> what we are running. The trust we have in the software is from testing,
> >> reviewing or using it.
> >>
> >> This relies on a key aspect provided by TDX and SEV, which is that the
> >> software being measured is largely tamper-resistant thanks to memory
> >> encryption. In other words, after you have measured your guest software
> >> stack, the host or hypervisor cannot willy-nilly change it.
> >>
> >> So this brings me to the next question: is there any way we could offer the
> >> same kind of service for KVM and qemu? The measurement part seems relatively
> >> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> me. But maybe someone else will have a brilliant idea?
> >>
> >> So I'm asking the question, because if you could somehow prove to the guest
> >> not only that it's running the right guest stack (as we can do today) but
> >> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> this is something which is evidently easier to deal with.
> >
> > Agree absolutely that's much easier.
> >
> >> I briefly discussed this with James, and he pointed out two interesting
> >> aspects of that question:
> >>
> >> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >> care about either virtio devices, or physical ones being passed through
> >> to the guest. Let's assume physical ones can be trusted, see above.
> >> That leaves virtio devices. How much damage can a malicious virtio device
> >> do to the guest kernel, and can this lead to secrets being leaked?
> >>
> >> 2/ He was not as negative as I anticipated on the possibility of somehow
> >> being able to prevent tampering of the guest. One example he mentioned is
> >> a research paper [1] about running the hypervisor itself inside an
> >> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >> with TDX using secure enclaves or some other mechanism?
> >
> > Or even just secureboot based root of trust?
>
> You mean host secureboot? Or guest?
>
> If it’s host, then the problem is detecting malicious tampering with
> host code (whether it’s kernel or hypervisor).

Host. Lots of existing systems do this. As an extreme boot a RO disk,
limit which packages are allowed.

> If it’s guest, at the moment at least, the measurements do not extend
> beyond the TCB.
>
> >
> > --
> > MST
> >
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>>
>>> Why are timing attacks by hypervisor on the guest out of scope?
>>
>> Good point.
>>
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
>
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".
>
>>>
>>>> </doc>
>>>>
>>>> Feel free to comment and reword at will ;-)
>>>>
>>>>
>>>> 3/ PCI-as-a-threat: where does that come from
>>>>
>>>> Isn't there a fundamental difference, from a threat model perspective,
>>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>>>> should defeat) and compromised software feeding us bad data? I think there
>>>> is: at leats inside the TCB, we can detect bad software using measurements,
>>>> and prevent it from running using attestation. In other words, we first
>>>> check what we will run, then we run it. The security there is that we know
>>>> what we are running. The trust we have in the software is from testing,
>>>> reviewing or using it.
>>>>
>>>> This relies on a key aspect provided by TDX and SEV, which is that the
>>>> software being measured is largely tamper-resistant thanks to memory
>>>> encryption. In other words, after you have measured your guest software
>>>> stack, the host or hypervisor cannot willy-nilly change it.
>>>>
>>>> So this brings me to the next question: is there any way we could offer the
>>>> same kind of service for KVM and qemu? The measurement part seems relatively
>>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>>>> me. But maybe someone else will have a brilliant idea?
>>>>
>>>> So I'm asking the question, because if you could somehow prove to the guest
>>>> not only that it's running the right guest stack (as we can do today) but
>>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>>>> this is something which is evidently easier to deal with.
>>>
>>> Agree absolutely that's much easier.
>>>
>>>> I briefly discussed this with James, and he pointed out two interesting
>>>> aspects of that question:
>>>>
>>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>>>> care about either virtio devices, or physical ones being passed through
>>>> to the guest. Let's assume physical ones can be trusted, see above.
>>>> That leaves virtio devices. How much damage can a malicious virtio device
>>>> do to the guest kernel, and can this lead to secrets being leaked?
>>>>
>>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>>>> being able to prevent tampering of the guest. One example he mentioned is
>>>> a research paper [1] about running the hypervisor itself inside an
>>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>>>> with TDX using secure enclaves or some other mechanism?
>>>
>>> Or even just secureboot based root of trust?
>>
>> You mean host secureboot? Or guest?
>>
>> If it’s host, then the problem is detecting malicious tampering with
>> host code (whether it’s kernel or hypervisor).
>
> Host. Lots of existing systems do this. As an extreme boot a RO disk,
> limit which packages are allowed.

Is that provable to the guest?

Consider a cloud provider doing that: how do they prove to their guest:

a) What firmware, kernel and kvm they run

b) That what they booted cannot be maliciouly modified, e.g. by a rogue
device driver installed by a rogue sysadmin

My understanding is that SecureBoot is only intended to prevent non-verified
operating systems from booting. So the proof is given to the cloud provider,
and the proof is that the system boots successfully.

After that, I think all bets are off. SecureBoot does little AFAICT
to prevent malicious modifications of the running system by someone with
root access, including deliberately loading a malicious kvm-zilog.ko

It does not mean it cannot be done, just that I don’t think we
have the tools at the moment.

>
>> If it’s guest, at the moment at least, the measurements do not extend
>> beyond the TCB.
>>
>>>
>>> --
>>> MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>
>
> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >>
> >>
> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>
> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >>>> Finally, security considerations that apply irrespective of whether the
> >>>> platform is confidential or not are also outside of the scope of this
> >>>> document. This includes topics ranging from timing attacks to social
> >>>> engineering.
> >>>
> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >>
> >> Good point.
> >>
> >> I was thinking that mitigation against timing attacks is the same
> >> irrespective of the source of the attack. However, because the HV
> >> controls CPU time allocation, there are presumably attacks that
> >> are made much easier through the HV. Those should be listed.
> >
> > Not just that, also because it can and does emulate some devices.
> > For example, are disk encryption systems protected against timing of
> > disk accesses?
> > This is why some people keep saying "forget about emulated devices, require
> > passthrough, include devices in the trust zone".
> >
> >>>
> >>>> </doc>
> >>>>
> >>>> Feel free to comment and reword at will ;-)
> >>>>
> >>>>
> >>>> 3/ PCI-as-a-threat: where does that come from
> >>>>
> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >>>> should defeat) and compromised software feeding us bad data? I think there
> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >>>> and prevent it from running using attestation. In other words, we first
> >>>> check what we will run, then we run it. The security there is that we know
> >>>> what we are running. The trust we have in the software is from testing,
> >>>> reviewing or using it.
> >>>>
> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >>>> software being measured is largely tamper-resistant thanks to memory
> >>>> encryption. In other words, after you have measured your guest software
> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >>>>
> >>>> So this brings me to the next question: is there any way we could offer the
> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >>>> me. But maybe someone else will have a brilliant idea?
> >>>>
> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >>>> not only that it's running the right guest stack (as we can do today) but
> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >>>> this is something which is evidently easier to deal with.
> >>>
> >>> Agree absolutely that's much easier.
> >>>
> >>>> I briefly discussed this with James, and he pointed out two interesting
> >>>> aspects of that question:
> >>>>
> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >>>> care about either virtio devices, or physical ones being passed through
> >>>> to the guest. Let's assume physical ones can be trusted, see above.
> >>>> That leaves virtio devices. How much damage can a malicious virtio device
> >>>> do to the guest kernel, and can this lead to secrets being leaked?
> >>>>
> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >>>> being able to prevent tampering of the guest. One example he mentioned is
> >>>> a research paper [1] about running the hypervisor itself inside an
> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >>>> with TDX using secure enclaves or some other mechanism?
> >>>
> >>> Or even just secureboot based root of trust?
> >>
> >> You mean host secureboot? Or guest?
> >>
> >> If it’s host, then the problem is detecting malicious tampering with
> >> host code (whether it’s kernel or hypervisor).
> >
> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
> > limit which packages are allowed.
>
> Is that provable to the guest?
>
> Consider a cloud provider doing that: how do they prove to their guest:
>
> a) What firmware, kernel and kvm they run
>
> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
> device driver installed by a rogue sysadmin
>
> My understanding is that SecureBoot is only intended to prevent non-verified
> operating systems from booting. So the proof is given to the cloud provider,
> and the proof is that the system boots successfully.

I think I should have said measured boot not secure boot.

>
> After that, I think all bets are off. SecureBoot does little AFAICT
> to prevent malicious modifications of the running system by someone with
> root access, including deliberately loading a malicious kvm-zilog.ko

So disable module loading then or don't allow root access?

>
> It does not mean it cannot be done, just that I don’t think we
> have the tools at the moment.

Phones, chromebooks do this all the time ...

> >
> >> If it’s guest, at the moment at least, the measurements do not extend
> >> beyond the TCB.
> >>
> >>>
> >>> --
> >>> MST
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>
>> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >
>> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>> >>
>> >>
>> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>>
>> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> >>>> Finally, security considerations that apply irrespective of whether the
>> >>>> platform is confidential or not are also outside of the scope of this
>> >>>> document. This includes topics ranging from timing attacks to social
>> >>>> engineering.
>> >>>
>> >>> Why are timing attacks by hypervisor on the guest out of scope?
>> >>
>> >> Good point.
>> >>
>> >> I was thinking that mitigation against timing attacks is the same
>> >> irrespective of the source of the attack. However, because the HV
>> >> controls CPU time allocation, there are presumably attacks that
>> >> are made much easier through the HV. Those should be listed.
>> >
>> > Not just that, also because it can and does emulate some devices.
>> > For example, are disk encryption systems protected against timing of
>> > disk accesses?
>> > This is why some people keep saying "forget about emulated devices, require
>> > passthrough, include devices in the trust zone".
>> >
>> >>>
>> >>>> </doc>
>> >>>>
>> >>>> Feel free to comment and reword at will ;-)
>> >>>>
>> >>>>
>> >>>> 3/ PCI-as-a-threat: where does that come from
>> >>>>
>> >>>> Isn't there a fundamental difference, from a threat model perspective,
>> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> >>>> should defeat) and compromised software feeding us bad data? I think there
>> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
>> >>>> and prevent it from running using attestation. In other words, we first
>> >>>> check what we will run, then we run it. The security there is that we know
>> >>>> what we are running. The trust we have in the software is from testing,
>> >>>> reviewing or using it.
>> >>>>
>> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
>> >>>> software being measured is largely tamper-resistant thanks to memory
>> >>>> encryption. In other words, after you have measured your guest software
>> >>>> stack, the host or hypervisor cannot willy-nilly change it.
>> >>>>
>> >>>> So this brings me to the next question: is there any way we could offer the
>> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
>> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> >>>> me. But maybe someone else will have a brilliant idea?
>> >>>>
>> >>>> So I'm asking the question, because if you could somehow prove to the guest
>> >>>> not only that it's running the right guest stack (as we can do today) but
>> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> >>>> this is something which is evidently easier to deal with.
>> >>>
>> >>> Agree absolutely that's much easier.
>> >>>
>> >>>> I briefly discussed this with James, and he pointed out two interesting
>> >>>> aspects of that question:
>> >>>>
>> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> >>>> care about either virtio devices, or physical ones being passed through
>> >>>> to the guest. Let's assume physical ones can be trusted, see above.
>> >>>> That leaves virtio devices. How much damage can a malicious virtio device
>> >>>> do to the guest kernel, and can this lead to secrets being leaked?
>> >>>>
>> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> >>>> being able to prevent tampering of the guest. One example he mentioned is
>> >>>> a research paper [1] about running the hypervisor itself inside an
>> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> >>>> with TDX using secure enclaves or some other mechanism?
>> >>>
>> >>> Or even just secureboot based root of trust?
>> >>
>> >> You mean host secureboot? Or guest?
>> >>
>> >> If it’s host, then the problem is detecting malicious tampering with
>> >> host code (whether it’s kernel or hypervisor).
>> >
>> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
>> > limit which packages are allowed.
>>
>> Is that provable to the guest?
>>
>> Consider a cloud provider doing that: how do they prove to their guest:
>>
>> a) What firmware, kernel and kvm they run
>>
>> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
>> device driver installed by a rogue sysadmin
>>
>> My understanding is that SecureBoot is only intended to prevent non-verified
>> operating systems from booting. So the proof is given to the cloud provider,
>> and the proof is that the system boots successfully.
>
> I think I should have said measured boot not secure boot.

The problem again is how you prove to the guest that you are not lying?

We know how to do that from a guest [1], but you will note that in the
normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
proves the validity of the measurements of the TCB by encrypting it with an
attestation signing key derived from some chip-unique secret. For AMD, this
is called the VCEK, and TDX has something similar. In the case of SEV, this
goes through firmware, and you have to tell the firmware each time you
insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
to a VM execution context. I do not believe there is any provision to do the
same thing to measure host data. And again, it would be somewhat pointless
if there isn't also a mechanism to ensure the host data is not changed after
the measurement.

Now, I don't think it would be super-difficult to add a firmware service
that would let the host do some kind of equivalent to PVALIDATE, setting
some physical pages aside that then get measured and become inaccessible to
the host. The PSP or similar could then integrate these measurements as part
of the TCB, and the fact that the pages were "transferred" to this special
invariant block would ensure the guests that the code will not change after
being measured.

I am not aware that such a mechanism exists on any of the existing CC
platforms. Please feel free to enlighten me if I'm wrong.

[1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
>
>>
>> After that, I think all bets are off. SecureBoot does little AFAICT
>> to prevent malicious modifications of the running system by someone with
>> root access, including deliberately loading a malicious kvm-zilog.ko
>
> So disable module loading then or don't allow root access?

Who would do that?

The problem is that we have a host and a tenant, and the tenant does not
trust the host in principle. So it is not sufficient for the host to disable
module loading or carefully control root access. It is also necessary to
prove to the tenant(s) that this was done.

>
>>
>> It does not mean it cannot be done, just that I don’t think we
>> have the tools at the moment.
>
> Phones, chromebooks do this all the time ...

Indeed, but there, this is to prove to the phone's real owner (which,
surprise, is not the naive person who thought they'd get some kind of
ownership by buying the phone) that the software running on the phone has
not been replaced by some horribly jailbreaked goo.

In other words, the user of the phone gets no proof whatsoever of anything,
except that the phone appears to work. This is somewhat the situation in the
cloud today: the owners of the hardware get all sorts of useful checks, from
SecureBoot to error-correction for memory or I/O devices. However, someone
running in a VM on the cloud gets none of that, just like the user of your
phone.

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
? 2023/2/1 19:01, Michael S. Tsirkin ??:
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>> Why are timing attacks by hypervisor on the guest out of scope?
>> Good point.
>>
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".


One problem is that the device could be yet another emulated one that is
running in the SmartNIC/DPU itself.

Thanks
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com>
> wrote...
> > Hi Dinechin,
>
> Nit: My first name is actually Christophe ;-)

I am sorry, my automation of extracting names from emails failed here ((

>
> [snip]
>
> >> "The implementation of the #VE handler is simple and does not require an
> >> in-depth security audit or fuzzing since it is not the actual consumer of
> >> the host/VMM supplied untrusted data": The assumption there seems to be
> that
> >> the host will never be able to supply data (e.g. through a bounce buffer)
> >> that it can trick the guest into executing. If that is indeed the
> >> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
> >> since many earlier attacks were based on executing the wrong code. Notably,
> >> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
> >> key (as opposed to any device key e.g. for PCI encryption) in either
> >> TDX or SEV. Is there for example anything that precludes TDX or SEV from
> >> executing code in the bounce buffers?
> >
> > This was already replied by Kirill, any code execution out of shared memory
> generates
> > a #GP.
>
> Apologies for my wording. Everyone interpreted "executing" as "executing
> directly on the bounce buffer page", when what I meant is "consuming data
> fetched from the bounce buffers as code" (not necessarily directly).

I guess in theory it is possible, but we have not seen such usages in guest kernel code
in practice during our audit. This would be pretty ugly thing to do imo even if
you forget about confidential computing.


>
> For example, in the diagram in your document, the guest kernel is a
> monolithic piece. In reality, there are dynamically loaded components. In
> the original SEV implementation, with pre-attestation, the measurement could
> only apply before loading any DLKM (I believe, not really sure). As another
> example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload
> directly into the guest kernel using virtio-based network I/O. That is what
> I referred to when I wrote "many earlier attacks were based on executing the
> wrong code".

The above attack was only possible because an attacker was able to directly
modify the code execution pointer to an arbitrary guest memory address
(in that case guest NMI handler was substituted pointing to attacker payload).
This is an obvious hole in the integrity protection of the guest private memory
and its page table mappings. This is not possible with TDX and I believe with
new versions of AMD SEV also.

>
> The fact that I/O buffers are not encrypted matters here, because it gives
> the host ample latitude to observe or even corrupt all I/Os, as many others
> have pointed out. Notably, disk crypto may not be designed to resist to a
> host that can see and possibly change the I/Os.
>
> So let me rephrase my vague question as a few more precise ones:
>
> 1) What are the effects of semi-random kernel code injection?
>
> If the host knows that a given bounce buffer happens to be used later to
> execute some kernel code, it can start flipping bits in it to try and
> trigger arbitrary code paths in the guest. My understanding is that
> crypto alone (i.e. without additional layers like dm-integrity) will
> happily decrypt that into a code stream with pseudo-random instructions
> in it, not vehemently error out.
>
> So, while TDX precludes the host from writing into guest memory directly,
> since the bounce buffers are shared, TDX will not prevent the host from
> flipping bits there. It's then just a matter of guessing where the bits
> will go, and hoping that some bits execute at guest PL0. Of course, this
> can be mitigated by either only using static configs, or using
> dm-verity/dm-integrity, or maybe some other mechanisms.
>
> Shouldn't that be part of your document? To be clear: you mention under
> "Storage protection" that you use dm-crypt and dm-integrity, so I believe
> *you* know, but your readers may not figure out why dm-integrity is
> integral to the process, notably after you write "Users could use other
> encryption schemes".

Sure, I can elaborate in the storage protection section about the importance
of disk integrity protection.

>
> 2) What are the effects of random user code injection?
>
> It's the same as above, except that now you can target a much wider range
> of input data, including shell scripts, etc. So the attack surface is
> much larger.
>
> 3) What is the effect of data poisoning?
>
> You don't necessarily need to corrupt code. Being able to corrupt a
> system configuration file for example can be largely sufficient.
>
> 4) Are there I/O-based replay attacks that would work pre-attestation?
>
> My current mental model is that you load a "base" software stack into the
> TCB and then measure a relevant part of it. What you measure is somewhat
> implementation-dependent, but in the end, if the system is attested, you
> respond to a cryptographic challenge based on what was measured, and you
> then get relevant secrets, e.g. a disk decryption key, that let you make
> forward progress. However, what happens if every time you boot, the host
> feeds you bogus disk data just to try to steer the boot sequence along
> some specific path?

What you ideally want is a full disk encryption with additional integrity protection,
like aes-gcm authenticated encryption mode. Then there are no questions on the
disk integrity and many attacks are mitigated.

>
> I believe that the short answer is: the guest either:
>
> a) reaches attestation, but with bad in-memory data, so it fails the
> crypto exchange, and secrets are not leaked.
>
> b) does not reach attestation, so never gets the secrets, and therefore
> still fulfils the CC promise of not leaking secrets.
>
> So I personally feel this is OK, but it's worth writing up in your doc.
>

Yes, I will expand the storage section more on this.

>
> Back to the #VE handler, if I can find a way to inject malicious code into
> my guest, what you wrote in that paragraph as a justification for no
> in-depth security still seems like "not exactly defense in depth". I would
> just remove the sentence, audit and fuzz that code with the same energy as
> for anything else that could face bad input.

In fact most of our fuzzing hooks are inside #VE itself if you take a look on the
implementation. They just don’t cover things like the #VE info decoding (information
is provided by a trusted party - TDX module).

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > DMA) because the device has specific unencrypted buffers set aside
> > > > for the expected DMA. If it writes outside that CC integrity will
> > > > detect it and if it reads outside that it gets unintelligible
> > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > out of us by returning unexpected data.
> > >
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> >
> > I don't see how MSI table sizing is a bug in the category we've
> > defined. The very text of the changelog says "resulting in a kernel
> > page fault in pci_write_msg_msix()." which is a crash, which I thought
> > we were agreeing was out of scope for CC attacks?
>
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are creative).
> But we noticed this one while fuzzing and it was common enough
> that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> The core PCI/MSI doesn’t seem to have that many easily triggerable
> Other examples in virtio patchset are more severe.
>
> >
> > > >
> > > > If I set this as the problem, verifying device correct operation is
> > > > a possible solution (albeit hugely expensive) but there are likely
> > > > many other cheaper ways to defeat or detect a device trying to
> > > > trick us into revealing something.
> > >
> > > What do you have in mind here for the actual devices we need to
> > > enable for CC cases?
> >
> > Well, the most dangerous devices seem to be the virtio set a CC system
> > will rely on to boot up. After that, there are other ways (like SPDM)
> > to verify a real PCI device is on the other end of the transaction.
>
> Yes, it the future, but not yet. Other vendors will not necessary be
> using virtio devices at this point, so we will have non-virtio and not
> CC enabled devices that we want to securely add to the guest.
>
> >
> > > We have been using here a combination of extensive fuzzing and static
> > > code analysis.
> >
> > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > off the table because fuzzing primarily triggers those
>
> If you enable memory sanitizers you can detect more server conditions like
> out of bounds accesses and such. I think given that we have a way to
> verify that fuzzing is reaching the code locations we want it to reach, it
> can be pretty effective method to find at least low-hanging bugs. And these
> will be the bugs that most of the attackers will go after at the first place.
> But of course it is not a formal verification of any kind.
>
> so its hard to
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong place
> > to begin and you should probably begin on the virtio bus and the
> > hypervisor generated configuration space.
>
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that it can
> inject them to the guest: whenever guest requests a pci config space
> (which is controlled by host/hypervisor as you said) read operation,
> it gets input injected by the kafl fuzzer. Same for other interfaces that
> are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> via #VE handler in our case). When it comes to virtio, we employ
> two different fuzzing techniques: directly injecting kafl fuzz input when
> virtio core or virtio drivers gets the data received from the host
> (via injecting input in functions virtio16/32/64_to_cpu and others) and
> directly fuzzing DMA memory pages using kfx fuzzer.
> More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>
> Best Regards,
> Elena.

Hi Elena,

I think it might be a good idea to narrow down a configuration that *can*
reasonably be hardened to be suitable for confidential computing, before
proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
in the context of virtualization, but what about taking PCI out of scope
completely by switching to virtio-mmio devices?

Jeremi
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > > DMA) because the device has specific unencrypted buffers set aside
> > > > > for the expected DMA. If it writes outside that CC integrity will
> > > > > detect it and if it reads outside that it gets unintelligible
> > > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > > out of us by returning unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined. The very text of the changelog says "resulting in a kernel
> > > page fault in pci_write_msg_msix()." which is a crash, which I thought
> > > we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are creative).
> > But we noticed this one while fuzzing and it was common enough
> > that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> > The core PCI/MSI doesn’t seem to have that many easily triggerable
> > Other examples in virtio patchset are more severe.
> >
> > >
> > > > >
> > > > > If I set this as the problem, verifying device correct operation is
> > > > > a possible solution (albeit hugely expensive) but there are likely
> > > > > many other cheaper ways to defeat or detect a device trying to
> > > > > trick us into revealing something.
> > > >
> > > > What do you have in mind here for the actual devices we need to
> > > > enable for CC cases?
> > >
> > > Well, the most dangerous devices seem to be the virtio set a CC system
> > > will rely on to boot up. After that, there are other ways (like SPDM)
> > > to verify a real PCI device is on the other end of the transaction.
> >
> > Yes, it the future, but not yet. Other vendors will not necessary be
> > using virtio devices at this point, so we will have non-virtio and not
> > CC enabled devices that we want to securely add to the guest.
> >
> > >
> > > > We have been using here a combination of extensive fuzzing and static
> > > > code analysis.
> > >
> > > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > > off the table because fuzzing primarily triggers those
> >
> > If you enable memory sanitizers you can detect more server conditions like
> > out of bounds accesses and such. I think given that we have a way to
> > verify that fuzzing is reaching the code locations we want it to reach, it
> > can be pretty effective method to find at least low-hanging bugs. And these
> > will be the bugs that most of the attackers will go after at the first place.
> > But of course it is not a formal verification of any kind.
> >
> > so its hard to
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong place
> > > to begin and you should probably begin on the virtio bus and the
> > > hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that it can
> > inject them to the guest: whenever guest requests a pci config space
> > (which is controlled by host/hypervisor as you said) read operation,
> > it gets input injected by the kafl fuzzer. Same for other interfaces that
> > are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> > via #VE handler in our case). When it comes to virtio, we employ
> > two different fuzzing techniques: directly injecting kafl fuzz input when
> > virtio core or virtio drivers gets the data received from the host
> > (via injecting input in functions virtio16/32/64_to_cpu and others) and
> > directly fuzzing DMA memory pages using kfx fuzzer.
> > More information can be found in https://intel.github.io/ccc-linux-guest-
> hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
> >
> > Best Regards,
> > Elena.
>
> Hi Elena,

Hi Jeremi,

>
> I think it might be a good idea to narrow down a configuration that *can*
> reasonably be hardened to be suitable for confidential computing, before
> proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
> in the context of virtualization, but what about taking PCI out of scope
> completely by switching to virtio-mmio devices?

I agree that narrowing down is important and we spent a significant effort
in disabling various code we don’t need (including PCI code, like quirks,
early PCI, etc). The decision to use virtio over pci vs. mmio I believe comes
from performance and usage scenarios and we have to best we can with these
limitations.

Moreover, even if we could remove PCI for the virtio devices by
removing the transport dependency, this isn’t possible for other devices that we
know are used in some CC setups: not all CSPs are using virtio-based drivers,
so pretty quickly PCI comes back into hardening scope and we cannot just remove
it unfortunately.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Christophe de Dinechin (dinechin@redhat.com) wrote:
>
> On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> > On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >>
> >>
> >> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >
> >> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >> >>
> >> >>
> >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>>
> >> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> >>>> Finally, security considerations that apply irrespective of whether the
> >> >>>> platform is confidential or not are also outside of the scope of this
> >> >>>> document. This includes topics ranging from timing attacks to social
> >> >>>> engineering.
> >> >>>
> >> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >> >>
> >> >> Good point.
> >> >>
> >> >> I was thinking that mitigation against timing attacks is the same
> >> >> irrespective of the source of the attack. However, because the HV
> >> >> controls CPU time allocation, there are presumably attacks that
> >> >> are made much easier through the HV. Those should be listed.
> >> >
> >> > Not just that, also because it can and does emulate some devices.
> >> > For example, are disk encryption systems protected against timing of
> >> > disk accesses?
> >> > This is why some people keep saying "forget about emulated devices, require
> >> > passthrough, include devices in the trust zone".
> >> >
> >> >>>
> >> >>>> </doc>
> >> >>>>
> >> >>>> Feel free to comment and reword at will ;-)
> >> >>>>
> >> >>>>
> >> >>>> 3/ PCI-as-a-threat: where does that come from
> >> >>>>
> >> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> >>>> should defeat) and compromised software feeding us bad data? I think there
> >> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >> >>>> and prevent it from running using attestation. In other words, we first
> >> >>>> check what we will run, then we run it. The security there is that we know
> >> >>>> what we are running. The trust we have in the software is from testing,
> >> >>>> reviewing or using it.
> >> >>>>
> >> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >> >>>> software being measured is largely tamper-resistant thanks to memory
> >> >>>> encryption. In other words, after you have measured your guest software
> >> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >> >>>>
> >> >>>> So this brings me to the next question: is there any way we could offer the
> >> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> >>>> me. But maybe someone else will have a brilliant idea?
> >> >>>>
> >> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >> >>>> not only that it's running the right guest stack (as we can do today) but
> >> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> >>>> this is something which is evidently easier to deal with.
> >> >>>
> >> >>> Agree absolutely that's much easier.
> >> >>>
> >> >>>> I briefly discussed this with James, and he pointed out two interesting
> >> >>>> aspects of that question:
> >> >>>>
> >> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >> >>>> care about either virtio devices, or physical ones being passed through
> >> >>>> to the guest. Let's assume physical ones can be trusted, see above.
> >> >>>> That leaves virtio devices. How much damage can a malicious virtio device
> >> >>>> do to the guest kernel, and can this lead to secrets being leaked?
> >> >>>>
> >> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >> >>>> being able to prevent tampering of the guest. One example he mentioned is
> >> >>>> a research paper [1] about running the hypervisor itself inside an
> >> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >> >>>> with TDX using secure enclaves or some other mechanism?
> >> >>>
> >> >>> Or even just secureboot based root of trust?
> >> >>
> >> >> You mean host secureboot? Or guest?
> >> >>
> >> >> If it’s host, then the problem is detecting malicious tampering with
> >> >> host code (whether it’s kernel or hypervisor).
> >> >
> >> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
> >> > limit which packages are allowed.
> >>
> >> Is that provable to the guest?
> >>
> >> Consider a cloud provider doing that: how do they prove to their guest:
> >>
> >> a) What firmware, kernel and kvm they run
> >>
> >> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
> >> device driver installed by a rogue sysadmin
> >>
> >> My understanding is that SecureBoot is only intended to prevent non-verified
> >> operating systems from booting. So the proof is given to the cloud provider,
> >> and the proof is that the system boots successfully.
> >
> > I think I should have said measured boot not secure boot.
>
> The problem again is how you prove to the guest that you are not lying?
>
> We know how to do that from a guest [1], but you will note that in the
> normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
> proves the validity of the measurements of the TCB by encrypting it with an
> attestation signing key derived from some chip-unique secret. For AMD, this
> is called the VCEK, and TDX has something similar. In the case of SEV, this
> goes through firmware, and you have to tell the firmware each time you
> insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
> to a VM execution context. I do not believe there is any provision to do the
> same thing to measure host data. And again, it would be somewhat pointless
> if there isn't also a mechanism to ensure the host data is not changed after
> the measurement.
>
> Now, I don't think it would be super-difficult to add a firmware service
> that would let the host do some kind of equivalent to PVALIDATE, setting
> some physical pages aside that then get measured and become inaccessible to
> the host. The PSP or similar could then integrate these measurements as part
> of the TCB, and the fact that the pages were "transferred" to this special
> invariant block would ensure the guests that the code will not change after
> being measured.
>
> I am not aware that such a mechanism exists on any of the existing CC
> platforms. Please feel free to enlighten me if I'm wrong.
>
> [1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
> >
> >>
> >> After that, I think all bets are off. SecureBoot does little AFAICT
> >> to prevent malicious modifications of the running system by someone with
> >> root access, including deliberately loading a malicious kvm-zilog.ko
> >
> > So disable module loading then or don't allow root access?
>
> Who would do that?
>
> The problem is that we have a host and a tenant, and the tenant does not
> trust the host in principle. So it is not sufficient for the host to disable
> module loading or carefully control root access. It is also necessary to
> prove to the tenant(s) that this was done.
>
> >
> >>
> >> It does not mean it cannot be done, just that I don’t think we
> >> have the tools at the moment.
> >
> > Phones, chromebooks do this all the time ...
>
> Indeed, but there, this is to prove to the phone's real owner (which,
> surprise, is not the naive person who thought they'd get some kind of
> ownership by buying the phone) that the software running on the phone has
> not been replaced by some horribly jailbreaked goo.
>
> In other words, the user of the phone gets no proof whatsoever of anything,
> except that the phone appears to work. This is somewhat the situation in the
> cloud today: the owners of the hardware get all sorts of useful checks, from
> SecureBoot to error-correction for memory or I/O devices. However, someone
> running in a VM on the cloud gets none of that, just like the user of your
> phone.

Assuming you do a measured boot, the host OS and firmware is measured into the host TPM;
people have thought in the past about triggering attestations of the
host from the guest; then you could have something external attest the
host and only release keys to the guests disks if the attestation is
correct; or a key for the guests disks held in the hosts TPM.

Dave

> --
> Cheers,
> Christophe de Dinechin (https://c3d.github.io)
> Theory of Incomplete Measurements (https://c3d.github.io/TIM)
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> Hi Greg,
>
> You mentioned couple of times (last time in this recent thread:
> https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> discussing the updated threat model for kernel, so this email is a start in this direction.
>
> (Note: I tried to include relevant people from different companies, as well as linux-coco
> mailing list, but I hope everyone can help by including additional people as needed).
>
> As we have shared before in various lkml threads/conference presentations
> ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> This is a big change in the threat model and requires both careful assessment of the
> new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> and security validation techniques. This is the activity that we have started back at Intel
> and the current status can be found in
>
> 1) Threat model and potential mitigations:
> https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html>
> 2) One of the described in the above doc mitigations is "hardening of the enabled
> code". What we mean by this, as well as techniques that are being used are
> described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html

Regarding driver hardening, does anyone have a better filtering idea?

The current solution assumes the kernel command line is trusted and cannot
avoid the __init() functions that waste memory. I don't know if the
__exit() routines of the filtered devices are called, but it doesn't sound
much better to allocate memory and free it right after.

>
> 3) All the tools are open-source and everyone can start using them right away even
> without any special HW (readme has description of what is needed).
> Tools and documentation is here:
> https://github.com/intel/ccc-linux-guest-hardening
>
> 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> here: https://github.com/intel/tdx/commits/guest-next
>
> So, my main question before we start to argue about the threat model, mitigations, etc,
> is what is the good way to get this reviewed to make sure everyone is aligned?
> There are a lot of angles and details, so what is the most efficient method?
> Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> into logical pieces and start submitting it to mailing list for discussion one by one?
> Any other methods?
>
> The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> data, alternatives considered, etc.
>
> Best Regards,
> Elena.
>
> [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/
> [2] https://lpc.events/event/16/contributions/1328/
> [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/

Thanks,
Carlos
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
> On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> > 2) One of the described in the above doc mitigations is "hardening of the enabled
> > code". What we mean by this, as well as techniques that are being used are
> > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>
> Regarding driver hardening, does anyone have a better filtering idea?
>
> The current solution assumes the kernel command line is trusted and cannot
> avoid the __init() functions that waste memory.

That is two different things (command line trust and __init()
functions), so I do not understand the relationship at all here. Please
explain it better.

Also, why would an __init() function waste memory? Memory usage isn't
an issue here, right?

> I don't know if the
> __exit() routines of the filtered devices are called, but it doesn't sound
> much better to allocate memory and free it right after.

What device has a __exit() function? Drivers have module init/exit
functions but they should do nothing but register themselves with the
relevant busses and they are only loaded if the device is found in the
system.

And what exactly is incorrect about allocating memory and then freeing
it when not needed?

So again, I don't understand the question, sorry.

thanks,

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2/7/23 00:03, Greg Kroah-Hartman wrote:

> On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
>> On 1/25/23 6:28 AM, Reshetova, Elena wrote:
>>> 2) One of the described in the above doc mitigations is "hardening of the enabled
>>> code". What we mean by this, as well as techniques that are being used are
>>> described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>> Regarding driver hardening, does anyone have a better filtering idea?
>>
>> The current solution assumes the kernel command line is trusted and cannot
>> avoid the __init() functions that waste memory.
> That is two different things (command line trust and __init()
> functions), so I do not understand the relationship at all here. Please
> explain it better.


No relation other than it would be nice to have a solution that does not
require kernel command line and that prevents __init()s.


>
> Also, why would an __init() function waste memory? Memory usage isn't
> an issue here, right?
>
>> I don't know if the
>> __exit() routines of the filtered devices are called, but it doesn't sound
>> much better to allocate memory and free it right after.
> What device has a __exit() function? Drivers have module init/exit
> functions but they should do nothing but register themselves with the
> relevant busses and they are only loaded if the device is found in the
> system.
>
> And what exactly is incorrect about allocating memory and then freeing
> it when not needed?


Currently proposed device filtering does not stop the __init() functions
from these drivers to be called. Whatever memory is allocated by
blacklisted drivers is wasted because those drivers cannot ever be used.
Sure, memory can be allocated and freed as soon as it is no longer needed,
but these memory would never be needed.


More pressing concern than wasted memory, which may be unimportant, there's
the issue of what are those driver init functions doing. For example, as
part of device setup, MMIO regs may be involved, which we cannot trust. It's
a lot more code to worry about from a CoCo perspective.


>
> So again, I don't understand the question, sorry.


Given the limitations of current approach, does anyone have any other ideas
for filtering devices prior to their initialization?


>
> thanks,
>
> greg k-h


Thanks,
Carlos
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> Given the limitations of current approach, does anyone have any other ideas
> for filtering devices prior to their initialization?

/me mumbles ... something something ... bpf ...

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> Currently proposed device filtering does not stop the __init() functions
> from these drivers to be called. Whatever memory is allocated by
> blacklisted drivers is wasted because those drivers cannot ever be used.
> Sure, memory can be allocated and freed as soon as it is no longer needed,
> but these memory would never be needed.
>
>
> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Why not just simply compile a special CoCo kernel that doesn't have
any drivers that you don't trust. Now, the distros may be pushing
back in that they don't want to support a separate kernel image. But
this apparently really a pain allocation negotiation, isn't it? Intel
and other companies want to make $$$$$ with CoCo.

In order to make $$$$$, you need to push the costs onto various
different players in the ecosystem. This is cleverly disguised as
taking current perfectly acceptable design paradigm when the trust
boundary is in the traditional location, and causing all of the
assumptions which you have broken as "bugs" that must be fixed by
upstream developers.

But another place to push the costs is to the distro vendors, who
might need to maintain a separate CoCo kernel that is differently
configured. Now, Red Hat and company will no doubt push back. But
the usptream development community will also push back if you try to
dump too much work on *us*.

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> On 2/7/23 00:03, Greg Kroah-Hartman wrote:
>
> > On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
> > > On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> > > > 2) One of the described in the above doc mitigations is "hardening of the enabled
> > > > code". What we mean by this, as well as techniques that are being used are
> > > > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
> > > Regarding driver hardening, does anyone have a better filtering idea?
> > >
> > > The current solution assumes the kernel command line is trusted and cannot
> > > avoid the __init() functions that waste memory.
> > That is two different things (command line trust and __init()
> > functions), so I do not understand the relationship at all here. Please
> > explain it better.
>
>
> No relation other than it would be nice to have a solution that does not
> require kernel command line and that prevents __init()s.

Again, __init() has nothing to do with the kernel command line so I do
not understand the relationship here. Have a specific example?

> > Also, why would an __init() function waste memory? Memory usage isn't
> > an issue here, right?
> >
> > > I don't know if the
> > > __exit() routines of the filtered devices are called, but it doesn't sound
> > > much better to allocate memory and free it right after.
> > What device has a __exit() function? Drivers have module init/exit
> > functions but they should do nothing but register themselves with the
> > relevant busses and they are only loaded if the device is found in the
> > system.
> >
> > And what exactly is incorrect about allocating memory and then freeing
> > it when not needed?
>
>
> Currently proposed device filtering does not stop the __init() functions
> from these drivers to be called. Whatever memory is allocated by
> blacklisted drivers is wasted because those drivers cannot ever be used.
> Sure, memory can be allocated and freed as soon as it is no longer needed,
> but these memory would never be needed.

Drivers are never even loaded if the hardware is not present, and a
driver init function should do nothing anyway if it is written properly,
so again, I do not understand what you are referring to here.

Again, a real example might help explain your concerns, pointers to the
code?

> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Again, specific example?

And if you don't want a driver to be loaded, don't build it into your
kernel as Ted said. Or better yet, use the in-kernel functionality to
prevent drivers from ever loading or binding to a device until you tell
it from userspace that it is safe to do so.

So I don't think this is a real issue unless you have pointers to code
you are concerned about.

> > So again, I don't understand the question, sorry.
>
> Given the limitations of current approach, does anyone have any other ideas
> for filtering devices prior to their initialization?

What is wrong with the functionality we have today for this very thing?
Does it not work properly for you? If so, why not, for what devices and
drivers and busses do you still have problems with?

thanks,

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> Why not just simply compile a special CoCo kernel that doesn't have
> any drivers that you don't trust.

Or at least, start with that? You can then gradually expand that until
some config is both acceptable to distros and seems sufficiently trusty
to the CoCo project. Lots of kernel features got upstreamed this way.
Requirement to have an arbitrary config satisfy CoCo seems like a very
high bar to clear.

--
MST
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> No relation other than it would be nice to have a solution that does not
>require kernel command line and that prevents __init()s.

For __inits see below. For the command line, it is pretty straightforward to
measure it and attest its integrity later: we need to do it for other parts
anyhow as acpi tables, etc. So I don’t see why we need to do smth special
about it? In any case it is indeed very different from driver discussion and
goes into "what should be covered by attestation for CC guest" topic.

> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Yes, we have seen such cases in kernel where drivers or modules would access
MMIO or pci config space already in their __init() functions.
Some concrete examples from modules and drivers (there are more):

intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch()
skx_init() -> get_all_munits()
skx_init() -> skx_register_mci() -> skx_get_dimm_config()
intel_rng_mod_init() -> intel_init_hw_struct()
i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log()

However, this is how we address this from security point of view:

1. In order for a MMIO read to obtain data from a untrusted host, the memory
range must be shared with the host to begin with. We enforce that
all MMIO mappings are private by default to the CC guest unless it is
explicitly shared (and we do automatically share for the authorized devices
and their drivers from the allow list). This removes a problem of an
"unexpected MMIO region interaction"
(modulo acpi AML operation regions that we do have to share also unfortunately,
but acpi is a whole different difficult case on its own).

2. For pci config space, we limit any interaction with pci config
space only to authorized devices and their drivers (that are in the allow list).
As a result device drivers outside of the allow list are not able to access pci
config space even in their __init routines. It is done by setting the
to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized
devices.

So, even if host made the driver __init function to run
(by faking the device on the host side), it should not be able to supply any
malicious data to it via MMIO or pci config space, so running their __init
routines should be ok from security point of view or does anyone see any
holes here?

Best Regards,
Elena.
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> > Why not just simply compile a special CoCo kernel that doesn't have
> > any drivers that you don't trust.

Aside from complexity and scalability management of such a config that has
to change with every kernel release, what about the build-in platform drivers?
I am not a driver expert here but as far as I understand they cannot be disabled
via config. Please correct if this statement is wrong.

> In order to make $$$$$, you need to push the costs onto various
> different players in the ecosystem. This is cleverly disguised as
> taking current perfectly acceptable design paradigm when the trust
> boundary is in the traditional location, and causing all of the
> assumptions which you have broken as "bugs" that must be fixed by
> upstream developers.

The CC threat model does change the traditional linux trust boundary regardless of
what mitigations are used (kernel config vs. runtime filtering). Because for the
drivers that CoCo guest happens to need, there is no way to fix this problem by
either of these mechanisms (we cannot disable the code that we need), unless somebody
writes a totally new set of coco specific drivers (who needs another set of
CoCo specific virtio drivers in the kernel?).

So, if the path is to be able to use existing driver kernel code, then we need:

1. these selective CoCo guest required drivers (small set) needs to be hardened
(or whatever word people prefer to use here), which only means that in
the presence of malicious host/hypervisor that can manipulate pci config space,
port IO and MMIO, these drivers should not expose CC guest memory
confidentiality or integrity (including via privilege escalation into CC guest).
Please note that this only applies to a small set (in tdx virtio setup we have less
than 10 of them) of drivers and does not present invasive changes to the kernel
code. There is also an additional core pci/msi code that is involved with discovery
and configuration of these drivers, this code also falls into the category we need to
make robust.

2. rest of non-needed drivers must be disabled. Here we can argue about what
is the correct method of doing this and who should bare the costs of enforcing it.
But from pure security point of view: the method that is simple and clear, that
requires as little maintenance as possible usually has the biggest chance of
enforcing security.
And given that we already have the concept of authorized devices in Linux,
does this method really brings so much additional complexity to the kernel?
But hard to argue here without the code: we need to submit the filter proposal first
(under internal review still).

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
>
>
> > On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> > > Why not just simply compile a special CoCo kernel that doesn't have
> > > any drivers that you don't trust.
>
> Aside from complexity and scalability management of such a config that has
> to change with every kernel release, what about the build-in platform drivers?

What do you mean by "built in platform drivers"? You are creating a
.config for a specific cloud platform, just only select the drivers for
that exact configuration and you should be fine.

And as for the management of such a config, distros do this just fine,
why can't you? It's not that hard to manage properly.

> I am not a driver expert here but as far as I understand they cannot be disabled
> via config. Please correct if this statement is wrong.

Again, which specific drivers are you referring to? And why are they a
problem?

> > In order to make $$$$$, you need to push the costs onto various
> > different players in the ecosystem. This is cleverly disguised as
> > taking current perfectly acceptable design paradigm when the trust
> > boundary is in the traditional location, and causing all of the
> > assumptions which you have broken as "bugs" that must be fixed by
> > upstream developers.
>
> The CC threat model does change the traditional linux trust boundary regardless of
> what mitigations are used (kernel config vs. runtime filtering). Because for the
> drivers that CoCo guest happens to need, there is no way to fix this problem by
> either of these mechanisms (we cannot disable the code that we need), unless somebody
> writes a totally new set of coco specific drivers (who needs another set of
> CoCo specific virtio drivers in the kernel?).

It sounds like you want such a set of drivers, why not just write them?
We have zillions of drivers already, it's not hard to write new ones, as
it really sounds like that's exactly what you want to have happen here
in the end as you don't trust the existing set of drivers you are using
for some reason.

> So, if the path is to be able to use existing driver kernel code, then we need:

Wait, again, why? Why not just have your own? That should be the
simplest thing overall. What's wrong with that?

> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> (or whatever word people prefer to use here), which only means that in
> the presence of malicious host/hypervisor that can manipulate pci config space,
> port IO and MMIO, these drivers should not expose CC guest memory
> confidentiality or integrity (including via privilege escalation into CC guest).

Again, stop it please with the "hardened" nonsense, that means nothing.
Either the driver has bugs, or it doesn't. I welcome you to prove it
doesn't :)

> Please note that this only applies to a small set (in tdx virtio setup we have less
> than 10 of them) of drivers and does not present invasive changes to the kernel
> code. There is also an additional core pci/msi code that is involved with discovery
> and configuration of these drivers, this code also falls into the category we need to
> make robust.

Again, why wouldn't we all want "robust" drivers? This is not anything
new here, all you are somehow saying is that you are changing the thread
model that the kernel "must" support. And for that, you need to then
change the driver code to support that.

So again, why not just have your own drivers and driver subsystem that
meets your new requirements? Let's see what that looks like and if
there even is any overlap between that and the existing kernel driver
subsystems.

> 2. rest of non-needed drivers must be disabled. Here we can argue about what
> is the correct method of doing this and who should bare the costs of enforcing it.

You bare that cost. Or you get a distro to do that. That's not up to
us in the kernel community, sorry, we give you the option to do that if
you want to, that's all that we can do.

> But from pure security point of view: the method that is simple and clear, that
> requires as little maintenance as possible usually has the biggest chance of
> enforcing security.

Again, that's up to your configuration management. Please do it, tell
us what doesn't work and send changes if you find better ways to do it.
Again, this is all there for you to do today, nothing for us to have to
do for you.

> And given that we already have the concept of authorized devices in Linux,
> does this method really brings so much additional complexity to the kernel?

No idea, you tell us! :)

Again, I recommend you just having your own drivers, that will allow you
to show us all exactly what you mean by the terms you keep using. Why
not just submit that for review instead?

good luck!

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> Because for the
> drivers that CoCo guest happens to need, there is no way to fix this problem by
> either of these mechanisms (we cannot disable the code that we need), unless somebody
> writes a totally new set of coco specific drivers (who needs another set of
> CoCo specific virtio drivers in the kernel?).

I think it's more about pci and all that jazz, no?
As a virtio maintainer I applied patches adding validation and intend to
do so in the future simply because for virtio specifically people
build all kind of weird setups out of software and so validating
everything is a good idea.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:16:14AM +0000, Reshetova, Elena wrote:
> > No relation other than it would be nice to have a solution that does not
> >require kernel command line and that prevents __init()s.
>
> For __inits see below. For the command line, it is pretty straightforward to
> measure it and attest its integrity later: we need to do it for other parts
> anyhow as acpi tables, etc. So I don’t see why we need to do smth special
> about it? In any case it is indeed very different from driver discussion and
> goes into "what should be covered by attestation for CC guest" topic.
>
> > More pressing concern than wasted memory, which may be unimportant, there's
> > the issue of what are those driver init functions doing. For example, as
> > part of device setup, MMIO regs may be involved, which we cannot trust. It's
> > a lot more code to worry about from a CoCo perspective.
>
> Yes, we have seen such cases in kernel where drivers or modules would access
> MMIO or pci config space already in their __init() functions.
> Some concrete examples from modules and drivers (there are more):
>
> intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch()

An iommu driver. So maybe you want to use virtio iommu then?

> skx_init() -> get_all_munits()
> skx_init() -> skx_register_mci() -> skx_get_dimm_config()

A memory controller driver, right? And you need it in a VM? why?

> intel_rng_mod_init() -> intel_init_hw_struct()

And virtio iommu?

> i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log()

Another memory controller driver? Can we decide on a single one?

> However, this is how we address this from security point of view:
>
> 1. In order for a MMIO read to obtain data from a untrusted host, the memory
> range must be shared with the host to begin with. We enforce that
> all MMIO mappings are private by default to the CC guest unless it is
> explicitly shared (and we do automatically share for the authorized devices
> and their drivers from the allow list). This removes a problem of an
> "unexpected MMIO region interaction"
> (modulo acpi AML operation regions that we do have to share also unfortunately,
> but acpi is a whole different difficult case on its own).

How does it remove the problem? You basically get trash from host, no?
But it seems that whether said trash is exploitable will really depend
on how it's used, e.g. if it's an 8 bit value host can just scan all
options in a couple of hundred attempts. What did I miss?


> 2. For pci config space, we limit any interaction with pci config
> space only to authorized devices and their drivers (that are in the allow list).
> As a result device drivers outside of the allow list are not able to access pci
> config space even in their __init routines. It is done by setting the
> to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized
> devices.

This seems to be assuming drivers check return code from pci config
space accesses, right? I doubt all drivers do though. Even if they do
that's unlikely to be a well tested path, right?

> So, even if host made the driver __init function to run
> (by faking the device on the host side), it should not be able to supply any
> malicious data to it via MMIO or pci config space, so running their __init
> routines should be ok from security point of view or does anyone see any
> holes here?
>
> Best Regards,
> Elena.

See above. I am not sure the argument that the bugs are unexploitable
sits well with the idea that all this effort is improving code quality.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> 2. rest of non-needed drivers must be disabled. Here we can argue about what
> is the correct method of doing this and who should bare the costs of enforcing it.
> But from pure security point of view: the method that is simple and clear, that
> requires as little maintenance as possible usually has the biggest chance of
> enforcing security.
> And given that we already have the concept of authorized devices in Linux,
> does this method really brings so much additional complexity to the kernel?
> But hard to argue here without the code: we need to submit the filter proposal first
> (under internal review still).

I think the problem here is that we've had a lot of painful experience
where fuzzing produces a lot of false positives which then
security-types then insist that all kernel developers must fix so that
we can see the "important" security issues from the false positives.

So "as little maintenance as possible" and fuzzing have not
necessarily gone together. It might be less maintenance costs for
*you*, but it's not necessarily less maintenance work for *us*. I've
seen Red Hat principal engineers take completely bogus issues and
raise them to CVE "high" priority levels, when it was nothing like
that, thus forcing distro and data center people to be forced to do
global pushes to production because it's easier than trying to explain
to FEDramp auditors why the CVE SS is bogus --- and every single
unnecessary push to production has its own costs and risks.

I've seen the constant load of syzbot false positives that generate
noise in my inbox and in bug tracking issues assigned to me at $WORK.
I've seen the false positives generated by DEPT, which is why I've
pushed back on it. So if you are going to insist on fuzzing all of
the PCI config space, and treat them all as "bugs", there is going to
be huge pushback.

Even if the "fixes" are minor, and don't have any massive impact on
memory used or cache line misses or code/maintainability bloat, the
fact that we treat them as P3 quality of implementation issues, and
*you* treat them as P1 security bugs that must be fixed Now! Now!
Now! is going to cause friction. (This is especially true since CVE
SS scores are unidimentional, and what might be high security --- or
at least embarassing --- for CoCo, might be completely innocuous QOI
bugs for the rest of the world.)

So it might be that a simple, separate, kerenl config is going to be
the massively simpler way to go, instead of insisting that all PCI
device drivers must be fuzzed and be made CoCo safe, even if they will
never be used in a CoCo context. Again, please be cognizant about the
costs that CoCo may be imposing and pushing onto the rest of the
ecosystem.

Cheers,

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
>>
>> The CC threat model does change the traditional linux trust boundary regardless of
>> what mitigations are used (kernel config vs. runtime filtering). Because for the
>> drivers that CoCo guest happens to need, there is no way to fix this problem by
>> either of these mechanisms (we cannot disable the code that we need), unless somebody
>> writes a totally new set of coco specific drivers (who needs another set of
>> CoCo specific virtio drivers in the kernel?).
>
> It sounds like you want such a set of drivers, why not just write them?
> We have zillions of drivers already, it's not hard to write new ones, as
> it really sounds like that's exactly what you want to have happen here
> in the end as you don't trust the existing set of drivers you are using
> for some reason.

In the CC approach, the hypervisor is considered as hostile. The rest of the
system is not changed much. If we pass-through some existing NIC, we'd
rather use the existing driver for that NIC rather than reinvent
it. However, we need to also consider the possibility that someone
maliciously replaced the actual NIC with a cleverly crafted software
emulator designed to cause the driver to leak confidential data.


>> So, if the path is to be able to use existing driver kernel code, then we need:
>
> Wait, again, why? Why not just have your own? That should be the
> simplest thing overall. What's wrong with that?

That would require duplication for the majority of hardware drivers.



>> 1. these selective CoCo guest required drivers (small set) needs to be hardened
>> (or whatever word people prefer to use here), which only means that in
>> the presence of malicious host/hypervisor that can manipulate pci config space,
>> port IO and MMIO, these drivers should not expose CC guest memory
>> confidentiality or integrity (including via privilege escalation into CC guest).
>
> Again, stop it please with the "hardened" nonsense, that means nothing.
> Either the driver has bugs, or it doesn't. I welcome you to prove it
> doesn't :)

In a non-CC scenario, a driver is correct if, among other things, it does
not leak kernel data to user space. However, it assumes that PCI devices are
working correctly and according to spec.

In a CC scenario, an additional condition for correctness is that it must
not leak data from the trusted environment to the host. It assumes that a
_virtual_ PCI device can be implemented on the host side to cause an
existing driver to leak secrets to the host.

It is this additional condition that we are talking about.

Think of this as a bit similar to the introduction of IOMMUs, which meant
there was a new condition impacting _the entire kernel_ that you had to make
sure your DMA operations and IOMMU were in agreement. Here, it is a bit of a
similar situation: CC forbids some specific operations the same way an IOMMU
does, except instead of stray DMAs, it's stray accesses from the host.

Note that, as James Bottomley pointed out, a crash is not seen as a failure
of the CC model, unless it leads to a subsequent leak of confidential data.
Denial of service, through crash or otherwise, is so easy to do from host or
hypervisor side that it is entirely out of scope.


>
>> Please note that this only applies to a small set (in tdx virtio setup we have less
>> than 10 of them) of drivers and does not present invasive changes to the kernel
>> code. There is also an additional core pci/msi code that is involved with discovery
>> and configuration of these drivers, this code also falls into the category we need to
>> make robust.
>
> Again, why wouldn't we all want "robust" drivers? This is not anything
> new here,

What is new is that CC requires driver to be "robust" against a new kind of
attack "from below" (i.e. from the [virtual] hardware side).

> all you are somehow saying is that you are changing the thread
> model that the kernel "must" support. And for that, you need to then
> change the driver code to support that.

What is being argued is that CC is not robust unless we block host-side
attacks that can cause the guest to leak data to the host.

>
> So again, why not just have your own drivers and driver subsystem that
> meets your new requirements? Let's see what that looks like and if
> there even is any overlap between that and the existing kernel driver
> subsystems.

Would a "CC-aware PCI" subsystem fit your definition?

>
>> 2. rest of non-needed drivers must be disabled. Here we can argue about what
>> is the correct method of doing this and who should bare the costs of enforcing it.
>
> You bare that cost.

I believe the CC community understands that.

The first step before introducing modifications in the drivers is getting an
understanding of why we think that CC introduces a new condition for
robustness.

We will not magically turn all drivers into CC-safe drivers. It will take a
lot of time, and the patches are likely to come from the CC community. At
that stage, though, the question is: "do you understand the problem we are
trying to solve?". I hope that my IOMMU analogy above helps.


> Or you get a distro to do that.

Best a distro can do is have a minified kernel tuned for CC use-cases, or
enabling an hypothetical CONFIG_COCO_SAFETY configuration.
A distro cannot decide what work goes behing CONFIG_COCO_SAFETY.


> That's not up to us in the kernel community, sorry, we give you the option
> to do that if you want to, that's all that we can do.

I hope that the explanations above will help you change your mind on that
statement. That cannot be a config-only or custom-drivers-only solution.
(or maybe you can convince us it can ;-)

>
>> But from pure security point of view: the method that is simple and clear, that
>> requires as little maintenance as possible usually has the biggest chance of
>> enforcing security.
>
> Again, that's up to your configuration management. Please do it, tell
> us what doesn't work and send changes if you find better ways to do it.
> Again, this is all there for you to do today, nothing for us to have to
> do for you.
>
>> And given that we already have the concept of authorized devices in Linux,
>> does this method really brings so much additional complexity to the kernel?
>
> No idea, you tell us! :)
>
> Again, I recommend you just having your own drivers, that will allow you
> to show us all exactly what you mean by the terms you keep using. Why
> not just submit that for review instead?
>
> good luck!
>
> greg k-h


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote:
>
> On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> >>
> >> The CC threat model does change the traditional linux trust boundary regardless of
> >> what mitigations are used (kernel config vs. runtime filtering). Because for the
> >> drivers that CoCo guest happens to need, there is no way to fix this problem by
> >> either of these mechanisms (we cannot disable the code that we need), unless somebody
> >> writes a totally new set of coco specific drivers (who needs another set of
> >> CoCo specific virtio drivers in the kernel?).
> >
> > It sounds like you want such a set of drivers, why not just write them?
> > We have zillions of drivers already, it's not hard to write new ones, as
> > it really sounds like that's exactly what you want to have happen here
> > in the end as you don't trust the existing set of drivers you are using
> > for some reason.
>
> In the CC approach, the hypervisor is considered as hostile. The rest of the
> system is not changed much. If we pass-through some existing NIC, we'd
> rather use the existing driver for that NIC rather than reinvent
> it.

But that is not what was proposed. I thought this was all about virtio.
If not, again, someone needs to write a solid definition.

So if you want to use existing drivers, wonderful, please work on making
the needed changes to meet your goals to all of them. I was trying to
give you a simple way out :)

> >> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> >> (or whatever word people prefer to use here), which only means that in
> >> the presence of malicious host/hypervisor that can manipulate pci config space,
> >> port IO and MMIO, these drivers should not expose CC guest memory
> >> confidentiality or integrity (including via privilege escalation into CC guest).
> >
> > Again, stop it please with the "hardened" nonsense, that means nothing.
> > Either the driver has bugs, or it doesn't. I welcome you to prove it
> > doesn't :)
>
> In a non-CC scenario, a driver is correct if, among other things, it does
> not leak kernel data to user space. However, it assumes that PCI devices are
> working correctly and according to spec.

And you also assume that your CPU is working properly. And what spec
exactly are you referring to? How can you validate any of that without
using the PCI authentication protocol already discussed in this thread?

> >> Please note that this only applies to a small set (in tdx virtio setup we have less
> >> than 10 of them) of drivers and does not present invasive changes to the kernel
> >> code. There is also an additional core pci/msi code that is involved with discovery
> >> and configuration of these drivers, this code also falls into the category we need to
> >> make robust.
> >
> > Again, why wouldn't we all want "robust" drivers? This is not anything
> > new here,
>
> What is new is that CC requires driver to be "robust" against a new kind of
> attack "from below" (i.e. from the [virtual] hardware side).

And as I have said multiple times, that is a totally new "requirement"
and one that Linux does not meet in any way at this point in time. If
you somehow feel this is a change that is ok to make for Linux, you will
need to do a lot of work to make this happen.

Anyway, you all are just spinning in circles now. I'll just mute this
thread until I see an actual code change as it seems to be full of
people not actually sending anything we can actually do anything with.

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote:
> >
> > On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> > > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> > >>
> > >> The CC threat model does change the traditional linux trust boundary regardless of
> > >> what mitigations are used (kernel config vs. runtime filtering). Because for the
> > >> drivers that CoCo guest happens to need, there is no way to fix this problem by
> > >> either of these mechanisms (we cannot disable the code that we need), unless somebody
> > >> writes a totally new set of coco specific drivers (who needs another set of
> > >> CoCo specific virtio drivers in the kernel?).
> > >
> > > It sounds like you want such a set of drivers, why not just write them?
> > > We have zillions of drivers already, it's not hard to write new ones, as
> > > it really sounds like that's exactly what you want to have happen here
> > > in the end as you don't trust the existing set of drivers you are using
> > > for some reason.
> >
> > In the CC approach, the hypervisor is considered as hostile. The rest of the
> > system is not changed much. If we pass-through some existing NIC, we'd
> > rather use the existing driver for that NIC rather than reinvent
> > it.
>
> But that is not what was proposed. I thought this was all about virtio.
> If not, again, someone needs to write a solid definition.

As I said in my reply to you a couple of weeks ago:

I'm not sure the request here isn't really to make sure *all* PCI devices
are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
and potentially ones that people will want to pass-through (which
generally needs a lot more work to make safe).
(I've not looked at these Intel tools to see what they cover)

so *mostly* virtio, and just a few of the other devices.

> So if you want to use existing drivers, wonderful, please work on making
> the needed changes to meet your goals to all of them. I was trying to
> give you a simple way out :)
>
> > >> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> > >> (or whatever word people prefer to use here), which only means that in
> > >> the presence of malicious host/hypervisor that can manipulate pci config space,
> > >> port IO and MMIO, these drivers should not expose CC guest memory
> > >> confidentiality or integrity (including via privilege escalation into CC guest).
> > >
> > > Again, stop it please with the "hardened" nonsense, that means nothing.
> > > Either the driver has bugs, or it doesn't. I welcome you to prove it
> > > doesn't :)
> >
> > In a non-CC scenario, a driver is correct if, among other things, it does
> > not leak kernel data to user space. However, it assumes that PCI devices are
> > working correctly and according to spec.
>
> And you also assume that your CPU is working properly.

We require the CPU to give us a signed attestation to prove that it's a
trusted CPU, that someone external can validate. So, not quite
'assume'.

> And what spec
> exactly are you referring to? How can you validate any of that without
> using the PCI authentication protocol already discussed in this thread?

The PCI auth protocol looks promising and is possibly the right long
term answer. But for a pass through NIC for example, all we'd want is
that (with the help of the IOMMU) it can't get or corrupt any data the
guest doesn't give it - and then it's upto the guest to run encryption
over the protocols over the NIC.

>
> > >> Please note that this only applies to a small set (in tdx virtio setup we have less
> > >> than 10 of them) of drivers and does not present invasive changes to the kernel
> > >> code. There is also an additional core pci/msi code that is involved with discovery
> > >> and configuration of these drivers, this code also falls into the category we need to
> > >> make robust.
> > >
> > > Again, why wouldn't we all want "robust" drivers? This is not anything
> > > new here,
> >
> > What is new is that CC requires driver to be "robust" against a new kind of
> > attack "from below" (i.e. from the [virtual] hardware side).
>
> And as I have said multiple times, that is a totally new "requirement"
> and one that Linux does not meet in any way at this point in time.

Yes, that's a fair statement.

> If
> you somehow feel this is a change that is ok to make for Linux, you will
> need to do a lot of work to make this happen.
>
> Anyway, you all are just spinning in circles now. I'll just mute this
> thread until I see an actual code change as it seems to be full of
> people not actually sending anything we can actually do anything with.

I think the challenge will be to come up with non-intrusive, minimal
changes; obviously you don't want stuff shutgunned everywhere.

Dave

> greg k-h
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

1 2 3 4 5  View All