Mailing List Archive: Linux guest kernel threat model for Confidential Computing

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

Jan 27, 2023, 2:15 AM

Post #51 of 103 (346 views)

On Thu, Jan 26, 2023 at 02:30:19PM +0200, Leon Romanovsky wrote:
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.

Believe it or not, this "invented" world is already part of the real
world, and will become even more in the future.

So this has been stated elsewhere in the thread already, but I also like
to stress that hiding misbehavior of devices (real or emulated) is not
the goal of this work.

In fact, the best action for a CoCo guest in case it detects a
(possible) attack is to stop whatever it is doing and crash. And a
misbehaving device in a CoCo guest is a possible attack.

But what needs to be prevented at all costs is undefined behavior in the
CoCo guest that is triggerable by the HV, e.g. by letting an emulated
device misbehave. That undefined behavior can lead to information leak,
which is a way bigger problem for a guest owner than a crashed VM.

Regards,

Joerg

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

mst at redhat

Jan 27, 2023, 2:15 AM

Post #52 of 103 (346 views)

Permalink

On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > And this is a very special aspect of 'hardening' since it is about hardening a
> > kernel
> > > under different threat model/assumptions.
> >
> > I am not sure it's that special in that hardening IMHO is not a specific
> > threat model or a set of assumptions. IIUC it's just something that
> > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > hardware in a variety of ways I guess. And one way is just that -
> > hardening linux such that ability to corrupt guest memory does not
> > automatically escalate into guest code execution.
>
> I am not sure if I fully follow you on this. I do agree that it is in principle
> the same 'hardening' that we have been doing in Linux for decades just
> applied to a new attack surface, host <-> guest, vs userspace <->kernel.

Sorry about being unclear this is not the type of hardening I meant
really. The "hardening" you meant is preventing kernel vulnerabilities,
right? This is what we've been doing for decades.
But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
we are trying to reduce a chance a vulnerability causes random
code execution as opposed to a DOS. To think in these terms you do not
need to think about attack surfaces - in the system including
a hypervisor, guest supervisor and guest userspace hiding
one component from others is helpful even if they share
a privelege level.

> Interfaces have changed, but the types of vulnerabilities, etc are the same.
> The attacker model is somewhat different because we have
> different expectations on what host/hypervisor should be able to do
> to the guest (following business reasons and use-cases), versus what we
> expect normal userspace being able to "do" towards kernel. The host and
> hypervisor still has a lot of control over the guest (ability to start/stop it,
> manage its resources, etc). But the reasons behind this doesn’t come
> from the fact that security CoCo HW not being able to support this stricter
> security model (it cannot now indeed, but this is a design decision), but
> from the fact that it is important for Cloud service providers to retain that
> level of control over their infrastructure.

Surely they need ability to control resource usage, not ability to execute DOS
attacks. Current hardware just does not have ability to allow the former
without the later.

> >
> > If you put it this way, you get to participate in a well understood
> > problem space instead of constantly saying "yes but CC is special". And
> > further, you will now talk about features as opposed to fixing bugs.
> > Which will stop annoying people who currently seem annoyed by the
> > implication that their code is buggy simply because it does not cache in
> > memory all data read from hardware. Finally, you then don't really need
> > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > for many users it's actually the reverse - the reason is not that it's
> > not part of a threat model - which then makes you work hard to define
> > the threat model - but simply that CC hardware does not support this
> > kind of hardening.
>
> But this won't be correct statement, because it is not limitation of HW, but the
> threat and business model that Confidential Computing exists in. I am not
> aware of a single cloud provider who would be willing to use the HW that
> takes the full control of their infrastructure and running confidential guests,
> leaving them with no mechanisms to control the load balancing, enforce
> resource usage, etc. So, given that nobody needs/willing to use such HW,
> such HW simply doesn’t exist.
>
> So, I would still say that the model we operate in CoCo usecases is somewhat
> special, but I do agree that given that we list a couple of these special assumptions
> (over which ones we have no control or ability to influence, none of us are business
> people), then the rest becomes just careful enumeration of attack surface interfaces
> and break up of potential mitigations.
>
> Best Regards,
> Elena.
>

I'd say each business has a slightly different business model, no?
Finding common ground is what helps us share code ...

--
MST

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 27, 2023, 4:15 AM

Post #53 of 103 (346 views)

Permalink

> On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> > Any virtual device exposed to the guest that can transfer potentially
> > sensitive data needs to have some form of guest controlled encryption
> > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > already best practice for services by using TLS. Other devices may not
> > have good existing options for applying encryption.
>
> I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> but not transport. If an attacker can observe all IO you better
> consult a cryptographer.
> LUKS has no concept of session keys or such, so the same disk sector will
> always get encrypted with the very same key/iv.

I guess you are referring to the aes-xts-plain64 mode of LUKS operation or
to LUKS in general? Different modes of operation (including AEAD modes)
can provide different levels of protection, so I would not state it so generally.
But the point you raised is good to discuss through: XTS for example is a confidentiality mode,
based on a concept of tweakable blockcipher, designed as you pointed out
with disk encryption use case in mind. It does have a bunch of limitations/
weaknesses that are known (good classical reference I can suggest on this is [1]),
but as any blockcipher mode its confidentiality guarantees are evaluated in terms
of security against a chosen ciphertext attack (CCA) where an adversary has an access to both
encryption and decryption oracle (he can perform encryptions and decryptions
of plaintexts/cyphertexts of his liking up to the allowed number of queries).
This is a very powerful attack model which to me seems to cover the model
of untrusted host/VMM being able to observe disk reads/writes.

Also, if I remember right, the disk encryption also assumes that the disk operations are fully visible
to the attacker, i.e. he can see all encrypted data on the disk, observe how it changes
when a new block is written, etc. So, where do we have a change in an attacker model here?
What am I missing?

What AES XTS was never designed to do is an integrity protection (only some very limited
malleability): it is not AEAD mode, it doesn’t also provide a replay protection. So, the same
limitations are going to apply in our case also.

Best Regards,
Elena.

[1] Chapter 6. XTS mode, https://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 27, 2023, 6:15 AM

Post #54 of 103 (346 views)

Permalink

> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > kernel
> > > > under different threat model/assumptions.
> > >
> > > I am not sure it's that special in that hardening IMHO is not a specific
> > > threat model or a set of assumptions. IIUC it's just something that
> > > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > > hardware in a variety of ways I guess. And one way is just that -
> > > hardening linux such that ability to corrupt guest memory does not
> > > automatically escalate into guest code execution.
> >
> > I am not sure if I fully follow you on this. I do agree that it is in principle
> > the same 'hardening' that we have been doing in Linux for decades just
> > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
>
> Sorry about being unclear this is not the type of hardening I meant
> really. The "hardening" you meant is preventing kernel vulnerabilities,
> right? This is what we've been doing for decades.
> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> we are trying to reduce a chance a vulnerability causes random
> code execution as opposed to a DOS. To think in these terms you do not
> need to think about attack surfaces - in the system including
> a hypervisor, guest supervisor and guest userspace hiding
> one component from others is helpful even if they share
> a privelege level.

Do you mean that the fact that CoCo guest has memory encrypted
can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
to grasp your idea fully. When the privilege level is shared, there is no
incentive to perform privilege escalation attacks across components,
so why hide them from each other? Data protection? But I don’t think you
are talking about this? I do agree that KASLR is stronger when you remove
the possibility to read the memory (make sure kernel code is execute only)
you are trying to attack, but again not sure if you mean this.

>
>
>
> > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > The attacker model is somewhat different because we have
> > different expectations on what host/hypervisor should be able to do
> > to the guest (following business reasons and use-cases), versus what we
> > expect normal userspace being able to "do" towards kernel. The host and
> > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > manage its resources, etc). But the reasons behind this doesn’t come
> > from the fact that security CoCo HW not being able to support this stricter
> > security model (it cannot now indeed, but this is a design decision), but
> > from the fact that it is important for Cloud service providers to retain that
> > level of control over their infrastructure.
>
> Surely they need ability to control resource usage, not ability to execute DOS
> attacks. Current hardware just does not have ability to allow the former
> without the later.

I don’t see why it cannot be added to HW if requirement comes. However, I think
in cloud provider world being able to control resources equals to being able
to deny these resources when required, so being able to denial of service its clients
is kind of build-in expectation that everyone just agrees on.

>
> > >
> > > If you put it this way, you get to participate in a well understood
> > > problem space instead of constantly saying "yes but CC is special". And
> > > further, you will now talk about features as opposed to fixing bugs.
> > > Which will stop annoying people who currently seem annoyed by the
> > > implication that their code is buggy simply because it does not cache in
> > > memory all data read from hardware. Finally, you then don't really need
> > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > for many users it's actually the reverse - the reason is not that it's
> > > not part of a threat model - which then makes you work hard to define
> > > the threat model - but simply that CC hardware does not support this
> > > kind of hardening.
> >
> > But this won't be correct statement, because it is not limitation of HW, but the
> > threat and business model that Confidential Computing exists in. I am not
> > aware of a single cloud provider who would be willing to use the HW that
> > takes the full control of their infrastructure and running confidential guests,
> > leaving them with no mechanisms to control the load balancing, enforce
> > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > such HW simply doesn’t exist.
> >
> > So, I would still say that the model we operate in CoCo usecases is somewhat
> > special, but I do agree that given that we list a couple of these special
> assumptions
> > (over which ones we have no control or ability to influence, none of us are
> business
> > people), then the rest becomes just careful enumeration of attack surface
> interfaces
> > and break up of potential mitigations.
> >
> > Best Regards,
> > Elena.
> >
>
> I'd say each business has a slightly different business model, no?
> Finding common ground is what helps us share code ...

Fully agree, and a good discussion with everyone willing to listen and cooperate
can go a long way into defining the best implementation.

Best Regards,
Elena.

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

mst at redhat

Jan 27, 2023, 8:15 AM

Post #55 of 103 (346 views)

Permalink

On Fri, Jan 27, 2023 at 12:25:09PM +0000, Reshetova, Elena wrote:
>
> > On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > > kernel
> > > > > under different threat model/assumptions.
> > > >
> > > > I am not sure it's that special in that hardening IMHO is not a specific
> > > > threat model or a set of assumptions. IIUC it's just something that
> > > > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > > > hardware in a variety of ways I guess. And one way is just that -
> > > > hardening linux such that ability to corrupt guest memory does not
> > > > automatically escalate into guest code execution.
> > >
> > > I am not sure if I fully follow you on this. I do agree that it is in principle
> > > the same 'hardening' that we have been doing in Linux for decades just
> > > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
> >
> > Sorry about being unclear this is not the type of hardening I meant
> > really. The "hardening" you meant is preventing kernel vulnerabilities,
> > right? This is what we've been doing for decades.
> > But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> > we are trying to reduce a chance a vulnerability causes random
> > code execution as opposed to a DOS. To think in these terms you do not
> > need to think about attack surfaces - in the system including
> > a hypervisor, guest supervisor and guest userspace hiding
> > one component from others is helpful even if they share
> > a privelege level.
>
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios?

Yes.

> I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other?

Because limiting horisontal movement between components is still valuable.

> Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this.

It's an example. If kernel was 100% secure we won't need KASLR. Nothing
ever is though.

> >
> >
> >
> > > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > > The attacker model is somewhat different because we have
> > > different expectations on what host/hypervisor should be able to do
> > > to the guest (following business reasons and use-cases), versus what we
> > > expect normal userspace being able to "do" towards kernel. The host and
> > > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > > manage its resources, etc). But the reasons behind this doesn’t come
> > > from the fact that security CoCo HW not being able to support this stricter
> > > security model (it cannot now indeed, but this is a design decision), but
> > > from the fact that it is important for Cloud service providers to retain that
> > > level of control over their infrastructure.
> >
> > Surely they need ability to control resource usage, not ability to execute DOS
> > attacks. Current hardware just does not have ability to allow the former
> > without the later.
>
> I don’t see why it cannot be added to HW if requirement comes. However, I think
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.
>
> >
> > > >
> > > > If you put it this way, you get to participate in a well understood
> > > > problem space instead of constantly saying "yes but CC is special". And
> > > > further, you will now talk about features as opposed to fixing bugs.
> > > > Which will stop annoying people who currently seem annoyed by the
> > > > implication that their code is buggy simply because it does not cache in
> > > > memory all data read from hardware. Finally, you then don't really need
> > > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > > for many users it's actually the reverse - the reason is not that it's
> > > > not part of a threat model - which then makes you work hard to define
> > > > the threat model - but simply that CC hardware does not support this
> > > > kind of hardening.
> > >
> > > But this won't be correct statement, because it is not limitation of HW, but the
> > > threat and business model that Confidential Computing exists in. I am not
> > > aware of a single cloud provider who would be willing to use the HW that
> > > takes the full control of their infrastructure and running confidential guests,
> > > leaving them with no mechanisms to control the load balancing, enforce
> > > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > > such HW simply doesn’t exist.
> > >
> > > So, I would still say that the model we operate in CoCo usecases is somewhat
> > > special, but I do agree that given that we list a couple of these special
> > assumptions
> > > (over which ones we have no control or ability to influence, none of us are
> > business
> > > people), then the rest becomes just careful enumeration of attack surface
> > interfaces
> > > and break up of potential mitigations.
> > >
> > > Best Regards,
> > > Elena.
> > >
> >
> > I'd say each business has a slightly different business model, no?
> > Finding common ground is what helps us share code ...
>
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation.
>
> Best Regards,
> Elena.

Right. My point was that trying to show how CC usecases are similar to other
existing ones will be more helpful for everyone than just focusing on how they
are different. I hope I was able to show some similarities.

--
MST

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

tytso at mit

Jan 27, 2023, 10:15 AM

Post #56 of 103 (346 views)

Permalink

On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote:
> > This is exactly what I said. You presented me the cases which exist in
> > your invented world. Mentioned unhandled page fault doesn't exist in real
> > world. If PCI device doesn't work, it needs to be replaced/blocked and not
> > left to be operable and accessible from the kernel/user.
>
> Can we really assure correct operation of *all* pci devices out there?
> How would such an audit be performed given a huge set of them available?
> Isnt it better instead to make a small fix in the kernel behavior that would guard
> us from such potentially not correctly operating devices?

We assume that hardware works according to the spec; that's why we
have a specification. Otherwise, things would be pretty insane, and
would lead to massive bloat *everywhere*. If there are broken PCI
devices out there, then we can blacklist the PCI device. If a
manufacturer is consistently creating devices which don't obey the
spec, we could block all devices from that manufacturer, and have an
explicit white list for those devices from that manufacturer that
actually work.

If we can't count on a floating point instruction to return the right
value, what are we supposed to do? Create a code which double checks
every single floating point instruction just in case 2 + 2 = 3.99999999? :-)

Ultimately, changing the trust boundary what is considered is a
fundamentally hard thing, and to try to claim that code that assumes
that things inside the trust boundary are, well, trusted, is not a
great way to win friends and influence people.

> Let's forget the trust angle here (it only applies to the Confidential Computing
> threat model and you clearly implying the existing threat model instead) and stick just to
> the not-correctly operating device. What you are proposing is to fix *unknown* bugs
> in multitude of pci devices that (in case of this particular MSI bug) can
> lead to two different values being read from the config space and kernel incorrectly
> handing this situation.

I don't think that's what people are saying. If there are buggy PCI
devices, we can put them on block lists. But checking that every
single read from the config space is unchanged is not something we
should do, period.

> Isn't it better to do the clear fix in one place to ensure such
> situation (two subsequent reads with different values) cannot even happen in theory?
> In security we have a saying that fixing a root cause of the problem is the most efficient
> way to mitigate the problem. The root cause here is a double-read with different values,
> so if it can be substituted with an easy and clear patch that probably even improves
> performance as we do one less pci read and use cached value instead, where is the
> problem in this particular case? If there are technical issues with the patch, of course we
> need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
> to be fixing kernel code when we notice such cases...

Well, if there is a performance win to cache a read from config space,
then make the argument from a performance perspective. But caching
values takes memory, and will potentially bloat data structures. It's
not necessarily cost-free to caching every single config space
variable to prevent double-read from either buggy or malicious devices.

So it's one thing if we make each decision from a cost-benefit
perspective. But then it's a *optimization*, not a *bug-fix*, and it
also means that we aren't obligated to cache every single read from
config space, lest someone wag their fingers at us saying, "Buggy!
Your code is Buggy!".

Cheers,

- Ted

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

jejb at linux

Jan 27, 2023, 12:15 PM

Post #57 of 103 (346 views)

Permalink

On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > using them right away even without any special HW (readme
> > > > > > > has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches
> > > > > > based on these tools, you HAVE TO document that. Otherwise
> > > > > > we think you all are crazy and will get your patches
> > > > > > rejected. You all know this, why ignore it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > linux kernel when we are submitting a fix that we have to
> > > > > list the way how it has been found. We will fix this in the
> > > > > future submissions, but some bugs we have are found by
> > > > > plain code audit, so 'human' is the tool.
> > > > My problem with that statement is that by applying different
> > > > threat model you "invent" bugs which didn't exist in a first
> > > > place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled
> > > > correct behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled
> > > page fault (such as in that case) or detection of a KASAN out of
> > > bounds violation (as it is in some other cases we already have
> > > fixes or investigating) it represents a correct behavior even if
> > > you expect that all your pci HW devices are trusted?
> >
> > This is exactly what I said. You presented me the cases which exist
> > in your invented world. Mentioned unhandled page fault doesn't
> > exist in real world. If PCI device doesn't work, it needs to be
> > replaced/blocked and not left to be operable and accessible from
> > the kernel/user.
>
> Can we really assure correct operation of *all* pci devices out
> there? How would such an audit be performed given a huge set of them
> available? Isnt it better instead to make a small fix in the kernel
> behavior that would guard us from such potentially not correctly
> operating devices?

I think this is really the wrong question from the confidential
computing (CC) point of view. The question shouldn't be about assuring
that the PCI device is operating completely correctly all the time (for
some value of correct). It's if it were programmed to be malicious
what could it do to us? If we take all DoS and Crash outcomes off the
table (annoying but harmless if they don't reveal the confidential
contents), we're left with it trying to extract secrets from the
confidential environment.

The big threat from most devices (including the thunderbolt classes) is
that they can DMA all over memory. However, this isn't really a threat
in CC (well until PCI becomes able to do encrypted DMA) because the
device has specific unencrypted buffers set aside for the expected DMA.
If it writes outside that CC integrity will detect it and if it reads
outside that it gets unintelligible ciphertext. So we're left with the
device trying to trick secrets out of us by returning unexpected data.

If I set this as the problem, verifying device correct operation is a
possible solution (albeit hugely expensive), but there are likely many
other cheaper ways to defeat or detect a device trying to trick us into
revealing something.

James

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

carlos.bilbao at amd

Jan 27, 2023, 2:15 PM

Post #58 of 103 (346 views)

Permalink

On 1/27/23 6:25 AM, Reshetova, Elena wrote:
>
>> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
>>>> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
>>>>> And this is a very special aspect of 'hardening' since it is about hardening a
>>>> kernel
>>>>> under different threat model/assumptions.
>>>>
>>>> I am not sure it's that special in that hardening IMHO is not a specific
>>>> threat model or a set of assumptions. IIUC it's just something that
>>>> helps reduce severity of vulnerabilities. Similarly, one can use the CC
>>>> hardware in a variety of ways I guess. And one way is just that -
>>>> hardening linux such that ability to corrupt guest memory does not
>>>> automatically escalate into guest code execution.
>>>
>>> I am not sure if I fully follow you on this. I do agree that it is in principle
>>> the same 'hardening' that we have been doing in Linux for decades just
>>> applied to a new attack surface, host <-> guest, vs userspace <->kernel.
>>
>> Sorry about being unclear this is not the type of hardening I meant
>> really. The "hardening" you meant is preventing kernel vulnerabilities,
>> right? This is what we've been doing for decades.
>> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
>> we are trying to reduce a chance a vulnerability causes random
>> code execution as opposed to a DOS. To think in these terms you do not
>> need to think about attack surfaces - in the system including
>> a hypervisor, guest supervisor and guest userspace hiding
>> one component from others is helpful even if they share
>> a privelege level.
>
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other? Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this.
>
>>
>>
>>
>>> Interfaces have changed, but the types of vulnerabilities, etc are the same.
>>> The attacker model is somewhat different because we have
>>> different expectations on what host/hypervisor should be able to do
>>> to the guest (following business reasons and use-cases), versus what we
>>> expect normal userspace being able to "do" towards kernel. The host and
>>> hypervisor still has a lot of control over the guest (ability to start/stop it,
>>> manage its resources, etc). But the reasons behind this doesn’t come
>>> from the fact that security CoCo HW not being able to support this stricter
>>> security model (it cannot now indeed, but this is a design decision), but
>>> from the fact that it is important for Cloud service providers to retain that
>>> level of control over their infrastructure.
>>
>> Surely they need ability to control resource usage, not ability to execute DOS
>> attacks. Current hardware just does not have ability to allow the former
>> without the later.
>
> I don’t see why it cannot be added to HW if requirement comes. However, I think
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.
>

Just a thought, but I wouldn't discard availability guarantees like that
at some point. As a client I would certainly like it, and if it's good
for business...

>>
>>>>
>>>> If you put it this way, you get to participate in a well understood
>>>> problem space instead of constantly saying "yes but CC is special". And
>>>> further, you will now talk about features as opposed to fixing bugs.
>>>> Which will stop annoying people who currently seem annoyed by the
>>>> implication that their code is buggy simply because it does not cache in
>>>> memory all data read from hardware. Finally, you then don't really need
>>>> to explain why e.g. DoS is not a problem but info leak is a problem - when
>>>> for many users it's actually the reverse - the reason is not that it's
>>>> not part of a threat model - which then makes you work hard to define
>>>> the threat model - but simply that CC hardware does not support this
>>>> kind of hardening.
>>>
>>> But this won't be correct statement, because it is not limitation of HW, but the
>>> threat and business model that Confidential Computing exists in. I am not
>>> aware of a single cloud provider who would be willing to use the HW that
>>> takes the full control of their infrastructure and running confidential guests,
>>> leaving them with no mechanisms to control the load balancing, enforce
>>> resource usage, etc. So, given that nobody needs/willing to use such HW,
>>> such HW simply doesn’t exist.
>>>
>>> So, I would still say that the model we operate in CoCo usecases is somewhat
>>> special, but I do agree that given that we list a couple of these special
>> assumptions
>>> (over which ones we have no control or ability to influence, none of us are
>> business
>>> people), then the rest becomes just careful enumeration of attack surface
>> interfaces
>>> and break up of potential mitigations.
>>>
>>> Best Regards,
>>> Elena.
>>>
>>
>> I'd say each business has a slightly different business model, no?
>> Finding common ground is what helps us share code ...
>
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation.
>
> Best Regards,
> Elena.

Thanks for sharing the threat model with the list!

Carlos

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 30, 2023, 12:15 AM

Post #59 of 103 (346 views)

Permalink

On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > > wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > > wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > > using them right away even without any special HW (readme
> > > > > > > > has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches
> > > > > > > based on these tools, you HAVE TO document that. Otherwise
> > > > > > > we think you all are crazy and will get your patches
> > > > > > > rejected. You all know this, why ignore it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > > linux kernel when we are submitting a fix that we have to
> > > > > > list the way how it has been found. We will fix this in the
> > > > > > future submissions, but some bugs we have are found by
> > > > > > plain code audit, so 'human' is the tool.
> > > > > My problem with that statement is that by applying different
> > > > > threat model you "invent" bugs which didn't exist in a first
> > > > > place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled
> > > > > correct behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@linux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled
> > > > page fault (such as in that case) or detection of a KASAN out of
> > > > bounds violation (as it is in some other cases we already have
> > > > fixes or investigating) it represents a correct behavior even if
> > > > you expect that all your pci HW devices are trusted?
> > >
> > > This is exactly what I said. You presented me the cases which exist
> > > in your invented world. Mentioned unhandled page fault doesn't
> > > exist in real world. If PCI device doesn't work, it needs to be
> > > replaced/blocked and not left to be operable and accessible from
> > > the kernel/user.
> >
> > Can we really assure correct operation of *all* pci devices out
> > there? How would such an audit be performed given a huge set of them
> > available? Isnt it better instead to make a small fix in the kernel
> > behavior that would guard us from such potentially not correctly
> > operating devices?
>
> I think this is really the wrong question from the confidential
> computing (CC) point of view. The question shouldn't be about assuring
> that the PCI device is operating completely correctly all the time (for
> some value of correct). It's if it were programmed to be malicious
> what could it do to us?

Sure, but Leon didn’t agree with CC threat model to begin with, so
I was trying to argue here how this fix can be useful for non-CC threat
model case. But obviously my argument for non-CC case wasn't good (
especially reading Ted's reply here
https://lore.kernel.org/all/Y9Lonw9HzlosUPnS@mit.edu/ ), so I better
stick to CC threat model case indeed.

>If we take all DoS and Crash outcomes off the
> table (annoying but harmless if they don't reveal the confidential
> contents), we're left with it trying to extract secrets from the
> confidential environment.

Yes, this is the ultimate end goal.

>
> The big threat from most devices (including the thunderbolt classes) is
> that they can DMA all over memory. However, this isn't really a threat
> in CC (well until PCI becomes able to do encrypted DMA) because the
> device has specific unencrypted buffers set aside for the expected DMA.
> If it writes outside that CC integrity will detect it and if it reads
> outside that it gets unintelligible ciphertext. So we're left with the
> device trying to trick secrets out of us by returning unexpected data.

Yes, by supplying the input that hasn’t been expected. This is exactly
the case we were trying to fix here for example:
https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
I do agree that this case is less severe when others where memory
corruption/buffer overrun can happen, like here:
https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
But we are trying to fix all issues we see now (prioritizing the second ones
though).

>
> If I set this as the problem, verifying device correct operation is a
> possible solution (albeit hugely expensive) but there are likely many
> other cheaper ways to defeat or detect a device trying to trick us into
> revealing something.

What do you have in mind here for the actual devices we need to enable for CC cases?
We have been using here a combination of extensive fuzzing and static code analysis.

Best Regards,
Elena.

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

dinechin at redhat

Jan 30, 2023, 4:15 AM

Post #60 of 103 (346 views)

Permalink

On 2023-01-25 at 14:13 UTC, Daniel P. Berrangé <berrange@redhat.com> wrote...
> On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
>> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
>> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
>> > > Hi Greg,
>> > >
>> > > You mentioned couple of times (last time in this recent thread:
>> > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
>> > > discussing the updated threat model for kernel, so this email is a start in this direction.
>> >
>> > Any specific reason you didn't cc: the linux-hardening mailing list?
>> > This seems to be in their area as well, right?
>> >
>> > > As we have shared before in various lkml threads/conference presentations
>> > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
>> > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
>> >
>> > That is, frankly, a very funny threat model. How realistic is it really
>> > given all of the other ways that a hypervisor can mess with a guest?
>>
>> It's what a lot of people would like; in the early attempts it was easy
>> to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
>> can mess with - remember that not just the memory is encrypted, so is
>> the register state, and the guest gets to see changes to mapping and a
>> lot of control over interrupt injection etc.
>>
>> > So what do you actually trust here? The CPU? A device? Nothing?
>>
>> We trust the actual physical CPU, provided that it can prove that it's a
>> real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware
>> can perform an attestation signed by the CPU to prove to someone
>> external that the guest is running on a real trusted CPU.
>>
>> Note that the trust is limited:
>> a) We don't trust that we can make forward progress - if something
>> does something bad it's OK for the guest to stop.
>> b) We don't trust devices, and we don't trust them by having the guest
>> do normal encryption; e.g. just LUKS on the disk and normal encrypted
>> networking. [There's a lot of schemes people are working on about how
>> the guest gets the keys etc for that)
>
> I think we need to more precisely say what we mean by 'trust' as it
> can have quite a broad interpretation.
>
> As a baseline requirement, in the context of confidential computing the
> guest would not trust the hypervisor with data that needs to remain
> confidential, but would generally still expect it to provide a faithful
> implementation of a given device.

... or to have a reliable faulting behaviour (e.g. panic) if the device is
found to be malicious, e.g. attempting to inject bogus data in the driver to
trigger unexpected paths in the guest kernel.

I think that part of the original discussion is really about being able to
do that at least for the small subset of (mostly virtio) devices that would
typically be of use in a CoCo setup.

As was pointed out elsewhere in that thread, doing so for physical devices,
to the point of enabling end-to-end attestation and encryption, is work that
is presently underway, but there is work to do already with the
comparatively small subset of devices we need in the short-term. Also, that
work needs only the Linux kernel community, whereas changes for example at
the PCI level are much broader, and therefore require a lot more time.

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

dinechin at redhat

Jan 30, 2023, 4:15 AM

Post #61 of 103 (346 views)

Permalink

Hi Elena,

On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote...
> Hi Greg,
>
> You mentioned couple of times (last time in this recent thread:
> https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> discussing the updated threat model for kernel, so this email is a start in this direction.
>
> (Note: I tried to include relevant people from different companies, as well as linux-coco
> mailing list, but I hope everyone can help by including additional people as needed).
>
> As we have shared before in various lkml threads/conference presentations
> ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> This is a big change in the threat model and requires both careful assessment of the
> new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> and security validation techniques. This is the activity that we have started back at Intel
> and the current status can be found in
>
> 1) Threat model and potential mitigations:
> https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html

I only looked at this one so far. Here are a few quick notes:

DoS attacks are out of scope. What about timing attacks, which were the
basis of some of the most successful attacks in the past years? My
understanding is that TDX relies on existing mitigations, and does not
introduce anythign new in that space. Worth mentioning in that "out of
scope" section IMO.

Why are TDVMCALL hypercalls listed as an "existing" communication interface?
That seems to exclude the TDX module from the TCB. Also, "shared memory for
I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing
attacks, network or storage attacks, or devices passed through to the guest.
The latter category seems important to list, since there are separate
efforts to provide confidential computing capabilities e.g. to PCI devices,
which were discussed elsewhere in this thread.

I suspect that my question above is due to ambiguous wording. What I
initially read as "this is out of scope for TDX" morphs in the next
paragraph into "we are going to explain how to mitigate attacks through
TDVMCALLS and shared memory for I/O". Consider rewording to clarify the
intent of these paragraphs.

Nit: I suggest adding bullets to the items below "between host/VMM and the
guest"

You could count the "unique code locations" that can consume malicious input
in drivers, why not in core kernel? I think you write elsewhere that the
drivers account for the vast majority, so I suspect you have the numbers.

"The implementation of the #VE handler is simple and does not require an
in-depth security audit or fuzzing since it is not the actual consumer of
the host/VMM supplied untrusted data": The assumption there seems to be that
the host will never be able to supply data (e.g. through a bounce buffer)
that it can trick the guest into executing. If that is indeed the
assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
since many earlier attacks were based on executing the wrong code. Notably,
it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
key (as opposed to any device key e.g. for PCI encryption) in either
TDX or SEV. Is there for example anything that precludes TDX or SEV from
executing code in the bounce buffers?

"We only care about users that read from MMIO": Why? My guess is that this
is the only way bad data could be fed to the guest. But what if a bad MMIO
write due to poisoned data injected earlier was a necessary step to open the
door to a successful attack?

>
> 2) One of the described in the above doc mitigations is "hardening of the enabled
> code". What we mean by this, as well as techniques that are being used are
> described in this document:
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>
> 3) All the tools are open-source and everyone can start using them right away even
> without any special HW (readme has description of what is needed).
> Tools and documentation is here:
> https://github.com/intel/ccc-linux-guest-hardening
>
> 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> here: https://github.com/intel/tdx/commits/guest-next
>
> So, my main question before we start to argue about the threat model, mitigations, etc,
> is what is the good way to get this reviewed to make sure everyone is aligned?
> There are a lot of angles and details, so what is the most efficient method?
> Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> into logical pieces and start submitting it to mailing list for discussion one by one?
> Any other methods?
>
> The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> data, alternatives considered, etc.
>
> Best Regards,
> Elena.
>
> [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/
> [2] https://lpc.events/event/16/contributions/1328/
> [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

kirill at shutemov

Jan 30, 2023, 4:15 AM

Post #62 of 103 (346 views)

Permalink

On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote:
> Is there for example anything that precludes TDX or SEV from executing
> code in the bounce buffers?

In TDX, attempt to fetch instructions from shared memory (i.e. bounce
buffer) will cause #GP, only data fetch is allowed. Page table also cannot
be placed there and will cause the same #GP.

--
Kiryl Shutsemau / Kirill A. Shutemov

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

jejb at linux

Jan 30, 2023, 6:15 AM

Post #63 of 103 (346 views)

Permalink

On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
[...]
> > The big threat from most devices (including the thunderbolt
> > classes) is that they can DMA all over memory. However, this isn't
> > really a threat in CC (well until PCI becomes able to do encrypted
> > DMA) because the device has specific unencrypted buffers set aside
> > for the expected DMA. If it writes outside that CC integrity will
> > detect it and if it reads outside that it gets unintelligible
> > ciphertext. So we're left with the device trying to trick secrets
> > out of us by returning unexpected data.
>
> Yes, by supplying the input that hasn’t been expected. This is
> exactly the case we were trying to fix here for example:
> https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
> I do agree that this case is less severe when others where memory
> corruption/buffer overrun can happen, like here:
> https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
> But we are trying to fix all issues we see now (prioritizing the
> second ones though).

I don't see how MSI table sizing is a bug in the category we've
defined. The very text of the changelog says "resulting in a kernel
page fault in pci_write_msg_msix()." which is a crash, which I thought
we were agreeing was out of scope for CC attacks?

> >
> > If I set this as the problem, verifying device correct operation is
> > a possible solution (albeit hugely expensive) but there are likely
> > many other cheaper ways to defeat or detect a device trying to
> > trick us into revealing something.
>
> What do you have in mind here for the actual devices we need to
> enable for CC cases?

Well, the most dangerous devices seem to be the virtio set a CC system
will rely on to boot up. After that, there are other ways (like SPDM)
to verify a real PCI device is on the other end of the transaction.

> We have been using here a combination of extensive fuzzing and static
> code analysis.

by fuzzing, I assume you mean fuzzing from the PCI configuration space?
Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
off the table because fuzzing primarily triggers those so its hard to
see what else it could detect given the signal will be smothered by
oopses and secondly I think the PCI interface is likely the wrong place
to begin and you should probably begin on the virtio bus and the
hypervisor generated configuration space.

James

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

mst at redhat

Jan 30, 2023, 8:15 AM

Post #64 of 103 (346 views)

Permalink

On Mon, Jan 30, 2023 at 03:00:52PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote:
> > Is there for example anything that precludes TDX or SEV from executing
> > code in the bounce buffers?
>
> In TDX, attempt to fetch instructions from shared memory (i.e. bounce
> buffer) will cause #GP, only data fetch is allowed. Page table also cannot
> be placed there and will cause the same #GP.

Same with SEV IIRC.

--
MST

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 31, 2023, 2:15 AM

Post #65 of 103 (346 views)

Permalink

Hi Dinechin,

Thank you very much for your review! Please find the replies inline.

>
> Hi Elena,
>
> On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com>
> wrote...
> > Hi Greg,
> >
> > You mentioned couple of times (last time in this recent thread:
> > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to
> start
> > discussing the updated threat model for kernel, so this email is a start in this
> direction.
> >
> > (Note: I tried to include relevant people from different companies, as well as
> linux-coco
> > mailing list, but I hope everyone can help by including additional people as
> needed).
> >
> > As we have shared before in various lkml threads/conference presentations
> > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we
> have a
> > change in the threat model where guest kernel doesn’t anymore trust the
> hypervisor.
> > This is a big change in the threat model and requires both careful assessment of
> the
> > new (hypervisor <-> guest kernel) attack surface, as well as careful design of
> mitigations
> > and security validation techniques. This is the activity that we have started back
> at Intel
> > and the current status can be found in
> >
> > 1) Threat model and potential mitigations:
> > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
>
> I only looked at this one so far. Here are a few quick notes:
>
> DoS attacks are out of scope. What about timing attacks, which were the
> basis of some of the most successful attacks in the past years? My
> understanding is that TDX relies on existing mitigations, and does not
> introduce anythign new in that space. Worth mentioning in that "out of
> scope" section IMO.

It is not out of the scope because TD guest SW has to think about these
matters and protect adequately. We have a section lower on " Transient Execution attacks
mitigation" https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#transient-execution-attacks-and-their-mitigation
but I agree it is worth pointing to this (and generic side-channel attacks) already
in the scoping. I will make an update.

>
> Why are TDVMCALL hypercalls listed as an "existing" communication interface?
> That seems to exclude the TDX module from the TCB.

I believe this is just ambiguous wording, I need to find a better one.
TDVMCALL is indeed a *new* TDX specific communication interface, but it is
only a transport in this case for the actual *existing* legacy communication interfaces
between the VM guest and host/hypervisor (read/write MSRs, pci config space
access, port IO and MMIO, etc).

Also, "shared memory for
> I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing
> attacks, network or storage attacks, or devices passed through to the guest.
> The latter category seems important to list, since there are separate
> efforts to provide confidential computing capabilities e.g. to PCI devices,
> which were discussed elsewhere in this thread.

The second bullet meant to say that we also have another interface how CoCo guest
and host/VMM can communicate and it is done via shared pages (vs private pages that
are only accessible to confidential computing guest). Maybe I should drop the "IO" part of
this and it would avoid confusion. The other means (some are higher-level abstractions
like disk operations that happen over bounce buffer in shared memory), like interrupts, disk, etc,
we do cover below in separate sections of the doc with exception of covering
CoCo-enabled devices. This is smth we can briefly mention as an addition, but since
we don’t have these devices yet, and neither we have linux implementation that
can securely add them to the CoCo guest, I find it preliminary to discuss details at this point.

> I suspect that my question above is due to ambiguous wording. What I
> initially read as "this is out of scope for TDX" morphs in the next
> paragraph into "we are going to explain how to mitigate attacks through
> TDVMCALLS and shared memory for I/O". Consider rewording to clarify the
> intent of these paragraphs.
>

Sure, sorry for ambiguous wording, will try to clarify.

> Nit: I suggest adding bullets to the items below "between host/VMM and the
> guest"

Yes, it used to have it actually, have to see what happened with recent docs update.

>
> You could count the "unique code locations" that can consume malicious input
> in drivers, why not in core kernel? I think you write elsewhere that the
> drivers account for the vast majority, so I suspect you have the numbers.

I don’t have the ready numbers for core kernel, but if really needed, I can calculate them.
Here https://github.com/intel/ccc-linux-guest-hardening/tree/master/bkc/audit/sample_output/6.0-rc2
you can find the public files that would produce this data:

https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig
is all hits (with taint propagation) for the whole allyesconfig (x86 build, CONFIG_COMPILE_TEST is off).
https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig_filtered
is the same but with most of the drivers dropped.

>
> "The implementation of the #VE handler is simple and does not require an
> in-depth security audit or fuzzing since it is not the actual consumer of
> the host/VMM supplied untrusted data": The assumption there seems to be that
> the host will never be able to supply data (e.g. through a bounce buffer)
> that it can trick the guest into executing. If that is indeed the
> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
> since many earlier attacks were based on executing the wrong code. Notably,
> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
> key (as opposed to any device key e.g. for PCI encryption) in either
> TDX or SEV. Is there for example anything that precludes TDX or SEV from
> executing code in the bounce buffers?

This was already replied by Kirill, any code execution out of shared memory generates
a #GP.

>
> "We only care about users that read from MMIO": Why? My guess is that this
> is the only way bad data could be fed to the guest. But what if a bad MMIO
> write due to poisoned data injected earlier was a necessary step to open the
> door to a successful attack?

The entry point of the attack is still a "read". The situation you describe can happen,
but the root cause would be still an incorrectly handled MMIO read and this is what
we try to check with both fuzzing and auditing the 'read' entry points.

Thank you again for the review!

Best Regards,
Elena.

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 31, 2023, 4:15 AM

Post #66 of 103 (346 views)

Permalink

> On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> [...]
> > > The big threat from most devices (including the thunderbolt
> > > classes) is that they can DMA all over memory. However, this isn't
> > > really a threat in CC (well until PCI becomes able to do encrypted
> > > DMA) because the device has specific unencrypted buffers set aside
> > > for the expected DMA. If it writes outside that CC integrity will
> > > detect it and if it reads outside that it gets unintelligible
> > > ciphertext. So we're left with the device trying to trick secrets
> > > out of us by returning unexpected data.
> >
> > Yes, by supplying the input that hasn’t been expected. This is
> > exactly the case we were trying to fix here for example:
> > https://lore.kernel.org/all/20230119170633.40944-2-
> alexander.shishkin@linux.intel.com/
> > I do agree that this case is less severe when others where memory
> > corruption/buffer overrun can happen, like here:
> > https://lore.kernel.org/all/20230119135721.83345-6-
> alexander.shishkin@linux.intel.com/
> > But we are trying to fix all issues we see now (prioritizing the
> > second ones though).
>
> I don't see how MSI table sizing is a bug in the category we've
> defined. The very text of the changelog says "resulting in a kernel
> page fault in pci_write_msg_msix()." which is a crash, which I thought
> we were agreeing was out of scope for CC attacks?

As I said this is an example of a crash and on the first look
might not lead to the exploitable condition (albeit attackers are creative).
But we noticed this one while fuzzing and it was common enough
that prevented fuzzer going deeper into the virtio devices driver fuzzing.
The core PCI/MSI doesn’t seem to have that many easily triggerable
Other examples in virtio patchset are more severe.

>
> > >
> > > If I set this as the problem, verifying device correct operation is
> > > a possible solution (albeit hugely expensive) but there are likely
> > > many other cheaper ways to defeat or detect a device trying to
> > > trick us into revealing something.
> >
> > What do you have in mind here for the actual devices we need to
> > enable for CC cases?
>
> Well, the most dangerous devices seem to be the virtio set a CC system
> will rely on to boot up. After that, there are other ways (like SPDM)
> to verify a real PCI device is on the other end of the transaction.

Yes, it the future, but not yet. Other vendors will not necessary be
using virtio devices at this point, so we will have non-virtio and not
CC enabled devices that we want to securely add to the guest.

>
> > We have been using here a combination of extensive fuzzing and static
> > code analysis.
>
> by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> off the table because fuzzing primarily triggers those

If you enable memory sanitizers you can detect more server conditions like
out of bounds accesses and such. I think given that we have a way to
verify that fuzzing is reaching the code locations we want it to reach, it
can be pretty effective method to find at least low-hanging bugs. And these
will be the bugs that most of the attackers will go after at the first place.
But of course it is not a formal verification of any kind.

so its hard to
> see what else it could detect given the signal will be smothered by
> oopses and secondly I think the PCI interface is likely the wrong place
> to begin and you should probably begin on the virtio bus and the
> hypervisor generated configuration space.

This is exactly what we do. We don’t fuzz from the PCI config space,
we supply inputs from the host/vmm via the legitimate interfaces that it can
inject them to the guest: whenever guest requests a pci config space
(which is controlled by host/hypervisor as you said) read operation,
it gets input injected by the kafl fuzzer. Same for other interfaces that
are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
via #VE handler in our case). When it comes to virtio, we employ
two different fuzzing techniques: directly injecting kafl fuzz input when
virtio core or virtio drivers gets the data received from the host
(via injecting input in functions virtio16/32/64_to_cpu and others) and
directly fuzzing DMA memory pages using kfx fuzzer.
More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Best Regards,
Elena.

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

jejb at linux

Jan 31, 2023, 6:15 AM

Post #67 of 103 (346 views)

Permalink

On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory. However, this
> > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > encrypted DMA) because the device has specific unencrypted
> > > > buffers set aside for the expected DMA. If it writes outside
> > > > that CC integrity will detect it and if it reads outside that
> > > > it gets unintelligible ciphertext. So we're left with the
> > > > device trying to trick secrets out of us by returning
> > > > unexpected data.
> > >
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> >
> > I don't see how MSI table sizing is a bug in the category we've
> > defined. The very text of the changelog says "resulting in a
> > kernel page fault in pci_write_msg_msix()." which is a crash,
> > which I thought we were agreeing was out of scope for CC attacks?
>
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are
> creative). But we noticed this one while fuzzing and it was common
> enough that prevented fuzzer going deeper into the virtio devices
> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> easily triggerable Other examples in virtio patchset are more severe.

You cited this as your example. I'm pointing out it seems to be an
event of the class we've agreed not to consider because it's an oops
not an exploit. If there are examples of fixing actual exploits to CC
VMs, what are they?

This patch is, however, an example of the problem everyone else on the
thread is complaining about: a patch which adds an unnecessary check to
the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
in the real world the tables are correct (or the manufacturer is
quickly chastened), so it adds overhead to no benefit.

[...]
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong
> > place to begin and you should probably begin on the virtio bus and
> > the hypervisor generated configuration space.
>
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that
> it can inject them to the guest: whenever guest requests a pci config
> space (which is controlled by host/hypervisor as you said) read
> operation, it gets input injected by the kafl fuzzer. Same for other
> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> anything that goes via #VE handler in our case). When it comes to
> virtio, we employ two different fuzzing techniques: directly
> injecting kafl fuzz input when virtio core or virtio drivers gets the
> data received from the host (via injecting input in functions
> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> pages using kfx fuzzer. More information can be found in
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Given that we previously agreed that oppses and other DoS attacks are
out of scope for CC, I really don't think fuzzing, which primarily
finds oopses, is at all a useful tool unless you filter the results by
the question "could we exploit this in a CC VM to reveal secrets".
Without applying that filter you're sending a load of patches which
don't really do much to reduce the CC attack surface and which do annoy
non-CC people because they add pointless checks to things they expect
the cards and config tables to get right.

James

RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]

elena.reshetova at intel

Jan 31, 2023, 10:15 AM

Post #68 of 103 (346 views)

Permalink

> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory. However, this
> > > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > > encrypted DMA) because the device has specific unencrypted
> > > > > buffers set aside for the expected DMA. If it writes outside
> > > > > that CC integrity will detect it and if it reads outside that
> > > > > it gets unintelligible ciphertext. So we're left with the
> > > > > device trying to trick secrets out of us by returning
> > > > > unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined. The very text of the changelog says "resulting in a
> > > kernel page fault in pci_write_msg_msix()." which is a crash,
> > > which I thought we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are
> > creative). But we noticed this one while fuzzing and it was common
> > enough that prevented fuzzer going deeper into the virtio devices
> > driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> > easily triggerable Other examples in virtio patchset are more severe.
>
> You cited this as your example. I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit. If there are examples of fixing actual exploits to CC
> VMs, what are they?
>
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

How can you make sure there is no exploit possible using this crash
as a stepping stone into a CC guest? Or are you saying that we are back
to the times when we can merge the fixes for crashes and out of bound errors in
kernel only given that we submit a proof of concept exploit with the
patch for every issue?

>
>
> [...]
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong
> > > place to begin and you should probably begin on the virtio bus and
> > > the hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that
> > it can inject them to the guest: whenever guest requests a pci config
> > space (which is controlled by host/hypervisor as you said) read
> > operation, it gets input injected by the kafl fuzzer. Same for other
> > interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> > anything that goes via #VE handler in our case). When it comes to
> > virtio, we employ two different fuzzing techniques: directly
> > injecting kafl fuzz input when virtio core or virtio drivers gets the
> > data received from the host (via injecting input in functions
> > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > pages using kfx fuzzer. More information can be found in
> > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> hardening.html#td-guest-fuzzing
>
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

I don’t think we have agreed that random kernel crashes are out of scope in CC threat model
(controlled safe panic is out of scope, but this is not what we have here).
It all depends if this ops can be used in a successful attack against guest private
memory or not and this is *not* a trivial thing to decide.
That's said, we are mostly focusing on KASAN findings, which
have higher likelihood to be exploitable at least for host -> guest privilege escalation
(which in turn compromised guest private memory confidentiality). Fuzzing has a
long history of find such issues in past (including the ones that have been
exploited after). But even for this ops bug, can anyone guarantee it cannot be chained
with other ones to cause a more complex privilege escalation attack? I wont be making
such a claim, I feel it is safer to fix this vs debating whenever it can be used for an
attack or not.

Best Regards,
Elena.

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

dinechin at redhat

Jan 31, 2023, 10:15 AM

Post #69 of 103 (346 views)

Permalink

On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>> > [...]
>> > > > The big threat from most devices (including the thunderbolt
>> > > > classes) is that they can DMA all over memory. However, this
>> > > > isn't really a threat in CC (well until PCI becomes able to do
>> > > > encrypted DMA) because the device has specific unencrypted
>> > > > buffers set aside for the expected DMA. If it writes outside
>> > > > that CC integrity will detect it and if it reads outside that
>> > > > it gets unintelligible ciphertext. So we're left with the
>> > > > device trying to trick secrets out of us by returning
>> > > > unexpected data.
>> > >
>> > > Yes, by supplying the input that hasn’t been expected. This is
>> > > exactly the case we were trying to fix here for example:
>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>> > alexander.shishkin@linux.intel.com/
>> > > I do agree that this case is less severe when others where memory
>> > > corruption/buffer overrun can happen, like here:
>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>> > alexander.shishkin@linux.intel.com/
>> > > But we are trying to fix all issues we see now (prioritizing the
>> > > second ones though).
>> >
>> > I don't see how MSI table sizing is a bug in the category we've
>> > defined. The very text of the changelog says "resulting in a
>> > kernel page fault in pci_write_msg_msix()." which is a crash,
>> > which I thought we were agreeing was out of scope for CC attacks?
>>
>> As I said this is an example of a crash and on the first look
>> might not lead to the exploitable condition (albeit attackers are
>> creative). But we noticed this one while fuzzing and it was common
>> enough that prevented fuzzer going deeper into the virtio devices
>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>> easily triggerable Other examples in virtio patchset are more severe.
>
> You cited this as your example. I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit. If there are examples of fixing actual exploits to CC
> VMs, what are they?
>
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

I'd like to backtrack a little here.

1/ PCI-as-a-thread, where does it come from?

On physical devices, we have to assume that the device is working. As other
pointed out, there are things like PCI compliance tests, etc. So Linux has
to trust the device. You could manufacture a broken device intentionally,
but the value you would get from that would be limited.

On a CC system, the "PCI" values are really provided by the hypervisor,
which is not trusted. This leads to this peculiar way of thinking where we
say "what happens if virtual device feeds us a bogus value *intentionally*".
We cannot assume that the *virtual* PCI device ran through the compliance
tests. Instead, we see the PCI interface as hostile, which makes us look
like weirdos to the rest of the community.

Consequently, as James pointed out, we first need to focus on consequences
that would break what I would call the "CC promise", which is essentially
that we'd rather kill the guest than reveal its secrets. Unless you have a
credible path to a secret being revealed, don't bother "fixing" a bug. And
as was pointed out elsewhere in this thread, caching has a cost, so you
can't really use the "optimization" angle either.

2/ Clarification of the "CC promise" and value proposition

Based on the above, the very first thing is to clarify that "CC promise",
because if exchanges on this thread have proved anything, it is that it's
quite unclear to anyone outside the "CoCo world".

The Linux Guest Kernel Security Specification needs to really elaborate on
what the value proposition of CC is, not assume it is a given. "Bug fixes"
before this value proposition has been understood and accepted by the
non-CoCo community are likely to go absolutely nowhere.

Here is a quick proposal for the Purpose and Scope section:

<doc>
Purpose and Scope

Confidential Computing (CC) is a set of technologies that allows a guest to
run without having to trust either the hypervisor or the host. CC offers two
new guarantees to the guest compared to the non-CC case:

a) The guest will be able to measure and attest, by cryptographic means, the
guest software stack that it is running, and be assured that this
software stack cannot be tampered with by the host or the hypervisor
after it was measured. The root of trust for this aspect of CC is
typically the CPU manufacturer (e.g. through a private key that can be
used to respond to cryptographic challenges).

b) Guest state, including memory, become secrets which must remain
inaccessible to the host. In a CC context, it is considered preferable to
stop or kill a guest rather than risk leaking its secrets. This aspect of
CC is typically enforced by means such as memory encryption and new
semantics for memory protection.

CC leads to a different threat model for a Linux kernel running as a guest
inside a confidential virtual machine (CVM). Notably, whereas the machine
(CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
case, the hypervisor emulating some aspects of the virtual machine is now
considered as potentially malicious. Consequently, effects of any data
provided by the guest to the hypervisor, including ACPI configuration
tables, MMIO interfaces or machine specific registers (MSRs) need to be
re-evaluated.

This document describes the security architecture of the Linux guest kernel
running inside a CVM, with a particular focus on the Intel TDX
implementation. Many aspects of this document will be applicable to other
CC implementations such as AMD SEV.

Aspects of the guest-visible state that are under direct control of the
hardware, such as the CPU state or memory protection, will be considered as
being handled by the CC implementations. This document will therefore only
focus on aspects of the virtual machine that are typically managed by the
hypervisor or the host.

Since the host ultimately owns the resources and can allocate them at will,
including denying their use at any point, this document will not address
denial or service or performance degradation. It will however cover random
number generation, which is central for cryptographic security.

Finally, security considerations that apply irrespective of whether the
platform is confidential or not are also outside of the scope of this
document. This includes topics ranging from timing attacks to social
engineering.
</doc>

Feel free to comment and reword at will ;-)

3/ PCI-as-a-threat: where does that come from

Isn't there a fundamental difference, from a threat model perspective,
between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
should defeat) and compromised software feeding us bad data? I think there
is: at leats inside the TCB, we can detect bad software using measurements,
and prevent it from running using attestation. In other words, we first
check what we will run, then we run it. The security there is that we know
what we are running. The trust we have in the software is from testing,
reviewing or using it.

This relies on a key aspect provided by TDX and SEV, which is that the
software being measured is largely tamper-resistant thanks to memory
encryption. In other words, after you have measured your guest software
stack, the host or hypervisor cannot willy-nilly change it.

So this brings me to the next question: is there any way we could offer the
same kind of service for KVM and qemu? The measurement part seems relatively
easy. Thetamper-resistant part, on the other hand, seems quite difficult to
me. But maybe someone else will have a brilliant idea?

So I'm asking the question, because if you could somehow prove to the guest
not only that it's running the right guest stack (as we can do today) but
also a known host/KVM/hypervisor stack, we would also switch the potential
issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
this is something which is evidently easier to deal with.

I briefly discussed this with James, and he pointed out two interesting
aspects of that question:

1/ In the CC world, we don't really care about *virtual* PCI devices. We
care about either virtio devices, or physical ones being passed through
to the guest. Let's assume physical ones can be trusted, see above.
That leaves virtio devices. How much damage can a malicious virtio device
do to the guest kernel, and can this lead to secrets being leaked?

2/ He was not as negative as I anticipated on the possibility of somehow
being able to prevent tampering of the guest. One example he mentioned is
a research paper [1] about running the hypervisor itself inside an
"outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
with TDX using secure enclaves or some other mechanism?

Sorry, this mail is a bit long ;-)

>
>
> [...]
>> > see what else it could detect given the signal will be smothered by
>> > oopses and secondly I think the PCI interface is likely the wrong
>> > place to begin and you should probably begin on the virtio bus and
>> > the hypervisor generated configuration space.
>>
>> This is exactly what we do. We don’t fuzz from the PCI config space,
>> we supply inputs from the host/vmm via the legitimate interfaces that
>> it can inject them to the guest: whenever guest requests a pci config
>> space (which is controlled by host/hypervisor as you said) read
>> operation, it gets input injected by the kafl fuzzer. Same for other
>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>> anything that goes via #VE handler in our case). When it comes to
>> virtio, we employ two different fuzzing techniques: directly
>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>> data received from the host (via injecting input in functions
>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>> pages using kfx fuzzer. More information can be found in
>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

Indeed.

[1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592
--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

mst at redhat

Jan 31, 2023, 10:15 AM

Post #70 of 103 (346 views)

Permalink

On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.

Why are timing attacks by hypervisor on the guest out of scope?

> </doc>
>
> Feel free to comment and reword at will ;-)
>
>
> 3/ PCI-as-a-threat: where does that come from
>
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation. In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
>
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
>
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
>
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.

Agree absolutely that's much easier.

> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
>
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> care about either virtio devices, or physical ones being passed through
> to the guest. Let's assume physical ones can be trusted, see above.
> That leaves virtio devices. How much damage can a malicious virtio device
> do to the guest kernel, and can this lead to secrets being leaked?
>
> 2/ He was not as negative as I anticipated on the possibility of somehow
> being able to prevent tampering of the guest. One example he mentioned is
> a research paper [1] about running the hypervisor itself inside an
> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> with TDX using secure enclaves or some other mechanism?

Or even just secureboot based root of trust?

--
MST

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

dinechin at redhat

Jan 31, 2023, 10:15 AM

Post #71 of 103 (346 views)

Permalink

On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote...
> Hi Dinechin,

Nit: My first name is actually Christophe ;-)

[snip]

>> "The implementation of the #VE handler is simple and does not require an
>> in-depth security audit or fuzzing since it is not the actual consumer of
>> the host/VMM supplied untrusted data": The assumption there seems to be that
>> the host will never be able to supply data (e.g. through a bounce buffer)
>> that it can trick the guest into executing. If that is indeed the
>> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
>> since many earlier attacks were based on executing the wrong code. Notably,
>> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
>> key (as opposed to any device key e.g. for PCI encryption) in either
>> TDX or SEV. Is there for example anything that precludes TDX or SEV from
>> executing code in the bounce buffers?
>
> This was already replied by Kirill, any code execution out of shared memory generates
> a #GP.

Apologies for my wording. Everyone interpreted "executing" as "executing
directly on the bounce buffer page", when what I meant is "consuming data
fetched from the bounce buffers as code" (not necessarily directly).

For example, in the diagram in your document, the guest kernel is a
monolithic piece. In reality, there are dynamically loaded components. In
the original SEV implementation, with pre-attestation, the measurement could
only apply before loading any DLKM (I believe, not really sure). As another
example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload
directly into the guest kernel using virtio-based network I/O. That is what
I referred to when I wrote "many earlier attacks were based on executing the
wrong code".

The fact that I/O buffers are not encrypted matters here, because it gives
the host ample latitude to observe or even corrupt all I/Os, as many others
have pointed out. Notably, disk crypto may not be designed to resist to a
host that can see and possibly change the I/Os.

So let me rephrase my vague question as a few more precise ones:

1) What are the effects of semi-random kernel code injection?

If the host knows that a given bounce buffer happens to be used later to
execute some kernel code, it can start flipping bits in it to try and
trigger arbitrary code paths in the guest. My understanding is that
crypto alone (i.e. without additional layers like dm-integrity) will
happily decrypt that into a code stream with pseudo-random instructions
in it, not vehemently error out.

So, while TDX precludes the host from writing into guest memory directly,
since the bounce buffers are shared, TDX will not prevent the host from
flipping bits there. It's then just a matter of guessing where the bits
will go, and hoping that some bits execute at guest PL0. Of course, this
can be mitigated by either only using static configs, or using
dm-verity/dm-integrity, or maybe some other mechanisms.

Shouldn't that be part of your document? To be clear: you mention under
"Storage protection" that you use dm-crypt and dm-integrity, so I believe
*you* know, but your readers may not figure out why dm-integrity is
integral to the process, notably after you write "Users could use other
encryption schemes".

2) What are the effects of random user code injection?

It's the same as above, except that now you can target a much wider range
of input data, including shell scripts, etc. So the attack surface is
much larger.

3) What is the effect of data poisoning?

You don't necessarily need to corrupt code. Being able to corrupt a
system configuration file for example can be largely sufficient.

4) Are there I/O-based replay attacks that would work pre-attestation?

My current mental model is that you load a "base" software stack into the
TCB and then measure a relevant part of it. What you measure is somewhat
implementation-dependent, but in the end, if the system is attested, you
respond to a cryptographic challenge based on what was measured, and you
then get relevant secrets, e.g. a disk decryption key, that let you make
forward progress. However, what happens if every time you boot, the host
feeds you bogus disk data just to try to steer the boot sequence along
some specific path?

I believe that the short answer is: the guest either:

a) reaches attestation, but with bad in-memory data, so it fails the
crypto exchange, and secrets are not leaked.

b) does not reach attestation, so never gets the secrets, and therefore
still fulfils the CC promise of not leaking secrets.

So I personally feel this is OK, but it's worth writing up in your doc.

Back to the #VE handler, if I can find a way to inject malicious code into
my guest, what you wrote in that paragraph as a justification for no
in-depth security still seems like "not exactly defense in depth". I would
just remove the sentence, audit and fuzz that code with the same energy as
for anything else that could face bad input.

[1]: https://www.sec.in.tum.de/i20/student-work/code-execution-attacks-against-encrypted-virtual-machines

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

jejb at linux

Jan 31, 2023, 10:15 AM

Post #72 of 103 (346 views)

Permalink

On Tue, 2023-01-31 at 16:34 +0000, Reshetova, Elena wrote:
[...]
> > You cited this as your example. I'm pointing out it seems to be an
> > event of the class we've agreed not to consider because it's an
> > oops not an exploit. If there are examples of fixing actual
> > exploits to CC VMs, what are they?
> >
> > This patch is, however, an example of the problem everyone else on
> > the thread is complaining about: a patch which adds an unnecessary
> > check to the MSI subsystem; unnecessary because it doesn't fix a CC
> > exploit and in the real world the tables are correct (or the
> > manufacturer is quickly chastened), so it adds overhead to no
> > benefit.
>
> How can you make sure there is no exploit possible using this crash
> as a stepping stone into a CC guest?

I'm not, what I'm saying is you haven't proved it can be used to
exfiltrate secrets. In a world where the PCI device is expected to be
correct, and the non-CC kernel doesn't want to second guess that, there
are loads of lies you can tell to the PCI subsystem that causes a crash
or a hang. If we fix every one, we end up with a massive patch set and
a huge potential slow down for the non-CC kernel. If there's no way to
tell what lies might leak data, the fuzzing results are a mass of noise
with no real signal and we can't even quantify by how much (or even if)
we've improved the CC VM attack surface even after we merge the huge
patch set it generates.

> Or are you saying that we are back to the times when we can merge
> the fixes for crashes and out of bound errors in kernel only given
> that we submit a proof of concept exploit with the patch for every
> issue?

The PCI people have already said that crashing in the face of bogus
configuration data is expected behaviour, so just generating the crash
doesn't prove there's a problem to be fixed. That means you do have to
go beyond and demonstrate there could be an information leak in a CC VM
on the back of it, yes.

> > [...]
> > > > see what else it could detect given the signal will be
> > > > smothered by oopses and secondly I think the PCI interface is
> > > > likely the wrong place to begin and you should probably begin
> > > > on the virtio bus and the hypervisor generated configuration
> > > > space.
> > >
> > > This is exactly what we do. We don’t fuzz from the PCI config
> > > space, we supply inputs from the host/vmm via the legitimate
> > > interfaces that it can inject them to the guest: whenever guest
> > > requests a pci config space (which is controlled by
> > > host/hypervisor as you said) read operation, it gets input
> > > injected by the kafl fuzzer. Same for other interfaces that are
> > > under control of host/VMM (MSRs, port IO, MMIO, anything that
> > > goes via #VE handler in our case). When it comes to virtio, we
> > > employ two different fuzzing techniques: directly injecting kafl
> > > fuzz input when virtio core or virtio drivers gets the data
> > > received from the host (via injecting input in functions
> > > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > > pages using kfx fuzzer. More information can be found in
> > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> > hardening.html#td-guest-fuzzing
> >
> > Given that we previously agreed that oppses and other DoS attacks
> > are out of scope for CC, I really don't think fuzzing, which
> > primarily finds oopses, is at all a useful tool unless you filter
> > the results by the question "could we exploit this in a CC VM to
> > reveal secrets". Without applying that filter you're sending a load
> > of patches which don't really do much to reduce the CC attack
> > surface and which do annoy non-CC people because they add pointless
> > checks to things they expect the cards and config tables to get
> > right.
>
> I don’t think we have agreed that random kernel crashes are out of
> scope in CC threat model (controlled safe panic is out of scope, but
> this is not what we have here).

So perhaps making it a controlled panic in the CC VM, so we can
guarantee no information leak, would be the first place to start?

> It all depends if this ops can be used in a successful attack against
> guest private memory or not and this is *not* a trivial thing to
> decide.

Right, but if you can't decide that, you can't extract the signal from
your fuzzing tool noise.

> That's said, we are mostly focusing on KASAN findings, which
> have higher likelihood to be exploitable at least for host -> guest
> privilege escalation (which in turn compromised guest private memory
> confidentiality). Fuzzing has a long history of find such issues in
> past (including the ones that have been exploited after). But even
> for this ops bug, can anyone guarantee it cannot be chained with
> other ones to cause a more complex privilege escalation attack?
> I wont be making such a claim, I feel it is safer to fix this vs
> debating whenever it can be used for an attack or not.

The PCI people have already been clear that adding a huge framework of
checks to PCI table parsing simply for the promise it "might possibly"
improve CC VM security is way too much effort for too little result.
If you can hone that down to a few places where you can show it will
prevent a CC information leak, I'm sure they'll be more receptive.
Telling them to disprove your assertion that there might be an exploit
here isn't going to make them change their minds.

James

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

dinechin at redhat

Feb 1, 2023, 4:15 AM

Post #73 of 103 (346 views)

Permalink

I typoed a lot in this email...

On 2023-01-31 at 16:14 +01, Christophe de Dinechin <dinechin@redhat.com> wrote...
> On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
>> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>>> > [...]
>>> > > > The big threat from most devices (including the thunderbolt
>>> > > > classes) is that they can DMA all over memory. However, this
>>> > > > isn't really a threat in CC (well until PCI becomes able to do
>>> > > > encrypted DMA) because the device has specific unencrypted
>>> > > > buffers set aside for the expected DMA. If it writes outside
>>> > > > that CC integrity will detect it and if it reads outside that
>>> > > > it gets unintelligible ciphertext. So we're left with the
>>> > > > device trying to trick secrets out of us by returning
>>> > > > unexpected data.
>>> > >
>>> > > Yes, by supplying the input that hasn’t been expected. This is
>>> > > exactly the case we were trying to fix here for example:
>>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>>> > alexander.shishkin@linux.intel.com/
>>> > > I do agree that this case is less severe when others where memory
>>> > > corruption/buffer overrun can happen, like here:
>>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>>> > alexander.shishkin@linux.intel.com/
>>> > > But we are trying to fix all issues we see now (prioritizing the
>>> > > second ones though).
>>> >
>>> > I don't see how MSI table sizing is a bug in the category we've
>>> > defined. The very text of the changelog says "resulting in a
>>> > kernel page fault in pci_write_msg_msix()." which is a crash,
>>> > which I thought we were agreeing was out of scope for CC attacks?
>>>
>>> As I said this is an example of a crash and on the first look
>>> might not lead to the exploitable condition (albeit attackers are
>>> creative). But we noticed this one while fuzzing and it was common
>>> enough that prevented fuzzer going deeper into the virtio devices
>>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>>> easily triggerable Other examples in virtio patchset are more severe.
>>
>> You cited this as your example. I'm pointing out it seems to be an
>> event of the class we've agreed not to consider because it's an oops
>> not an exploit. If there are examples of fixing actual exploits to CC
>> VMs, what are they?
>>
>> This patch is, however, an example of the problem everyone else on the
>> thread is complaining about: a patch which adds an unnecessary check to
>> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
>> in the real world the tables are correct (or the manufacturer is
>> quickly chastened), so it adds overhead to no benefit.
>
> I'd like to backtrack a little here.
>
>
> 1/ PCI-as-a-thread, where does it come from?

PCI-as-a-threat

>
> On physical devices, we have to assume that the device is working. As other
> pointed out, there are things like PCI compliance tests, etc. So Linux has
> to trust the device. You could manufacture a broken device intentionally,
> but the value you would get from that would be limited.
>
> On a CC system, the "PCI" values are really provided by the hypervisor,
> which is not trusted. This leads to this peculiar way of thinking where we
> say "what happens if virtual device feeds us a bogus value *intentionally*".
> We cannot assume that the *virtual* PCI device ran through the compliance
> tests. Instead, we see the PCI interface as hostile, which makes us look
> like weirdos to the rest of the community.
>
> Consequently, as James pointed out, we first need to focus on consequences
> that would break what I would call the "CC promise", which is essentially
> that we'd rather kill the guest than reveal its secrets. Unless you have a
> credible path to a secret being revealed, don't bother "fixing" a bug. And
> as was pointed out elsewhere in this thread, caching has a cost, so you
> can't really use the "optimization" angle either.
>
>
> 2/ Clarification of the "CC promise" and value proposition
>
> Based on the above, the very first thing is to clarify that "CC promise",
> because if exchanges on this thread have proved anything, it is that it's
> quite unclear to anyone outside the "CoCo world".
>
> The Linux Guest Kernel Security Specification needs to really elaborate on
> what the value proposition of CC is, not assume it is a given. "Bug fixes"
> before this value proposition has been understood and accepted by the
> non-CoCo community are likely to go absolutely nowhere.
>
> Here is a quick proposal for the Purpose and Scope section:
>
> <doc>
> Purpose and Scope
>
> Confidential Computing (CC) is a set of technologies that allows a guest to
> run without having to trust either the hypervisor or the host. CC offers two
> new guarantees to the guest compared to the non-CC case:
>
> a) The guest will be able to measure and attest, by cryptographic means, the
> guest software stack that it is running, and be assured that this
> software stack cannot be tampered with by the host or the hypervisor
> after it was measured. The root of trust for this aspect of CC is
> typically the CPU manufacturer (e.g. through a private key that can be
> used to respond to cryptographic challenges).
>
> b) Guest state, including memory, become secrets which must remain
> inaccessible to the host. In a CC context, it is considered preferable to
> stop or kill a guest rather than risk leaking its secrets. This aspect of
> CC is typically enforced by means such as memory encryption and new
> semantics for memory protection.
>
> CC leads to a different threat model for a Linux kernel running as a guest
> inside a confidential virtual machine (CVM). Notably, whereas the machine
> (CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
> case, the hypervisor emulating some aspects of the virtual machine is now
> considered as potentially malicious. Consequently, effects of any data
> provided by the guest to the hypervisor, including ACPI configuration

to the guest by the hypervisor

> tables, MMIO interfaces or machine specific registers (MSRs) need to be
> re-evaluated.
>
> This document describes the security architecture of the Linux guest kernel
> running inside a CVM, with a particular focus on the Intel TDX
> implementation. Many aspects of this document will be applicable to other
> CC implementations such as AMD SEV.
>
> Aspects of the guest-visible state that are under direct control of the
> hardware, such as the CPU state or memory protection, will be considered as
> being handled by the CC implementations. This document will therefore only
> focus on aspects of the virtual machine that are typically managed by the
> hypervisor or the host.
>
> Since the host ultimately owns the resources and can allocate them at will,
> including denying their use at any point, this document will not address
> denial or service or performance degradation. It will however cover random
> number generation, which is central for cryptographic security.
>
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.
> </doc>
>
> Feel free to comment and reword at will ;-)
>
>
> 3/ PCI-as-a-threat: where does that come from

3/ Can we shift from "malicious" hypervisor/host input to "bogus" input?

>
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation. In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
>
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
>
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
>
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.
>
> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
>
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> care about either virtio devices, or physical ones being passed through
> to the guest. Let's assume physical ones can be trusted, see above.
> That leaves virtio devices. How much damage can a malicious virtio device
> do to the guest kernel, and can this lead to secrets being leaked?
>
> 2/ He was not as negative as I anticipated on the possibility of somehow
> being able to prevent tampering of the guest. One example he mentioned is
> a research paper [1] about running the hypervisor itself inside an
> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> with TDX using secure enclaves or some other mechanism?
>
>
> Sorry, this mail is a bit long ;-)

and was a bit rushed too...

>
>
>>
>>
>> [...]
>>> > see what else it could detect given the signal will be smothered by
>>> > oopses and secondly I think the PCI interface is likely the wrong
>>> > place to begin and you should probably begin on the virtio bus and
>>> > the hypervisor generated configuration space.
>>>
>>> This is exactly what we do. We don’t fuzz from the PCI config space,
>>> we supply inputs from the host/vmm via the legitimate interfaces that
>>> it can inject them to the guest: whenever guest requests a pci config
>>> space (which is controlled by host/hypervisor as you said) read
>>> operation, it gets input injected by the kafl fuzzer. Same for other
>>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>>> anything that goes via #VE handler in our case). When it comes to
>>> virtio, we employ two different fuzzing techniques: directly
>>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>>> data received from the host (via injecting input in functions
>>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>>> pages using kfx fuzzer. More information can be found in
>>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>>
>> Given that we previously agreed that oppses and other DoS attacks are
>> out of scope for CC, I really don't think fuzzing, which primarily
>> finds oopses, is at all a useful tool unless you filter the results by
>> the question "could we exploit this in a CC VM to reveal secrets".
>> Without applying that filter you're sending a load of patches which
>> don't really do much to reduce the CC attack surface and which do annoy
>> non-CC people because they add pointless checks to things they expect
>> the cards and config tables to get right.
>
> Indeed.
>
> [1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)

Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]

cdupontd at redhat

Feb 1, 2023, 4:15 AM

Post #74 of 103 (316 views)

Permalink

> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> Finally, security considerations that apply irrespective of whether the
>> platform is confidential or not are also outside of the scope of this
>> document. This includes topics ranging from timing attacks to social
>> engineering.
>
> Why are timing attacks by hypervisor on the guest out of scope?

Good point.

I was thinking that mitigation against timing attacks is the same
irrespective of the source of the attack. However, because the HV
controls CPU time allocation, there are presumably attacks that
are made much easier through the HV. Those should be listed.

>
>> </doc>
>>
>> Feel free to comment and reword at will ;-)
>>
>>
>> 3/ PCI-as-a-threat: where does that come from
>>
>> Isn't there a fundamental difference, from a threat model perspective,
>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> should defeat) and compromised software feeding us bad data? I think there
>> is: at leats inside the TCB, we can detect bad software using measurements,
>> and prevent it from running using attestation. In other words, we first
>> check what we will run, then we run it. The security there is that we know
>> what we are running. The trust we have in the software is from testing,
>> reviewing or using it.
>>
>> This relies on a key aspect provided by TDX and SEV, which is that the
>> software being measured is largely tamper-resistant thanks to memory
>> encryption. In other words, after you have measured your guest software
>> stack, the host or hypervisor cannot willy-nilly change it.
>>
>> So this brings me to the next question: is there any way we could offer the
>> same kind of service for KVM and qemu? The measurement part seems relatively
>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> me. But maybe someone else will have a brilliant idea?
>>
>> So I'm asking the question, because if you could somehow prove to the guest
>> not only that it's running the right guest stack (as we can do today) but
>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> this is something which is evidently easier to deal with.
>
> Agree absolutely that's much easier.
>
>> I briefly discussed this with James, and he pointed out two interesting
>> aspects of that question:
>>
>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> care about either virtio devices, or physical ones being passed through
>> to the guest. Let's assume physical ones can be trusted, see above.
>> That leaves virtio devices. How much damage can a malicious virtio device
>> do to the guest kernel, and can this lead to secrets being leaked?
>>
>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> being able to prevent tampering of the guest. One example he mentioned is
>> a research paper [1] about running the hypervisor itself inside an
>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> with TDX using secure enclaves or some other mechanism?
>
> Or even just secureboot based root of trust?

You mean host secureboot? Or guest?

If it’s host, then the problem is detecting malicious tampering with
host code (whether it’s kernel or hypervisor).

If it’s guest, at the moment at least, the measurements do not extend
beyond the TCB.

>
> --
> MST
>