Mailing List Archive

Linux guest kernel threat model for Confidential Computing
Hi Greg,

You mentioned couple of times (last time in this recent thread:
https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
discussing the updated threat model for kernel, so this email is a start in this direction.

(Note: I tried to include relevant people from different companies, as well as linux-coco
mailing list, but I hope everyone can help by including additional people as needed).

As we have shared before in various lkml threads/conference presentations
([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
This is a big change in the threat model and requires both careful assessment of the
new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
and security validation techniques. This is the activity that we have started back at Intel
and the current status can be found in

1) Threat model and potential mitigations:
https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html

2) One of the described in the above doc mitigations is "hardening of the enabled
code". What we mean by this, as well as techniques that are being used are
described in this document:
https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html

3) All the tools are open-source and everyone can start using them right away even
without any special HW (readme has description of what is needed).
Tools and documentation is here:
https://github.com/intel/ccc-linux-guest-hardening

4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
here: https://github.com/intel/tdx/commits/guest-next

So, my main question before we start to argue about the threat model, mitigations, etc,
is what is the good way to get this reviewed to make sure everyone is aligned?
There are a lot of angles and details, so what is the most efficient method?
Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
into logical pieces and start submitting it to mailing list for discussion one by one?
Any other methods?

The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
i.e. when submitting the device filter patches, we will include problem statement, threat model link,
data, alternatives considered, etc.

Best Regards,
Elena.

[1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/
[2] https://lpc.events/event/16/contributions/1328/
[3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> Hi Greg,
>
> You mentioned couple of times (last time in this recent thread:
> https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> discussing the updated threat model for kernel, so this email is a start in this direction.

Any specific reason you didn't cc: the linux-hardening mailing list?
This seems to be in their area as well, right?

> As we have shared before in various lkml threads/conference presentations
> ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> change in the threat model where guest kernel doesn’t anymore trust the hypervisor.

That is, frankly, a very funny threat model. How realistic is it really
given all of the other ways that a hypervisor can mess with a guest?

So what do you actually trust here? The CPU? A device? Nothing?

> This is a big change in the threat model and requires both careful assessment of the
> new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> and security validation techniques. This is the activity that we have started back at Intel
> and the current status can be found in
>
> 1) Threat model and potential mitigations:
> https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html

So you trust all of qemu but not Linux? Or am I misreading that
diagram?

> 2) One of the described in the above doc mitigations is "hardening of the enabled
> code". What we mean by this, as well as techniques that are being used are
> described in this document:
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html

I hate the term "hardening". Please just say it for what it really is,
"fixing bugs to handle broken hardware". We've done that for years when
dealing with PCI and USB and even CPUs doing things that they shouldn't
be doing. How is this any different in the end?

So what you also are saying here now is "we do not trust any PCI
devices", so please just say that (why do you trust USB devices?) If
that is something that you all think that Linux should support, then
let's go from there.

> 3) All the tools are open-source and everyone can start using them right away even
> without any special HW (readme has description of what is needed).
> Tools and documentation is here:
> https://github.com/intel/ccc-linux-guest-hardening

Again, as our documentation states, when you submit patches based on
these tools, you HAVE TO document that. Otherwise we think you all are
crazy and will get your patches rejected. You all know this, why ignore
it?

> 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> here: https://github.com/intel/tdx/commits/guest-next

Random github trees of kernel patches are just that, sorry.

> So, my main question before we start to argue about the threat model, mitigations, etc,
> is what is the good way to get this reviewed to make sure everyone is aligned?
> There are a lot of angles and details, so what is the most efficient method?
> Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> into logical pieces and start submitting it to mailing list for discussion one by one?

Yes, start out by laying out what you feel the actual problem is, what
you feel should be done for it, and the patches you have proposed to
implement this, for each and every logical piece.

Again, nothing new here, that's how Linux is developed, again, you all
know this, it's not anything I should have to say.

> Any other methods?
>
> The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> data, alternatives considered, etc.

As always, we can't do anything without actual working changes to the
code, otherwise it's just a pipe dream and we can't waste our time on it
(neither would you want us to).

thanks, and good luck!

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > Hi Greg,
> >
> > You mentioned couple of times (last time in this recent thread:
> > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > discussing the updated threat model for kernel, so this email is a start in this direction.
>
> Any specific reason you didn't cc: the linux-hardening mailing list?
> This seems to be in their area as well, right?
>
> > As we have shared before in various lkml threads/conference presentations
> > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
>
> That is, frankly, a very funny threat model. How realistic is it really
> given all of the other ways that a hypervisor can mess with a guest?

It's what a lot of people would like; in the early attempts it was easy
to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
can mess with - remember that not just the memory is encrypted, so is
the register state, and the guest gets to see changes to mapping and a
lot of control over interrupt injection etc.

> So what do you actually trust here? The CPU? A device? Nothing?

We trust the actual physical CPU, provided that it can prove that it's a
real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware
can perform an attestation signed by the CPU to prove to someone
external that the guest is running on a real trusted CPU.

Note that the trust is limited:
a) We don't trust that we can make forward progress - if something
does something bad it's OK for the guest to stop.
b) We don't trust devices, and we don't trust them by having the guest
do normal encryption; e.g. just LUKS on the disk and normal encrypted
networking. [There's a lot of schemes people are working on about how
the guest gets the keys etc for that)

> > This is a big change in the threat model and requires both careful assessment of the
> > new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> > and security validation techniques. This is the activity that we have started back at Intel
> > and the current status can be found in
> >
> > 1) Threat model and potential mitigations:
> > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
>
> So you trust all of qemu but not Linux? Or am I misreading that
> diagram?

You're misreading it; This is about the grey part (i.e. the guest) not
trusting the host (the white part including qemu and the host kernel).

> > 2) One of the described in the above doc mitigations is "hardening of the enabled
> > code". What we mean by this, as well as techniques that are being used are
> > described in this document:
> > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>
> I hate the term "hardening". Please just say it for what it really is,
> "fixing bugs to handle broken hardware". We've done that for years when
> dealing with PCI and USB and even CPUs doing things that they shouldn't
> be doing. How is this any different in the end?
>
> So what you also are saying here now is "we do not trust any PCI
> devices", so please just say that (why do you trust USB devices?) If
> that is something that you all think that Linux should support, then
> let's go from there.

I don't think generally all PCI device drivers guard against all the
nasty things that a broken implementation of their hardware can do.
The USB devices are probably a bit better, because they actually worry
about people walking up with a nasty HID device; I'm skeptical that
a kernel would survive a purposely broken USB controller.

I'm not sure the request here isn't really to make sure *all* PCI devices
are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
and potentially ones that people will want to pass-through (which
generally needs a lot more work to make safe).
(I've not looked at these Intel tools to see what they cover)

Having said that, how happy are you with Thunderbolt PCI devices being
plugged into your laptop or into the hotplug NVMe slot on a server?
We're now in the position we were with random USB devices years ago.

Also we would want to make sure that any config data that the hypervisor
can pass to the guest is validated.

> > 3) All the tools are open-source and everyone can start using them right away even
> > without any special HW (readme has description of what is needed).
> > Tools and documentation is here:
> > https://github.com/intel/ccc-linux-guest-hardening
>
> Again, as our documentation states, when you submit patches based on
> these tools, you HAVE TO document that. Otherwise we think you all are
> crazy and will get your patches rejected. You all know this, why ignore
> it?
>
> > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> > here: https://github.com/intel/tdx/commits/guest-next
>
> Random github trees of kernel patches are just that, sorry.
>
> > So, my main question before we start to argue about the threat model, mitigations, etc,
> > is what is the good way to get this reviewed to make sure everyone is aligned?
> > There are a lot of angles and details, so what is the most efficient method?
> > Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> > into logical pieces and start submitting it to mailing list for discussion one by one?
>
> Yes, start out by laying out what you feel the actual problem is, what
> you feel should be done for it, and the patches you have proposed to
> implement this, for each and every logical piece.
>
> Again, nothing new here, that's how Linux is developed, again, you all
> know this, it's not anything I should have to say.

That seems harsh.
The problem seems reasonably well understood within the CoCo world - how
far people want to push it probably varies; but it's good to make the
problem more widely understood.

> > Any other methods?
> >
> > The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> > i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> > data, alternatives considered, etc.
>
> As always, we can't do anything without actual working changes to the
> code, otherwise it's just a pipe dream and we can't waste our time on it
> (neither would you want us to).
>
> thanks, and good luck!
>
> greg k-h

Dave

>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > Hi Greg,
> > >
> > > You mentioned couple of times (last time in this recent thread:
> > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > discussing the updated threat model for kernel, so this email is a start in this direction.
> >
> > Any specific reason you didn't cc: the linux-hardening mailing list?
> > This seems to be in their area as well, right?
> >
> > > As we have shared before in various lkml threads/conference presentations
> > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> >
> > That is, frankly, a very funny threat model. How realistic is it really
> > given all of the other ways that a hypervisor can mess with a guest?
>
> It's what a lot of people would like; in the early attempts it was easy
> to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> can mess with - remember that not just the memory is encrypted, so is
> the register state, and the guest gets to see changes to mapping and a
> lot of control over interrupt injection etc.
>
> > So what do you actually trust here? The CPU? A device? Nothing?
>
> We trust the actual physical CPU, provided that it can prove that it's a
> real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware
> can perform an attestation signed by the CPU to prove to someone
> external that the guest is running on a real trusted CPU.
>
> Note that the trust is limited:
> a) We don't trust that we can make forward progress - if something
> does something bad it's OK for the guest to stop.
> b) We don't trust devices, and we don't trust them by having the guest
> do normal encryption; e.g. just LUKS on the disk and normal encrypted
> networking. [There's a lot of schemes people are working on about how
> the guest gets the keys etc for that)

I think we need to more precisely say what we mean by 'trust' as it
can have quite a broad interpretation.

As a baseline requirement, in the context of confidential computing the
guest would not trust the hypervisor with data that needs to remain
confidential, but would generally still expect it to provide a faithful
implementation of a given device.

IOW, the guest would expect the implementation of virtio-blk devices to
be functionally correct per the virtio-blk specification, but would not
trust the host to protect confidentiality any stored data in the disk.

Any virtual device exposed to the guest that can transfer potentially
sensitive data needs to have some form of guest controlled encryption
applied. For disks this is easy with FDE like LUKS, for NICs this is
already best practice for services by using TLS. Other devices may not
have good existing options for applying encryption.

If the guest has a virtual keyboard, mouse and graphical display, which
is backed by a VNC/RDP server in the host, then all that is visible to the
host. There's no pre-existing solutions I know can could offer easy
confidentiality for basic console I/O from the start of guest firmware
onwards. The best is to spawn a VNC/RDP server in the guest at some
point during boot. Means you can't login to the guest in single user
mode with your root password though, without compromising it.

The problem also applies for common solutions today where the host passes
in config data to the guest, for consumption by tools like cloud-init.
This is used in the past to inject an SSH key for example, or set the
guest root password. Such data received from the host can no longer be
trusted, as the host can see the data, or subsitute its own SSH key(s)
in order to gain access. Cloud-init needs to get its config data from
a trusted source, likely an external attestation server


A further challenge surrounds handling of undesirable devices. A goal
of OS development has been to ensure that both coldplugged and hotplugged
devices "just work" out of the box with zero guest admin config required.
To some extent this is contrary to what a confidential guest will want.
It doesn't want a getty spawned on any console exposed, it doesn't want
to use a virtio-rng exposed by the host which could be feeding non-random.


Protecting against malicious implementations of devices is conceivably
interesting, as a hardening task. A malicious host may try to take
advantage of the guest OS device driver impl to exploit the guest OS
kernel with an end goal of getting into a state where it can be made
to reveal confidential data that was otherwise protected.

With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > Hi Greg,
> > >
> > > You mentioned couple of times (last time in this recent thread:
> > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > discussing the updated threat model for kernel, so this email is a start in this direction.
> >
> > Any specific reason you didn't cc: the linux-hardening mailing list?
> > This seems to be in their area as well, right?
> >
> > > As we have shared before in various lkml threads/conference presentations
> > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> >
> > That is, frankly, a very funny threat model. How realistic is it really
> > given all of the other ways that a hypervisor can mess with a guest?
>
> It's what a lot of people would like; in the early attempts it was easy
> to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> can mess with - remember that not just the memory is encrypted, so is
> the register state, and the guest gets to see changes to mapping and a
> lot of control over interrupt injection etc.

And due to the fact that SEV and TDX really do not work, how is anyone
expecting any of this to work? As one heckler on IRC recently put it,
if you squint hard enough, you can kind of ignore the real-world issues
here, so perhaps this should all be called "squint-puting" in order to
feel like you have a "confidential" system? :)

> > So what do you actually trust here? The CPU? A device? Nothing?
>
> We trust the actual physical CPU, provided that it can prove that it's a
> real CPU with the CoCo hardware enabled.

Great, so why not have hardware attestation also for your devices you
wish to talk to? Why not use that as well? Then you don't have to
worry about anything in the guest.

> Both the SNP and TDX hardware
> can perform an attestation signed by the CPU to prove to someone
> external that the guest is running on a real trusted CPU.

And again, do the same thing for the other hardware devices and all is
good. To not do that is to just guess and wave hands. You know this :)

> Note that the trust is limited:
> a) We don't trust that we can make forward progress - if something
> does something bad it's OK for the guest to stop.

So the guest can stop itself?

> b) We don't trust devices, and we don't trust them by having the guest
> do normal encryption; e.g. just LUKS on the disk and normal encrypted
> networking. [There's a lot of schemes people are working on about how
> the guest gets the keys etc for that)

How do you trust you got real data on the disk? On the network? Those
are coming from the host, how is any of that data to be trusted? Where
does the trust stop and why?

> > I hate the term "hardening". Please just say it for what it really is,
> > "fixing bugs to handle broken hardware". We've done that for years when
> > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > be doing. How is this any different in the end?
> >
> > So what you also are saying here now is "we do not trust any PCI
> > devices", so please just say that (why do you trust USB devices?) If
> > that is something that you all think that Linux should support, then
> > let's go from there.
>
> I don't think generally all PCI device drivers guard against all the
> nasty things that a broken implementation of their hardware can do.

I know that all PCI drivers can NOT do that today as that was never
anything that Linux was designed for.

> The USB devices are probably a bit better, because they actually worry
> about people walking up with a nasty HID device; I'm skeptical that
> a kernel would survive a purposely broken USB controller.

I agree with you there, USB drivers are only starting to be fuzzed at
the descriptor level, that's all. Which is why they too can be put into
the "untrusted" area until you trust them.

> I'm not sure the request here isn't really to make sure *all* PCI devices
> are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> and potentially ones that people will want to pass-through (which
> generally needs a lot more work to make safe).
> (I've not looked at these Intel tools to see what they cover)

Why not just create a whole new bus path for these "trusted" devices to
attach to and do that instead of tyring to emulate a protocol that was
explicitly designed NOT to this model at all? Why are you trying to
shoehorn something here and not just designing it properly from the
beginning?

> Having said that, how happy are you with Thunderbolt PCI devices being
> plugged into your laptop or into the hotplug NVMe slot on a server?

We have protection for that, and have had it for many years. Same for
USB devices. This isn't new, perhaps you all have not noticed those
features be added and taken advantage of already by many Linux distros
and system images (i.e. ChromeOS and embedded systems?)

> We're now in the position we were with random USB devices years ago.

Nope, we are not, again, we already handle random PCI devices being
plugged in. It's up to userspace to make the policy decision if it
should be trusted or not before the kernel has access to it.

So a meta-comment, why not just use that today? If your guest OS can
not authenticate the PCI device passed to it, don't allow the kernel to
bind to it. If it can be authenticated, wonderful, bind away! You can
do this today with no kernel changes needed.

> Also we would want to make sure that any config data that the hypervisor
> can pass to the guest is validated.

Define "validated" please.

> The problem seems reasonably well understood within the CoCo world - how
> far people want to push it probably varies; but it's good to make the
> problem more widely understood.

The "CoCo" world seems distant and separate from the real-world of Linux
kernel development if you all do not even know about the authentication
methods that we have for years for enabling access to PCI and USB
devices as described above. If the impementations that we currently
have are lacking in some way, wonderful, please submit changes for them
and we will be glad to review them as needed.

Remember, it's up to you all to convince us that your changes make
actual sense and are backed up with working implementations. Not us :)

good luck!

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, 2023-01-25 at 15:22 +0100, Greg Kroah-Hartman wrote:
> On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert
> wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > Hi Greg,
> > > >
> > > > You mentioned couple of times (last time in this recent thread:
> > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that
> > > > we ought to start
> > > > discussing the updated threat model for kernel, so this email
> > > > is a start in this direction.
> > >
> > > Any specific reason you didn't cc: the linux-hardening mailing
> > > list? This seems to be in their area as well, right?
> > >
> > > > As we have shared before in various lkml threads/conference
> > > > presentations ([1], [2], [3] and many others), for the
> > > > Confidential Computing guest kernel, we have a change in the
> > > > threat model where guest kernel doesn’t anymore trust the
> > > > hypervisor.
> > >
> > > That is, frankly, a very funny threat model.  How realistic is it
> > > really given all of the other ways that a hypervisor can mess
> > > with a guest?
> >
> > It's what a lot of people would like; in the early attempts it was
> > easy to defeat, but in TDX and SEV-SNP the hypervisor has a lot
> > less that it can mess with - remember that not just the memory is
> > encrypted, so is the register state, and the guest gets to see
> > changes to mapping and a lot of control over interrupt injection
> > etc.
>
> And due to the fact that SEV and TDX really do not work, how is
> anyone expecting any of this to work?  As one heckler on IRC recently
> put it, if you squint hard enough, you can kind of ignore the real-
> world issues here, so perhaps this should all be called "squint-
> puting" in order to feel like you have a "confidential" system?  :)

There's a difference between no trust, which requires defeating all
attacks as they occur and limited trust, which merely means you want to
detect an attack from the limited trust entity to show that trust was
violated. Trying to achieve the former with CC is a good academic
exercise, but not required for the technology to be useful. Most cloud
providers are working towards the latter ... we know there are holes,
but as long as the guest can always detect interference they can be
confident in their trust in the CSP not to attack them via various
hypervisor mechanisms.

James
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > Hi Greg,
> > > >
> > > > You mentioned couple of times (last time in this recent thread:
> > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > > discussing the updated threat model for kernel, so this email is a start in this direction.
> > >
> > > Any specific reason you didn't cc: the linux-hardening mailing list?
> > > This seems to be in their area as well, right?
> > >
> > > > As we have shared before in various lkml threads/conference presentations
> > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> > >
> > > That is, frankly, a very funny threat model. How realistic is it really
> > > given all of the other ways that a hypervisor can mess with a guest?
> >
> > It's what a lot of people would like; in the early attempts it was easy
> > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> > can mess with - remember that not just the memory is encrypted, so is
> > the register state, and the guest gets to see changes to mapping and a
> > lot of control over interrupt injection etc.
>
> And due to the fact that SEV and TDX really do not work, how is anyone
> expecting any of this to work? As one heckler on IRC recently put it,
> if you squint hard enough, you can kind of ignore the real-world issues
> here, so perhaps this should all be called "squint-puting" in order to
> feel like you have a "confidential" system? :)

I agree the original SEV was that weak; I've not seen anyone give a good
argument against SNP or TDX.

> > > So what do you actually trust here? The CPU? A device? Nothing?
> >
> > We trust the actual physical CPU, provided that it can prove that it's a
> > real CPU with the CoCo hardware enabled.
>
> Great, so why not have hardware attestation also for your devices you
> wish to talk to? Why not use that as well? Then you don't have to
> worry about anything in the guest.

There were some talks at Plumbers where PCIe is working on adding that;
it's not there yet though. I think that's PCIe 'Integrity and Data
Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
SPDM. I don't know much of the detail of those, just that they're far
enough off that people aren't depending on them yet.

> > Both the SNP and TDX hardware
> > can perform an attestation signed by the CPU to prove to someone
> > external that the guest is running on a real trusted CPU.
>
> And again, do the same thing for the other hardware devices and all is
> good. To not do that is to just guess and wave hands. You know this :)

That wouldn't help you necessarily for virtual devices - where the
hypervisor implements the device (like a virtual NIC).

> > Note that the trust is limited:
> > a) We don't trust that we can make forward progress - if something
> > does something bad it's OK for the guest to stop.
>
> So the guest can stop itself?

Sure.

> > b) We don't trust devices, and we don't trust them by having the guest
> > do normal encryption; e.g. just LUKS on the disk and normal encrypted
> > networking. [There's a lot of schemes people are working on about how
> > the guest gets the keys etc for that)
>
> How do you trust you got real data on the disk? On the network? Those
> are coming from the host, how is any of that data to be trusted? Where
> does the trust stop and why?

We don't; you use LUKS2 on the disk and/or dm-verity; so there's no
trust in the disk.
You use whatever your favorite network encryption already is that
you're using to send data across the untrusted net.
So no trust in the data from the NIC.

> > > I hate the term "hardening". Please just say it for what it really is,
> > > "fixing bugs to handle broken hardware". We've done that for years when
> > > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > > be doing. How is this any different in the end?
> > >
> > > So what you also are saying here now is "we do not trust any PCI
> > > devices", so please just say that (why do you trust USB devices?) If
> > > that is something that you all think that Linux should support, then
> > > let's go from there.
> >
> > I don't think generally all PCI device drivers guard against all the
> > nasty things that a broken implementation of their hardware can do.
>
> I know that all PCI drivers can NOT do that today as that was never
> anything that Linux was designed for.

Agreed; which again is why I only really worry about the subset of
devices I'd want in a CoCo VM.

> > The USB devices are probably a bit better, because they actually worry
> > about people walking up with a nasty HID device; I'm skeptical that
> > a kernel would survive a purposely broken USB controller.
>
> I agree with you there, USB drivers are only starting to be fuzzed at
> the descriptor level, that's all. Which is why they too can be put into
> the "untrusted" area until you trust them.
>
> > I'm not sure the request here isn't really to make sure *all* PCI devices
> > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> > and potentially ones that people will want to pass-through (which
> > generally needs a lot more work to make safe).
> > (I've not looked at these Intel tools to see what they cover)
>
> Why not just create a whole new bus path for these "trusted" devices to
> attach to and do that instead of tyring to emulate a protocol that was
> explicitly designed NOT to this model at all? Why are you trying to
> shoehorn something here and not just designing it properly from the
> beginning?

I'd be kind of OK with that for the virtual devices; but:

a) I think you'd start reinventing PCIe with enumeration etc

b) We do want those pass through NICs etc that are PCIe
- as long as you use normal guest crypto stuff then the host
can be just as nasty as it likes with the data they present.

c) The world has enough bus protocols, and people understand the
basics of PCI(e) - we really don't need another one.

> > Having said that, how happy are you with Thunderbolt PCI devices being
> > plugged into your laptop or into the hotplug NVMe slot on a server?
>
> We have protection for that, and have had it for many years. Same for
> USB devices. This isn't new, perhaps you all have not noticed those
> features be added and taken advantage of already by many Linux distros
> and system images (i.e. ChromeOS and embedded systems?)

What protection? I know we have an IOMMU, and that stops the device
stamping all over RAM by itself - but I think Intel's worries are more
subtle, things where the device starts playing with what PCI devices
are expected to do to try and trigger untested kernel paths. I don't
think there's protection against that.
I know we can lock by PCI/USB vendor/device ID - but those can be made
up trivially; protection like that is meaningless.

> > We're now in the position we were with random USB devices years ago.
>
> Nope, we are not, again, we already handle random PCI devices being
> plugged in. It's up to userspace to make the policy decision if it
> should be trusted or not before the kernel has access to it.
>
> So a meta-comment, why not just use that today? If your guest OS can
> not authenticate the PCI device passed to it, don't allow the kernel to
> bind to it. If it can be authenticated, wonderful, bind away! You can
> do this today with no kernel changes needed.

Because:
a) there's no good way to authenticate a PCI device yet
- any nasty device can claim to have a given PCI ID.
b) Even if you could, there's no man-in-the-middle protection yet.

> > Also we would want to make sure that any config data that the hypervisor
> > can pass to the guest is validated.
>
> Define "validated" please.

Lets say you get something like a ACPI table or qemu fw.cfg table
giving details of your devices; if the hypervisor builds those in a
nasty way what happens?

> > The problem seems reasonably well understood within the CoCo world - how
> > far people want to push it probably varies; but it's good to make the
> > problem more widely understood.
>
> The "CoCo" world seems distant and separate from the real-world of Linux
> kernel development if you all do not even know about the authentication
> methods that we have for years for enabling access to PCI and USB
> devices as described above. If the impementations that we currently
> have are lacking in some way, wonderful, please submit changes for them
> and we will be glad to review them as needed.

That's probably fair to some degree - the people looking at this are VM
people, not desktop people; I'm not sure what the overlap is; but as I
say above, I don't think the protection currently available really help
here. Please show us where we're wrong.

> Remember, it's up to you all to convince us that your changes make
> actual sense and are backed up with working implementations. Not us :)

Sure; I'm seeing existing implementations being used in vendors clouds
at the moment, and they're slowly getting the security people want.
I'd like to see that being done with upstream kernels and firmware.

Dave


> good luck!
>
> greg k-h
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> > > > >
> > > > > You mentioned couple of times (last time in this recent thread:
> > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > > > discussing the updated threat model for kernel, so this email is a start in this direction.
> > > >
> > > > Any specific reason you didn't cc: the linux-hardening mailing list?
> > > > This seems to be in their area as well, right?
> > > >
> > > > > As we have shared before in various lkml threads/conference presentations
> > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> > > >
> > > > That is, frankly, a very funny threat model. How realistic is it really
> > > > given all of the other ways that a hypervisor can mess with a guest?
> > >
> > > It's what a lot of people would like; in the early attempts it was easy
> > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> > > can mess with - remember that not just the memory is encrypted, so is
> > > the register state, and the guest gets to see changes to mapping and a
> > > lot of control over interrupt injection etc.
> >
> > And due to the fact that SEV and TDX really do not work, how is anyone
> > expecting any of this to work? As one heckler on IRC recently put it,
> > if you squint hard enough, you can kind of ignore the real-world issues
> > here, so perhaps this should all be called "squint-puting" in order to
> > feel like you have a "confidential" system? :)
>
> I agree the original SEV was that weak; I've not seen anyone give a good
> argument against SNP or TDX.

Argument that it doesn't work? I thought that ship sailed a long time
ago but I could be wrong as I don't really pay attention to that stuff
as it's just vaporware :)

> > > > So what do you actually trust here? The CPU? A device? Nothing?
> > >
> > > We trust the actual physical CPU, provided that it can prove that it's a
> > > real CPU with the CoCo hardware enabled.
> >
> > Great, so why not have hardware attestation also for your devices you
> > wish to talk to? Why not use that as well? Then you don't have to
> > worry about anything in the guest.
>
> There were some talks at Plumbers where PCIe is working on adding that;
> it's not there yet though. I think that's PCIe 'Integrity and Data
> Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> SPDM. I don't know much of the detail of those, just that they're far
> enough off that people aren't depending on them yet.

Then work with those groups to implement that in an industry-wide way
and then take advantage of it by adding support for it to Linux! Don't
try to reinvent the same thing in a totally different way please.

> > > Both the SNP and TDX hardware
> > > can perform an attestation signed by the CPU to prove to someone
> > > external that the guest is running on a real trusted CPU.
> >
> > And again, do the same thing for the other hardware devices and all is
> > good. To not do that is to just guess and wave hands. You know this :)
>
> That wouldn't help you necessarily for virtual devices - where the
> hypervisor implements the device (like a virtual NIC).

Then create a new bus for that if you don't trust the virtio bus today.

> > > > I hate the term "hardening". Please just say it for what it really is,
> > > > "fixing bugs to handle broken hardware". We've done that for years when
> > > > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > > > be doing. How is this any different in the end?
> > > >
> > > > So what you also are saying here now is "we do not trust any PCI
> > > > devices", so please just say that (why do you trust USB devices?) If
> > > > that is something that you all think that Linux should support, then
> > > > let's go from there.
> > >
> > > I don't think generally all PCI device drivers guard against all the
> > > nasty things that a broken implementation of their hardware can do.
> >
> > I know that all PCI drivers can NOT do that today as that was never
> > anything that Linux was designed for.
>
> Agreed; which again is why I only really worry about the subset of
> devices I'd want in a CoCo VM.

Everyone wants a subset, different from other's subset, which means you
need them all. Sorry.

> > > The USB devices are probably a bit better, because they actually worry
> > > about people walking up with a nasty HID device; I'm skeptical that
> > > a kernel would survive a purposely broken USB controller.
> >
> > I agree with you there, USB drivers are only starting to be fuzzed at
> > the descriptor level, that's all. Which is why they too can be put into
> > the "untrusted" area until you trust them.
> >
> > > I'm not sure the request here isn't really to make sure *all* PCI devices
> > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> > > and potentially ones that people will want to pass-through (which
> > > generally needs a lot more work to make safe).
> > > (I've not looked at these Intel tools to see what they cover)
> >
> > Why not just create a whole new bus path for these "trusted" devices to
> > attach to and do that instead of tyring to emulate a protocol that was
> > explicitly designed NOT to this model at all? Why are you trying to
> > shoehorn something here and not just designing it properly from the
> > beginning?
>
> I'd be kind of OK with that for the virtual devices; but:
>
> a) I think you'd start reinventing PCIe with enumeration etc

Great, then work with the PCI group as talked about above to solve it
properly and not do whack-a-mole like seems to be happening so far.

> b) We do want those pass through NICs etc that are PCIe
> - as long as you use normal guest crypto stuff then the host
> can be just as nasty as it likes with the data they present.

Great, work with the PCI spec for verified devices.

> c) The world has enough bus protocols, and people understand the
> basics of PCI(e) - we really don't need another one.

Great, work with the PCI spec people please.

> > > Having said that, how happy are you with Thunderbolt PCI devices being
> > > plugged into your laptop or into the hotplug NVMe slot on a server?
> >
> > We have protection for that, and have had it for many years. Same for
> > USB devices. This isn't new, perhaps you all have not noticed those
> > features be added and taken advantage of already by many Linux distros
> > and system images (i.e. ChromeOS and embedded systems?)
>
> What protection? I know we have an IOMMU, and that stops the device
> stamping all over RAM by itself - but I think Intel's worries are more
> subtle, things where the device starts playing with what PCI devices
> are expected to do to try and trigger untested kernel paths. I don't
> think there's protection against that.
> I know we can lock by PCI/USB vendor/device ID - but those can be made
> up trivially; protection like that is meaningless.

Then combine it with device attestation and you have a solved solution,
don't ignore others working on this please.

> > > We're now in the position we were with random USB devices years ago.
> >
> > Nope, we are not, again, we already handle random PCI devices being
> > plugged in. It's up to userspace to make the policy decision if it
> > should be trusted or not before the kernel has access to it.
> >
> > So a meta-comment, why not just use that today? If your guest OS can
> > not authenticate the PCI device passed to it, don't allow the kernel to
> > bind to it. If it can be authenticated, wonderful, bind away! You can
> > do this today with no kernel changes needed.
>
> Because:
> a) there's no good way to authenticate a PCI device yet
> - any nasty device can claim to have a given PCI ID.
> b) Even if you could, there's no man-in-the-middle protection yet.

Where is the "man" here in the middle of?

And any PCI attestation should handle that, if not, work with them to
solve that please.

Thunderbolt has authenticated device support today, and so does PCI, and
USB has had it for a decade or so. Use the in-kernel implementation
that we already have or again, show us where it is lacking and we will
be glad to take patches to cover the holes (as we did last year when
ChromeOS implemented support for it in their userspace.)

> > > Also we would want to make sure that any config data that the hypervisor
> > > can pass to the guest is validated.
> >
> > Define "validated" please.
>
> Lets say you get something like a ACPI table or qemu fw.cfg table
> giving details of your devices; if the hypervisor builds those in a
> nasty way what happens?

You tell me, as we trust ACPI tables today, and if we can not, again
then you need to change the model of what Linux does. Why isn't the
BIOS authentication path working properly for ACPI tables already today?
I thought that was a long-solved problem with UEFI (if not, I'm sure the
UEFI people would be interested.)

Anyway, I'll wait until I see real patches as this thread seems to be
totally vague and ignores our current best-practices for pluggable
devices for some odd reason.

thanks,

greg k-h
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
Replying only to the not-so-far addressed points.

> On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > Hi Greg,
> >
> > You mentioned couple of times (last time in this recent thread:
> > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to
> start
> > discussing the updated threat model for kernel, so this email is a start in this
> direction.
>
> Any specific reason you didn't cc: the linux-hardening mailing list?
> This seems to be in their area as well, right?

Added now, I am just not sure how many mailing lists I want to cross spam this.
And this is a very special aspect of 'hardening' since it is about hardening a kernel
under different threat model/assumptions.


> I hate the term "hardening". Please just say it for what it really is,
> "fixing bugs to handle broken hardware". We've done that for years when
> dealing with PCI and USB and even CPUs doing things that they shouldn't
> be doing. How is this any different in the end?

Well, that would not be fully correct in this case. You can really see it from two
angles:

1. fixing bugs to handle broken hardware
2. fixing bugs that are result of correctly operating HW, but incorrectly or maliciously
operating hypervisor (acting as a man in the middle)

We focus on 2 but it happens to address 1 also to some level.

>
> So what you also are saying here now is "we do not trust any PCI
> devices", so please just say that (why do you trust USB devices?) If
> that is something that you all think that Linux should support, then
> let's go from there.
>
> > 3) All the tools are open-source and everyone can start using them right away
> even
> > without any special HW (readme has description of what is needed).
> > Tools and documentation is here:
> > https://github.com/intel/ccc-linux-guest-hardening
>
> Again, as our documentation states, when you submit patches based on
> these tools, you HAVE TO document that. Otherwise we think you all are
> crazy and will get your patches rejected. You all know this, why ignore
> it?

Sorry, I didn’t know that for every bug that is found in linux kernel when
we are submitting a fix that we have to list the way how it has been found.
We will fix this in the future submissions, but some bugs we have are found by
plain code audit, so 'human' is the tool.

>
> > 4) all not yet upstreamed linux patches (that we are slowly submitting) can be
> found
> > here: https://github.com/intel/tdx/commits/guest-next
>
> Random github trees of kernel patches are just that, sorry.

This was just for a completeness or for anyone who is curious to see the actual
code already now. Of course they will be submitted for review
using normal process.

>
> > So, my main question before we start to argue about the threat model,
> mitigations, etc,
> > is what is the good way to get this reviewed to make sure everyone is aligned?
> > There are a lot of angles and details, so what is the most efficient method?
> > Should I split the threat model from https://intel.github.io/ccc-linux-guest-
> hardening-docs/security-spec.html
> > into logical pieces and start submitting it to mailing list for discussion one by
> one?
>
> Yes, start out by laying out what you feel the actual problem is, what
> you feel should be done for it, and the patches you have proposed to
> implement this, for each and every logical piece.

OK, so this thread is about the actual threat model and overall problem.
We can re-write the current bug fixe patches (virtio and MSI) to refer to this threat model
properly and explain that they fix the actual bugs under this threat model.
Rest of pieces will come when other patches will be submitted for the review
in logical groups.

Does this work?

>
> Again, nothing new here, that's how Linux is developed, again, you all
> know this, it's not anything I should have to say.
>
> > Any other methods?
> >
> > The original plan we had in mind is to start discussing the relevant pieces when
> submitting the code,
> > i.e. when submitting the device filter patches, we will include problem
> statement, threat model link,
> > data, alternatives considered, etc.
>
> As always, we can't do anything without actual working changes to the
> code, otherwise it's just a pipe dream and we can't waste our time on it
> (neither would you want us to).

Of course, code exists, we just only starting submitting it. We started from
easy bug fixes because they are small trivial fixes that are easy to review.
Bigger pieces will follow (for example Satya has been addressing your comments about the
device filter in his new implementation).

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > Hi Greg,
> > > >
> > > > You mentioned couple of times (last time in this recent thread:
> > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > > discussing the updated threat model for kernel, so this email is a start in this direction.
> > >
> > > Any specific reason you didn't cc: the linux-hardening mailing list?
> > > This seems to be in their area as well, right?
> > >
> > > > As we have shared before in various lkml threads/conference presentations
> > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> > >
> > > That is, frankly, a very funny threat model. How realistic is it really
> > > given all of the other ways that a hypervisor can mess with a guest?
> >
> > It's what a lot of people would like; in the early attempts it was easy
> > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> > can mess with - remember that not just the memory is encrypted, so is
> > the register state, and the guest gets to see changes to mapping and a
> > lot of control over interrupt injection etc.
> >
> > > So what do you actually trust here? The CPU? A device? Nothing?
> >
> > We trust the actual physical CPU, provided that it can prove that it's a
> > real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware
> > can perform an attestation signed by the CPU to prove to someone
> > external that the guest is running on a real trusted CPU.
> >
> > Note that the trust is limited:
> > a) We don't trust that we can make forward progress - if something
> > does something bad it's OK for the guest to stop.
> > b) We don't trust devices, and we don't trust them by having the guest
> > do normal encryption; e.g. just LUKS on the disk and normal encrypted
> > networking. [There's a lot of schemes people are working on about how
> > the guest gets the keys etc for that)
>
> I think we need to more precisely say what we mean by 'trust' as it
> can have quite a broad interpretation.
>
> As a baseline requirement, in the context of confidential computing the
> guest would not trust the hypervisor with data that needs to remain
> confidential, but would generally still expect it to provide a faithful
> implementation of a given device.
>
> IOW, the guest would expect the implementation of virtio-blk devices to
> be functionally correct per the virtio-blk specification, but would not
> trust the host to protect confidentiality any stored data in the disk.
>
> Any virtual device exposed to the guest that can transfer potentially
> sensitive data needs to have some form of guest controlled encryption
> applied. For disks this is easy with FDE like LUKS, for NICs this is
> already best practice for services by using TLS. Other devices may not
> have good existing options for applying encryption.
>
> If the guest has a virtual keyboard, mouse and graphical display, which
> is backed by a VNC/RDP server in the host, then all that is visible to the
> host. There's no pre-existing solutions I know can could offer easy
> confidentiality for basic console I/O from the start of guest firmware
> onwards. The best is to spawn a VNC/RDP server in the guest at some
> point during boot. Means you can't login to the guest in single user
> mode with your root password though, without compromising it.
>
> The problem also applies for common solutions today where the host passes
> in config data to the guest, for consumption by tools like cloud-init.
> This is used in the past to inject an SSH key for example, or set the
> guest root password. Such data received from the host can no longer be
> trusted, as the host can see the data, or subsitute its own SSH key(s)
> in order to gain access. Cloud-init needs to get its config data from
> a trusted source, likely an external attestation server
>
>
> A further challenge surrounds handling of undesirable devices. A goal
> of OS development has been to ensure that both coldplugged and hotplugged
> devices "just work" out of the box with zero guest admin config required.
> To some extent this is contrary to what a confidential guest will want.
> It doesn't want a getty spawned on any console exposed, it doesn't want
> to use a virtio-rng exposed by the host which could be feeding non-random.
>
>
> Protecting against malicious implementations of devices is conceivably
> interesting, as a hardening task. A malicious host may try to take
> advantage of the guest OS device driver impl to exploit the guest OS
> kernel with an end goal of getting into a state where it can be made
> to reveal confidential data that was otherwise protected.

I think this is really what the Intel stuff is trying to protect
against.

Dave

> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote:
> Everyone wants a subset, different from other's subset, which means you
> need them all. Sorry.

Well if there's a very popular system (virtual in this case) that needs
a specific config to work well, then I guess
arch/x86/configs/ccguest.config or whatever might be acceptable, no?
Lots of precedent here.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > You mentioned couple of times (last time in this recent thread:
> > > > > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> > > > > > discussing the updated threat model for kernel, so this email is a start in this direction.
> > > > >
> > > > > Any specific reason you didn't cc: the linux-hardening mailing list?
> > > > > This seems to be in their area as well, right?
> > > > >
> > > > > > As we have shared before in various lkml threads/conference presentations
> > > > > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> > > > > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> > > > >
> > > > > That is, frankly, a very funny threat model. How realistic is it really
> > > > > given all of the other ways that a hypervisor can mess with a guest?
> > > >
> > > > It's what a lot of people would like; in the early attempts it was easy
> > > > to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
> > > > can mess with - remember that not just the memory is encrypted, so is
> > > > the register state, and the guest gets to see changes to mapping and a
> > > > lot of control over interrupt injection etc.
> > >
> > > And due to the fact that SEV and TDX really do not work, how is anyone
> > > expecting any of this to work? As one heckler on IRC recently put it,
> > > if you squint hard enough, you can kind of ignore the real-world issues
> > > here, so perhaps this should all be called "squint-puting" in order to
> > > feel like you have a "confidential" system? :)
> >
> > I agree the original SEV was that weak; I've not seen anyone give a good
> > argument against SNP or TDX.
>
> Argument that it doesn't work? I thought that ship sailed a long time
> ago but I could be wrong as I don't really pay attention to that stuff
> as it's just vaporware :)

You're being unfair claiming it's vaporware. You can go out and buy SNP
hardware now (for over a year), the patches are on list and under review
(and have been for quite a while).
If you're claiming it doesn't, please justify it.

> > > > > So what do you actually trust here? The CPU? A device? Nothing?
> > > >
> > > > We trust the actual physical CPU, provided that it can prove that it's a
> > > > real CPU with the CoCo hardware enabled.
> > >
> > > Great, so why not have hardware attestation also for your devices you
> > > wish to talk to? Why not use that as well? Then you don't have to
> > > worry about anything in the guest.
> >
> > There were some talks at Plumbers where PCIe is working on adding that;
> > it's not there yet though. I think that's PCIe 'Integrity and Data
> > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > SPDM. I don't know much of the detail of those, just that they're far
> > enough off that people aren't depending on them yet.
>
> Then work with those groups to implement that in an industry-wide way
> and then take advantage of it by adding support for it to Linux! Don't
> try to reinvent the same thing in a totally different way please.

Sure, people are working with them; but those are going to take time
and people want to use existing PCIe devices; and given that the hosts
are available that seems reasonable.

> > > > Both the SNP and TDX hardware
> > > > can perform an attestation signed by the CPU to prove to someone
> > > > external that the guest is running on a real trusted CPU.
> > >
> > > And again, do the same thing for the other hardware devices and all is
> > > good. To not do that is to just guess and wave hands. You know this :)
> >
> > That wouldn't help you necessarily for virtual devices - where the
> > hypervisor implements the device (like a virtual NIC).
>
> Then create a new bus for that if you don't trust the virtio bus today.

It's not that I distrust the virtio bus - just that we need to make sure
it's implementation is pessimistic enough for CoCo.

> > > > > I hate the term "hardening". Please just say it for what it really is,
> > > > > "fixing bugs to handle broken hardware". We've done that for years when
> > > > > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > > > > be doing. How is this any different in the end?
> > > > >
> > > > > So what you also are saying here now is "we do not trust any PCI
> > > > > devices", so please just say that (why do you trust USB devices?) If
> > > > > that is something that you all think that Linux should support, then
> > > > > let's go from there.
> > > >
> > > > I don't think generally all PCI device drivers guard against all the
> > > > nasty things that a broken implementation of their hardware can do.
> > >
> > > I know that all PCI drivers can NOT do that today as that was never
> > > anything that Linux was designed for.
> >
> > Agreed; which again is why I only really worry about the subset of
> > devices I'd want in a CoCo VM.
>
> Everyone wants a subset, different from other's subset, which means you
> need them all. Sorry.

I think for CoCo the subset is fairly small, even including all the
people discussing it. It's the virtual devices, and a few of their
favourite physical devices, but a fairly small subset.

> > > > The USB devices are probably a bit better, because they actually worry
> > > > about people walking up with a nasty HID device; I'm skeptical that
> > > > a kernel would survive a purposely broken USB controller.
> > >
> > > I agree with you there, USB drivers are only starting to be fuzzed at
> > > the descriptor level, that's all. Which is why they too can be put into
> > > the "untrusted" area until you trust them.
> > >
> > > > I'm not sure the request here isn't really to make sure *all* PCI devices
> > > > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> > > > and potentially ones that people will want to pass-through (which
> > > > generally needs a lot more work to make safe).
> > > > (I've not looked at these Intel tools to see what they cover)
> > >
> > > Why not just create a whole new bus path for these "trusted" devices to
> > > attach to and do that instead of tyring to emulate a protocol that was
> > > explicitly designed NOT to this model at all? Why are you trying to
> > > shoehorn something here and not just designing it properly from the
> > > beginning?
> >
> > I'd be kind of OK with that for the virtual devices; but:
> >
> > a) I think you'd start reinventing PCIe with enumeration etc
>
> Great, then work with the PCI group as talked about above to solve it
> properly and not do whack-a-mole like seems to be happening so far.
>
> > b) We do want those pass through NICs etc that are PCIe
> > - as long as you use normal guest crypto stuff then the host
> > can be just as nasty as it likes with the data they present.
>
> Great, work with the PCI spec for verified devices.
>
> > c) The world has enough bus protocols, and people understand the
> > basics of PCI(e) - we really don't need another one.
>
> Great, work with the PCI spec people please.

As I say above; all happening - but it's going to take years.
It's wrong to leave users with less secure solutions if there are simple
fixes available. I agree that if it involves major pain all over then
I can see your dislike - but if it's small fixes then what's the
problem?

> > > > Having said that, how happy are you with Thunderbolt PCI devices being
> > > > plugged into your laptop or into the hotplug NVMe slot on a server?
> > >
> > > We have protection for that, and have had it for many years. Same for
> > > USB devices. This isn't new, perhaps you all have not noticed those
> > > features be added and taken advantage of already by many Linux distros
> > > and system images (i.e. ChromeOS and embedded systems?)
> >
> > What protection? I know we have an IOMMU, and that stops the device
> > stamping all over RAM by itself - but I think Intel's worries are more
> > subtle, things where the device starts playing with what PCI devices
> > are expected to do to try and trigger untested kernel paths. I don't
> > think there's protection against that.
> > I know we can lock by PCI/USB vendor/device ID - but those can be made
> > up trivially; protection like that is meaningless.
>
> Then combine it with device attestation and you have a solved solution,
> don't ignore others working on this please.
>
> > > > We're now in the position we were with random USB devices years ago.
> > >
> > > Nope, we are not, again, we already handle random PCI devices being
> > > plugged in. It's up to userspace to make the policy decision if it
> > > should be trusted or not before the kernel has access to it.
> > >
> > > So a meta-comment, why not just use that today? If your guest OS can
> > > not authenticate the PCI device passed to it, don't allow the kernel to
> > > bind to it. If it can be authenticated, wonderful, bind away! You can
> > > do this today with no kernel changes needed.
> >
> > Because:
> > a) there's no good way to authenticate a PCI device yet
> > - any nasty device can claim to have a given PCI ID.
> > b) Even if you could, there's no man-in-the-middle protection yet.
>
> Where is the "man" here in the middle of?

I'm worried what a malicious hypervisor could do.

> And any PCI attestation should handle that, if not, work with them to
> solve that please.

I believe the two mechanisms I mentioned above would handle that; when
it eventually gets there.

> Thunderbolt has authenticated device support today, and so does PCI, and
> USB has had it for a decade or so. Use the in-kernel implementation
> that we already have or again, show us where it is lacking and we will
> be glad to take patches to cover the holes (as we did last year when
> ChromeOS implemented support for it in their userspace.)

I'd appreciate pointers to the implementations you're referring to.

> > > > Also we would want to make sure that any config data that the hypervisor
> > > > can pass to the guest is validated.
> > >
> > > Define "validated" please.
> >
> > Lets say you get something like a ACPI table or qemu fw.cfg table
> > giving details of your devices; if the hypervisor builds those in a
> > nasty way what happens?
>
> You tell me, as we trust ACPI tables today, and if we can not, again
> then you need to change the model of what Linux does. Why isn't the
> BIOS authentication path working properly for ACPI tables already today?
> I thought that was a long-solved problem with UEFI (if not, I'm sure the
> UEFI people would be interested.)

If it's part of the BIOS image that's measured/loaded during startup
then we're fine; if it's a table dynamically generated by the hypervisor
I'm more worried.

> Anyway, I'll wait until I see real patches as this thread seems to be
> totally vague and ignores our current best-practices for pluggable
> devices for some odd reason.

Please point people at those best practices rather than just ranting
about how pointless you feel all this is!

The patches here from Intel are a TOOL to find problems; I can't see the
objections to having a tool like this.

(I suspect some of these fixes might make the kernel a bit more robust
against unexpected hot-remove of PCIe devices as well; but that's more
of a guess)

Dave

> thanks,
>
> greg k-h
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 10:45:48AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote:
> > Everyone wants a subset, different from other's subset, which means you
> > need them all. Sorry.
>
> Well if there's a very popular system (virtual in this case) that needs
> a specific config to work well, then I guess
> arch/x86/configs/ccguest.config or whatever might be acceptable, no?
> Lots of precedent here.

OS vendors want the single kernel that fits all sizes: it should be
possible (and secure) to run a generic disto kernel within TDX/SEV guest.

--
Kiryl Shutsemau / Kirill A. Shutemov
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > Again, as our documentation states, when you submit patches based on
> > these tools, you HAVE TO document that. Otherwise we think you all are
> > crazy and will get your patches rejected. You all know this, why ignore
> > it?
>
> Sorry, I didn’t know that for every bug that is found in linux kernel when
> we are submitting a fix that we have to list the way how it has been found.
> We will fix this in the future submissions, but some bugs we have are found by
> plain code audit, so 'human' is the tool.

So the concern is that *you* may think it is a bug, but other people
may not agree. Perhaps what is needed is a full description of the
goals of Confidential Computing, and what is in scope, and what is
deliberately *not* in scope. I predict that when you do this, that
people will come out of the wood work and say, no wait, "CoCo ala
S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V
means QUUX".

Others may end up objecting, "no wait, doing this is going to mean
***insane*** changes to the entire kernel, and this will be a
performance / maintenance nightmare and unless you fix your hardware
in future chips, we wlil consider this a hardware bug and reject all
of your patches".

But it's better to figure this out now, then after you get hundreds of
patches into the upstream kernel, we discover that this is only 5% of
the necessary changes, and then the rest of your patches are rejected,
and you have to end up fixing the hardware anyway, with the patches
upstreamed so far being wasted effort. :-)

If we get consensus on that document, then that can get checked into
Documentation, and that can represent general consensus on the problem
early on.

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 07:02:03PM +0300, Kirill A. Shutemov wrote:
> On Wed, Jan 25, 2023 at 10:45:48AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote:
> > > Everyone wants a subset, different from other's subset, which means you
> > > need them all. Sorry.
> >
> > Well if there's a very popular system (virtual in this case) that needs
> > a specific config to work well, then I guess
> > arch/x86/configs/ccguest.config or whatever might be acceptable, no?
> > Lots of precedent here.
>
> OS vendors want the single kernel that fits all sizes: it should be
> possible (and secure) to run a generic disto kernel within TDX/SEV guest.

If they want that, sure. But it then becomes this distro's
responsibility to configure things in a sane way. At least if
there's a known good config that's a place to document what
is known to work well. No?

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, 25 Jan 2023, Greg Kroah-Hartman wrote:

> Argument that it doesn't work? I thought that ship sailed a long time
> ago but I could be wrong as I don't really pay attention to that stuff
> as it's just vaporware :)

Greg, are you sure you are talking about *SEV-SNP* here? (*)

That ship hasn't sailed as far as I can tell, it's being actively worked
on.

With SEV-SNP launch attestation, FDE, and runtime remote attestation (**)
one thing that you get is a way how to ensure that the guest image that
you have booted in a (public) cloud hasn't been tampered with, even if you
have zero trust in the cloud provider and their hypervisor.

And that without the issues and side-channels previous SEV and SEV-ES had.

Which to me is a rather valid usecase in today's world, rather than
vaporware.

(*) and corresponding Intel-TDX support counterpart, once it exists

(**) which is not necessarily a kernel work of course, but rather
userspace integration work, e.g. based on Keylime

--
Jiri Kosina
SUSE Labs
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, 25 Jan 2023, Greg Kroah-Hartman wrote:

> How do you trust you got real data on the disk? On the network? Those
> are coming from the host, how is any of that data to be trusted? Where
> does the trust stop and why?

This is all well described in AMD SEV-SNP documentation, see page 5 of
[1]. All the external devices are treated as untrusted in that model.

[1] https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

--
Jiri Kosina
SUSE Labs
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
[cc += Jonathan Cameron, linux-pci]

On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > Great, so why not have hardware attestation also for your devices you
> > wish to talk to? Why not use that as well? Then you don't have to
> > worry about anything in the guest.
>
> There were some talks at Plumbers where PCIe is working on adding that;
> it's not there yet though. I think that's PCIe 'Integrity and Data
> Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> SPDM. I don't know much of the detail of those, just that they're far
> enough off that people aren't depending on them yet.

CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:

https://github.com/l1k/linux/commits/doe

It will allow for authentication of PCIe devices. Goal is to submit
this quarter (Q1). Afterwards we'll look into retrieving measurements
via CMA/SPDM and bringing up IDE encryption.

It's a kernel-native implementation which uses the existing crypto and
keys infrastructure and is wired into the appropriate places in the
PCI core to authenticate devices on enumeration and reauthenticate
when CMA/SPDM state is lost (after resume from D3cold, after a
Secondary Bus Reset and after a DPC-induced Hot Reset).

The device authentication service afforded here is generic.
It is up to users and vendors to decide how to employ it,
be it for "confidential computing" or something else.

Trusted root certificates to validate device certificates can be
installed into a kernel keyring using the familiar keyctl(1) utility,
but platform-specific roots of trust (such as a HSM) could be
supported as well.

I would like to stress that this particular effort is a collaboration
of multiple vendors. It is decidedly not a single vendor trying to
shoehorn something into upstream, so the criticism that has been
leveled upthread against other things does not apply here.

The Plumbers BoF you're referring to was co-authored by Jonathan Cameron
and me and its purpose was precisely to have an open discussion and
align on an approach that works for everyone:

https://lpc.events/event/16/contributions/1304/


> a) there's no good way to authenticate a PCI device yet
> - any nasty device can claim to have a given PCI ID.

CMA/SPDM prescribes that the Subject Alternative Name of the device
certificate contains the Vendor ID, Device ID, Subsystem Vendor ID,
Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0
sec 6.31.3).

Thus a forged Device ID in the Configuration Space Header will result
in authentication failure.

Thanks,

Lukas
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Again, as our documentation states, when you submit patches based on
> > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > crazy and will get your patches rejected. You all know this, why ignore
> > > it?
> >
> > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > we are submitting a fix that we have to list the way how it has been found.
> > We will fix this in the future submissions, but some bugs we have are found by
> > plain code audit, so 'human' is the tool.
>
> So the concern is that *you* may think it is a bug, but other people
> may not agree. Perhaps what is needed is a full description of the
> goals of Confidential Computing, and what is in scope, and what is
> deliberately *not* in scope. I predict that when you do this, that
> people will come out of the wood work and say, no wait, "CoCo ala
> S/390 means FOO", and "CoCo ala AMD means BAR", and "CoCo ala RISC V
> means QUUX".

Agree, and this is the reason behind starting this thread: to make sure people
agree on the threat model. The only reason why we submitted some trivial bugs
fixes separately is the fact that they *also* can be considered bugs under existing
threat model, if one thinks that kernel should be as robust as possible against
potential erroneous devices.

As described right in the beginning of the doc I shared [1] (adjusted now to remove
'TDX' and put generic 'CC guest kernel'), we want to make sure that an untrusted
host (and hypervisor) is not able to

1. archive privileged escalation into a CC guest kernel
2. compromise the confidentiality or integrity of CC guest private memory

The above security objectives give us two primary assets we want to protect:
CC guest execution context and CC guest private memory confidentiality and
integrity.

The DoS from the host towards CC guest is explicitly out of scope and a non-security
objective.

The attack surface in question is any interface exposed from a CC guest kernel
towards untrusted host that is not covered by the CC HW protections. Here the
exact list can differ somewhat depending on what technology is being used, but as
David already pointed out before: both CC guest memory and register state is
protected from host attacks, so we are focusing on other communication channels
and on generic interfaces used by Linux today.

Examples of such interfaces for TDX (and I think SEV shares most of them, but please
correct me if I am wrong here) are access to some MSRs and CPUIDs, port IO, MMIO
and DMA, access to PCI config space, KVM hypercalls (if hypervisor is KVM), TDX specific
hypercalls (this is technology specific), data consumed from untrusted host during the
CC guest initialization (including kernel itself, kernel command line, provided ACPI tables,
etc) and others described in [1].
An important note here is that these interfaces are not limited just to device drivers
(albeit device drivers are the biggest users for some of them), they are present through the whole
kernel in different subsystems and need careful examination and development of
mitigations.

The possible range of mitigations that we can apply is also wide, but you can roughly split it into
two groups:

1. mitigations that use various attestation mechanisms (we can attest the kernel code,
cmline, ACPI tables being provided and other potential configurations, and one day we
will hopefully also be able to attest devices we connect to CC guest and their configuration)

2. other mitigations for threats that attestation cannot cover, i.e. mainly runtime
interactions with the host.

Above sounds conceptually simple but the devil is as usual in details, but it doesn’t look
very impossible or smth that would need the ***insane*** changes to the entire kernel.

>
> Others may end up objecting, "no wait, doing this is going to mean
> ***insane*** changes to the entire kernel, and this will be a
> performance / maintenance nightmare and unless you fix your hardware
> in future chips, we wlil consider this a hardware bug and reject all
> of your patches".
>
> But it's better to figure this out now, then after you get hundreds of
> patches into the upstream kernel, we discover that this is only 5% of
> the necessary changes, and then the rest of your patches are rejected,
> and you have to end up fixing the hardware anyway, with the patches
> upstreamed so far being wasted effort. :-)
>
> If we get consensus on that document, then that can get checked into
> Documentation, and that can represent general consensus on the problem
> early on.

Sure, I am willing to work on this since we already spent quite a lot of effort
looking into this problem. My only question is how to organize a review of such
document in a sane and productive way and to make sure all relevant people
are included into discussion. As I said, this spawns across many areas in kernel,
and ideally you would want different people review their area in detail.
For example, one of many aspects we need to worry is security of CC guest LRNG (
especially in cases when we don’t have a trusted security HW source of entropy)
[2] and here a feedback from LRNG experts would be important.

I guess the first clear step I can do is to re-write the relevant part of [1] into a CC-technology
neutral language and then would need feedback and input from AMD guys to make
sure it correctly reflects their case also. We can probably do this preparation work
on linux-coco mailing list and then post for a wider review?

Best Regards,
Elena.

[1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#threat-model
[2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest

>
> - Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 04:16:02PM +0100, Greg Kroah-Hartman wrote:
> Argument that it doesn't work? I thought that ship sailed a long time
> ago but I could be wrong as I don't really pay attention to that stuff
> as it's just vaporware :)

Well, "vaporware" is a bold word, especially given the fact that one can
get a confidential VM using AMD SEV[1] or SEV-SNP[2] the cloud today.
Hardware for SEV-SNP is also widely available since at least October
2021.

But okay, there seems to be some misunderstanding what Confidential
Computing (CoCo) implicates, so let me state my view here.

The vision for CoCo is to remove trust from the hypervisor (HV), so that
a guest owner only needs to trust the hardware and the os vendor for the
VM to be trusted and the data in it to be secure.

The implication is that the guest-HV interface becomes an attack surface
for the guest, and there are two basic strategies to mitigate the risk:

1) Move HV functionality into the guest or the hardware and
reduce the guest-HV interface. This already happened to some
degree with the SEV-ES enablement, where instruction decoding
and handling of most intercepts moved into the guest kernel.

2) Harden the guest-HV interface against malicious input.

Where possible we are going with option 1, up to the point where
scheduling our VCPUs is the only point we need to trust the HV on.

For example, the whole interrupt injection logic will also move either
into guest context or the hardware (depends on the HW vendor). That
covers most of the CPU emulation that the HV was doing, but an equally
important part is device emulation.

For device emulation it is harder to move that into the trusted guest
context, first of all because there is limited hardware support for
that, secondly because it will not perform well.

So device emulation will have to stay in the HV for the forseeable
future (except for devices carrying secrets, like the TPM). What Elena
and others are trying in this thread is to make the wider kernel
community aware that malicious input to a device driver is a real
problem in some environments and driver hardening is actually worthwile.

Regards,

Joerg


[1] https://cloud.google.com/confidential-computing
[2] https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-vm-overview
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Lukas Wunner (lukas@wunner.de) wrote:
> [cc += Jonathan Cameron, linux-pci]
>
> On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > Great, so why not have hardware attestation also for your devices you
> > > wish to talk to? Why not use that as well? Then you don't have to
> > > worry about anything in the guest.
> >
> > There were some talks at Plumbers where PCIe is working on adding that;
> > it's not there yet though. I think that's PCIe 'Integrity and Data
> > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > SPDM. I don't know much of the detail of those, just that they're far
> > enough off that people aren't depending on them yet.
>
> CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
>
> https://github.com/l1k/linux/commits/doe

Thanks for the pointer - I'll go and hunt down that spec.

> It will allow for authentication of PCIe devices. Goal is to submit
> this quarter (Q1). Afterwards we'll look into retrieving measurements
> via CMA/SPDM and bringing up IDE encryption.
>
> It's a kernel-native implementation which uses the existing crypto and
> keys infrastructure and is wired into the appropriate places in the
> PCI core to authenticate devices on enumeration and reauthenticate
> when CMA/SPDM state is lost (after resume from D3cold, after a
> Secondary Bus Reset and after a DPC-induced Hot Reset).
>
> The device authentication service afforded here is generic.
> It is up to users and vendors to decide how to employ it,
> be it for "confidential computing" or something else.

As Samuel asks about who is doing the challenge; but I guess there are
also things like what happens when the host controls intermediate
switches and BAR access and when only VFs are passed to guests.

> Trusted root certificates to validate device certificates can be
> installed into a kernel keyring using the familiar keyctl(1) utility,
> but platform-specific roots of trust (such as a HSM) could be
> supported as well.
>
> I would like to stress that this particular effort is a collaboration
> of multiple vendors. It is decidedly not a single vendor trying to
> shoehorn something into upstream, so the criticism that has been
> leveled upthread against other things does not apply here.
>
> The Plumbers BoF you're referring to was co-authored by Jonathan Cameron
> and me and its purpose was precisely to have an open discussion and
> align on an approach that works for everyone:
>
> https://lpc.events/event/16/contributions/1304/
>
>
> > a) there's no good way to authenticate a PCI device yet
> > - any nasty device can claim to have a given PCI ID.
>
> CMA/SPDM prescribes that the Subject Alternative Name of the device
> certificate contains the Vendor ID, Device ID, Subsystem Vendor ID,
> Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0
> sec 6.31.3).
>
> Thus a forged Device ID in the Configuration Space Header will result
> in authentication failure.

Good! It'll be nice when people figure out the CoCo integration for
that; I'm still guessing it's a little way off until we get hardware
for that.

Dave

> Thanks,
>
> Lukas
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, 26 Jan 2023 10:24:32 +0100
Samuel Ortiz <sameo@rivosinc.com> wrote:

> Hi Lukas,
>
> On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> > [cc += Jonathan Cameron, linux-pci]
> >
> > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > Great, so why not have hardware attestation also for your devices you
> > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > worry about anything in the guest.
> > >
> > > There were some talks at Plumbers where PCIe is working on adding that;
> > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > SPDM. I don't know much of the detail of those, just that they're far
> > > enough off that people aren't depending on them yet.
> >
> > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> >
> > https://github.com/l1k/linux/commits/doe
>
> Nice, thanks a lot for that.
>
>
>
> > The device authentication service afforded here is generic.
> > It is up to users and vendors to decide how to employ it,
> > be it for "confidential computing" or something else.
> >
> > Trusted root certificates to validate device certificates can be
> > installed into a kernel keyring using the familiar keyctl(1) utility,
> > but platform-specific roots of trust (such as a HSM) could be
> > supported as well.
> >
>
> This may have been discussed at LPC, but are there any plans to also
> support confidential computing flows where the host kernel is not part
> of the TCB and would not be trusted for validating the device cert chain
> nor for running the SPDM challenge?

There are lots of possible models for this. One simple option if the assigned
VF supports it is a CMA instance per VF. That will let the guest
do full attestation including measurement of whether the device is
appropriately locked down so the hypervisor can't mess with
configuration that affects the guest (without a reset anyway and that
is guest visible). Whether anyone builds that option isn't yet clear
though. If they do, Lukas' work should work there as well as for the
host OS. (Note I'm not a security expert so may be missing something!)

For extra fun, why should the device trust the host? Mutual authentication
fun (there are usecases where that matters)

There are way more complex options supported in PCIe TDISP (Tee Device
security interface protocols). Anyone have an visibility of open solutions
that make use of that? May be too new.

Jonathan


>
> Cheers,
> Samuel.
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> Replying only to the not-so-far addressed points.
>
> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > Hi Greg,

<...>

> > > 3) All the tools are open-source and everyone can start using them right away
> > even
> > > without any special HW (readme has description of what is needed).
> > > Tools and documentation is here:
> > > https://github.com/intel/ccc-linux-guest-hardening
> >
> > Again, as our documentation states, when you submit patches based on
> > these tools, you HAVE TO document that. Otherwise we think you all are
> > crazy and will get your patches rejected. You all know this, why ignore
> > it?
>
> Sorry, I didn’t know that for every bug that is found in linux kernel when
> we are submitting a fix that we have to list the way how it has been found.
> We will fix this in the future submissions, but some bugs we have are found by
> plain code audit, so 'human' is the tool.

My problem with that statement is that by applying different threat
model you "invent" bugs which didn't exist in a first place.

For example, in this [1] latest submission, authors labeled correct
behaviour as "bug".

[1] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com/

Thanks
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, 26 Jan 2023 10:48:50 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Lukas Wunner (lukas@wunner.de) wrote:
> > [cc += Jonathan Cameron, linux-pci]
> >
> > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > Great, so why not have hardware attestation also for your devices you
> > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > worry about anything in the guest.
> > >
> > > There were some talks at Plumbers where PCIe is working on adding that;
> > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > SPDM. I don't know much of the detail of those, just that they're far
> > > enough off that people aren't depending on them yet.
> >
> > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> >
> > https://github.com/l1k/linux/commits/doe
>
> Thanks for the pointer - I'll go and hunt down that spec.
>
> > It will allow for authentication of PCIe devices. Goal is to submit
> > this quarter (Q1). Afterwards we'll look into retrieving measurements
> > via CMA/SPDM and bringing up IDE encryption.
> >
> > It's a kernel-native implementation which uses the existing crypto and
> > keys infrastructure and is wired into the appropriate places in the
> > PCI core to authenticate devices on enumeration and reauthenticate
> > when CMA/SPDM state is lost (after resume from D3cold, after a
> > Secondary Bus Reset and after a DPC-induced Hot Reset).
> >
> > The device authentication service afforded here is generic.
> > It is up to users and vendors to decide how to employ it,
> > be it for "confidential computing" or something else.
>
> As Samuel asks about who is doing the challenge; but I guess there are
> also things like what happens when the host controls intermediate
> switches and BAR access and when only VFs are passed to guests.

Hmm. Bringing switches into the TCB came up at Plumbers.
You can get partly around that using selective IDE (end to end encryption)
but it has some disadvantages.

You can attest the switches if you don't mind bringing them into TCB
(one particularly cloud vendor person was very strongly against doing so!)
but they don't have nice VF type abstractions so the switch attestation
needs to go through someone who isn't the guest.

>
> > Trusted root certificates to validate device certificates can be
> > installed into a kernel keyring using the familiar keyctl(1) utility,
> > but platform-specific roots of trust (such as a HSM) could be
> > supported as well.
> >
> > I would like to stress that this particular effort is a collaboration
> > of multiple vendors. It is decidedly not a single vendor trying to
> > shoehorn something into upstream, so the criticism that has been
> > leveled upthread against other things does not apply here.
> >
> > The Plumbers BoF you're referring to was co-authored by Jonathan Cameron
> > and me and its purpose was precisely to have an open discussion and
> > align on an approach that works for everyone:
> >
> > https://lpc.events/event/16/contributions/1304/
> >
> >
> > > a) there's no good way to authenticate a PCI device yet
> > > - any nasty device can claim to have a given PCI ID.
> >
> > CMA/SPDM prescribes that the Subject Alternative Name of the device
> > certificate contains the Vendor ID, Device ID, Subsystem Vendor ID,
> > Subsystem ID, Class Code, Revision and Serial Number (PCIe r6.0
> > sec 6.31.3).
> >
> > Thus a forged Device ID in the Configuration Space Header will result
> > in authentication failure.
>
> Good! It'll be nice when people figure out the CoCo integration for
> that; I'm still guessing it's a little way off until we get hardware
> for that.

FYI: We have QEMU using the DMTF reference implementation (libspdm/spdm-emu)
if anyone wants to play with it. Avery Design folk did the qemu bridging to that
a while back. Not upstream yet*, but I'm carrying it on my staging CXL qemu tree.

https://gitlab.com/jic23/qemu/-/commit/8d0ad6bc84a5d96039aaf8f929c60b9f7ba02832

In combination with Lucas' tree mentioned earlier you can get all the handshaking
to happen to attest against certs. Don't think we are yet actually checking the
IDs but trivial to add (mainly a case of generating right certs with the
Subject Alternative Name set).

Jonathan

* It's a hack using the socket interface of spdm-emu tools - at some point I need
to start a discussion on QEMU list / with dmtf tools group on whether to fix
libspdm to actually work as a shared library, or cope with the current approach
(crossing fingers the socket interface remains stable in spdm-emu).

>
> Dave
>
> > Thanks,
> >
> > Lukas
> >
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > Replying only to the not-so-far addressed points.
> >
> > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > Hi Greg,
>
> <...>
>
> > > > 3) All the tools are open-source and everyone can start using them right
> away
> > > even
> > > > without any special HW (readme has description of what is needed).
> > > > Tools and documentation is here:
> > > > https://github.com/intel/ccc-linux-guest-hardening
> > >
> > > Again, as our documentation states, when you submit patches based on
> > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > crazy and will get your patches rejected. You all know this, why ignore
> > > it?
> >
> > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > we are submitting a fix that we have to list the way how it has been found.
> > We will fix this in the future submissions, but some bugs we have are found by
> > plain code audit, so 'human' is the tool.
>
> My problem with that statement is that by applying different threat
> model you "invent" bugs which didn't exist in a first place.
>
> For example, in this [1] latest submission, authors labeled correct
> behaviour as "bug".
>
> [1] https://lore.kernel.org/all/20230119170633.40944-1-
> alexander.shishkin@linux.intel.com/

Hm.. Does everyone think that when kernel dies with unhandled page fault
(such as in that case) or detection of a KASAN out of bounds violation (as it is in some
other cases we already have fixes or investigating) it represents a correct behavior even if
you expect that all your pci HW devices are trusted? What about an error in two
consequent pci reads? What about just some failure that results in erroneous input?

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Replying only to the not-so-far addressed points.
> > >
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> >
> > <...>
> >
> > > > > 3) All the tools are open-source and everyone can start using them right
> > away
> > > > even
> > > > > without any special HW (readme has description of what is needed).
> > > > > Tools and documentation is here:
> > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > >
> > > > Again, as our documentation states, when you submit patches based on
> > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > it?
> > >
> > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > we are submitting a fix that we have to list the way how it has been found.
> > > We will fix this in the future submissions, but some bugs we have are found by
> > > plain code audit, so 'human' is the tool.
> >
> > My problem with that statement is that by applying different threat
> > model you "invent" bugs which didn't exist in a first place.
> >
> > For example, in this [1] latest submission, authors labeled correct
> > behaviour as "bug".
> >
> > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > alexander.shishkin@linux.intel.com/
>
> Hm.. Does everyone think that when kernel dies with unhandled page fault
> (such as in that case) or detection of a KASAN out of bounds violation (as it is in some
> other cases we already have fixes or investigating) it represents a correct behavior even if
> you expect that all your pci HW devices are trusted?

This is exactly what I said. You presented me the cases which exist in
your invented world. Mentioned unhandled page fault doesn't exist in real
world. If PCI device doesn't work, it needs to be replaced/blocked and not
left to be operable and accessible from the kernel/user.

> What about an error in two consequent pci reads? What about just some
> failure that results in erroneous input?

Yes, some bugs need to be fixed, but they are not related to trust/not-trust
discussion and PCI spec violations.

Thanks

>
> Best Regards,
> Elena.
>
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> > > I hate the term "hardening". Please just say it for what it really is,
> > > "fixing bugs to handle broken hardware". We've done that for years when
> > > dealing with PCI and USB and even CPUs doing things that they shouldn't
> > > be doing. How is this any different in the end?
> > >
> > > So what you also are saying here now is "we do not trust any PCI
> > > devices", so please just say that (why do you trust USB devices?) If
> > > that is something that you all think that Linux should support, then
> > > let's go from there.
> >
> > I don't think generally all PCI device drivers guard against all the
> > nasty things that a broken implementation of their hardware can do.
>
> I know that all PCI drivers can NOT do that today as that was never
> anything that Linux was designed for.
>
> > The USB devices are probably a bit better, because they actually worry
> > about people walking up with a nasty HID device; I'm skeptical that
> > a kernel would survive a purposely broken USB controller.
>
> I agree with you there, USB drivers are only starting to be fuzzed at
> the descriptor level, that's all. Which is why they too can be put into
> the "untrusted" area until you trust them.
>
> > I'm not sure the request here isn't really to make sure *all* PCI devices
> > are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
> > and potentially ones that people will want to pass-through (which
> > generally needs a lot more work to make safe).
> > (I've not looked at these Intel tools to see what they cover)
>
> Why not just create a whole new bus path for these "trusted" devices to
> attach to and do that instead of tyring to emulate a protocol that was
> explicitly designed NOT to this model at all? Why are you trying to
> shoehorn something here and not just designing it properly from the
> beginning?
>
> > Having said that, how happy are you with Thunderbolt PCI devices being
> > plugged into your laptop or into the hotplug NVMe slot on a server?
>
> We have protection for that, and have had it for many years. Same for
> USB devices. This isn't new, perhaps you all have not noticed those
> features be added and taken advantage of already by many Linux distros
> and system images (i.e. ChromeOS and embedded systems?)
>
> > We're now in the position we were with random USB devices years ago.
>
> Nope, we are not, again, we already handle random PCI devices being
> plugged in. It's up to userspace to make the policy decision if it
> should be trusted or not before the kernel has access to it.
>
> So a meta-comment, why not just use that today? If your guest OS can
> not authenticate the PCI device passed to it, don't allow the kernel to
> bind to it. If it can be authenticated, wonderful, bind away! You can
> do this today with no kernel changes needed.
>
> > Also we would want to make sure that any config data that the hypervisor
> > can pass to the guest is validated.
>
> Define "validated" please.
>
> > The problem seems reasonably well understood within the CoCo world - how
> > far people want to push it probably varies; but it's good to make the
> > problem more widely understood.
>
> The "CoCo" world seems distant and separate from the real-world of Linux
> kernel development if you all do not even know about the authentication
> methods that we have for years for enabling access to PCI and USB
> devices as described above. If the impementations that we currently
> have are lacking in some way, wonderful, please submit changes for them
> and we will be glad to review them as needed.

We are aware of USB/Thunderbolt authorization framework and this is what we have
been extending now for the our CC usage in order to apply this to all devices.
The patches are currently under testing/polishing, but we will be submitting
them in the near future.

That's said even with the above in place we don’t get a protection from a man-in-
the-middle attacks that are possible by untrusted hypervisor or host. In order
to get a full protection here, we need an attestation and end-to-end secure channel
between devices and CC guest. However, since it is going to take a long time before
we have all the infrastructure in place in Linux, as well as devices that are capable of
supporting all required functionality (and some devices will never have this support such
as virtual devices), we need to have a reasonable security model now, vs waiting
until researchers are starting to post the proof-of-concept privilege escalation exploits
on smth that is even (thanks to the tools we created in in [1]) not so hard to find:
you run our fuzzing tools on the guest kernel tree of your liking and it gives you a nice set
of KASAN issues to play with. What we are trying to do is to address these findings (among
other things) for a more robust guest kernel.

Best Regards,
Elena

[1] https://github.com/intel/ccc-linux-guest-hardening
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote:
> On Thu, 26 Jan 2023 10:24:32 +0100
> Samuel Ortiz <sameo@rivosinc.com> wrote:
>
> > Hi Lukas,
> >
> > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > > [cc += Jonathan Cameron, linux-pci]
> > >
> > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > > Great, so why not have hardware attestation also for your devices you
> > > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > > worry about anything in the guest.
> > > >
> > > > There were some talks at Plumbers where PCIe is working on adding that;
> > > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > > SPDM. I don't know much of the detail of those, just that they're far
> > > > enough off that people aren't depending on them yet.
> > >
> > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> > >
> > > https://github.com/l1k/linux/commits/doe
> >
> > Nice, thanks a lot for that.
> >
> >
> >
> > > The device authentication service afforded here is generic.
> > > It is up to users and vendors to decide how to employ it,
> > > be it for "confidential computing" or something else.
> > >
> > > Trusted root certificates to validate device certificates can be
> > > installed into a kernel keyring using the familiar keyctl(1) utility,
> > > but platform-specific roots of trust (such as a HSM) could be
> > > supported as well.
> > >
> >
> > This may have been discussed at LPC, but are there any plans to also
> > support confidential computing flows where the host kernel is not part
> > of the TCB and would not be trusted for validating the device cert chain
> > nor for running the SPDM challenge?
>
> There are lots of possible models for this. One simple option if the assigned
> VF supports it is a CMA instance per VF. That will let the guest
> do full attestation including measurement of whether the device is
> appropriately locked down so the hypervisor can't mess with
> configuration that affects the guest (without a reset anyway and that
> is guest visible).

So the VF would be directly assigned to the guest, and the guest kernel
would create a CMA instance for the VF, and do the SPDM authentication
(based on a guest provided trusted root certificate). I think one
security concern with that approach is assigning the VF to the
(potentially confidential) guest address space without the guest being
able to attest of the device trustworthiness first. That's what TDISP is
aiming at fixing (establish a secure SPDM between the confidential guest
and the device, lock the device from the guest, attest and then enable
DMA).

> Whether anyone builds that option isn't yet clear
> though. If they do, Lukas' work should work there as well as for the
> host OS. (Note I'm not a security expert so may be missing something!)
>
> For extra fun, why should the device trust the host? Mutual authentication
> fun (there are usecases where that matters)
>
> There are way more complex options supported in PCIe TDISP (Tee Device
> security interface protocols). Anyone have an visibility of open solutions
> that make use of that? May be too new.

It's still a PCI ECN, so quite new indeed.
FWIW the rust spdm crate [1] implements the TDISP state machine.

Cheers,
Samuel.

[1] https://github.com/jyao1/rust-spdm
>
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > Replying only to the not-so-far addressed points.
> > > >
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > >
> > > <...>
> > >
> > > > > > 3) All the tools are open-source and everyone can start using them right
> > > away
> > > > > even
> > > > > > without any special HW (readme has description of what is needed).
> > > > > > Tools and documentation is here:
> > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > >
> > > > > Again, as our documentation states, when you submit patches based on
> > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > it?
> > > >
> > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > we are submitting a fix that we have to list the way how it has been found.
> > > > We will fix this in the future submissions, but some bugs we have are found
> by
> > > > plain code audit, so 'human' is the tool.
> > >
> > > My problem with that statement is that by applying different threat
> > > model you "invent" bugs which didn't exist in a first place.
> > >
> > > For example, in this [1] latest submission, authors labeled correct
> > > behaviour as "bug".
> > >
> > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > alexander.shishkin@linux.intel.com/
> >
> > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> some
> > other cases we already have fixes or investigating) it represents a correct
> behavior even if
> > you expect that all your pci HW devices are trusted?
>
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.

Can we really assure correct operation of *all* pci devices out there?
How would such an audit be performed given a huge set of them available?
Isnt it better instead to make a small fix in the kernel behavior that would guard
us from such potentially not correctly operating devices?


>
> > What about an error in two consequent pci reads? What about just some
> > failure that results in erroneous input?
>
> Yes, some bugs need to be fixed, but they are not related to trust/not-trust
> discussion and PCI spec violations.

Let's forget the trust angle here (it only applies to the Confidential Computing
threat model and you clearly implying the existing threat model instead) and stick just to
the not-correctly operating device. What you are proposing is to fix *unknown* bugs
in multitude of pci devices that (in case of this particular MSI bug) can
lead to two different values being read from the config space and kernel incorrectly
handing this situation. Isn't it better to do the clear fix in one place to ensure such
situation (two subsequent reads with different values) cannot even happen in theory?
In security we have a saying that fixing a root cause of the problem is the most efficient
way to mitigate the problem. The root cause here is a double-read with different values,
so if it can be substituted with an easy and clear patch that probably even improves
performance as we do one less pci read and use cached value instead, where is the
problem in this particular case? If there are technical issues with the patch, of course we
need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
to be fixing kernel code when we notice such cases...

Best Regards,
Elena
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 10:48:50AM +0000, Dr. David Alan Gilbert wrote:
> * Lukas Wunner (lukas@wunner.de) wrote:
> > [cc += Jonathan Cameron, linux-pci]
> >
> > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > Great, so why not have hardware attestation also for your devices you
> > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > worry about anything in the guest.
> > >
> > > There were some talks at Plumbers where PCIe is working on adding that;
> > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > SPDM. I don't know much of the detail of those, just that they're far
> > > enough off that people aren't depending on them yet.
> >
> > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> >
> > https://github.com/l1k/linux/commits/doe
>
> Thanks for the pointer - I'll go and hunt down that spec.
>
> > It will allow for authentication of PCIe devices. Goal is to submit
> > this quarter (Q1). Afterwards we'll look into retrieving measurements
> > via CMA/SPDM and bringing up IDE encryption.
> >
> > It's a kernel-native implementation which uses the existing crypto and
> > keys infrastructure and is wired into the appropriate places in the
> > PCI core to authenticate devices on enumeration and reauthenticate
> > when CMA/SPDM state is lost (after resume from D3cold, after a
> > Secondary Bus Reset and after a DPC-induced Hot Reset).
> >
> > The device authentication service afforded here is generic.
> > It is up to users and vendors to decide how to employ it,
> > be it for "confidential computing" or something else.
>
> As Samuel asks about who is doing the challenge; but I guess there are
> also things like what happens when the host controls intermediate
> switches

You'd want to protect that through IDE selective streams.

> and BAR access and when only VFs are passed to guests.

TDISP aims at addressing that afaiu. Once the VF (aka TDI) is locked,
any changes to its BAR(s) or any PF MMIO that would affect the VF would
get the VF back to unlocked (and let the guest reject it).

Cheers,
Samuel.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > away
> > > > > > even
> > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > > it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > We will fix this in the future submissions, but some bugs we have are found
> > by
> > > > > plain code audit, so 'human' is the tool.
> > > >
> > > > My problem with that statement is that by applying different threat
> > > > model you "invent" bugs which didn't exist in a first place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled correct
> > > > behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > some
> > > other cases we already have fixes or investigating) it represents a correct
> > behavior even if
> > > you expect that all your pci HW devices are trusted?
> >
> > This is exactly what I said. You presented me the cases which exist in
> > your invented world. Mentioned unhandled page fault doesn't exist in real
> > world. If PCI device doesn't work, it needs to be replaced/blocked and not
> > left to be operable and accessible from the kernel/user.
>
> Can we really assure correct operation of *all* pci devices out there?

Why do we need to do it in 2022? These *all* pci devices work.

> How would such an audit be performed given a huge set of them available?

Compliance tests?
https://pcisig.com/developers/compliance-program

> Isnt it better instead to make a small fix in the kernel behavior that would guard
> us from such potentially not correctly operating devices?

Like Greg already said, this is a small drop in a ocean which needs to be changed.

However even in mentioned by me case, you are not fixing but hiding real
problem of having broken device in my machine. It is worst possible solution
for the users.

>
>
> >
> > > What about an error in two consequent pci reads? What about just some
> > > failure that results in erroneous input?
> >
> > Yes, some bugs need to be fixed, but they are not related to trust/not-trust
> > discussion and PCI spec violations.
>
> Let's forget the trust angle here (it only applies to the Confidential Computing
> threat model and you clearly implying the existing threat model instead) and stick just to
> the not-correctly operating device. What you are proposing is to fix *unknown* bugs
> in multitude of pci devices that (in case of this particular MSI bug) can
> lead to two different values being read from the config space and kernel incorrectly
> handing this situation.

Let's don't call bug for something which is not.

Random crashes are much more tolerable then "working" device which sends
random results.

> Isn't it better to do the clear fix in one place to ensure such
> situation (two subsequent reads with different values) cannot even happen in theory?
> In security we have a saying that fixing a root cause of the problem is the most efficient
> way to mitigate the problem. The root cause here is a double-read with different values,
> so if it can be substituted with an easy and clear patch that probably even improves
> performance as we do one less pci read and use cached value instead, where is the
> problem in this particular case? If there are technical issues with the patch, of course we
> need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
> to be fixing kernel code when we notice such cases...

Not really, we are arguing what is the right thing to do:
1. Fix a root cause - device
2. Hide the failure and pretend what everything is perfect despite
having problematic device.

Thanks

>
> Best Regards,
> Elena
>
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > Replying only to the not-so-far addressed points.
> > >
> > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > Hi Greg,
> >
> > <...>
> >
> > > > > 3) All the tools are open-source and everyone can start using them right
> > away
> > > > even
> > > > > without any special HW (readme has description of what is needed).
> > > > > Tools and documentation is here:
> > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > >
> > > > Again, as our documentation states, when you submit patches based on
> > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > it?
> > >
> > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > we are submitting a fix that we have to list the way how it has been found.
> > > We will fix this in the future submissions, but some bugs we have are found by
> > > plain code audit, so 'human' is the tool.
> >
> > My problem with that statement is that by applying different threat
> > model you "invent" bugs which didn't exist in a first place.
> >
> > For example, in this [1] latest submission, authors labeled correct
> > behaviour as "bug".
> >
> > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > alexander.shishkin@linux.intel.com/
>
> Hm.. Does everyone think that when kernel dies with unhandled page fault
> (such as in that case) or detection of a KASAN out of bounds violation (as it is in some
> other cases we already have fixes or investigating) it represents a correct behavior even if
> you expect that all your pci HW devices are trusted? What about an error in two
> consequent pci reads? What about just some failure that results in erroneous input?

I'm not sure you'll get general agreement on those answers for all
devices and situations; I think for most devices for non-CoCo
situations, then people are generally OK with a misbehaving PCI device
causing a kernel crash, since most people are running without IOMMU
anyway, a misbehaving device can cause otherwise undetectable chaos.

I'd say:
a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
guarantee forward progress or stop the hypervisor doing something
truly stupid.

b) For CoCo, information disclosure, or corruption IS a problem

c) For non-CoCo some people might care about robustness of the kernel
against a failing PCI device, but generally I think they worry about
a fairly clean failure, even in the unexpected-hot unplug case.

d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe
device; if it's a device that attests OK and we trust it is the device
it says it is, do we give it freedom or are we still wary?

Dave


> Best Regards,
> Elena.
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> Any virtual device exposed to the guest that can transfer potentially
> sensitive data needs to have some form of guest controlled encryption
> applied. For disks this is easy with FDE like LUKS, for NICs this is
> already best practice for services by using TLS. Other devices may not
> have good existing options for applying encryption.

I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
but not transport. If an attacker can observe all IO you better
consult a cryptographer.
LUKS has no concept of session keys or such, so the same disk sector will
always get encrypted with the very same key/iv.

--
Thanks,
//richard
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Richard Weinberger (richard.weinberger@gmail.com) wrote:
> On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrang? <berrange@redhat.com> wrote:
> > Any virtual device exposed to the guest that can transfer potentially
> > sensitive data needs to have some form of guest controlled encryption
> > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > already best practice for services by using TLS. Other devices may not
> > have good existing options for applying encryption.
>
> I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> but not transport. If an attacker can observe all IO you better
> consult a cryptographer.
> LUKS has no concept of session keys or such, so the same disk sector will
> always get encrypted with the very same key/iv.

Are you aware of anything that you'd use instead?

Are you happy with dm-verity for protection against modification?

Dave

> --
> Thanks,
> //richard
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Richard Weinberger (richard.weinberger@gmail.com) wrote:
> > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > Any virtual device exposed to the guest that can transfer potentially
> > > sensitive data needs to have some form of guest controlled encryption
> > > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > > already best practice for services by using TLS. Other devices may not
> > > have good existing options for applying encryption.
> >
> > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> > but not transport. If an attacker can observe all IO you better
> > consult a cryptographer.
> > LUKS has no concept of session keys or such, so the same disk sector will
> > always get encrypted with the very same key/iv.
>
> Are you aware of anything that you'd use instead?

Well, I'd think towards iSCSI over TLS to protect the IO transport.

> Are you happy with dm-verity for protection against modification?

Like LUKS (actually dm-crypt) the crypto behind is designed to protect
persistent data not transport.
My fear is that an attacker who is able to observe IOs can do bad things.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Richard Weinberger (richard.weinberger@gmail.com) wrote:
> On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Richard Weinberger (richard.weinberger@gmail.com) wrote:
> > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrang? <berrange@redhat.com> wrote:
> > > > Any virtual device exposed to the guest that can transfer potentially
> > > > sensitive data needs to have some form of guest controlled encryption
> > > > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > > > already best practice for services by using TLS. Other devices may not
> > > > have good existing options for applying encryption.
> > >
> > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> > > but not transport. If an attacker can observe all IO you better
> > > consult a cryptographer.
> > > LUKS has no concept of session keys or such, so the same disk sector will
> > > always get encrypted with the very same key/iv.
> >
> > Are you aware of anything that you'd use instead?
>
> Well, I'd think towards iSCSI over TLS to protect the IO transport.

Yeh, that's not entirely crazy for VMs which tend to come off some
remote storage system.

> > Are you happy with dm-verity for protection against modification?
>
> Like LUKS (actually dm-crypt) the crypto behind is designed to protect
> persistent data not transport.
> My fear is that an attacker who is able to observe IOs can do bad things.

Hmm, OK, I'd assumed dm-crypt was OK since it's more hashlike and
unchanging.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 10:24:32AM +0100, Samuel Ortiz wrote:
> On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
> > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> >
> > https://github.com/l1k/linux/commits/doe
> >
> > The device authentication service afforded here is generic.
> > It is up to users and vendors to decide how to employ it,
> > be it for "confidential computing" or something else.
> >
> > Trusted root certificates to validate device certificates can be
> > installed into a kernel keyring using the familiar keyctl(1) utility,
> > but platform-specific roots of trust (such as a HSM) could be
> > supported as well.
>
> This may have been discussed at LPC, but are there any plans to also
> support confidential computing flows where the host kernel is not part
> of the TCB and would not be trusted for validating the device cert chain
> nor for running the SPDM challenge?

As long as a device is passed through to a guest, the guest owns
that device. It is the guest's prerogative and duty to perform
CMA/SPDM authentication on its own behalf. If the guest uses
memory encryption via TDX or SEV, key material established through
a Diffie-Hellman exchange between guest and device is invisible
to the host. Consequently using that key material for IDE encryption
protects device accesses from the guest against snooping by the host.

SPDM authentication consists of a sequence of exchanges, the first
being GET_VERSION. When a responder (=device) receives a GET_VERSION
request, it resets the connection and all internal state related to
that connection. (SPDM 1.2.1 margin no 185: "a Requester can issue
a GET_VERSION to a Responder to reset a connection at any time";
see also SPDM 1.1.0 margin no 161 for details.)

Thus, even though the host may have authenticated the device,
once it's passed through to a guest and the guest performs
authentication again, SPDM state on the device is reset.

I'll amend the patches so that the host refrains from performing
reauthentication as long as a device is passed through. The host
has no business mutating SPDM state on the device once ownership
has passed to the guest.

The first few SPDM exchanges are transmitted in the clear,
so the host can eavesdrop on the negotiated algorithms,
exchanged certificates and nonces. However the host cannot
successfully modify the exchanged data due to the man in the middle
protection afforded by SPDM: The challenge/response hash is
computed over the concatenation of the exchanged messages,
so modification of the messages by a man in the middle leads
to authentication failure.

Obviously the host can DoS guest access to the device by modifying
exchanged messages, but there are much simpler ways for it to
do that, say, by clearing Bus Master Enable or Memory Space Enable
bits in the Command Register. DoS attacks from the host against
the guest cannot be part of the threat model at this point.

Thanks,

Lukas
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 03:23:34PM +0100, Richard Weinberger wrote:
> On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Any virtual device exposed to the guest that can transfer potentially
> > sensitive data needs to have some form of guest controlled encryption
> > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > already best practice for services by using TLS. Other devices may not
> > have good existing options for applying encryption.
>
> I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> but not transport. If an attacker can observe all IO you better
> consult a cryptographer.
> LUKS has no concept of session keys or such, so the same disk sector will
> always get encrypted with the very same key/iv.

Yes, you're right, all the FDE cipher modes are susceptible to
time based analysis of I/O, so very far from ideal. You'll get
protection for your historically written confidential data at the
time a VM host is first compromised, but if (as) they retain long
term access to the host, confidentiality is increasingly undermined
the longer they can observe the ongoing I/O.

With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 04:13:11PM +0100, Richard Weinberger wrote:
> On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Richard Weinberger (richard.weinberger@gmail.com) wrote:
> > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > Any virtual device exposed to the guest that can transfer potentially
> > > > sensitive data needs to have some form of guest controlled encryption
> > > > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > > > already best practice for services by using TLS. Other devices may not
> > > > have good existing options for applying encryption.
> > >
> > > I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> > > but not transport. If an attacker can observe all IO you better
> > > consult a cryptographer.
> > > LUKS has no concept of session keys or such, so the same disk sector will
> > > always get encrypted with the very same key/iv.
> >
> > Are you aware of anything that you'd use instead?
>
> Well, I'd think towards iSCSI over TLS to protect the IO transport.

That just moves the problem elsewhere though surely. The remote iSCSI
server still has to persist the VMs' data, and the cloud sevice provider
can observe any I/O before it hits the final hardware storage. So the
remote iSCSI server needs to apply a FDE like encryption scheme for
the exported iSCSI block device, and using a key only accessible to the
tenant that owns the VM. It still needs to solve the same problem of
having some kind of "generation ID" that can tweak the IV for each virtual
disk sector, to protect against time based analysis.

With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, 26 Jan 2023 14:15:05 +0100
Samuel Ortiz <sameo@rivosinc.com> wrote:

> On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote:
> > On Thu, 26 Jan 2023 10:24:32 +0100
> > Samuel Ortiz <sameo@rivosinc.com> wrote:
> >
> > > Hi Lukas,
> > >
> > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
> > >
> > > > [cc += Jonathan Cameron, linux-pci]
> > > >
> > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > > > Great, so why not have hardware attestation also for your devices you
> > > > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > > > worry about anything in the guest.
> > > > >
> > > > > There were some talks at Plumbers where PCIe is working on adding that;
> > > > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > > > SPDM. I don't know much of the detail of those, just that they're far
> > > > > enough off that people aren't depending on them yet.
> > > >
> > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> > > >
> > > > https://github.com/l1k/linux/commits/doe
> > >
> > > Nice, thanks a lot for that.
> > >
> > >
> > >
> > > > The device authentication service afforded here is generic.
> > > > It is up to users and vendors to decide how to employ it,
> > > > be it for "confidential computing" or something else.
> > > >
> > > > Trusted root certificates to validate device certificates can be
> > > > installed into a kernel keyring using the familiar keyctl(1) utility,
> > > > but platform-specific roots of trust (such as a HSM) could be
> > > > supported as well.
> > > >
> > >
> > > This may have been discussed at LPC, but are there any plans to also
> > > support confidential computing flows where the host kernel is not part
> > > of the TCB and would not be trusted for validating the device cert chain
> > > nor for running the SPDM challenge?
> >
> > There are lots of possible models for this. One simple option if the assigned
> > VF supports it is a CMA instance per VF. That will let the guest
> > do full attestation including measurement of whether the device is
> > appropriately locked down so the hypervisor can't mess with
> > configuration that affects the guest (without a reset anyway and that
> > is guest visible).
>
> So the VF would be directly assigned to the guest, and the guest kernel
> would create a CMA instance for the VF, and do the SPDM authentication
> (based on a guest provided trusted root certificate). I think one
> security concern with that approach is assigning the VF to the
> (potentially confidential) guest address space without the guest being
> able to attest of the device trustworthiness first. That's what TDISP is
> aiming at fixing (establish a secure SPDM between the confidential guest
> and the device, lock the device from the guest, attest and then enable
> DMA).

Agreed, TDISP is more comprehensive, but also much more complex with
more moving parts that we don't really have yet.

Depending on your IOMMU design (+ related stuff) and interaction with
the secure guest, you might be able to block any rogue DMA until
after attestation / lock down checks even if the Hypervisor was letting
it through.

>
> > Whether anyone builds that option isn't yet clear
> > though. If they do, Lukas' work should work there as well as for the
> > host OS. (Note I'm not a security expert so may be missing something!)
> >
> > For extra fun, why should the device trust the host? Mutual authentication
> > fun (there are usecases where that matters)
> >
> > There are way more complex options supported in PCIe TDISP (Tee Device
> > security interface protocols). Anyone have an visibility of open solutions
> > that make use of that? May be too new.
>
> It's still a PCI ECN, so quite new indeed.
> FWIW the rust spdm crate [1] implements the TDISP state machine.

Cool. thanks for the reference.
>
> Cheers,
> Samuel.
>
> [1] https://github.com/jyao1/rust-spdm
> >
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> And this is a very special aspect of 'hardening' since it is about hardening a kernel
> under different threat model/assumptions.

I am not sure it's that special in that hardening IMHO is not a specific
threat model or a set of assumptions. IIUC it's just something that
helps reduce severity of vulnerabilities. Similarly, one can use the CC
hardware in a variety of ways I guess. And one way is just that -
hardening linux such that ability to corrupt guest memory does not
automatically escalate into guest code execution.

If you put it this way, you get to participate in a well understood
problem space instead of constantly saying "yes but CC is special". And
further, you will now talk about features as opposed to fixing bugs.
Which will stop annoying people who currently seem annoyed by the
implication that their code is buggy simply because it does not cache in
memory all data read from hardware. Finally, you then don't really need
to explain why e.g. DoS is not a problem but info leak is a problem - when
for many users it's actually the reverse - the reason is not that it's
not part of a threat model - which then makes you work hard to define
the threat model - but simply that CC hardware does not support this
kind of hardening.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote:
> Obviously the host can DoS guest access to the device by modifying
> exchanged messages, but there are much simpler ways for it to
> do that, say, by clearing Bus Master Enable or Memory Space Enable
> bits in the Command Register.

There's a single key per guest though, isn't it? Also used
for regular memory?


--
MST
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > Replying only to the not-so-far addressed points.
> > > >
> > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > Hi Greg,
> > >
> > > <...>
> > >
> > > > > > 3) All the tools are open-source and everyone can start using them right
> > > away
> > > > > even
> > > > > > without any special HW (readme has description of what is needed).
> > > > > > Tools and documentation is here:
> > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > >
> > > > > Again, as our documentation states, when you submit patches based on
> > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > it?
> > > >
> > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > we are submitting a fix that we have to list the way how it has been found.
> > > > We will fix this in the future submissions, but some bugs we have are found
> by
> > > > plain code audit, so 'human' is the tool.
> > >
> > > My problem with that statement is that by applying different threat
> > > model you "invent" bugs which didn't exist in a first place.
> > >
> > > For example, in this [1] latest submission, authors labeled correct
> > > behaviour as "bug".
> > >
> > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > alexander.shishkin@linux.intel.com/
> >
> > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> some
> > other cases we already have fixes or investigating) it represents a correct
> behavior even if
> > you expect that all your pci HW devices are trusted? What about an error in
> two
> > consequent pci reads? What about just some failure that results in erroneous
> input?
>
> I'm not sure you'll get general agreement on those answers for all
> devices and situations; I think for most devices for non-CoCo
> situations, then people are generally OK with a misbehaving PCI device
> causing a kernel crash, since most people are running without IOMMU
> anyway, a misbehaving device can cause otherwise undetectable chaos.

Ok, if this is a consensus within the kernel community, then we can consider
the fixes strictly from the CoCo threat model point of view.

>
> I'd say:
> a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> guarantee forward progress or stop the hypervisor doing something
> truly stupid.

Yes, denial of service is out of scope but I would not pile all crashes as
'safe' automatically. Depending on the crash, it can be used as a
primitive to launch further attacks: privilege escalation, information
disclosure and corruption. It is especially true for memory corruption
issues.

> b) For CoCo, information disclosure, or corruption IS a problem

Agreed, but the path to this can incorporate a number of attack
primitives, as well as bug chaining. So, if the bug is detected, and
fix is easy, instead of thinking about possible implications and its
potential usage in exploit writing, safer to fix it.

>
> c) For non-CoCo some people might care about robustness of the kernel
> against a failing PCI device, but generally I think they worry about
> a fairly clean failure, even in the unexpected-hot unplug case.

Ok.

>
> d) It's not clear to me what 'trust' means in terms of CoCo for a PCIe
> device; if it's a device that attests OK and we trust it is the device
> it says it is, do we give it freedom or are we still wary?

I would say that attestation and established secure channel to an end
device means that we don’t have to employ additional measures to
secure data transfer, as well as we 'trust' a device at least to some degree to
keep our data protected (both from untrusted host and from other
CC guests). I don’t think there is anything else behind this concept.

Best Regards,
Elena
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
>
> > * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > away
> > > > > > even
> > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > > it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > We will fix this in the future submissions, but some bugs we have are found
> > by
> > > > > plain code audit, so 'human' is the tool.
> > > >
> > > > My problem with that statement is that by applying different threat
> > > > model you "invent" bugs which didn't exist in a first place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled correct
> > > > behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > some
> > > other cases we already have fixes or investigating) it represents a correct
> > behavior even if
> > > you expect that all your pci HW devices are trusted? What about an error in
> > two
> > > consequent pci reads? What about just some failure that results in erroneous
> > input?
> >
> > I'm not sure you'll get general agreement on those answers for all
> > devices and situations; I think for most devices for non-CoCo
> > situations, then people are generally OK with a misbehaving PCI device
> > causing a kernel crash, since most people are running without IOMMU
> > anyway, a misbehaving device can cause otherwise undetectable chaos.
>
> Ok, if this is a consensus within the kernel community, then we can consider
> the fixes strictly from the CoCo threat model point of view.
>
> >
> > I'd say:
> > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> > guarantee forward progress or stop the hypervisor doing something
> > truly stupid.
>
> Yes, denial of service is out of scope but I would not pile all crashes as
> 'safe' automatically. Depending on the crash, it can be used as a
> primitive to launch further attacks: privilege escalation, information
> disclosure and corruption. It is especially true for memory corruption
> issues.
>
> > b) For CoCo, information disclosure, or corruption IS a problem
>
> Agreed, but the path to this can incorporate a number of attack
> primitives, as well as bug chaining. So, if the bug is detected, and
> fix is easy, instead of thinking about possible implications and its
> potential usage in exploit writing, safer to fix it.
>
> >
> > c) For non-CoCo some people might care about robustness of the kernel
> > against a failing PCI device, but generally I think they worry about
> > a fairly clean failure, even in the unexpected-hot unplug case.
>
> Ok.

With my other hat as a representative of hardware vendor (at least for
NIC part), who cares about quality of our devices, we don't want to hide
ANY crash related to our devices, especially if it is related to misbehaving
PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
QA/customer support much harder.

Thanks
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Leon Romanovsky (leon@kernel.org) wrote:
> On Thu, Jan 26, 2023 at 05:48:33PM +0000, Reshetova, Elena wrote:
> >
> > > * Reshetova, Elena (elena.reshetova@intel.com) wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start using them right
> > > > > away
> > > > > > > even
> > > > > > > > without any special HW (readme has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches based on
> > > > > > > these tools, you HAVE TO document that. Otherwise we think you all are
> > > > > > > crazy and will get your patches rejected. You all know this, why ignore
> > > > > > > it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in linux kernel when
> > > > > > we are submitting a fix that we have to list the way how it has been found.
> > > > > > We will fix this in the future submissions, but some bugs we have are found
> > > by
> > > > > > plain code audit, so 'human' is the tool.
> > > > >
> > > > > My problem with that statement is that by applying different threat
> > > > > model you "invent" bugs which didn't exist in a first place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled correct
> > > > > behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@linux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled page fault
> > > > (such as in that case) or detection of a KASAN out of bounds violation (as it is in
> > > some
> > > > other cases we already have fixes or investigating) it represents a correct
> > > behavior even if
> > > > you expect that all your pci HW devices are trusted? What about an error in
> > > two
> > > > consequent pci reads? What about just some failure that results in erroneous
> > > input?
> > >
> > > I'm not sure you'll get general agreement on those answers for all
> > > devices and situations; I think for most devices for non-CoCo
> > > situations, then people are generally OK with a misbehaving PCI device
> > > causing a kernel crash, since most people are running without IOMMU
> > > anyway, a misbehaving device can cause otherwise undetectable chaos.
> >
> > Ok, if this is a consensus within the kernel community, then we can consider
> > the fixes strictly from the CoCo threat model point of view.
> >
> > >
> > > I'd say:
> > > a) For CoCo, a guest (guaranteed) crash isn't a problem - CoCo doesn't
> > > guarantee forward progress or stop the hypervisor doing something
> > > truly stupid.
> >
> > Yes, denial of service is out of scope but I would not pile all crashes as
> > 'safe' automatically. Depending on the crash, it can be used as a
> > primitive to launch further attacks: privilege escalation, information
> > disclosure and corruption. It is especially true for memory corruption
> > issues.
> >
> > > b) For CoCo, information disclosure, or corruption IS a problem
> >
> > Agreed, but the path to this can incorporate a number of attack
> > primitives, as well as bug chaining. So, if the bug is detected, and
> > fix is easy, instead of thinking about possible implications and its
> > potential usage in exploit writing, safer to fix it.
> >
> > >
> > > c) For non-CoCo some people might care about robustness of the kernel
> > > against a failing PCI device, but generally I think they worry about
> > > a fairly clean failure, even in the unexpected-hot unplug case.
> >
> > Ok.
>
> With my other hat as a representative of hardware vendor (at least for
> NIC part), who cares about quality of our devices, we don't want to hide
> ANY crash related to our devices, especially if it is related to misbehaving
> PCI HW logic. Any uncontrolled "robustness" hides real issues and makes
> QA/customer support much harder.

Yeh if you're adding new code to be more careful, you want the code to
fail/log the problem, not hide it.
(Although heck, I suspect there are a million apparently working PCI
cards out there that break some spec somewhere).

Dave

> Thanks
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 11:25:21AM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote:
> > Obviously the host can DoS guest access to the device by modifying
> > exchanged messages, but there are much simpler ways for it to
> > do that, say, by clearing Bus Master Enable or Memory Space Enable
> > bits in the Command Register.
>
> There's a single key per guest though, isn't it? Also used
> for regular memory?

The current design is to have a global keyring (per kernel, i.e. per
guest). A device presents a certificate chain and the first certificate
in that chain needs to be signed by one of the certificates on the keyring.

This is completely independent from the key used for memory encryption.

A device can have up to 8 certificate chains (called "slots" in the
SPDM spec) and I've implemented it such that all slots are iterated
and validation is considered to be successful as soon as a slot with
a valid signature is found.

We can discuss having a per-device keyring if anyone thinks it makes
sense.

The PCISIG's idea seems to be that each vendor of PCIe cards publishes
a trusted root certificate and users would then have to keep all those
vendor certificates in their global keyring. This follows from the
last paragraph of PCIe r6.0.1 sec 6.31.3, which says "it is strongly
recommended that authentication requesters [i.e. the kernel] confirm
that the information provided in the Subject Alternative Name entry
[of the device's leaf certificate] is signed by the vendor indicated
by the Vendor ID."

The astute reader will notice that for this to work, the Vendor ID
must be included in the trusted root certificate in a machine-readable
way. Unfortunately the PCIe Base Spec fails to specify that.
So I don't know how to associate a trusted root certificate with a
Vendor ID.

I'll report this and several other gaps I've found in the spec to the
editor at the PCISIG so that they can be filled in a future revision.

Thanks,

Lukas
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 04:07:29PM +0000, Jonathan Cameron wrote:
> On Thu, 26 Jan 2023 14:15:05 +0100
> Samuel Ortiz <sameo@rivosinc.com> wrote:
>
> > On Thu, Jan 26, 2023 at 10:58:47AM +0000, Jonathan Cameron wrote:
> > > On Thu, 26 Jan 2023 10:24:32 +0100
> > > Samuel Ortiz <sameo@rivosinc.com> wrote:
> > >
> > > > Hi Lukas,
> > > >
> > > > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > >
> > > > > [cc += Jonathan Cameron, linux-pci]
> > > > >
> > > > > On Wed, Jan 25, 2023 at 02:57:40PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> > > > > > > Great, so why not have hardware attestation also for your devices you
> > > > > > > wish to talk to? Why not use that as well? Then you don't have to
> > > > > > > worry about anything in the guest.
> > > > > >
> > > > > > There were some talks at Plumbers where PCIe is working on adding that;
> > > > > > it's not there yet though. I think that's PCIe 'Integrity and Data
> > > > > > Encryption' (IDE - sigh), and PCIe 'Security Prtocol and Data Model' -
> > > > > > SPDM. I don't know much of the detail of those, just that they're far
> > > > > > enough off that people aren't depending on them yet.
> > > > >
> > > > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> > > > >
> > > > > https://github.com/l1k/linux/commits/doe
> > > >
> > > > Nice, thanks a lot for that.
> > > >
> > > >
> > > >
> > > > > The device authentication service afforded here is generic.
> > > > > It is up to users and vendors to decide how to employ it,
> > > > > be it for "confidential computing" or something else.
> > > > >
> > > > > Trusted root certificates to validate device certificates can be
> > > > > installed into a kernel keyring using the familiar keyctl(1) utility,
> > > > > but platform-specific roots of trust (such as a HSM) could be
> > > > > supported as well.
> > > > >
> > > >
> > > > This may have been discussed at LPC, but are there any plans to also
> > > > support confidential computing flows where the host kernel is not part
> > > > of the TCB and would not be trusted for validating the device cert chain
> > > > nor for running the SPDM challenge?
> > >
> > > There are lots of possible models for this. One simple option if the assigned
> > > VF supports it is a CMA instance per VF. That will let the guest
> > > do full attestation including measurement of whether the device is
> > > appropriately locked down so the hypervisor can't mess with
> > > configuration that affects the guest (without a reset anyway and that
> > > is guest visible).
> >
> > So the VF would be directly assigned to the guest, and the guest kernel
> > would create a CMA instance for the VF, and do the SPDM authentication
> > (based on a guest provided trusted root certificate). I think one
> > security concern with that approach is assigning the VF to the
> > (potentially confidential) guest address space without the guest being
> > able to attest of the device trustworthiness first. That's what TDISP is
> > aiming at fixing (establish a secure SPDM between the confidential guest
> > and the device, lock the device from the guest, attest and then enable
> > DMA).
>
> Agreed, TDISP is more comprehensive, but also much more complex with
> more moving parts that we don't really have yet.
>
> Depending on your IOMMU design (+ related stuff) and interaction with
> the secure guest, you might be able to block any rogue DMA until
> after attestation / lock down checks even if the Hypervisor was letting
> it through.

Provided that the guest or, in the TDX and AP-TEE cases, the TSM have
protected access to the IOMMU, yes. But then the implementation becomes
platform specific.

Cheers,
Samuel.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 04:44:49PM +0100, Lukas Wunner wrote:
> On Thu, Jan 26, 2023 at 10:24:32AM +0100, Samuel Ortiz wrote:
> > On Wed, Jan 25, 2023 at 11:03 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > CMA/SPDM (PCIe r6.0 sec 6.31) is in active development on this branch:
> > >
> > > https://github.com/l1k/linux/commits/doe
> > >
> > > The device authentication service afforded here is generic.
> > > It is up to users and vendors to decide how to employ it,
> > > be it for "confidential computing" or something else.
> > >
> > > Trusted root certificates to validate device certificates can be
> > > installed into a kernel keyring using the familiar keyctl(1) utility,
> > > but platform-specific roots of trust (such as a HSM) could be
> > > supported as well.
> >
> > This may have been discussed at LPC, but are there any plans to also
> > support confidential computing flows where the host kernel is not part
> > of the TCB and would not be trusted for validating the device cert chain
> > nor for running the SPDM challenge?
>
> As long as a device is passed through to a guest, the guest owns
> that device.

I agree. On a SRIOV setup, the host typically owns the PF and assigns
VFs to the guests. Devices must be enlightened to guarantee that once
one of their VFs/interfaces is passed to a trusted VM, it can no longer
be modified by anything untrusted (e.g. the hypervisor).

> It is the guest's prerogative and duty to perform
> CMA/SPDM authentication on its own behalf. If the guest uses
> memory encryption via TDX or SEV, key material established through
> a Diffie-Hellman exchange between guest and device is invisible
> to the host. Consequently using that key material for IDE encryption
> protects device accesses from the guest against snooping by the host.

On confidential computing platforms where a security manager (e.g.
Intel TDX module) manages the confidential guests, the IDE key
management and stream settings would be handled by this manager. In
other words, the SPDM requester would not be a Linux kernel.
FWIW, Intel recently published an interesting description of TEE-IO
enabling with TDX [1].

> SPDM authentication consists of a sequence of exchanges, the first
> being GET_VERSION. When a responder (=device) receives a GET_VERSION
> request, it resets the connection and all internal state related to
> that connection. (SPDM 1.2.1 margin no 185: "a Requester can issue
> a GET_VERSION to a Responder to reset a connection at any time";
> see also SPDM 1.1.0 margin no 161 for details.)
>
> Thus, even though the host may have authenticated the device,
> once it's passed through to a guest and the guest performs
> authentication again, SPDM state on the device is reset.
>
> I'll amend the patches so that the host refrains from performing
> reauthentication as long as a device is passed through. The host
> has no business mutating SPDM state on the device once ownership
> has passed to the guest.
>
> The first few SPDM exchanges are transmitted in the clear,
> so the host can eavesdrop on the negotiated algorithms,
> exchanged certificates and nonces. However the host cannot
> successfully modify the exchanged data due to the man in the middle
> protection afforded by SPDM: The challenge/response hash is
> computed over the concatenation of the exchanged messages,
> so modification of the messages by a man in the middle leads
> to authentication failure.

Right, I was not concerned by the challenge messages integrity but by
trusting the host with verifying the response and validating the device
cert chains.

> Obviously the host can DoS guest access to the device by modifying
> exchanged messages, but there are much simpler ways for it to
> do that, say, by clearing Bus Master Enable or Memory Space Enable
> bits in the Command Register. DoS attacks from the host against
> the guest cannot be part of the threat model at this point.

Yes, the host can DoS the guest at anytime it wants and in multiple
ways. It's definitely out of the confidential computing thread model at
least.

Cheers,
Samuel.

[1] https://cdrdv2-public.intel.com/742542/software-enabling-for-tdx-tee-io-fixed.pdf
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > And this is a very special aspect of 'hardening' since it is about hardening a
> kernel
> > under different threat model/assumptions.
>
> I am not sure it's that special in that hardening IMHO is not a specific
> threat model or a set of assumptions. IIUC it's just something that
> helps reduce severity of vulnerabilities. Similarly, one can use the CC
> hardware in a variety of ways I guess. And one way is just that -
> hardening linux such that ability to corrupt guest memory does not
> automatically escalate into guest code execution.

I am not sure if I fully follow you on this. I do agree that it is in principle
the same 'hardening' that we have been doing in Linux for decades just
applied to a new attack surface, host <-> guest, vs userspace <->kernel.
Interfaces have changed, but the types of vulnerabilities, etc are the same.
The attacker model is somewhat different because we have
different expectations on what host/hypervisor should be able to do
to the guest (following business reasons and use-cases), versus what we
expect normal userspace being able to "do" towards kernel. The host and
hypervisor still has a lot of control over the guest (ability to start/stop it,
manage its resources, etc). But the reasons behind this doesn’t come
from the fact that security CoCo HW not being able to support this stricter
security model (it cannot now indeed, but this is a design decision), but
from the fact that it is important for Cloud service providers to retain that
level of control over their infrastructure.

>
> If you put it this way, you get to participate in a well understood
> problem space instead of constantly saying "yes but CC is special". And
> further, you will now talk about features as opposed to fixing bugs.
> Which will stop annoying people who currently seem annoyed by the
> implication that their code is buggy simply because it does not cache in
> memory all data read from hardware. Finally, you then don't really need
> to explain why e.g. DoS is not a problem but info leak is a problem - when
> for many users it's actually the reverse - the reason is not that it's
> not part of a threat model - which then makes you work hard to define
> the threat model - but simply that CC hardware does not support this
> kind of hardening.

But this won't be correct statement, because it is not limitation of HW, but the
threat and business model that Confidential Computing exists in. I am not
aware of a single cloud provider who would be willing to use the HW that
takes the full control of their infrastructure and running confidential guests,
leaving them with no mechanisms to control the load balancing, enforce
resource usage, etc. So, given that nobody needs/willing to use such HW,
such HW simply doesn’t exist.

So, I would still say that the model we operate in CoCo usecases is somewhat
special, but I do agree that given that we list a couple of these special assumptions
(over which ones we have no control or ability to influence, none of us are business
people), then the rest becomes just careful enumeration of attack surface interfaces
and break up of potential mitigations.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 04:13:11PM +0100, Richard Weinberger wrote:
> On Thu, Jan 26, 2023 at 3:58 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Richard Weinberger (richard.weinberger@gmail.com) wrote:
> > > On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrang? <berrange@redhat.com> wrote:
> > Are you aware of anything that you'd use instead?
>
> Well, I'd think towards iSCSI over TLS to protect the IO transport.

In the context of confidential computing this makes only sense if the
scsi target is part of the trusted base, which means it needs to be
attested and protected against outside attacks. Currently all CoCo
implementations I know of treat disk storage as untrusted.

Besides that the same problems exist with a VMs encrypted memory. The
hardware does not guarantee that the HV can not fiddle with your private
memory, it only guarantees that you can detect such fiddling and that
the private data is encrypted. The HV can also still trace memory access
patterns of confidential guests by setting the right permissions in the
nested page table.

So storage and memory of a CoCo VM have in common that the transport is
not secure, but there are measures to detect if someone fiddles with
your data on the transport or at rest, for memory implemented in
hardware, and for storage in software by using dm-crypt together with
dm-verity or dm-integrity.

Regards,

Joerg
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 02:30:19PM +0200, Leon Romanovsky wrote:
> This is exactly what I said. You presented me the cases which exist in
> your invented world. Mentioned unhandled page fault doesn't exist in real
> world. If PCI device doesn't work, it needs to be replaced/blocked and not
> left to be operable and accessible from the kernel/user.

Believe it or not, this "invented" world is already part of the real
world, and will become even more in the future.

So this has been stated elsewhere in the thread already, but I also like
to stress that hiding misbehavior of devices (real or emulated) is not
the goal of this work.

In fact, the best action for a CoCo guest in case it detects a
(possible) attack is to stop whatever it is doing and crash. And a
misbehaving device in a CoCo guest is a possible attack.

But what needs to be prevented at all costs is undefined behavior in the
CoCo guest that is triggerable by the HV, e.g. by letting an emulated
device misbehave. That undefined behavior can lead to information leak,
which is a way bigger problem for a guest owner than a crashed VM.

Regards,

Joerg
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > And this is a very special aspect of 'hardening' since it is about hardening a
> > kernel
> > > under different threat model/assumptions.
> >
> > I am not sure it's that special in that hardening IMHO is not a specific
> > threat model or a set of assumptions. IIUC it's just something that
> > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > hardware in a variety of ways I guess. And one way is just that -
> > hardening linux such that ability to corrupt guest memory does not
> > automatically escalate into guest code execution.
>
> I am not sure if I fully follow you on this. I do agree that it is in principle
> the same 'hardening' that we have been doing in Linux for decades just
> applied to a new attack surface, host <-> guest, vs userspace <->kernel.

Sorry about being unclear this is not the type of hardening I meant
really. The "hardening" you meant is preventing kernel vulnerabilities,
right? This is what we've been doing for decades.
But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
we are trying to reduce a chance a vulnerability causes random
code execution as opposed to a DOS. To think in these terms you do not
need to think about attack surfaces - in the system including
a hypervisor, guest supervisor and guest userspace hiding
one component from others is helpful even if they share
a privelege level.



> Interfaces have changed, but the types of vulnerabilities, etc are the same.
> The attacker model is somewhat different because we have
> different expectations on what host/hypervisor should be able to do
> to the guest (following business reasons and use-cases), versus what we
> expect normal userspace being able to "do" towards kernel. The host and
> hypervisor still has a lot of control over the guest (ability to start/stop it,
> manage its resources, etc). But the reasons behind this doesn’t come
> from the fact that security CoCo HW not being able to support this stricter
> security model (it cannot now indeed, but this is a design decision), but
> from the fact that it is important for Cloud service providers to retain that
> level of control over their infrastructure.

Surely they need ability to control resource usage, not ability to execute DOS
attacks. Current hardware just does not have ability to allow the former
without the later.

> >
> > If you put it this way, you get to participate in a well understood
> > problem space instead of constantly saying "yes but CC is special". And
> > further, you will now talk about features as opposed to fixing bugs.
> > Which will stop annoying people who currently seem annoyed by the
> > implication that their code is buggy simply because it does not cache in
> > memory all data read from hardware. Finally, you then don't really need
> > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > for many users it's actually the reverse - the reason is not that it's
> > not part of a threat model - which then makes you work hard to define
> > the threat model - but simply that CC hardware does not support this
> > kind of hardening.
>
> But this won't be correct statement, because it is not limitation of HW, but the
> threat and business model that Confidential Computing exists in. I am not
> aware of a single cloud provider who would be willing to use the HW that
> takes the full control of their infrastructure and running confidential guests,
> leaving them with no mechanisms to control the load balancing, enforce
> resource usage, etc. So, given that nobody needs/willing to use such HW,
> such HW simply doesn’t exist.
>
> So, I would still say that the model we operate in CoCo usecases is somewhat
> special, but I do agree that given that we list a couple of these special assumptions
> (over which ones we have no control or ability to influence, none of us are business
> people), then the rest becomes just careful enumeration of attack surface interfaces
> and break up of potential mitigations.
>
> Best Regards,
> Elena.
>

I'd say each business has a slightly different business model, no?
Finding common ground is what helps us share code ...

--
MST
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Wed, Jan 25, 2023 at 3:22 PM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> > Any virtual device exposed to the guest that can transfer potentially
> > sensitive data needs to have some form of guest controlled encryption
> > applied. For disks this is easy with FDE like LUKS, for NICs this is
> > already best practice for services by using TLS. Other devices may not
> > have good existing options for applying encryption.
>
> I disagree wrt. LUKS. The cryptography behind LUKS protects persistent data
> but not transport. If an attacker can observe all IO you better
> consult a cryptographer.
> LUKS has no concept of session keys or such, so the same disk sector will
> always get encrypted with the very same key/iv.

I guess you are referring to the aes-xts-plain64 mode of LUKS operation or
to LUKS in general? Different modes of operation (including AEAD modes)
can provide different levels of protection, so I would not state it so generally.
But the point you raised is good to discuss through: XTS for example is a confidentiality mode,
based on a concept of tweakable blockcipher, designed as you pointed out
with disk encryption use case in mind. It does have a bunch of limitations/
weaknesses that are known (good classical reference I can suggest on this is [1]),
but as any blockcipher mode its confidentiality guarantees are evaluated in terms
of security against a chosen ciphertext attack (CCA) where an adversary has an access to both
encryption and decryption oracle (he can perform encryptions and decryptions
of plaintexts/cyphertexts of his liking up to the allowed number of queries).
This is a very powerful attack model which to me seems to cover the model
of untrusted host/VMM being able to observe disk reads/writes.

Also, if I remember right, the disk encryption also assumes that the disk operations are fully visible
to the attacker, i.e. he can see all encrypted data on the disk, observe how it changes
when a new block is written, etc. So, where do we have a change in an attacker model here?
What am I missing?

What AES XTS was never designed to do is an integrity protection (only some very limited
malleability): it is not AEAD mode, it doesn’t also provide a replay protection. So, the same
limitations are going to apply in our case also.

Best Regards,
Elena.

[1] Chapter 6. XTS mode, https://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > kernel
> > > > under different threat model/assumptions.
> > >
> > > I am not sure it's that special in that hardening IMHO is not a specific
> > > threat model or a set of assumptions. IIUC it's just something that
> > > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > > hardware in a variety of ways I guess. And one way is just that -
> > > hardening linux such that ability to corrupt guest memory does not
> > > automatically escalate into guest code execution.
> >
> > I am not sure if I fully follow you on this. I do agree that it is in principle
> > the same 'hardening' that we have been doing in Linux for decades just
> > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
>
> Sorry about being unclear this is not the type of hardening I meant
> really. The "hardening" you meant is preventing kernel vulnerabilities,
> right? This is what we've been doing for decades.
> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> we are trying to reduce a chance a vulnerability causes random
> code execution as opposed to a DOS. To think in these terms you do not
> need to think about attack surfaces - in the system including
> a hypervisor, guest supervisor and guest userspace hiding
> one component from others is helpful even if they share
> a privelege level.

Do you mean that the fact that CoCo guest has memory encrypted
can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
to grasp your idea fully. When the privilege level is shared, there is no
incentive to perform privilege escalation attacks across components,
so why hide them from each other? Data protection? But I don’t think you
are talking about this? I do agree that KASLR is stronger when you remove
the possibility to read the memory (make sure kernel code is execute only)
you are trying to attack, but again not sure if you mean this.

>
>
>
> > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > The attacker model is somewhat different because we have
> > different expectations on what host/hypervisor should be able to do
> > to the guest (following business reasons and use-cases), versus what we
> > expect normal userspace being able to "do" towards kernel. The host and
> > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > manage its resources, etc). But the reasons behind this doesn’t come
> > from the fact that security CoCo HW not being able to support this stricter
> > security model (it cannot now indeed, but this is a design decision), but
> > from the fact that it is important for Cloud service providers to retain that
> > level of control over their infrastructure.
>
> Surely they need ability to control resource usage, not ability to execute DOS
> attacks. Current hardware just does not have ability to allow the former
> without the later.

I don’t see why it cannot be added to HW if requirement comes. However, I think
in cloud provider world being able to control resources equals to being able
to deny these resources when required, so being able to denial of service its clients
is kind of build-in expectation that everyone just agrees on.

>
> > >
> > > If you put it this way, you get to participate in a well understood
> > > problem space instead of constantly saying "yes but CC is special". And
> > > further, you will now talk about features as opposed to fixing bugs.
> > > Which will stop annoying people who currently seem annoyed by the
> > > implication that their code is buggy simply because it does not cache in
> > > memory all data read from hardware. Finally, you then don't really need
> > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > for many users it's actually the reverse - the reason is not that it's
> > > not part of a threat model - which then makes you work hard to define
> > > the threat model - but simply that CC hardware does not support this
> > > kind of hardening.
> >
> > But this won't be correct statement, because it is not limitation of HW, but the
> > threat and business model that Confidential Computing exists in. I am not
> > aware of a single cloud provider who would be willing to use the HW that
> > takes the full control of their infrastructure and running confidential guests,
> > leaving them with no mechanisms to control the load balancing, enforce
> > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > such HW simply doesn’t exist.
> >
> > So, I would still say that the model we operate in CoCo usecases is somewhat
> > special, but I do agree that given that we list a couple of these special
> assumptions
> > (over which ones we have no control or ability to influence, none of us are
> business
> > people), then the rest becomes just careful enumeration of attack surface
> interfaces
> > and break up of potential mitigations.
> >
> > Best Regards,
> > Elena.
> >
>
> I'd say each business has a slightly different business model, no?
> Finding common ground is what helps us share code ...

Fully agree, and a good discussion with everyone willing to listen and cooperate
can go a long way into defining the best implementation.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Fri, Jan 27, 2023 at 12:25:09PM +0000, Reshetova, Elena wrote:
>
> > On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
> > > > > And this is a very special aspect of 'hardening' since it is about hardening a
> > > > kernel
> > > > > under different threat model/assumptions.
> > > >
> > > > I am not sure it's that special in that hardening IMHO is not a specific
> > > > threat model or a set of assumptions. IIUC it's just something that
> > > > helps reduce severity of vulnerabilities. Similarly, one can use the CC
> > > > hardware in a variety of ways I guess. And one way is just that -
> > > > hardening linux such that ability to corrupt guest memory does not
> > > > automatically escalate into guest code execution.
> > >
> > > I am not sure if I fully follow you on this. I do agree that it is in principle
> > > the same 'hardening' that we have been doing in Linux for decades just
> > > applied to a new attack surface, host <-> guest, vs userspace <->kernel.
> >
> > Sorry about being unclear this is not the type of hardening I meant
> > really. The "hardening" you meant is preventing kernel vulnerabilities,
> > right? This is what we've been doing for decades.
> > But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
> > we are trying to reduce a chance a vulnerability causes random
> > code execution as opposed to a DOS. To think in these terms you do not
> > need to think about attack surfaces - in the system including
> > a hypervisor, guest supervisor and guest userspace hiding
> > one component from others is helpful even if they share
> > a privelege level.
>
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios?

Yes.

> I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other?

Because limiting horisontal movement between components is still valuable.

> Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this.

It's an example. If kernel was 100% secure we won't need KASLR. Nothing
ever is though.

> >
> >
> >
> > > Interfaces have changed, but the types of vulnerabilities, etc are the same.
> > > The attacker model is somewhat different because we have
> > > different expectations on what host/hypervisor should be able to do
> > > to the guest (following business reasons and use-cases), versus what we
> > > expect normal userspace being able to "do" towards kernel. The host and
> > > hypervisor still has a lot of control over the guest (ability to start/stop it,
> > > manage its resources, etc). But the reasons behind this doesn’t come
> > > from the fact that security CoCo HW not being able to support this stricter
> > > security model (it cannot now indeed, but this is a design decision), but
> > > from the fact that it is important for Cloud service providers to retain that
> > > level of control over their infrastructure.
> >
> > Surely they need ability to control resource usage, not ability to execute DOS
> > attacks. Current hardware just does not have ability to allow the former
> > without the later.
>
> I don’t see why it cannot be added to HW if requirement comes. However, I think
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.
>
> >
> > > >
> > > > If you put it this way, you get to participate in a well understood
> > > > problem space instead of constantly saying "yes but CC is special". And
> > > > further, you will now talk about features as opposed to fixing bugs.
> > > > Which will stop annoying people who currently seem annoyed by the
> > > > implication that their code is buggy simply because it does not cache in
> > > > memory all data read from hardware. Finally, you then don't really need
> > > > to explain why e.g. DoS is not a problem but info leak is a problem - when
> > > > for many users it's actually the reverse - the reason is not that it's
> > > > not part of a threat model - which then makes you work hard to define
> > > > the threat model - but simply that CC hardware does not support this
> > > > kind of hardening.
> > >
> > > But this won't be correct statement, because it is not limitation of HW, but the
> > > threat and business model that Confidential Computing exists in. I am not
> > > aware of a single cloud provider who would be willing to use the HW that
> > > takes the full control of their infrastructure and running confidential guests,
> > > leaving them with no mechanisms to control the load balancing, enforce
> > > resource usage, etc. So, given that nobody needs/willing to use such HW,
> > > such HW simply doesn’t exist.
> > >
> > > So, I would still say that the model we operate in CoCo usecases is somewhat
> > > special, but I do agree that given that we list a couple of these special
> > assumptions
> > > (over which ones we have no control or ability to influence, none of us are
> > business
> > > people), then the rest becomes just careful enumeration of attack surface
> > interfaces
> > > and break up of potential mitigations.
> > >
> > > Best Regards,
> > > Elena.
> > >
> >
> > I'd say each business has a slightly different business model, no?
> > Finding common ground is what helps us share code ...
>
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation.
>
> Best Regards,
> Elena.

Right. My point was that trying to show how CC usecases are similar to other
existing ones will be more helpful for everyone than just focusing on how they
are different. I hope I was able to show some similarities.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, Jan 26, 2023 at 01:28:15PM +0000, Reshetova, Elena wrote:
> > This is exactly what I said. You presented me the cases which exist in
> > your invented world. Mentioned unhandled page fault doesn't exist in real
> > world. If PCI device doesn't work, it needs to be replaced/blocked and not
> > left to be operable and accessible from the kernel/user.
>
> Can we really assure correct operation of *all* pci devices out there?
> How would such an audit be performed given a huge set of them available?
> Isnt it better instead to make a small fix in the kernel behavior that would guard
> us from such potentially not correctly operating devices?

We assume that hardware works according to the spec; that's why we
have a specification. Otherwise, things would be pretty insane, and
would lead to massive bloat *everywhere*. If there are broken PCI
devices out there, then we can blacklist the PCI device. If a
manufacturer is consistently creating devices which don't obey the
spec, we could block all devices from that manufacturer, and have an
explicit white list for those devices from that manufacturer that
actually work.

If we can't count on a floating point instruction to return the right
value, what are we supposed to do? Create a code which double checks
every single floating point instruction just in case 2 + 2 = 3.99999999? :-)

Ultimately, changing the trust boundary what is considered is a
fundamentally hard thing, and to try to claim that code that assumes
that things inside the trust boundary are, well, trusted, is not a
great way to win friends and influence people.

> Let's forget the trust angle here (it only applies to the Confidential Computing
> threat model and you clearly implying the existing threat model instead) and stick just to
> the not-correctly operating device. What you are proposing is to fix *unknown* bugs
> in multitude of pci devices that (in case of this particular MSI bug) can
> lead to two different values being read from the config space and kernel incorrectly
> handing this situation.

I don't think that's what people are saying. If there are buggy PCI
devices, we can put them on block lists. But checking that every
single read from the config space is unchanged is not something we
should do, period.

> Isn't it better to do the clear fix in one place to ensure such
> situation (two subsequent reads with different values) cannot even happen in theory?
> In security we have a saying that fixing a root cause of the problem is the most efficient
> way to mitigate the problem. The root cause here is a double-read with different values,
> so if it can be substituted with an easy and clear patch that probably even improves
> performance as we do one less pci read and use cached value instead, where is the
> problem in this particular case? If there are technical issues with the patch, of course we
> need to discuss it/fix it, but it seems we are arguing here about whenever or not we want
> to be fixing kernel code when we notice such cases...

Well, if there is a performance win to cache a read from config space,
then make the argument from a performance perspective. But caching
values takes memory, and will potentially bloat data structures. It's
not necessarily cost-free to caching every single config space
variable to prevent double-read from either buggy or malicious devices.

So it's one thing if we make each decision from a cost-benefit
perspective. But then it's a *optimization*, not a *bug-fix*, and it
also means that we aren't obligated to cache every single read from
config space, lest someone wag their fingers at us saying, "Buggy!
Your code is Buggy!".

Cheers,

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > >
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > wrote:
> > > > > > > Hi Greg,
> > > >
> > > > <...>
> > > >
> > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > using them right away even without any special HW (readme
> > > > > > > has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > >
> > > > > > Again, as our documentation states, when you submit patches
> > > > > > based on these tools, you HAVE TO document that.  Otherwise
> > > > > > we think you all are crazy and will get your patches
> > > > > > rejected.  You all know this, why ignore it?
> > > > >
> > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > linux kernel when we are submitting a fix that we have to
> > > > > list the way how it has been found. We will fix this in the
> > > > > future submissions, but some bugs we have are found by
> > > > > plain code audit, so 'human' is the tool.
> > > > My problem with that statement is that by applying different
> > > > threat model you "invent" bugs which didn't exist in a first
> > > > place.
> > > >
> > > > For example, in this [1] latest submission, authors labeled
> > > > correct behaviour as "bug".
> > > >
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@linux.intel.com/
> > >
> > > Hm.. Does everyone think that when kernel dies with unhandled
> > > page fault (such as in that case) or detection of a KASAN out of
> > > bounds violation (as it is in some other cases we already have
> > > fixes or investigating) it represents a correct behavior even if
> > > you expect that all your pci HW devices are trusted?
> >
> > This is exactly what I said. You presented me the cases which exist
> > in your invented world. Mentioned unhandled page fault doesn't
> > exist in real world. If PCI device doesn't work, it needs to be
> > replaced/blocked and not left to be operable and accessible from
> > the kernel/user.
>
> Can we really assure correct operation of *all* pci devices out
> there? How would such an audit be performed given a huge set of them
> available? Isnt it better instead to make a small fix in the kernel
> behavior that would guard us from such potentially not correctly
> operating devices?

I think this is really the wrong question from the confidential
computing (CC) point of view. The question shouldn't be about assuring
that the PCI device is operating completely correctly all the time (for
some value of correct). It's if it were programmed to be malicious
what could it do to us? If we take all DoS and Crash outcomes off the
table (annoying but harmless if they don't reveal the confidential
contents), we're left with it trying to extract secrets from the
confidential environment.

The big threat from most devices (including the thunderbolt classes) is
that they can DMA all over memory. However, this isn't really a threat
in CC (well until PCI becomes able to do encrypted DMA) because the
device has specific unencrypted buffers set aside for the expected DMA.
If it writes outside that CC integrity will detect it and if it reads
outside that it gets unintelligible ciphertext. So we're left with the
device trying to trick secrets out of us by returning unexpected data.

If I set this as the problem, verifying device correct operation is a
possible solution (albeit hugely expensive), but there are likely many
other cheaper ways to defeat or detect a device trying to trick us into
revealing something.

James
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 1/27/23 6:25 AM, Reshetova, Elena wrote:
>
>> On Fri, Jan 27, 2023 at 08:52:22AM +0000, Reshetova, Elena wrote:
>>>> On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena wrote:
>>>>> And this is a very special aspect of 'hardening' since it is about hardening a
>>>> kernel
>>>>> under different threat model/assumptions.
>>>>
>>>> I am not sure it's that special in that hardening IMHO is not a specific
>>>> threat model or a set of assumptions. IIUC it's just something that
>>>> helps reduce severity of vulnerabilities. Similarly, one can use the CC
>>>> hardware in a variety of ways I guess. And one way is just that -
>>>> hardening linux such that ability to corrupt guest memory does not
>>>> automatically escalate into guest code execution.
>>>
>>> I am not sure if I fully follow you on this. I do agree that it is in principle
>>> the same 'hardening' that we have been doing in Linux for decades just
>>> applied to a new attack surface, host <-> guest, vs userspace <->kernel.
>>
>> Sorry about being unclear this is not the type of hardening I meant
>> really. The "hardening" you meant is preventing kernel vulnerabilities,
>> right? This is what we've been doing for decades.
>> But I meant slightly newer things like e.g. KASLR or indeed ASLR generally -
>> we are trying to reduce a chance a vulnerability causes random
>> code execution as opposed to a DOS. To think in these terms you do not
>> need to think about attack surfaces - in the system including
>> a hypervisor, guest supervisor and guest userspace hiding
>> one component from others is helpful even if they share
>> a privelege level.
>
> Do you mean that the fact that CoCo guest has memory encrypted
> can help even in non-CoCo scenarios? I am sorry, I still seem not to be able
> to grasp your idea fully. When the privilege level is shared, there is no
> incentive to perform privilege escalation attacks across components,
> so why hide them from each other? Data protection? But I don’t think you
> are talking about this? I do agree that KASLR is stronger when you remove
> the possibility to read the memory (make sure kernel code is execute only)
> you are trying to attack, but again not sure if you mean this.
>
>>
>>
>>
>>> Interfaces have changed, but the types of vulnerabilities, etc are the same.
>>> The attacker model is somewhat different because we have
>>> different expectations on what host/hypervisor should be able to do
>>> to the guest (following business reasons and use-cases), versus what we
>>> expect normal userspace being able to "do" towards kernel. The host and
>>> hypervisor still has a lot of control over the guest (ability to start/stop it,
>>> manage its resources, etc). But the reasons behind this doesn’t come
>>> from the fact that security CoCo HW not being able to support this stricter
>>> security model (it cannot now indeed, but this is a design decision), but
>>> from the fact that it is important for Cloud service providers to retain that
>>> level of control over their infrastructure.
>>
>> Surely they need ability to control resource usage, not ability to execute DOS
>> attacks. Current hardware just does not have ability to allow the former
>> without the later.
>
> I don’t see why it cannot be added to HW if requirement comes. However, I think
> in cloud provider world being able to control resources equals to being able
> to deny these resources when required, so being able to denial of service its clients
> is kind of build-in expectation that everyone just agrees on.
>

Just a thought, but I wouldn't discard availability guarantees like that
at some point. As a client I would certainly like it, and if it's good
for business...

>>
>>>>
>>>> If you put it this way, you get to participate in a well understood
>>>> problem space instead of constantly saying "yes but CC is special". And
>>>> further, you will now talk about features as opposed to fixing bugs.
>>>> Which will stop annoying people who currently seem annoyed by the
>>>> implication that their code is buggy simply because it does not cache in
>>>> memory all data read from hardware. Finally, you then don't really need
>>>> to explain why e.g. DoS is not a problem but info leak is a problem - when
>>>> for many users it's actually the reverse - the reason is not that it's
>>>> not part of a threat model - which then makes you work hard to define
>>>> the threat model - but simply that CC hardware does not support this
>>>> kind of hardening.
>>>
>>> But this won't be correct statement, because it is not limitation of HW, but the
>>> threat and business model that Confidential Computing exists in. I am not
>>> aware of a single cloud provider who would be willing to use the HW that
>>> takes the full control of their infrastructure and running confidential guests,
>>> leaving them with no mechanisms to control the load balancing, enforce
>>> resource usage, etc. So, given that nobody needs/willing to use such HW,
>>> such HW simply doesn’t exist.
>>>
>>> So, I would still say that the model we operate in CoCo usecases is somewhat
>>> special, but I do agree that given that we list a couple of these special
>> assumptions
>>> (over which ones we have no control or ability to influence, none of us are
>> business
>>> people), then the rest becomes just careful enumeration of attack surface
>> interfaces
>>> and break up of potential mitigations.
>>>
>>> Best Regards,
>>> Elena.
>>>
>>
>> I'd say each business has a slightly different business model, no?
>> Finding common ground is what helps us share code ...
>
> Fully agree, and a good discussion with everyone willing to listen and cooperate
> can go a long way into defining the best implementation.
>
> Best Regards,
> Elena.

Thanks for sharing the threat model with the list!

Carlos
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > > wrote:
> > > > > > Replying only to the not-so-far addressed points.
> > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > > wrote:
> > > > > > > > Hi Greg,
> > > > >
> > > > > <...>
> > > > >
> > > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > > using them right away even without any special HW (readme
> > > > > > > > has description of what is needed).
> > > > > > > > Tools and documentation is here:
> > > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > >
> > > > > > > Again, as our documentation states, when you submit patches
> > > > > > > based on these tools, you HAVE TO document that.  Otherwise
> > > > > > > we think you all are crazy and will get your patches
> > > > > > > rejected.  You all know this, why ignore it?
> > > > > >
> > > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > > linux kernel when we are submitting a fix that we have to
> > > > > > list the way how it has been found. We will fix this in the
> > > > > > future submissions, but some bugs we have are found by
> > > > > > plain code audit, so 'human' is the tool.
> > > > > My problem with that statement is that by applying different
> > > > > threat model you "invent" bugs which didn't exist in a first
> > > > > place.
> > > > >
> > > > > For example, in this [1] latest submission, authors labeled
> > > > > correct behaviour as "bug".
> > > > >
> > > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > > alexander.shishkin@linux.intel.com/
> > > >
> > > > Hm.. Does everyone think that when kernel dies with unhandled
> > > > page fault (such as in that case) or detection of a KASAN out of
> > > > bounds violation (as it is in some other cases we already have
> > > > fixes or investigating) it represents a correct behavior even if
> > > > you expect that all your pci HW devices are trusted?
> > >
> > > This is exactly what I said. You presented me the cases which exist
> > > in your invented world. Mentioned unhandled page fault doesn't
> > > exist in real world. If PCI device doesn't work, it needs to be
> > > replaced/blocked and not left to be operable and accessible from
> > > the kernel/user.
> >
> > Can we really assure correct operation of *all* pci devices out
> > there? How would such an audit be performed given a huge set of them
> > available? Isnt it better instead to make a small fix in the kernel
> > behavior that would guard us from such potentially not correctly
> > operating devices?
>
> I think this is really the wrong question from the confidential
> computing (CC) point of view. The question shouldn't be about assuring
> that the PCI device is operating completely correctly all the time (for
> some value of correct). It's if it were programmed to be malicious
> what could it do to us?

Sure, but Leon didn’t agree with CC threat model to begin with, so
I was trying to argue here how this fix can be useful for non-CC threat
model case. But obviously my argument for non-CC case wasn't good (
especially reading Ted's reply here
https://lore.kernel.org/all/Y9Lonw9HzlosUPnS@mit.edu/ ), so I better
stick to CC threat model case indeed.

>If we take all DoS and Crash outcomes off the
> table (annoying but harmless if they don't reveal the confidential
> contents), we're left with it trying to extract secrets from the
> confidential environment.

Yes, this is the ultimate end goal.

>
> The big threat from most devices (including the thunderbolt classes) is
> that they can DMA all over memory. However, this isn't really a threat
> in CC (well until PCI becomes able to do encrypted DMA) because the
> device has specific unencrypted buffers set aside for the expected DMA.
> If it writes outside that CC integrity will detect it and if it reads
> outside that it gets unintelligible ciphertext. So we're left with the
> device trying to trick secrets out of us by returning unexpected data.

Yes, by supplying the input that hasn’t been expected. This is exactly
the case we were trying to fix here for example:
https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
I do agree that this case is less severe when others where memory
corruption/buffer overrun can happen, like here:
https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
But we are trying to fix all issues we see now (prioritizing the second ones
though).

>
> If I set this as the problem, verifying device correct operation is a
> possible solution (albeit hugely expensive) but there are likely many
> other cheaper ways to defeat or detect a device trying to trick us into
> revealing something.

What do you have in mind here for the actual devices we need to enable for CC cases?
We have been using here a combination of extensive fuzzing and static code analysis.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-01-25 at 14:13 UTC, Daniel P. Berrangé <berrange@redhat.com> wrote...
> On Wed, Jan 25, 2023 at 01:42:53PM +0000, Dr. David Alan Gilbert wrote:
>> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
>> > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena wrote:
>> > > Hi Greg,
>> > >
>> > > You mentioned couple of times (last time in this recent thread:
>> > > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
>> > > discussing the updated threat model for kernel, so this email is a start in this direction.
>> >
>> > Any specific reason you didn't cc: the linux-hardening mailing list?
>> > This seems to be in their area as well, right?
>> >
>> > > As we have shared before in various lkml threads/conference presentations
>> > > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
>> > > change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
>> >
>> > That is, frankly, a very funny threat model. How realistic is it really
>> > given all of the other ways that a hypervisor can mess with a guest?
>>
>> It's what a lot of people would like; in the early attempts it was easy
>> to defeat, but in TDX and SEV-SNP the hypervisor has a lot less that it
>> can mess with - remember that not just the memory is encrypted, so is
>> the register state, and the guest gets to see changes to mapping and a
>> lot of control over interrupt injection etc.
>>
>> > So what do you actually trust here? The CPU? A device? Nothing?
>>
>> We trust the actual physical CPU, provided that it can prove that it's a
>> real CPU with the CoCo hardware enabled. Both the SNP and TDX hardware
>> can perform an attestation signed by the CPU to prove to someone
>> external that the guest is running on a real trusted CPU.
>>
>> Note that the trust is limited:
>> a) We don't trust that we can make forward progress - if something
>> does something bad it's OK for the guest to stop.
>> b) We don't trust devices, and we don't trust them by having the guest
>> do normal encryption; e.g. just LUKS on the disk and normal encrypted
>> networking. [There's a lot of schemes people are working on about how
>> the guest gets the keys etc for that)
>
> I think we need to more precisely say what we mean by 'trust' as it
> can have quite a broad interpretation.
>
> As a baseline requirement, in the context of confidential computing the
> guest would not trust the hypervisor with data that needs to remain
> confidential, but would generally still expect it to provide a faithful
> implementation of a given device.

... or to have a reliable faulting behaviour (e.g. panic) if the device is
found to be malicious, e.g. attempting to inject bogus data in the driver to
trigger unexpected paths in the guest kernel.

I think that part of the original discussion is really about being able to
do that at least for the small subset of (mostly virtio) devices that would
typically be of use in a CoCo setup.

As was pointed out elsewhere in that thread, doing so for physical devices,
to the point of enabling end-to-end attestation and encryption, is work that
is presently underway, but there is work to do already with the
comparatively small subset of devices we need in the short-term. Also, that
work needs only the Linux kernel community, whereas changes for example at
the PCI level are much broader, and therefore require a lot more time.

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
Hi Elena,

On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote...
> Hi Greg,
>
> You mentioned couple of times (last time in this recent thread:
> https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> discussing the updated threat model for kernel, so this email is a start in this direction.
>
> (Note: I tried to include relevant people from different companies, as well as linux-coco
> mailing list, but I hope everyone can help by including additional people as needed).
>
> As we have shared before in various lkml threads/conference presentations
> ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> This is a big change in the threat model and requires both careful assessment of the
> new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> and security validation techniques. This is the activity that we have started back at Intel
> and the current status can be found in
>
> 1) Threat model and potential mitigations:
> https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html

I only looked at this one so far. Here are a few quick notes:

DoS attacks are out of scope. What about timing attacks, which were the
basis of some of the most successful attacks in the past years? My
understanding is that TDX relies on existing mitigations, and does not
introduce anythign new in that space. Worth mentioning in that "out of
scope" section IMO.

Why are TDVMCALL hypercalls listed as an "existing" communication interface?
That seems to exclude the TDX module from the TCB. Also, "shared memory for
I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing
attacks, network or storage attacks, or devices passed through to the guest.
The latter category seems important to list, since there are separate
efforts to provide confidential computing capabilities e.g. to PCI devices,
which were discussed elsewhere in this thread.

I suspect that my question above is due to ambiguous wording. What I
initially read as "this is out of scope for TDX" morphs in the next
paragraph into "we are going to explain how to mitigate attacks through
TDVMCALLS and shared memory for I/O". Consider rewording to clarify the
intent of these paragraphs.

Nit: I suggest adding bullets to the items below "between host/VMM and the
guest"

You could count the "unique code locations" that can consume malicious input
in drivers, why not in core kernel? I think you write elsewhere that the
drivers account for the vast majority, so I suspect you have the numbers.

"The implementation of the #VE handler is simple and does not require an
in-depth security audit or fuzzing since it is not the actual consumer of
the host/VMM supplied untrusted data": The assumption there seems to be that
the host will never be able to supply data (e.g. through a bounce buffer)
that it can trick the guest into executing. If that is indeed the
assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
since many earlier attacks were based on executing the wrong code. Notably,
it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
key (as opposed to any device key e.g. for PCI encryption) in either
TDX or SEV. Is there for example anything that precludes TDX or SEV from
executing code in the bounce buffers?

"We only care about users that read from MMIO": Why? My guess is that this
is the only way bad data could be fed to the guest. But what if a bad MMIO
write due to poisoned data injected earlier was a necessary step to open the
door to a successful attack?


>
> 2) One of the described in the above doc mitigations is "hardening of the enabled
> code". What we mean by this, as well as techniques that are being used are
> described in this document:
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>
> 3) All the tools are open-source and everyone can start using them right away even
> without any special HW (readme has description of what is needed).
> Tools and documentation is here:
> https://github.com/intel/ccc-linux-guest-hardening
>
> 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> here: https://github.com/intel/tdx/commits/guest-next
>
> So, my main question before we start to argue about the threat model, mitigations, etc,
> is what is the good way to get this reviewed to make sure everyone is aligned?
> There are a lot of angles and details, so what is the most efficient method?
> Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> into logical pieces and start submitting it to mailing list for discussion one by one?
> Any other methods?
>
> The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> data, alternatives considered, etc.
>
> Best Regards,
> Elena.
>
> [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/
> [2] https://lpc.events/event/16/contributions/1328/
> [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote:
> Is there for example anything that precludes TDX or SEV from executing
> code in the bounce buffers?

In TDX, attempt to fetch instructions from shared memory (i.e. bounce
buffer) will cause #GP, only data fetch is allowed. Page table also cannot
be placed there and will cause the same #GP.

--
Kiryl Shutsemau / Kirill A. Shutemov
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
[...]
> > The big threat from most devices (including the thunderbolt
> > classes) is that they can DMA all over memory.  However, this isn't
> > really a threat in CC (well until PCI becomes able to do encrypted
> > DMA) because the device has specific unencrypted buffers set aside
> > for the expected DMA. If it writes outside that CC integrity will
> > detect it and if it reads outside that it gets unintelligible
> > ciphertext.  So we're left with the device trying to trick secrets
> > out of us by returning unexpected data.
>
> Yes, by supplying the input that hasn’t been expected. This is
> exactly the case we were trying to fix here for example:
> https://lore.kernel.org/all/20230119170633.40944-2-alexander.shishkin@linux.intel.com/
> I do agree that this case is less severe when others where memory
> corruption/buffer overrun can happen, like here:
> https://lore.kernel.org/all/20230119135721.83345-6-alexander.shishkin@linux.intel.com/
> But we are trying to fix all issues we see now (prioritizing the
> second ones though).

I don't see how MSI table sizing is a bug in the category we've
defined. The very text of the changelog says "resulting in a kernel
page fault in pci_write_msg_msix()." which is a crash, which I thought
we were agreeing was out of scope for CC attacks?

> >
> > If I set this as the problem, verifying device correct operation is
> > a possible solution (albeit hugely expensive) but there are likely
> > many other cheaper ways to defeat or detect a device trying to
> > trick us into revealing something.
>
> What do you have in mind here for the actual devices we need to
> enable for CC cases?

Well, the most dangerous devices seem to be the virtio set a CC system
will rely on to boot up. After that, there are other ways (like SPDM)
to verify a real PCI device is on the other end of the transaction.

> We have been using here a combination of extensive fuzzing and static
> code analysis.

by fuzzing, I assume you mean fuzzing from the PCI configuration space?
Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
off the table because fuzzing primarily triggers those so its hard to
see what else it could detect given the signal will be smothered by
oopses and secondly I think the PCI interface is likely the wrong place
to begin and you should probably begin on the virtio bus and the
hypervisor generated configuration space.

James
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Mon, Jan 30, 2023 at 03:00:52PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jan 30, 2023 at 12:36:34PM +0100, Christophe de Dinechin wrote:
> > Is there for example anything that precludes TDX or SEV from executing
> > code in the bounce buffers?
>
> In TDX, attempt to fetch instructions from shared memory (i.e. bounce
> buffer) will cause #GP, only data fetch is allowed. Page table also cannot
> be placed there and will cause the same #GP.

Same with SEV IIRC.

--
MST
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
Hi Dinechin,

Thank you very much for your review! Please find the replies inline.

>
> Hi Elena,
>
> On 2023-01-25 at 12:28 UTC, "Reshetova, Elena" <elena.reshetova@intel.com>
> wrote...
> > Hi Greg,
> >
> > You mentioned couple of times (last time in this recent thread:
> > https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to
> start
> > discussing the updated threat model for kernel, so this email is a start in this
> direction.
> >
> > (Note: I tried to include relevant people from different companies, as well as
> linux-coco
> > mailing list, but I hope everyone can help by including additional people as
> needed).
> >
> > As we have shared before in various lkml threads/conference presentations
> > ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we
> have a
> > change in the threat model where guest kernel doesn’t anymore trust the
> hypervisor.
> > This is a big change in the threat model and requires both careful assessment of
> the
> > new (hypervisor <-> guest kernel) attack surface, as well as careful design of
> mitigations
> > and security validation techniques. This is the activity that we have started back
> at Intel
> > and the current status can be found in
> >
> > 1) Threat model and potential mitigations:
> > https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
>
> I only looked at this one so far. Here are a few quick notes:
>
> DoS attacks are out of scope. What about timing attacks, which were the
> basis of some of the most successful attacks in the past years? My
> understanding is that TDX relies on existing mitigations, and does not
> introduce anythign new in that space. Worth mentioning in that "out of
> scope" section IMO.

It is not out of the scope because TD guest SW has to think about these
matters and protect adequately. We have a section lower on " Transient Execution attacks
mitigation" https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#transient-execution-attacks-and-their-mitigation
but I agree it is worth pointing to this (and generic side-channel attacks) already
in the scoping. I will make an update.

>
> Why are TDVMCALL hypercalls listed as an "existing" communication interface?
> That seems to exclude the TDX module from the TCB.

I believe this is just ambiguous wording, I need to find a better one.
TDVMCALL is indeed a *new* TDX specific communication interface, but it is
only a transport in this case for the actual *existing* legacy communication interfaces
between the VM guest and host/hypervisor (read/write MSRs, pci config space
access, port IO and MMIO, etc).

Also, "shared memory for
> I/Os" seems unnecessarily restrictive, since it excludes interrupts, timing
> attacks, network or storage attacks, or devices passed through to the guest.
> The latter category seems important to list, since there are separate
> efforts to provide confidential computing capabilities e.g. to PCI devices,
> which were discussed elsewhere in this thread.

The second bullet meant to say that we also have another interface how CoCo guest
and host/VMM can communicate and it is done via shared pages (vs private pages that
are only accessible to confidential computing guest). Maybe I should drop the "IO" part of
this and it would avoid confusion. The other means (some are higher-level abstractions
like disk operations that happen over bounce buffer in shared memory), like interrupts, disk, etc,
we do cover below in separate sections of the doc with exception of covering
CoCo-enabled devices. This is smth we can briefly mention as an addition, but since
we don’t have these devices yet, and neither we have linux implementation that
can securely add them to the CoCo guest, I find it preliminary to discuss details at this point.


> I suspect that my question above is due to ambiguous wording. What I
> initially read as "this is out of scope for TDX" morphs in the next
> paragraph into "we are going to explain how to mitigate attacks through
> TDVMCALLS and shared memory for I/O". Consider rewording to clarify the
> intent of these paragraphs.
>

Sure, sorry for ambiguous wording, will try to clarify.

> Nit: I suggest adding bullets to the items below "between host/VMM and the
> guest"

Yes, it used to have it actually, have to see what happened with recent docs update.

>
> You could count the "unique code locations" that can consume malicious input
> in drivers, why not in core kernel? I think you write elsewhere that the
> drivers account for the vast majority, so I suspect you have the numbers.

I don’t have the ready numbers for core kernel, but if really needed, I can calculate them.
Here https://github.com/intel/ccc-linux-guest-hardening/tree/master/bkc/audit/sample_output/6.0-rc2
you can find the public files that would produce this data:

https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig
is all hits (with taint propagation) for the whole allyesconfig (x86 build, CONFIG_COMPILE_TEST is off).
https://github.com/intel/ccc-linux-guest-hardening/blob/master/bkc/audit/sample_output/6.0-rc2/smatch_warns_6.0_tdx_allyesconfig_filtered
is the same but with most of the drivers dropped.


>
> "The implementation of the #VE handler is simple and does not require an
> in-depth security audit or fuzzing since it is not the actual consumer of
> the host/VMM supplied untrusted data": The assumption there seems to be that
> the host will never be able to supply data (e.g. through a bounce buffer)
> that it can trick the guest into executing. If that is indeed the
> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
> since many earlier attacks were based on executing the wrong code. Notably,
> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
> key (as opposed to any device key e.g. for PCI encryption) in either
> TDX or SEV. Is there for example anything that precludes TDX or SEV from
> executing code in the bounce buffers?

This was already replied by Kirill, any code execution out of shared memory generates
a #GP.

>
> "We only care about users that read from MMIO": Why? My guess is that this
> is the only way bad data could be fed to the guest. But what if a bad MMIO
> write due to poisoned data injected earlier was a necessary step to open the
> door to a successful attack?

The entry point of the attack is still a "read". The situation you describe can happen,
but the root cause would be still an incorrectly handled MMIO read and this is what
we try to check with both fuzzing and auditing the 'read' entry points.

Thank you again for the review!

Best Regards,
Elena.
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> [...]
> > > The big threat from most devices (including the thunderbolt
> > > classes) is that they can DMA all over memory.  However, this isn't
> > > really a threat in CC (well until PCI becomes able to do encrypted
> > > DMA) because the device has specific unencrypted buffers set aside
> > > for the expected DMA. If it writes outside that CC integrity will
> > > detect it and if it reads outside that it gets unintelligible
> > > ciphertext.  So we're left with the device trying to trick secrets
> > > out of us by returning unexpected data.
> >
> > Yes, by supplying the input that hasn’t been expected. This is
> > exactly the case we were trying to fix here for example:
> > https://lore.kernel.org/all/20230119170633.40944-2-
> alexander.shishkin@linux.intel.com/
> > I do agree that this case is less severe when others where memory
> > corruption/buffer overrun can happen, like here:
> > https://lore.kernel.org/all/20230119135721.83345-6-
> alexander.shishkin@linux.intel.com/
> > But we are trying to fix all issues we see now (prioritizing the
> > second ones though).
>
> I don't see how MSI table sizing is a bug in the category we've
> defined. The very text of the changelog says "resulting in a kernel
> page fault in pci_write_msg_msix()." which is a crash, which I thought
> we were agreeing was out of scope for CC attacks?

As I said this is an example of a crash and on the first look
might not lead to the exploitable condition (albeit attackers are creative).
But we noticed this one while fuzzing and it was common enough
that prevented fuzzer going deeper into the virtio devices driver fuzzing.
The core PCI/MSI doesn’t seem to have that many easily triggerable
Other examples in virtio patchset are more severe.

>
> > >
> > > If I set this as the problem, verifying device correct operation is
> > > a possible solution (albeit hugely expensive) but there are likely
> > > many other cheaper ways to defeat or detect a device trying to
> > > trick us into revealing something.
> >
> > What do you have in mind here for the actual devices we need to
> > enable for CC cases?
>
> Well, the most dangerous devices seem to be the virtio set a CC system
> will rely on to boot up. After that, there are other ways (like SPDM)
> to verify a real PCI device is on the other end of the transaction.

Yes, it the future, but not yet. Other vendors will not necessary be
using virtio devices at this point, so we will have non-virtio and not
CC enabled devices that we want to securely add to the guest.

>
> > We have been using here a combination of extensive fuzzing and static
> > code analysis.
>
> by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> off the table because fuzzing primarily triggers those

If you enable memory sanitizers you can detect more server conditions like
out of bounds accesses and such. I think given that we have a way to
verify that fuzzing is reaching the code locations we want it to reach, it
can be pretty effective method to find at least low-hanging bugs. And these
will be the bugs that most of the attackers will go after at the first place.
But of course it is not a formal verification of any kind.

so its hard to
> see what else it could detect given the signal will be smothered by
> oopses and secondly I think the PCI interface is likely the wrong place
> to begin and you should probably begin on the virtio bus and the
> hypervisor generated configuration space.

This is exactly what we do. We don’t fuzz from the PCI config space,
we supply inputs from the host/vmm via the legitimate interfaces that it can
inject them to the guest: whenever guest requests a pci config space
(which is controlled by host/hypervisor as you said) read operation,
it gets input injected by the kafl fuzzer. Same for other interfaces that
are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
via #VE handler in our case). When it comes to virtio, we employ
two different fuzzing techniques: directly injecting kafl fuzz input when
virtio core or virtio drivers gets the data received from the host
(via injecting input in functions virtio16/32/64_to_cpu and others) and
directly fuzzing DMA memory pages using kfx fuzzer.
More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory.  However, this
> > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > encrypted DMA) because the device has specific unencrypted
> > > > buffers set aside for the expected DMA. If it writes outside
> > > > that CC integrity will detect it and if it reads outside that
> > > > it gets unintelligible ciphertext.  So we're left with the
> > > > device trying to trick secrets out of us by returning
> > > > unexpected data.
> > >
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> >
> > I don't see how MSI table sizing is a bug in the category we've
> > defined.  The very text of the changelog says "resulting in a
> > kernel page fault in pci_write_msg_msix()."  which is a crash,
> > which I thought we were agreeing was out of scope for CC attacks?
>
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are
> creative). But we noticed this one while fuzzing and it was common
> enough that prevented fuzzer going deeper into the virtio devices
> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> easily triggerable Other examples in virtio patchset are more severe.

You cited this as your example. I'm pointing out it seems to be an
event of the class we've agreed not to consider because it's an oops
not an exploit. If there are examples of fixing actual exploits to CC
VMs, what are they?

This patch is, however, an example of the problem everyone else on the
thread is complaining about: a patch which adds an unnecessary check to
the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
in the real world the tables are correct (or the manufacturer is
quickly chastened), so it adds overhead to no benefit.


[...]
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong
> > place to begin and you should probably begin on the virtio bus and
> > the hypervisor generated configuration space.
>
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that
> it can inject them to the guest: whenever guest requests a pci config
> space (which is controlled by host/hypervisor as you said) read
> operation, it gets input injected by the kafl fuzzer.  Same for other
> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> anything that goes via #VE handler in our case). When it comes to
> virtio, we employ two different fuzzing techniques: directly
> injecting kafl fuzz input when virtio core or virtio drivers gets the
> data received from the host (via injecting input in functions
> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> pages using kfx fuzzer. More information can be found in
> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing

Given that we previously agreed that oppses and other DoS attacks are
out of scope for CC, I really don't think fuzzing, which primarily
finds oopses, is at all a useful tool unless you filter the results by
the question "could we exploit this in a CC VM to reveal secrets".
Without applying that filter you're sending a load of patches which
don't really do much to reduce the CC attack surface and which do annoy
non-CC people because they add pointless checks to things they expect
the cards and config tables to get right.

James
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory.  However, this
> > > > > isn't really a threat in CC (well until PCI becomes able to do
> > > > > encrypted DMA) because the device has specific unencrypted
> > > > > buffers set aside for the expected DMA. If it writes outside
> > > > > that CC integrity will detect it and if it reads outside that
> > > > > it gets unintelligible ciphertext.  So we're left with the
> > > > > device trying to trick secrets out of us by returning
> > > > > unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined.  The very text of the changelog says "resulting in a
> > > kernel page fault in pci_write_msg_msix()."  which is a crash,
> > > which I thought we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are
> > creative). But we noticed this one while fuzzing and it was common
> > enough that prevented fuzzer going deeper into the virtio devices
> > driver fuzzing. The core PCI/MSI doesn’t seem to have that many
> > easily triggerable Other examples in virtio patchset are more severe.
>
> You cited this as your example. I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit. If there are examples of fixing actual exploits to CC
> VMs, what are they?
>
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

How can you make sure there is no exploit possible using this crash
as a stepping stone into a CC guest? Or are you saying that we are back
to the times when we can merge the fixes for crashes and out of bound errors in
kernel only given that we submit a proof of concept exploit with the
patch for every issue?

>
>
> [...]
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong
> > > place to begin and you should probably begin on the virtio bus and
> > > the hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that
> > it can inject them to the guest: whenever guest requests a pci config
> > space (which is controlled by host/hypervisor as you said) read
> > operation, it gets input injected by the kafl fuzzer.  Same for other
> > interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
> > anything that goes via #VE handler in our case). When it comes to
> > virtio, we employ two different fuzzing techniques: directly
> > injecting kafl fuzz input when virtio core or virtio drivers gets the
> > data received from the host (via injecting input in functions
> > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > pages using kfx fuzzer. More information can be found in
> > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> hardening.html#td-guest-fuzzing
>
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

I don’t think we have agreed that random kernel crashes are out of scope in CC threat model
(controlled safe panic is out of scope, but this is not what we have here).
It all depends if this ops can be used in a successful attack against guest private
memory or not and this is *not* a trivial thing to decide.
That's said, we are mostly focusing on KASAN findings, which
have higher likelihood to be exploitable at least for host -> guest privilege escalation
(which in turn compromised guest private memory confidentiality). Fuzzing has a
long history of find such issues in past (including the ones that have been
exploited after). But even for this ops bug, can anyone guarantee it cannot be chained
with other ones to cause a more complex privilege escalation attack? I wont be making
such a claim, I feel it is safer to fix this vs debating whenever it can be used for an
attack or not.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>> > [...]
>> > > > The big threat from most devices (including the thunderbolt
>> > > > classes) is that they can DMA all over memory.  However, this
>> > > > isn't really a threat in CC (well until PCI becomes able to do
>> > > > encrypted DMA) because the device has specific unencrypted
>> > > > buffers set aside for the expected DMA. If it writes outside
>> > > > that CC integrity will detect it and if it reads outside that
>> > > > it gets unintelligible ciphertext.  So we're left with the
>> > > > device trying to trick secrets out of us by returning
>> > > > unexpected data.
>> > >
>> > > Yes, by supplying the input that hasn’t been expected. This is
>> > > exactly the case we were trying to fix here for example:
>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>> > alexander.shishkin@linux.intel.com/
>> > > I do agree that this case is less severe when others where memory
>> > > corruption/buffer overrun can happen, like here:
>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>> > alexander.shishkin@linux.intel.com/
>> > > But we are trying to fix all issues we see now (prioritizing the
>> > > second ones though).
>> >
>> > I don't see how MSI table sizing is a bug in the category we've
>> > defined.  The very text of the changelog says "resulting in a
>> > kernel page fault in pci_write_msg_msix()."  which is a crash,
>> > which I thought we were agreeing was out of scope for CC attacks?
>>
>> As I said this is an example of a crash and on the first look
>> might not lead to the exploitable condition (albeit attackers are
>> creative). But we noticed this one while fuzzing and it was common
>> enough that prevented fuzzer going deeper into the virtio devices
>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>> easily triggerable Other examples in virtio patchset are more severe.
>
> You cited this as your example. I'm pointing out it seems to be an
> event of the class we've agreed not to consider because it's an oops
> not an exploit. If there are examples of fixing actual exploits to CC
> VMs, what are they?
>
> This patch is, however, an example of the problem everyone else on the
> thread is complaining about: a patch which adds an unnecessary check to
> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
> in the real world the tables are correct (or the manufacturer is
> quickly chastened), so it adds overhead to no benefit.

I'd like to backtrack a little here.


1/ PCI-as-a-thread, where does it come from?

On physical devices, we have to assume that the device is working. As other
pointed out, there are things like PCI compliance tests, etc. So Linux has
to trust the device. You could manufacture a broken device intentionally,
but the value you would get from that would be limited.

On a CC system, the "PCI" values are really provided by the hypervisor,
which is not trusted. This leads to this peculiar way of thinking where we
say "what happens if virtual device feeds us a bogus value *intentionally*".
We cannot assume that the *virtual* PCI device ran through the compliance
tests. Instead, we see the PCI interface as hostile, which makes us look
like weirdos to the rest of the community.

Consequently, as James pointed out, we first need to focus on consequences
that would break what I would call the "CC promise", which is essentially
that we'd rather kill the guest than reveal its secrets. Unless you have a
credible path to a secret being revealed, don't bother "fixing" a bug. And
as was pointed out elsewhere in this thread, caching has a cost, so you
can't really use the "optimization" angle either.


2/ Clarification of the "CC promise" and value proposition

Based on the above, the very first thing is to clarify that "CC promise",
because if exchanges on this thread have proved anything, it is that it's
quite unclear to anyone outside the "CoCo world".

The Linux Guest Kernel Security Specification needs to really elaborate on
what the value proposition of CC is, not assume it is a given. "Bug fixes"
before this value proposition has been understood and accepted by the
non-CoCo community are likely to go absolutely nowhere.

Here is a quick proposal for the Purpose and Scope section:

<doc>
Purpose and Scope

Confidential Computing (CC) is a set of technologies that allows a guest to
run without having to trust either the hypervisor or the host. CC offers two
new guarantees to the guest compared to the non-CC case:

a) The guest will be able to measure and attest, by cryptographic means, the
guest software stack that it is running, and be assured that this
software stack cannot be tampered with by the host or the hypervisor
after it was measured. The root of trust for this aspect of CC is
typically the CPU manufacturer (e.g. through a private key that can be
used to respond to cryptographic challenges).

b) Guest state, including memory, become secrets which must remain
inaccessible to the host. In a CC context, it is considered preferable to
stop or kill a guest rather than risk leaking its secrets. This aspect of
CC is typically enforced by means such as memory encryption and new
semantics for memory protection.

CC leads to a different threat model for a Linux kernel running as a guest
inside a confidential virtual machine (CVM). Notably, whereas the machine
(CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
case, the hypervisor emulating some aspects of the virtual machine is now
considered as potentially malicious. Consequently, effects of any data
provided by the guest to the hypervisor, including ACPI configuration
tables, MMIO interfaces or machine specific registers (MSRs) need to be
re-evaluated.

This document describes the security architecture of the Linux guest kernel
running inside a CVM, with a particular focus on the Intel TDX
implementation. Many aspects of this document will be applicable to other
CC implementations such as AMD SEV.

Aspects of the guest-visible state that are under direct control of the
hardware, such as the CPU state or memory protection, will be considered as
being handled by the CC implementations. This document will therefore only
focus on aspects of the virtual machine that are typically managed by the
hypervisor or the host.

Since the host ultimately owns the resources and can allocate them at will,
including denying their use at any point, this document will not address
denial or service or performance degradation. It will however cover random
number generation, which is central for cryptographic security.

Finally, security considerations that apply irrespective of whether the
platform is confidential or not are also outside of the scope of this
document. This includes topics ranging from timing attacks to social
engineering.
</doc>

Feel free to comment and reword at will ;-)


3/ PCI-as-a-threat: where does that come from

Isn't there a fundamental difference, from a threat model perspective,
between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
should defeat) and compromised software feeding us bad data? I think there
is: at leats inside the TCB, we can detect bad software using measurements,
and prevent it from running using attestation. In other words, we first
check what we will run, then we run it. The security there is that we know
what we are running. The trust we have in the software is from testing,
reviewing or using it.

This relies on a key aspect provided by TDX and SEV, which is that the
software being measured is largely tamper-resistant thanks to memory
encryption. In other words, after you have measured your guest software
stack, the host or hypervisor cannot willy-nilly change it.

So this brings me to the next question: is there any way we could offer the
same kind of service for KVM and qemu? The measurement part seems relatively
easy. Thetamper-resistant part, on the other hand, seems quite difficult to
me. But maybe someone else will have a brilliant idea?

So I'm asking the question, because if you could somehow prove to the guest
not only that it's running the right guest stack (as we can do today) but
also a known host/KVM/hypervisor stack, we would also switch the potential
issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
this is something which is evidently easier to deal with.

I briefly discussed this with James, and he pointed out two interesting
aspects of that question:

1/ In the CC world, we don't really care about *virtual* PCI devices. We
care about either virtio devices, or physical ones being passed through
to the guest. Let's assume physical ones can be trusted, see above.
That leaves virtio devices. How much damage can a malicious virtio device
do to the guest kernel, and can this lead to secrets being leaked?

2/ He was not as negative as I anticipated on the possibility of somehow
being able to prevent tampering of the guest. One example he mentioned is
a research paper [1] about running the hypervisor itself inside an
"outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
with TDX using secure enclaves or some other mechanism?


Sorry, this mail is a bit long ;-)


>
>
> [...]
>> > see what else it could detect given the signal will be smothered by
>> > oopses and secondly I think the PCI interface is likely the wrong
>> > place to begin and you should probably begin on the virtio bus and
>> > the hypervisor generated configuration space.
>>
>> This is exactly what we do. We don’t fuzz from the PCI config space,
>> we supply inputs from the host/vmm via the legitimate interfaces that
>> it can inject them to the guest: whenever guest requests a pci config
>> space (which is controlled by host/hypervisor as you said) read
>> operation, it gets input injected by the kafl fuzzer.  Same for other
>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>> anything that goes via #VE handler in our case). When it comes to
>> virtio, we employ two different fuzzing techniques: directly
>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>> data received from the host (via injecting input in functions
>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>> pages using kfx fuzzer. More information can be found in
>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>
> Given that we previously agreed that oppses and other DoS attacks are
> out of scope for CC, I really don't think fuzzing, which primarily
> finds oopses, is at all a useful tool unless you filter the results by
> the question "could we exploit this in a CC VM to reveal secrets".
> Without applying that filter you're sending a load of patches which
> don't really do much to reduce the CC attack surface and which do annoy
> non-CC people because they add pointless checks to things they expect
> the cards and config tables to get right.

Indeed.

[1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592
--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.

Why are timing attacks by hypervisor on the guest out of scope?

> </doc>
>
> Feel free to comment and reword at will ;-)
>
>
> 3/ PCI-as-a-threat: where does that come from
>
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation. In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
>
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
>
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
>
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.

Agree absolutely that's much easier.

> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
>
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> care about either virtio devices, or physical ones being passed through
> to the guest. Let's assume physical ones can be trusted, see above.
> That leaves virtio devices. How much damage can a malicious virtio device
> do to the guest kernel, and can this lead to secrets being leaked?
>
> 2/ He was not as negative as I anticipated on the possibility of somehow
> being able to prevent tampering of the guest. One example he mentioned is
> a research paper [1] about running the hypervisor itself inside an
> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> with TDX using secure enclaves or some other mechanism?

Or even just secureboot based root of trust?

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com> wrote...
> Hi Dinechin,

Nit: My first name is actually Christophe ;-)

[snip]

>> "The implementation of the #VE handler is simple and does not require an
>> in-depth security audit or fuzzing since it is not the actual consumer of
>> the host/VMM supplied untrusted data": The assumption there seems to be that
>> the host will never be able to supply data (e.g. through a bounce buffer)
>> that it can trick the guest into executing. If that is indeed the
>> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
>> since many earlier attacks were based on executing the wrong code. Notably,
>> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
>> key (as opposed to any device key e.g. for PCI encryption) in either
>> TDX or SEV. Is there for example anything that precludes TDX or SEV from
>> executing code in the bounce buffers?
>
> This was already replied by Kirill, any code execution out of shared memory generates
> a #GP.

Apologies for my wording. Everyone interpreted "executing" as "executing
directly on the bounce buffer page", when what I meant is "consuming data
fetched from the bounce buffers as code" (not necessarily directly).

For example, in the diagram in your document, the guest kernel is a
monolithic piece. In reality, there are dynamically loaded components. In
the original SEV implementation, with pre-attestation, the measurement could
only apply before loading any DLKM (I believe, not really sure). As another
example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload
directly into the guest kernel using virtio-based network I/O. That is what
I referred to when I wrote "many earlier attacks were based on executing the
wrong code".

The fact that I/O buffers are not encrypted matters here, because it gives
the host ample latitude to observe or even corrupt all I/Os, as many others
have pointed out. Notably, disk crypto may not be designed to resist to a
host that can see and possibly change the I/Os.

So let me rephrase my vague question as a few more precise ones:

1) What are the effects of semi-random kernel code injection?

If the host knows that a given bounce buffer happens to be used later to
execute some kernel code, it can start flipping bits in it to try and
trigger arbitrary code paths in the guest. My understanding is that
crypto alone (i.e. without additional layers like dm-integrity) will
happily decrypt that into a code stream with pseudo-random instructions
in it, not vehemently error out.

So, while TDX precludes the host from writing into guest memory directly,
since the bounce buffers are shared, TDX will not prevent the host from
flipping bits there. It's then just a matter of guessing where the bits
will go, and hoping that some bits execute at guest PL0. Of course, this
can be mitigated by either only using static configs, or using
dm-verity/dm-integrity, or maybe some other mechanisms.

Shouldn't that be part of your document? To be clear: you mention under
"Storage protection" that you use dm-crypt and dm-integrity, so I believe
*you* know, but your readers may not figure out why dm-integrity is
integral to the process, notably after you write "Users could use other
encryption schemes".

2) What are the effects of random user code injection?

It's the same as above, except that now you can target a much wider range
of input data, including shell scripts, etc. So the attack surface is
much larger.

3) What is the effect of data poisoning?

You don't necessarily need to corrupt code. Being able to corrupt a
system configuration file for example can be largely sufficient.

4) Are there I/O-based replay attacks that would work pre-attestation?

My current mental model is that you load a "base" software stack into the
TCB and then measure a relevant part of it. What you measure is somewhat
implementation-dependent, but in the end, if the system is attested, you
respond to a cryptographic challenge based on what was measured, and you
then get relevant secrets, e.g. a disk decryption key, that let you make
forward progress. However, what happens if every time you boot, the host
feeds you bogus disk data just to try to steer the boot sequence along
some specific path?

I believe that the short answer is: the guest either:

a) reaches attestation, but with bad in-memory data, so it fails the
crypto exchange, and secrets are not leaked.

b) does not reach attestation, so never gets the secrets, and therefore
still fulfils the CC promise of not leaking secrets.

So I personally feel this is OK, but it's worth writing up in your doc.


Back to the #VE handler, if I can find a way to inject malicious code into
my guest, what you wrote in that paragraph as a justification for no
in-depth security still seems like "not exactly defense in depth". I would
just remove the sentence, audit and fuzz that code with the same energy as
for anything else that could face bad input.


[1]: https://www.sec.in.tum.de/i20/student-work/code-execution-attacks-against-encrypted-virtual-machines



--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, 2023-01-31 at 16:34 +0000, Reshetova, Elena wrote:
[...]
> > You cited this as your example.  I'm pointing out it seems to be an
> > event of the class we've agreed not to consider because it's an
> > oops not an exploit.  If there are examples of fixing actual
> > exploits to CC VMs, what are they?
> >
> > This patch is, however, an example of the problem everyone else on
> > the thread is complaining about: a patch which adds an unnecessary
> > check to the MSI subsystem; unnecessary because it doesn't fix a CC
> > exploit and in the real world the tables are correct (or the
> > manufacturer is quickly chastened), so it adds overhead to no
> > benefit.
>
> How can you make sure there is no exploit possible using this crash
> as a stepping stone into a CC guest?

I'm not, what I'm saying is you haven't proved it can be used to
exfiltrate secrets. In a world where the PCI device is expected to be
correct, and the non-CC kernel doesn't want to second guess that, there
are loads of lies you can tell to the PCI subsystem that causes a crash
or a hang. If we fix every one, we end up with a massive patch set and
a huge potential slow down for the non-CC kernel. If there's no way to
tell what lies might leak data, the fuzzing results are a mass of noise
with no real signal and we can't even quantify by how much (or even if)
we've improved the CC VM attack surface even after we merge the huge
patch set it generates.

> Or are you saying that we are back to the times when we can merge
> the fixes for crashes and out of bound errors in kernel only given
> that we submit a proof of concept exploit with the patch for every
> issue?

The PCI people have already said that crashing in the face of bogus
configuration data is expected behaviour, so just generating the crash
doesn't prove there's a problem to be fixed. That means you do have to
go beyond and demonstrate there could be an information leak in a CC VM
on the back of it, yes.

> > [...]
> > > > see what else it could detect given the signal will be
> > > > smothered by oopses and secondly I think the PCI interface is
> > > > likely the wrong place to begin and you should probably begin
> > > > on the virtio bus and the hypervisor generated configuration
> > > > space.
> > >
> > > This is exactly what we do. We don’t fuzz from the PCI config
> > > space, we supply inputs from the host/vmm via the legitimate
> > > interfaces that it can inject them to the guest: whenever guest
> > > requests a pci config space (which is controlled by
> > > host/hypervisor as you said) read operation, it gets input
> > > injected by the kafl fuzzer.  Same for other interfaces that are
> > > under control of host/VMM (MSRs, port IO, MMIO, anything that
> > > goes via #VE handler in our case). When it comes to virtio, we
> > > employ  two different fuzzing techniques: directly injecting kafl
> > > fuzz input when virtio core or virtio drivers gets the data
> > > received from the host (via injecting input in functions
> > > virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
> > > pages using kfx fuzzer. More information can be found in
> > > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-
> > hardening.html#td-guest-fuzzing
> >
> > Given that we previously agreed that oppses and other DoS attacks
> > are out of scope for CC, I really don't think fuzzing, which
> > primarily finds oopses, is at all a useful tool unless you filter
> > the results by the question "could we exploit this in a CC VM to
> > reveal secrets". Without applying that filter you're sending a load
> > of patches which don't really do much to reduce the CC attack
> > surface and which do annoy non-CC people because they add pointless
> > checks to things they expect the cards and config tables to get
> > right.
>
> I don’t think we have agreed that random kernel crashes are out of
> scope in CC threat model (controlled safe panic is out of scope, but
> this is not what we have here).

So perhaps making it a controlled panic in the CC VM, so we can
guarantee no information leak, would be the first place to start?

> It all depends if this ops can be used in a successful attack against
> guest private memory or not and this is *not* a trivial thing to
> decide.

Right, but if you can't decide that, you can't extract the signal from
your fuzzing tool noise.

> That's said, we are mostly focusing on KASAN findings, which
> have higher likelihood to be exploitable at least for host -> guest
> privilege escalation (which in turn compromised guest private memory
> confidentiality). Fuzzing has a long history of find such issues in
> past (including the ones that have been exploited after). But even
> for this ops bug, can anyone guarantee it cannot be chained with
> other ones to cause a more complex privilege escalation attack?
> I wont be making such a claim, I feel it is safer to fix this vs
> debating whenever it can be used for an attack or not.

The PCI people have already been clear that adding a huge framework of
checks to PCI table parsing simply for the promise it "might possibly"
improve CC VM security is way too much effort for too little result.
If you can hone that down to a few places where you can show it will
prevent a CC information leak, I'm sure they'll be more receptive.
Telling them to disprove your assertion that there might be an exploit
here isn't going to make them change their minds.

James
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
I typoed a lot in this email...


On 2023-01-31 at 16:14 +01, Christophe de Dinechin <dinechin@redhat.com> wrote...
> On 2023-01-31 at 08:28 -05, James Bottomley <jejb@linux.ibm.com> wrote...
>> On Tue, 2023-01-31 at 11:31 +0000, Reshetova, Elena wrote:
>>> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
>>> > [...]
>>> > > > The big threat from most devices (including the thunderbolt
>>> > > > classes) is that they can DMA all over memory.  However, this
>>> > > > isn't really a threat in CC (well until PCI becomes able to do
>>> > > > encrypted DMA) because the device has specific unencrypted
>>> > > > buffers set aside for the expected DMA. If it writes outside
>>> > > > that CC integrity will detect it and if it reads outside that
>>> > > > it gets unintelligible ciphertext.  So we're left with the
>>> > > > device trying to trick secrets out of us by returning
>>> > > > unexpected data.
>>> > >
>>> > > Yes, by supplying the input that hasn’t been expected. This is
>>> > > exactly the case we were trying to fix here for example:
>>> > > https://lore.kernel.org/all/20230119170633.40944-2-
>>> > alexander.shishkin@linux.intel.com/
>>> > > I do agree that this case is less severe when others where memory
>>> > > corruption/buffer overrun can happen, like here:
>>> > > https://lore.kernel.org/all/20230119135721.83345-6-
>>> > alexander.shishkin@linux.intel.com/
>>> > > But we are trying to fix all issues we see now (prioritizing the
>>> > > second ones though).
>>> >
>>> > I don't see how MSI table sizing is a bug in the category we've
>>> > defined.  The very text of the changelog says "resulting in a
>>> > kernel page fault in pci_write_msg_msix()."  which is a crash,
>>> > which I thought we were agreeing was out of scope for CC attacks?
>>>
>>> As I said this is an example of a crash and on the first look
>>> might not lead to the exploitable condition (albeit attackers are
>>> creative). But we noticed this one while fuzzing and it was common
>>> enough that prevented fuzzer going deeper into the virtio devices
>>> driver fuzzing. The core PCI/MSI doesn’t seem to have that many
>>> easily triggerable Other examples in virtio patchset are more severe.
>>
>> You cited this as your example. I'm pointing out it seems to be an
>> event of the class we've agreed not to consider because it's an oops
>> not an exploit. If there are examples of fixing actual exploits to CC
>> VMs, what are they?
>>
>> This patch is, however, an example of the problem everyone else on the
>> thread is complaining about: a patch which adds an unnecessary check to
>> the MSI subsystem; unnecessary because it doesn't fix a CC exploit and
>> in the real world the tables are correct (or the manufacturer is
>> quickly chastened), so it adds overhead to no benefit.
>
> I'd like to backtrack a little here.
>
>
> 1/ PCI-as-a-thread, where does it come from?

PCI-as-a-threat

>
> On physical devices, we have to assume that the device is working. As other
> pointed out, there are things like PCI compliance tests, etc. So Linux has
> to trust the device. You could manufacture a broken device intentionally,
> but the value you would get from that would be limited.
>
> On a CC system, the "PCI" values are really provided by the hypervisor,
> which is not trusted. This leads to this peculiar way of thinking where we
> say "what happens if virtual device feeds us a bogus value *intentionally*".
> We cannot assume that the *virtual* PCI device ran through the compliance
> tests. Instead, we see the PCI interface as hostile, which makes us look
> like weirdos to the rest of the community.
>
> Consequently, as James pointed out, we first need to focus on consequences
> that would break what I would call the "CC promise", which is essentially
> that we'd rather kill the guest than reveal its secrets. Unless you have a
> credible path to a secret being revealed, don't bother "fixing" a bug. And
> as was pointed out elsewhere in this thread, caching has a cost, so you
> can't really use the "optimization" angle either.
>
>
> 2/ Clarification of the "CC promise" and value proposition
>
> Based on the above, the very first thing is to clarify that "CC promise",
> because if exchanges on this thread have proved anything, it is that it's
> quite unclear to anyone outside the "CoCo world".
>
> The Linux Guest Kernel Security Specification needs to really elaborate on
> what the value proposition of CC is, not assume it is a given. "Bug fixes"
> before this value proposition has been understood and accepted by the
> non-CoCo community are likely to go absolutely nowhere.
>
> Here is a quick proposal for the Purpose and Scope section:
>
> <doc>
> Purpose and Scope
>
> Confidential Computing (CC) is a set of technologies that allows a guest to
> run without having to trust either the hypervisor or the host. CC offers two
> new guarantees to the guest compared to the non-CC case:
>
> a) The guest will be able to measure and attest, by cryptographic means, the
> guest software stack that it is running, and be assured that this
> software stack cannot be tampered with by the host or the hypervisor
> after it was measured. The root of trust for this aspect of CC is
> typically the CPU manufacturer (e.g. through a private key that can be
> used to respond to cryptographic challenges).
>
> b) Guest state, including memory, become secrets which must remain
> inaccessible to the host. In a CC context, it is considered preferable to
> stop or kill a guest rather than risk leaking its secrets. This aspect of
> CC is typically enforced by means such as memory encryption and new
> semantics for memory protection.
>
> CC leads to a different threat model for a Linux kernel running as a guest
> inside a confidential virtual machine (CVM). Notably, whereas the machine
> (CPU, I/O devices, etc) is usually considered as trustworthy, in the CC
> case, the hypervisor emulating some aspects of the virtual machine is now
> considered as potentially malicious. Consequently, effects of any data
> provided by the guest to the hypervisor, including ACPI configuration

to the guest by the hypervisor

> tables, MMIO interfaces or machine specific registers (MSRs) need to be
> re-evaluated.
>
> This document describes the security architecture of the Linux guest kernel
> running inside a CVM, with a particular focus on the Intel TDX
> implementation. Many aspects of this document will be applicable to other
> CC implementations such as AMD SEV.
>
> Aspects of the guest-visible state that are under direct control of the
> hardware, such as the CPU state or memory protection, will be considered as
> being handled by the CC implementations. This document will therefore only
> focus on aspects of the virtual machine that are typically managed by the
> hypervisor or the host.
>
> Since the host ultimately owns the resources and can allocate them at will,
> including denying their use at any point, this document will not address
> denial or service or performance degradation. It will however cover random
> number generation, which is central for cryptographic security.
>
> Finally, security considerations that apply irrespective of whether the
> platform is confidential or not are also outside of the scope of this
> document. This includes topics ranging from timing attacks to social
> engineering.
> </doc>
>
> Feel free to comment and reword at will ;-)
>
>
> 3/ PCI-as-a-threat: where does that come from

3/ Can we shift from "malicious" hypervisor/host input to "bogus" input?

>
> Isn't there a fundamental difference, from a threat model perspective,
> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> should defeat) and compromised software feeding us bad data? I think there
> is: at leats inside the TCB, we can detect bad software using measurements,
> and prevent it from running using attestation. In other words, we first
> check what we will run, then we run it. The security there is that we know
> what we are running. The trust we have in the software is from testing,
> reviewing or using it.
>
> This relies on a key aspect provided by TDX and SEV, which is that the
> software being measured is largely tamper-resistant thanks to memory
> encryption. In other words, after you have measured your guest software
> stack, the host or hypervisor cannot willy-nilly change it.
>
> So this brings me to the next question: is there any way we could offer the
> same kind of service for KVM and qemu? The measurement part seems relatively
> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> me. But maybe someone else will have a brilliant idea?
>
> So I'm asking the question, because if you could somehow prove to the guest
> not only that it's running the right guest stack (as we can do today) but
> also a known host/KVM/hypervisor stack, we would also switch the potential
> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> this is something which is evidently easier to deal with.
>
> I briefly discussed this with James, and he pointed out two interesting
> aspects of that question:
>
> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> care about either virtio devices, or physical ones being passed through
> to the guest. Let's assume physical ones can be trusted, see above.
> That leaves virtio devices. How much damage can a malicious virtio device
> do to the guest kernel, and can this lead to secrets being leaked?
>
> 2/ He was not as negative as I anticipated on the possibility of somehow
> being able to prevent tampering of the guest. One example he mentioned is
> a research paper [1] about running the hypervisor itself inside an
> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> with TDX using secure enclaves or some other mechanism?
>
>
> Sorry, this mail is a bit long ;-)

and was a bit rushed too...

>
>
>>
>>
>> [...]
>>> > see what else it could detect given the signal will be smothered by
>>> > oopses and secondly I think the PCI interface is likely the wrong
>>> > place to begin and you should probably begin on the virtio bus and
>>> > the hypervisor generated configuration space.
>>>
>>> This is exactly what we do. We don’t fuzz from the PCI config space,
>>> we supply inputs from the host/vmm via the legitimate interfaces that
>>> it can inject them to the guest: whenever guest requests a pci config
>>> space (which is controlled by host/hypervisor as you said) read
>>> operation, it gets input injected by the kafl fuzzer.  Same for other
>>> interfaces that are under control of host/VMM (MSRs, port IO, MMIO,
>>> anything that goes via #VE handler in our case). When it comes to
>>> virtio, we employ two different fuzzing techniques: directly
>>> injecting kafl fuzz input when virtio core or virtio drivers gets the
>>> data received from the host (via injecting input in functions
>>> virtio16/32/64_to_cpu and others) and directly fuzzing DMA memory
>>> pages using kfx fuzzer. More information can be found in
>>> https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>>
>> Given that we previously agreed that oppses and other DoS attacks are
>> out of scope for CC, I really don't think fuzzing, which primarily
>> finds oopses, is at all a useful tool unless you filter the results by
>> the question "could we exploit this in a CC VM to reveal secrets".
>> Without applying that filter you're sending a load of patches which
>> don't really do much to reduce the CC attack surface and which do annoy
>> non-CC people because they add pointless checks to things they expect
>> the cards and config tables to get right.
>
> Indeed.
>
> [1]: https://dl.acm.org/doi/abs/10.1145/3548606.3560592


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> Finally, security considerations that apply irrespective of whether the
>> platform is confidential or not are also outside of the scope of this
>> document. This includes topics ranging from timing attacks to social
>> engineering.
>
> Why are timing attacks by hypervisor on the guest out of scope?

Good point.

I was thinking that mitigation against timing attacks is the same
irrespective of the source of the attack. However, because the HV
controls CPU time allocation, there are presumably attacks that
are made much easier through the HV. Those should be listed.

>
>> </doc>
>>
>> Feel free to comment and reword at will ;-)
>>
>>
>> 3/ PCI-as-a-threat: where does that come from
>>
>> Isn't there a fundamental difference, from a threat model perspective,
>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> should defeat) and compromised software feeding us bad data? I think there
>> is: at leats inside the TCB, we can detect bad software using measurements,
>> and prevent it from running using attestation. In other words, we first
>> check what we will run, then we run it. The security there is that we know
>> what we are running. The trust we have in the software is from testing,
>> reviewing or using it.
>>
>> This relies on a key aspect provided by TDX and SEV, which is that the
>> software being measured is largely tamper-resistant thanks to memory
>> encryption. In other words, after you have measured your guest software
>> stack, the host or hypervisor cannot willy-nilly change it.
>>
>> So this brings me to the next question: is there any way we could offer the
>> same kind of service for KVM and qemu? The measurement part seems relatively
>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> me. But maybe someone else will have a brilliant idea?
>>
>> So I'm asking the question, because if you could somehow prove to the guest
>> not only that it's running the right guest stack (as we can do today) but
>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> this is something which is evidently easier to deal with.
>
> Agree absolutely that's much easier.
>
>> I briefly discussed this with James, and he pointed out two interesting
>> aspects of that question:
>>
>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> care about either virtio devices, or physical ones being passed through
>> to the guest. Let's assume physical ones can be trusted, see above.
>> That leaves virtio devices. How much damage can a malicious virtio device
>> do to the guest kernel, and can this lead to secrets being leaked?
>>
>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> being able to prevent tampering of the guest. One example he mentioned is
>> a research paper [1] about running the hypervisor itself inside an
>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> with TDX using secure enclaves or some other mechanism?
>
> Or even just secureboot based root of trust?

You mean host secureboot? Or guest?

If it’s host, then the problem is detecting malicious tampering with
host code (whether it’s kernel or hypervisor).

If it’s guest, at the moment at least, the measurements do not extend
beyond the TCB.

>
> --
> MST
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> Finally, security considerations that apply irrespective of whether the
>> platform is confidential or not are also outside of the scope of this
>> document. This includes topics ranging from timing attacks to social
>> engineering.
>
> Why are timing attacks by hypervisor on the guest out of scope?

Good point.

I was thinking that mitigation against timing attacks is the same
irrespective of the source of the attack. However, because the HV
controls CPU time allocation, there are presumably attacks that
are made much easier through the HV. Those should be listed.

>
>> </doc>
>>
>> Feel free to comment and reword at will ;-)
>>
>>
>> 3/ PCI-as-a-threat: where does that come from
>>
>> Isn't there a fundamental difference, from a threat model perspective,
>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> should defeat) and compromised software feeding us bad data? I think there
>> is: at leats inside the TCB, we can detect bad software using measurements,
>> and prevent it from running using attestation. In other words, we first
>> check what we will run, then we run it. The security there is that we know
>> what we are running. The trust we have in the software is from testing,
>> reviewing or using it.
>>
>> This relies on a key aspect provided by TDX and SEV, which is that the
>> software being measured is largely tamper-resistant thanks to memory
>> encryption. In other words, after you have measured your guest software
>> stack, the host or hypervisor cannot willy-nilly change it.
>>
>> So this brings me to the next question: is there any way we could offer the
>> same kind of service for KVM and qemu? The measurement part seems relatively
>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> me. But maybe someone else will have a brilliant idea?
>>
>> So I'm asking the question, because if you could somehow prove to the guest
>> not only that it's running the right guest stack (as we can do today) but
>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> this is something which is evidently easier to deal with.
>
> Agree absolutely that's much easier.
>
>> I briefly discussed this with James, and he pointed out two interesting
>> aspects of that question:
>>
>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> care about either virtio devices, or physical ones being passed through
>> to the guest. Let's assume physical ones can be trusted, see above.
>> That leaves virtio devices. How much damage can a malicious virtio device
>> do to the guest kernel, and can this lead to secrets being leaked?
>>
>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> being able to prevent tampering of the guest. One example he mentioned is
>> a research paper [1] about running the hypervisor itself inside an
>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> with TDX using secure enclaves or some other mechanism?
>
> Or even just secureboot based root of trust?

You mean host secureboot? Or guest?

If it’s host, then the problem is detecting malicious tampering with
host code (whether it’s kernel or hypervisor).

If it’s guest, at the moment at least, the measurements do not extend
beyond the TCB.

>
> --
> MST
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>
>
> > On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> Finally, security considerations that apply irrespective of whether the
> >> platform is confidential or not are also outside of the scope of this
> >> document. This includes topics ranging from timing attacks to social
> >> engineering.
> >
> > Why are timing attacks by hypervisor on the guest out of scope?
>
> Good point.
>
> I was thinking that mitigation against timing attacks is the same
> irrespective of the source of the attack. However, because the HV
> controls CPU time allocation, there are presumably attacks that
> are made much easier through the HV. Those should be listed.

Not just that, also because it can and does emulate some devices.
For example, are disk encryption systems protected against timing of
disk accesses?
This is why some people keep saying "forget about emulated devices, require
passthrough, include devices in the trust zone".

> >
> >> </doc>
> >>
> >> Feel free to comment and reword at will ;-)
> >>
> >>
> >> 3/ PCI-as-a-threat: where does that come from
> >>
> >> Isn't there a fundamental difference, from a threat model perspective,
> >> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> should defeat) and compromised software feeding us bad data? I think there
> >> is: at leats inside the TCB, we can detect bad software using measurements,
> >> and prevent it from running using attestation. In other words, we first
> >> check what we will run, then we run it. The security there is that we know
> >> what we are running. The trust we have in the software is from testing,
> >> reviewing or using it.
> >>
> >> This relies on a key aspect provided by TDX and SEV, which is that the
> >> software being measured is largely tamper-resistant thanks to memory
> >> encryption. In other words, after you have measured your guest software
> >> stack, the host or hypervisor cannot willy-nilly change it.
> >>
> >> So this brings me to the next question: is there any way we could offer the
> >> same kind of service for KVM and qemu? The measurement part seems relatively
> >> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> me. But maybe someone else will have a brilliant idea?
> >>
> >> So I'm asking the question, because if you could somehow prove to the guest
> >> not only that it's running the right guest stack (as we can do today) but
> >> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> this is something which is evidently easier to deal with.
> >
> > Agree absolutely that's much easier.
> >
> >> I briefly discussed this with James, and he pointed out two interesting
> >> aspects of that question:
> >>
> >> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >> care about either virtio devices, or physical ones being passed through
> >> to the guest. Let's assume physical ones can be trusted, see above.
> >> That leaves virtio devices. How much damage can a malicious virtio device
> >> do to the guest kernel, and can this lead to secrets being leaked?
> >>
> >> 2/ He was not as negative as I anticipated on the possibility of somehow
> >> being able to prevent tampering of the guest. One example he mentioned is
> >> a research paper [1] about running the hypervisor itself inside an
> >> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >> with TDX using secure enclaves or some other mechanism?
> >
> > Or even just secureboot based root of trust?
>
> You mean host secureboot? Or guest?
>
> If it’s host, then the problem is detecting malicious tampering with
> host code (whether it’s kernel or hypervisor).

Host. Lots of existing systems do this. As an extreme boot a RO disk,
limit which packages are allowed.

> If it’s guest, at the moment at least, the measurements do not extend
> beyond the TCB.
>
> >
> > --
> > MST
> >
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>>
>>> Why are timing attacks by hypervisor on the guest out of scope?
>>
>> Good point.
>>
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
>
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".
>
>>>
>>>> </doc>
>>>>
>>>> Feel free to comment and reword at will ;-)
>>>>
>>>>
>>>> 3/ PCI-as-a-threat: where does that come from
>>>>
>>>> Isn't there a fundamental difference, from a threat model perspective,
>>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>>>> should defeat) and compromised software feeding us bad data? I think there
>>>> is: at leats inside the TCB, we can detect bad software using measurements,
>>>> and prevent it from running using attestation. In other words, we first
>>>> check what we will run, then we run it. The security there is that we know
>>>> what we are running. The trust we have in the software is from testing,
>>>> reviewing or using it.
>>>>
>>>> This relies on a key aspect provided by TDX and SEV, which is that the
>>>> software being measured is largely tamper-resistant thanks to memory
>>>> encryption. In other words, after you have measured your guest software
>>>> stack, the host or hypervisor cannot willy-nilly change it.
>>>>
>>>> So this brings me to the next question: is there any way we could offer the
>>>> same kind of service for KVM and qemu? The measurement part seems relatively
>>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>>>> me. But maybe someone else will have a brilliant idea?
>>>>
>>>> So I'm asking the question, because if you could somehow prove to the guest
>>>> not only that it's running the right guest stack (as we can do today) but
>>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>>>> this is something which is evidently easier to deal with.
>>>
>>> Agree absolutely that's much easier.
>>>
>>>> I briefly discussed this with James, and he pointed out two interesting
>>>> aspects of that question:
>>>>
>>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>>>> care about either virtio devices, or physical ones being passed through
>>>> to the guest. Let's assume physical ones can be trusted, see above.
>>>> That leaves virtio devices. How much damage can a malicious virtio device
>>>> do to the guest kernel, and can this lead to secrets being leaked?
>>>>
>>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>>>> being able to prevent tampering of the guest. One example he mentioned is
>>>> a research paper [1] about running the hypervisor itself inside an
>>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>>>> with TDX using secure enclaves or some other mechanism?
>>>
>>> Or even just secureboot based root of trust?
>>
>> You mean host secureboot? Or guest?
>>
>> If it’s host, then the problem is detecting malicious tampering with
>> host code (whether it’s kernel or hypervisor).
>
> Host. Lots of existing systems do this. As an extreme boot a RO disk,
> limit which packages are allowed.

Is that provable to the guest?

Consider a cloud provider doing that: how do they prove to their guest:

a) What firmware, kernel and kvm they run

b) That what they booted cannot be maliciouly modified, e.g. by a rogue
device driver installed by a rogue sysadmin

My understanding is that SecureBoot is only intended to prevent non-verified
operating systems from booting. So the proof is given to the cloud provider,
and the proof is that the system boots successfully.

After that, I think all bets are off. SecureBoot does little AFAICT
to prevent malicious modifications of the running system by someone with
root access, including deliberately loading a malicious kvm-zilog.ko

It does not mean it cannot be done, just that I don’t think we
have the tools at the moment.

>
>> If it’s guest, at the moment at least, the measurements do not extend
>> beyond the TCB.
>>
>>>
>>> --
>>> MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>
>
> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >>
> >>
> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>
> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >>>> Finally, security considerations that apply irrespective of whether the
> >>>> platform is confidential or not are also outside of the scope of this
> >>>> document. This includes topics ranging from timing attacks to social
> >>>> engineering.
> >>>
> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >>
> >> Good point.
> >>
> >> I was thinking that mitigation against timing attacks is the same
> >> irrespective of the source of the attack. However, because the HV
> >> controls CPU time allocation, there are presumably attacks that
> >> are made much easier through the HV. Those should be listed.
> >
> > Not just that, also because it can and does emulate some devices.
> > For example, are disk encryption systems protected against timing of
> > disk accesses?
> > This is why some people keep saying "forget about emulated devices, require
> > passthrough, include devices in the trust zone".
> >
> >>>
> >>>> </doc>
> >>>>
> >>>> Feel free to comment and reword at will ;-)
> >>>>
> >>>>
> >>>> 3/ PCI-as-a-threat: where does that come from
> >>>>
> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >>>> should defeat) and compromised software feeding us bad data? I think there
> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >>>> and prevent it from running using attestation. In other words, we first
> >>>> check what we will run, then we run it. The security there is that we know
> >>>> what we are running. The trust we have in the software is from testing,
> >>>> reviewing or using it.
> >>>>
> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >>>> software being measured is largely tamper-resistant thanks to memory
> >>>> encryption. In other words, after you have measured your guest software
> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >>>>
> >>>> So this brings me to the next question: is there any way we could offer the
> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >>>> me. But maybe someone else will have a brilliant idea?
> >>>>
> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >>>> not only that it's running the right guest stack (as we can do today) but
> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >>>> this is something which is evidently easier to deal with.
> >>>
> >>> Agree absolutely that's much easier.
> >>>
> >>>> I briefly discussed this with James, and he pointed out two interesting
> >>>> aspects of that question:
> >>>>
> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >>>> care about either virtio devices, or physical ones being passed through
> >>>> to the guest. Let's assume physical ones can be trusted, see above.
> >>>> That leaves virtio devices. How much damage can a malicious virtio device
> >>>> do to the guest kernel, and can this lead to secrets being leaked?
> >>>>
> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >>>> being able to prevent tampering of the guest. One example he mentioned is
> >>>> a research paper [1] about running the hypervisor itself inside an
> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >>>> with TDX using secure enclaves or some other mechanism?
> >>>
> >>> Or even just secureboot based root of trust?
> >>
> >> You mean host secureboot? Or guest?
> >>
> >> If it’s host, then the problem is detecting malicious tampering with
> >> host code (whether it’s kernel or hypervisor).
> >
> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
> > limit which packages are allowed.
>
> Is that provable to the guest?
>
> Consider a cloud provider doing that: how do they prove to their guest:
>
> a) What firmware, kernel and kvm they run
>
> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
> device driver installed by a rogue sysadmin
>
> My understanding is that SecureBoot is only intended to prevent non-verified
> operating systems from booting. So the proof is given to the cloud provider,
> and the proof is that the system boots successfully.

I think I should have said measured boot not secure boot.

>
> After that, I think all bets are off. SecureBoot does little AFAICT
> to prevent malicious modifications of the running system by someone with
> root access, including deliberately loading a malicious kvm-zilog.ko

So disable module loading then or don't allow root access?

>
> It does not mean it cannot be done, just that I don’t think we
> have the tools at the moment.

Phones, chromebooks do this all the time ...

> >
> >> If it’s guest, at the moment at least, the measurements do not extend
> >> beyond the TCB.
> >>
> >>>
> >>> --
> >>> MST
>
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>
>> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >
>> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>> >>
>> >>
>> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >>>
>> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>> >>>> Finally, security considerations that apply irrespective of whether the
>> >>>> platform is confidential or not are also outside of the scope of this
>> >>>> document. This includes topics ranging from timing attacks to social
>> >>>> engineering.
>> >>>
>> >>> Why are timing attacks by hypervisor on the guest out of scope?
>> >>
>> >> Good point.
>> >>
>> >> I was thinking that mitigation against timing attacks is the same
>> >> irrespective of the source of the attack. However, because the HV
>> >> controls CPU time allocation, there are presumably attacks that
>> >> are made much easier through the HV. Those should be listed.
>> >
>> > Not just that, also because it can and does emulate some devices.
>> > For example, are disk encryption systems protected against timing of
>> > disk accesses?
>> > This is why some people keep saying "forget about emulated devices, require
>> > passthrough, include devices in the trust zone".
>> >
>> >>>
>> >>>> </doc>
>> >>>>
>> >>>> Feel free to comment and reword at will ;-)
>> >>>>
>> >>>>
>> >>>> 3/ PCI-as-a-threat: where does that come from
>> >>>>
>> >>>> Isn't there a fundamental difference, from a threat model perspective,
>> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
>> >>>> should defeat) and compromised software feeding us bad data? I think there
>> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
>> >>>> and prevent it from running using attestation. In other words, we first
>> >>>> check what we will run, then we run it. The security there is that we know
>> >>>> what we are running. The trust we have in the software is from testing,
>> >>>> reviewing or using it.
>> >>>>
>> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
>> >>>> software being measured is largely tamper-resistant thanks to memory
>> >>>> encryption. In other words, after you have measured your guest software
>> >>>> stack, the host or hypervisor cannot willy-nilly change it.
>> >>>>
>> >>>> So this brings me to the next question: is there any way we could offer the
>> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
>> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
>> >>>> me. But maybe someone else will have a brilliant idea?
>> >>>>
>> >>>> So I'm asking the question, because if you could somehow prove to the guest
>> >>>> not only that it's running the right guest stack (as we can do today) but
>> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
>> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
>> >>>> this is something which is evidently easier to deal with.
>> >>>
>> >>> Agree absolutely that's much easier.
>> >>>
>> >>>> I briefly discussed this with James, and he pointed out two interesting
>> >>>> aspects of that question:
>> >>>>
>> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
>> >>>> care about either virtio devices, or physical ones being passed through
>> >>>> to the guest. Let's assume physical ones can be trusted, see above.
>> >>>> That leaves virtio devices. How much damage can a malicious virtio device
>> >>>> do to the guest kernel, and can this lead to secrets being leaked?
>> >>>>
>> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
>> >>>> being able to prevent tampering of the guest. One example he mentioned is
>> >>>> a research paper [1] about running the hypervisor itself inside an
>> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
>> >>>> with TDX using secure enclaves or some other mechanism?
>> >>>
>> >>> Or even just secureboot based root of trust?
>> >>
>> >> You mean host secureboot? Or guest?
>> >>
>> >> If it’s host, then the problem is detecting malicious tampering with
>> >> host code (whether it’s kernel or hypervisor).
>> >
>> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
>> > limit which packages are allowed.
>>
>> Is that provable to the guest?
>>
>> Consider a cloud provider doing that: how do they prove to their guest:
>>
>> a) What firmware, kernel and kvm they run
>>
>> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
>> device driver installed by a rogue sysadmin
>>
>> My understanding is that SecureBoot is only intended to prevent non-verified
>> operating systems from booting. So the proof is given to the cloud provider,
>> and the proof is that the system boots successfully.
>
> I think I should have said measured boot not secure boot.

The problem again is how you prove to the guest that you are not lying?

We know how to do that from a guest [1], but you will note that in the
normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
proves the validity of the measurements of the TCB by encrypting it with an
attestation signing key derived from some chip-unique secret. For AMD, this
is called the VCEK, and TDX has something similar. In the case of SEV, this
goes through firmware, and you have to tell the firmware each time you
insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
to a VM execution context. I do not believe there is any provision to do the
same thing to measure host data. And again, it would be somewhat pointless
if there isn't also a mechanism to ensure the host data is not changed after
the measurement.

Now, I don't think it would be super-difficult to add a firmware service
that would let the host do some kind of equivalent to PVALIDATE, setting
some physical pages aside that then get measured and become inaccessible to
the host. The PSP or similar could then integrate these measurements as part
of the TCB, and the fact that the pages were "transferred" to this special
invariant block would ensure the guests that the code will not change after
being measured.

I am not aware that such a mechanism exists on any of the existing CC
platforms. Please feel free to enlighten me if I'm wrong.

[1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
>
>>
>> After that, I think all bets are off. SecureBoot does little AFAICT
>> to prevent malicious modifications of the running system by someone with
>> root access, including deliberately loading a malicious kvm-zilog.ko
>
> So disable module loading then or don't allow root access?

Who would do that?

The problem is that we have a host and a tenant, and the tenant does not
trust the host in principle. So it is not sufficient for the host to disable
module loading or carefully control root access. It is also necessary to
prove to the tenant(s) that this was done.

>
>>
>> It does not mean it cannot be done, just that I don’t think we
>> have the tools at the moment.
>
> Phones, chromebooks do this all the time ...

Indeed, but there, this is to prove to the phone's real owner (which,
surprise, is not the naive person who thought they'd get some kind of
ownership by buying the phone) that the software running on the phone has
not been replaced by some horribly jailbreaked goo.

In other words, the user of the phone gets no proof whatsoever of anything,
except that the phone appears to work. This is somewhat the situation in the
cloud today: the owners of the hardware get all sorts of useful checks, from
SecureBoot to error-correction for memory or I/O devices. However, someone
running in a VM on the cloud gets none of that, just like the user of your
phone.

--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
? 2023/2/1 19:01, Michael S. Tsirkin ??:
> On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
>>
>>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
>>>> Finally, security considerations that apply irrespective of whether the
>>>> platform is confidential or not are also outside of the scope of this
>>>> document. This includes topics ranging from timing attacks to social
>>>> engineering.
>>> Why are timing attacks by hypervisor on the guest out of scope?
>> Good point.
>>
>> I was thinking that mitigation against timing attacks is the same
>> irrespective of the source of the attack. However, because the HV
>> controls CPU time allocation, there are presumably attacks that
>> are made much easier through the HV. Those should be listed.
> Not just that, also because it can and does emulate some devices.
> For example, are disk encryption systems protected against timing of
> disk accesses?
> This is why some people keep saying "forget about emulated devices, require
> passthrough, include devices in the trust zone".


One problem is that the device could be yet another emulated one that is
running in the SmartNIC/DPU itself.

Thanks
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On 2023-01-31 at 10:06 UTC, "Reshetova, Elena" <elena.reshetova@intel.com>
> wrote...
> > Hi Dinechin,
>
> Nit: My first name is actually Christophe ;-)

I am sorry, my automation of extracting names from emails failed here ((

>
> [snip]
>
> >> "The implementation of the #VE handler is simple and does not require an
> >> in-depth security audit or fuzzing since it is not the actual consumer of
> >> the host/VMM supplied untrusted data": The assumption there seems to be
> that
> >> the host will never be able to supply data (e.g. through a bounce buffer)
> >> that it can trick the guest into executing. If that is indeed the
> >> assumption, it is worth mentioning explicitly. I suspect it is a bit weak,
> >> since many earlier attacks were based on executing the wrong code. Notably,
> >> it is worth pointing out that I/O buffers are _not_ encrypted with the CPU
> >> key (as opposed to any device key e.g. for PCI encryption) in either
> >> TDX or SEV. Is there for example anything that precludes TDX or SEV from
> >> executing code in the bounce buffers?
> >
> > This was already replied by Kirill, any code execution out of shared memory
> generates
> > a #GP.
>
> Apologies for my wording. Everyone interpreted "executing" as "executing
> directly on the bounce buffer page", when what I meant is "consuming data
> fetched from the bounce buffers as code" (not necessarily directly).

I guess in theory it is possible, but we have not seen such usages in guest kernel code
in practice during our audit. This would be pretty ugly thing to do imo even if
you forget about confidential computing.


>
> For example, in the diagram in your document, the guest kernel is a
> monolithic piece. In reality, there are dynamically loaded components. In
> the original SEV implementation, with pre-attestation, the measurement could
> only apply before loading any DLKM (I believe, not really sure). As another
> example, SEVerity (CVE-2020-12967 [1]) worked by injecting a payload
> directly into the guest kernel using virtio-based network I/O. That is what
> I referred to when I wrote "many earlier attacks were based on executing the
> wrong code".

The above attack was only possible because an attacker was able to directly
modify the code execution pointer to an arbitrary guest memory address
(in that case guest NMI handler was substituted pointing to attacker payload).
This is an obvious hole in the integrity protection of the guest private memory
and its page table mappings. This is not possible with TDX and I believe with
new versions of AMD SEV also.

>
> The fact that I/O buffers are not encrypted matters here, because it gives
> the host ample latitude to observe or even corrupt all I/Os, as many others
> have pointed out. Notably, disk crypto may not be designed to resist to a
> host that can see and possibly change the I/Os.
>
> So let me rephrase my vague question as a few more precise ones:
>
> 1) What are the effects of semi-random kernel code injection?
>
> If the host knows that a given bounce buffer happens to be used later to
> execute some kernel code, it can start flipping bits in it to try and
> trigger arbitrary code paths in the guest. My understanding is that
> crypto alone (i.e. without additional layers like dm-integrity) will
> happily decrypt that into a code stream with pseudo-random instructions
> in it, not vehemently error out.
>
> So, while TDX precludes the host from writing into guest memory directly,
> since the bounce buffers are shared, TDX will not prevent the host from
> flipping bits there. It's then just a matter of guessing where the bits
> will go, and hoping that some bits execute at guest PL0. Of course, this
> can be mitigated by either only using static configs, or using
> dm-verity/dm-integrity, or maybe some other mechanisms.
>
> Shouldn't that be part of your document? To be clear: you mention under
> "Storage protection" that you use dm-crypt and dm-integrity, so I believe
> *you* know, but your readers may not figure out why dm-integrity is
> integral to the process, notably after you write "Users could use other
> encryption schemes".

Sure, I can elaborate in the storage protection section about the importance
of disk integrity protection.

>
> 2) What are the effects of random user code injection?
>
> It's the same as above, except that now you can target a much wider range
> of input data, including shell scripts, etc. So the attack surface is
> much larger.
>
> 3) What is the effect of data poisoning?
>
> You don't necessarily need to corrupt code. Being able to corrupt a
> system configuration file for example can be largely sufficient.
>
> 4) Are there I/O-based replay attacks that would work pre-attestation?
>
> My current mental model is that you load a "base" software stack into the
> TCB and then measure a relevant part of it. What you measure is somewhat
> implementation-dependent, but in the end, if the system is attested, you
> respond to a cryptographic challenge based on what was measured, and you
> then get relevant secrets, e.g. a disk decryption key, that let you make
> forward progress. However, what happens if every time you boot, the host
> feeds you bogus disk data just to try to steer the boot sequence along
> some specific path?

What you ideally want is a full disk encryption with additional integrity protection,
like aes-gcm authenticated encryption mode. Then there are no questions on the
disk integrity and many attacks are mitigated.

>
> I believe that the short answer is: the guest either:
>
> a) reaches attestation, but with bad in-memory data, so it fails the
> crypto exchange, and secrets are not leaked.
>
> b) does not reach attestation, so never gets the secrets, and therefore
> still fulfils the CC promise of not leaking secrets.
>
> So I personally feel this is OK, but it's worth writing up in your doc.
>

Yes, I will expand the storage section more on this.

>
> Back to the #VE handler, if I can find a way to inject malicious code into
> my guest, what you wrote in that paragraph as a justification for no
> in-depth security still seems like "not exactly defense in depth". I would
> just remove the sentence, audit and fuzz that code with the same energy as
> for anything else that could face bad input.

In fact most of our fuzzing hooks are inside #VE itself if you take a look on the
implementation. They just don’t cover things like the #VE info decoding (information
is provided by a trusted party - TDX module).

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > [...]
> > > > The big threat from most devices (including the thunderbolt
> > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > DMA) because the device has specific unencrypted buffers set aside
> > > > for the expected DMA. If it writes outside that CC integrity will
> > > > detect it and if it reads outside that it gets unintelligible
> > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > out of us by returning unexpected data.
> > >
> > > Yes, by supplying the input that hasn’t been expected. This is
> > > exactly the case we were trying to fix here for example:
> > > https://lore.kernel.org/all/20230119170633.40944-2-
> > alexander.shishkin@linux.intel.com/
> > > I do agree that this case is less severe when others where memory
> > > corruption/buffer overrun can happen, like here:
> > > https://lore.kernel.org/all/20230119135721.83345-6-
> > alexander.shishkin@linux.intel.com/
> > > But we are trying to fix all issues we see now (prioritizing the
> > > second ones though).
> >
> > I don't see how MSI table sizing is a bug in the category we've
> > defined. The very text of the changelog says "resulting in a kernel
> > page fault in pci_write_msg_msix()." which is a crash, which I thought
> > we were agreeing was out of scope for CC attacks?
>
> As I said this is an example of a crash and on the first look
> might not lead to the exploitable condition (albeit attackers are creative).
> But we noticed this one while fuzzing and it was common enough
> that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> The core PCI/MSI doesn’t seem to have that many easily triggerable
> Other examples in virtio patchset are more severe.
>
> >
> > > >
> > > > If I set this as the problem, verifying device correct operation is
> > > > a possible solution (albeit hugely expensive) but there are likely
> > > > many other cheaper ways to defeat or detect a device trying to
> > > > trick us into revealing something.
> > >
> > > What do you have in mind here for the actual devices we need to
> > > enable for CC cases?
> >
> > Well, the most dangerous devices seem to be the virtio set a CC system
> > will rely on to boot up. After that, there are other ways (like SPDM)
> > to verify a real PCI device is on the other end of the transaction.
>
> Yes, it the future, but not yet. Other vendors will not necessary be
> using virtio devices at this point, so we will have non-virtio and not
> CC enabled devices that we want to securely add to the guest.
>
> >
> > > We have been using here a combination of extensive fuzzing and static
> > > code analysis.
> >
> > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > off the table because fuzzing primarily triggers those
>
> If you enable memory sanitizers you can detect more server conditions like
> out of bounds accesses and such. I think given that we have a way to
> verify that fuzzing is reaching the code locations we want it to reach, it
> can be pretty effective method to find at least low-hanging bugs. And these
> will be the bugs that most of the attackers will go after at the first place.
> But of course it is not a formal verification of any kind.
>
> so its hard to
> > see what else it could detect given the signal will be smothered by
> > oopses and secondly I think the PCI interface is likely the wrong place
> > to begin and you should probably begin on the virtio bus and the
> > hypervisor generated configuration space.
>
> This is exactly what we do. We don’t fuzz from the PCI config space,
> we supply inputs from the host/vmm via the legitimate interfaces that it can
> inject them to the guest: whenever guest requests a pci config space
> (which is controlled by host/hypervisor as you said) read operation,
> it gets input injected by the kafl fuzzer. Same for other interfaces that
> are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> via #VE handler in our case). When it comes to virtio, we employ
> two different fuzzing techniques: directly injecting kafl fuzz input when
> virtio core or virtio drivers gets the data received from the host
> (via injecting input in functions virtio16/32/64_to_cpu and others) and
> directly fuzzing DMA memory pages using kfx fuzzer.
> More information can be found in https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
>
> Best Regards,
> Elena.

Hi Elena,

I think it might be a good idea to narrow down a configuration that *can*
reasonably be hardened to be suitable for confidential computing, before
proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
in the context of virtualization, but what about taking PCI out of scope
completely by switching to virtio-mmio devices?

Jeremi
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Tue, Jan 31, 2023 at 11:31:28AM +0000, Reshetova, Elena wrote:
> > > On Mon, 2023-01-30 at 07:42 +0000, Reshetova, Elena wrote:
> > > [...]
> > > > > The big threat from most devices (including the thunderbolt
> > > > > classes) is that they can DMA all over memory.  However, this isn't
> > > > > really a threat in CC (well until PCI becomes able to do encrypted
> > > > > DMA) because the device has specific unencrypted buffers set aside
> > > > > for the expected DMA. If it writes outside that CC integrity will
> > > > > detect it and if it reads outside that it gets unintelligible
> > > > > ciphertext.  So we're left with the device trying to trick secrets
> > > > > out of us by returning unexpected data.
> > > >
> > > > Yes, by supplying the input that hasn’t been expected. This is
> > > > exactly the case we were trying to fix here for example:
> > > > https://lore.kernel.org/all/20230119170633.40944-2-
> > > alexander.shishkin@linux.intel.com/
> > > > I do agree that this case is less severe when others where memory
> > > > corruption/buffer overrun can happen, like here:
> > > > https://lore.kernel.org/all/20230119135721.83345-6-
> > > alexander.shishkin@linux.intel.com/
> > > > But we are trying to fix all issues we see now (prioritizing the
> > > > second ones though).
> > >
> > > I don't see how MSI table sizing is a bug in the category we've
> > > defined. The very text of the changelog says "resulting in a kernel
> > > page fault in pci_write_msg_msix()." which is a crash, which I thought
> > > we were agreeing was out of scope for CC attacks?
> >
> > As I said this is an example of a crash and on the first look
> > might not lead to the exploitable condition (albeit attackers are creative).
> > But we noticed this one while fuzzing and it was common enough
> > that prevented fuzzer going deeper into the virtio devices driver fuzzing.
> > The core PCI/MSI doesn’t seem to have that many easily triggerable
> > Other examples in virtio patchset are more severe.
> >
> > >
> > > > >
> > > > > If I set this as the problem, verifying device correct operation is
> > > > > a possible solution (albeit hugely expensive) but there are likely
> > > > > many other cheaper ways to defeat or detect a device trying to
> > > > > trick us into revealing something.
> > > >
> > > > What do you have in mind here for the actual devices we need to
> > > > enable for CC cases?
> > >
> > > Well, the most dangerous devices seem to be the virtio set a CC system
> > > will rely on to boot up. After that, there are other ways (like SPDM)
> > > to verify a real PCI device is on the other end of the transaction.
> >
> > Yes, it the future, but not yet. Other vendors will not necessary be
> > using virtio devices at this point, so we will have non-virtio and not
> > CC enabled devices that we want to securely add to the guest.
> >
> > >
> > > > We have been using here a combination of extensive fuzzing and static
> > > > code analysis.
> > >
> > > by fuzzing, I assume you mean fuzzing from the PCI configuration space?
> > > Firstly I'm not so sure how useful a tool fuzzing is if we take Oopses
> > > off the table because fuzzing primarily triggers those
> >
> > If you enable memory sanitizers you can detect more server conditions like
> > out of bounds accesses and such. I think given that we have a way to
> > verify that fuzzing is reaching the code locations we want it to reach, it
> > can be pretty effective method to find at least low-hanging bugs. And these
> > will be the bugs that most of the attackers will go after at the first place.
> > But of course it is not a formal verification of any kind.
> >
> > so its hard to
> > > see what else it could detect given the signal will be smothered by
> > > oopses and secondly I think the PCI interface is likely the wrong place
> > > to begin and you should probably begin on the virtio bus and the
> > > hypervisor generated configuration space.
> >
> > This is exactly what we do. We don’t fuzz from the PCI config space,
> > we supply inputs from the host/vmm via the legitimate interfaces that it can
> > inject them to the guest: whenever guest requests a pci config space
> > (which is controlled by host/hypervisor as you said) read operation,
> > it gets input injected by the kafl fuzzer. Same for other interfaces that
> > are under control of host/VMM (MSRs, port IO, MMIO, anything that goes
> > via #VE handler in our case). When it comes to virtio, we employ
> > two different fuzzing techniques: directly injecting kafl fuzz input when
> > virtio core or virtio drivers gets the data received from the host
> > (via injecting input in functions virtio16/32/64_to_cpu and others) and
> > directly fuzzing DMA memory pages using kfx fuzzer.
> > More information can be found in https://intel.github.io/ccc-linux-guest-
> hardening-docs/tdx-guest-hardening.html#td-guest-fuzzing
> >
> > Best Regards,
> > Elena.
>
> Hi Elena,

Hi Jeremi,

>
> I think it might be a good idea to narrow down a configuration that *can*
> reasonably be hardened to be suitable for confidential computing, before
> proceeding with fuzzing. Eg. a lot of time was spent discussing PCI devices
> in the context of virtualization, but what about taking PCI out of scope
> completely by switching to virtio-mmio devices?

I agree that narrowing down is important and we spent a significant effort
in disabling various code we don’t need (including PCI code, like quirks,
early PCI, etc). The decision to use virtio over pci vs. mmio I believe comes
from performance and usage scenarios and we have to best we can with these
limitations.

Moreover, even if we could remove PCI for the virtio devices by
removing the transport dependency, this isn’t possible for other devices that we
know are used in some CC setups: not all CSPs are using virtio-based drivers,
so pretty quickly PCI comes back into hardening scope and we cannot just remove
it unfortunately.

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Christophe de Dinechin (dinechin@redhat.com) wrote:
>
> On 2023-02-01 at 11:02 -05, "Michael S. Tsirkin" <mst@redhat.com> wrote...
> > On Wed, Feb 01, 2023 at 02:15:10PM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >>
> >>
> >> > On 1 Feb 2023, at 12:01, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >
> >> > On Wed, Feb 01, 2023 at 11:52:27AM +0100, Christophe de Dinechin Dupont de Dinechin wrote:
> >> >>
> >> >>
> >> >>> On 31 Jan 2023, at 18:39, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >>>
> >> >>> On Tue, Jan 31, 2023 at 04:14:29PM +0100, Christophe de Dinechin wrote:
> >> >>>> Finally, security considerations that apply irrespective of whether the
> >> >>>> platform is confidential or not are also outside of the scope of this
> >> >>>> document. This includes topics ranging from timing attacks to social
> >> >>>> engineering.
> >> >>>
> >> >>> Why are timing attacks by hypervisor on the guest out of scope?
> >> >>
> >> >> Good point.
> >> >>
> >> >> I was thinking that mitigation against timing attacks is the same
> >> >> irrespective of the source of the attack. However, because the HV
> >> >> controls CPU time allocation, there are presumably attacks that
> >> >> are made much easier through the HV. Those should be listed.
> >> >
> >> > Not just that, also because it can and does emulate some devices.
> >> > For example, are disk encryption systems protected against timing of
> >> > disk accesses?
> >> > This is why some people keep saying "forget about emulated devices, require
> >> > passthrough, include devices in the trust zone".
> >> >
> >> >>>
> >> >>>> </doc>
> >> >>>>
> >> >>>> Feel free to comment and reword at will ;-)
> >> >>>>
> >> >>>>
> >> >>>> 3/ PCI-as-a-threat: where does that come from
> >> >>>>
> >> >>>> Isn't there a fundamental difference, from a threat model perspective,
> >> >>>> between a bad actor, say a rogue sysadmin dumping the guest memory (which CC
> >> >>>> should defeat) and compromised software feeding us bad data? I think there
> >> >>>> is: at leats inside the TCB, we can detect bad software using measurements,
> >> >>>> and prevent it from running using attestation. In other words, we first
> >> >>>> check what we will run, then we run it. The security there is that we know
> >> >>>> what we are running. The trust we have in the software is from testing,
> >> >>>> reviewing or using it.
> >> >>>>
> >> >>>> This relies on a key aspect provided by TDX and SEV, which is that the
> >> >>>> software being measured is largely tamper-resistant thanks to memory
> >> >>>> encryption. In other words, after you have measured your guest software
> >> >>>> stack, the host or hypervisor cannot willy-nilly change it.
> >> >>>>
> >> >>>> So this brings me to the next question: is there any way we could offer the
> >> >>>> same kind of service for KVM and qemu? The measurement part seems relatively
> >> >>>> easy. Thetamper-resistant part, on the other hand, seems quite difficult to
> >> >>>> me. But maybe someone else will have a brilliant idea?
> >> >>>>
> >> >>>> So I'm asking the question, because if you could somehow prove to the guest
> >> >>>> not only that it's running the right guest stack (as we can do today) but
> >> >>>> also a known host/KVM/hypervisor stack, we would also switch the potential
> >> >>>> issues with PCI, MSRs and the like from "malicious" to merely "bogus", and
> >> >>>> this is something which is evidently easier to deal with.
> >> >>>
> >> >>> Agree absolutely that's much easier.
> >> >>>
> >> >>>> I briefly discussed this with James, and he pointed out two interesting
> >> >>>> aspects of that question:
> >> >>>>
> >> >>>> 1/ In the CC world, we don't really care about *virtual* PCI devices. We
> >> >>>> care about either virtio devices, or physical ones being passed through
> >> >>>> to the guest. Let's assume physical ones can be trusted, see above.
> >> >>>> That leaves virtio devices. How much damage can a malicious virtio device
> >> >>>> do to the guest kernel, and can this lead to secrets being leaked?
> >> >>>>
> >> >>>> 2/ He was not as negative as I anticipated on the possibility of somehow
> >> >>>> being able to prevent tampering of the guest. One example he mentioned is
> >> >>>> a research paper [1] about running the hypervisor itself inside an
> >> >>>> "outer" TCB, using VMPLs on AMD. Maybe something similar can be achieved
> >> >>>> with TDX using secure enclaves or some other mechanism?
> >> >>>
> >> >>> Or even just secureboot based root of trust?
> >> >>
> >> >> You mean host secureboot? Or guest?
> >> >>
> >> >> If it’s host, then the problem is detecting malicious tampering with
> >> >> host code (whether it’s kernel or hypervisor).
> >> >
> >> > Host. Lots of existing systems do this. As an extreme boot a RO disk,
> >> > limit which packages are allowed.
> >>
> >> Is that provable to the guest?
> >>
> >> Consider a cloud provider doing that: how do they prove to their guest:
> >>
> >> a) What firmware, kernel and kvm they run
> >>
> >> b) That what they booted cannot be maliciouly modified, e.g. by a rogue
> >> device driver installed by a rogue sysadmin
> >>
> >> My understanding is that SecureBoot is only intended to prevent non-verified
> >> operating systems from booting. So the proof is given to the cloud provider,
> >> and the proof is that the system boots successfully.
> >
> > I think I should have said measured boot not secure boot.
>
> The problem again is how you prove to the guest that you are not lying?
>
> We know how to do that from a guest [1], but you will note that in the
> normal process, a trusted hardware component (e.g. the PSP for AMD SEV)
> proves the validity of the measurements of the TCB by encrypting it with an
> attestation signing key derived from some chip-unique secret. For AMD, this
> is called the VCEK, and TDX has something similar. In the case of SEV, this
> goes through firmware, and you have to tell the firmware each time you
> insert data in the original TCB (using SNP_LAUNCH_UPDATE). This is all tied
> to a VM execution context. I do not believe there is any provision to do the
> same thing to measure host data. And again, it would be somewhat pointless
> if there isn't also a mechanism to ensure the host data is not changed after
> the measurement.
>
> Now, I don't think it would be super-difficult to add a firmware service
> that would let the host do some kind of equivalent to PVALIDATE, setting
> some physical pages aside that then get measured and become inaccessible to
> the host. The PSP or similar could then integrate these measurements as part
> of the TCB, and the fact that the pages were "transferred" to this special
> invariant block would ensure the guests that the code will not change after
> being measured.
>
> I am not aware that such a mechanism exists on any of the existing CC
> platforms. Please feel free to enlighten me if I'm wrong.
>
> [1] https://www.redhat.com/en/blog/understanding-confidential-containers-attestation-flow
> >
> >>
> >> After that, I think all bets are off. SecureBoot does little AFAICT
> >> to prevent malicious modifications of the running system by someone with
> >> root access, including deliberately loading a malicious kvm-zilog.ko
> >
> > So disable module loading then or don't allow root access?
>
> Who would do that?
>
> The problem is that we have a host and a tenant, and the tenant does not
> trust the host in principle. So it is not sufficient for the host to disable
> module loading or carefully control root access. It is also necessary to
> prove to the tenant(s) that this was done.
>
> >
> >>
> >> It does not mean it cannot be done, just that I don’t think we
> >> have the tools at the moment.
> >
> > Phones, chromebooks do this all the time ...
>
> Indeed, but there, this is to prove to the phone's real owner (which,
> surprise, is not the naive person who thought they'd get some kind of
> ownership by buying the phone) that the software running on the phone has
> not been replaced by some horribly jailbreaked goo.
>
> In other words, the user of the phone gets no proof whatsoever of anything,
> except that the phone appears to work. This is somewhat the situation in the
> cloud today: the owners of the hardware get all sorts of useful checks, from
> SecureBoot to error-correction for memory or I/O devices. However, someone
> running in a VM on the cloud gets none of that, just like the user of your
> phone.

Assuming you do a measured boot, the host OS and firmware is measured into the host TPM;
people have thought in the past about triggering attestations of the
host from the guest; then you could have something external attest the
host and only release keys to the guests disks if the attestation is
correct; or a key for the guests disks held in the hosts TPM.

Dave

> --
> Cheers,
> Christophe de Dinechin (https://c3d.github.io)
> Theory of Incomplete Measurements (https://c3d.github.io/TIM)
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> Hi Greg,
>
> You mentioned couple of times (last time in this recent thread:
> https://lore.kernel.org/all/Y80WtujnO7kfduAZ@kroah.com/) that we ought to start
> discussing the updated threat model for kernel, so this email is a start in this direction.
>
> (Note: I tried to include relevant people from different companies, as well as linux-coco
> mailing list, but I hope everyone can help by including additional people as needed).
>
> As we have shared before in various lkml threads/conference presentations
> ([1], [2], [3] and many others), for the Confidential Computing guest kernel, we have a
> change in the threat model where guest kernel doesn’t anymore trust the hypervisor.
> This is a big change in the threat model and requires both careful assessment of the
> new (hypervisor <-> guest kernel) attack surface, as well as careful design of mitigations
> and security validation techniques. This is the activity that we have started back at Intel
> and the current status can be found in
>
> 1) Threat model and potential mitigations:
> https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html>
> 2) One of the described in the above doc mitigations is "hardening of the enabled
> code". What we mean by this, as well as techniques that are being used are
> described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html

Regarding driver hardening, does anyone have a better filtering idea?

The current solution assumes the kernel command line is trusted and cannot
avoid the __init() functions that waste memory. I don't know if the
__exit() routines of the filtered devices are called, but it doesn't sound
much better to allocate memory and free it right after.

>
> 3) All the tools are open-source and everyone can start using them right away even
> without any special HW (readme has description of what is needed).
> Tools and documentation is here:
> https://github.com/intel/ccc-linux-guest-hardening
>
> 4) all not yet upstreamed linux patches (that we are slowly submitting) can be found
> here: https://github.com/intel/tdx/commits/guest-next
>
> So, my main question before we start to argue about the threat model, mitigations, etc,
> is what is the good way to get this reviewed to make sure everyone is aligned?
> There are a lot of angles and details, so what is the most efficient method?
> Should I split the threat model from https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html
> into logical pieces and start submitting it to mailing list for discussion one by one?
> Any other methods?
>
> The original plan we had in mind is to start discussing the relevant pieces when submitting the code,
> i.e. when submitting the device filter patches, we will include problem statement, threat model link,
> data, alternatives considered, etc.
>
> Best Regards,
> Elena.
>
> [1] https://lore.kernel.org/all/20210804174322.2898409-1-sathyanarayanan.kuppuswamy@linux.intel.com/
> [2] https://lpc.events/event/16/contributions/1328/
> [3] https://events.linuxfoundation.org/archive/2022/linux-security-summit-north-america/program/schedule/

Thanks,
Carlos
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
> On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> > 2) One of the described in the above doc mitigations is "hardening of the enabled
> > code". What we mean by this, as well as techniques that are being used are
> > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>
> Regarding driver hardening, does anyone have a better filtering idea?
>
> The current solution assumes the kernel command line is trusted and cannot
> avoid the __init() functions that waste memory.

That is two different things (command line trust and __init()
functions), so I do not understand the relationship at all here. Please
explain it better.

Also, why would an __init() function waste memory? Memory usage isn't
an issue here, right?

> I don't know if the
> __exit() routines of the filtered devices are called, but it doesn't sound
> much better to allocate memory and free it right after.

What device has a __exit() function? Drivers have module init/exit
functions but they should do nothing but register themselves with the
relevant busses and they are only loaded if the device is found in the
system.

And what exactly is incorrect about allocating memory and then freeing
it when not needed?

So again, I don't understand the question, sorry.

thanks,

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2/7/23 00:03, Greg Kroah-Hartman wrote:

> On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
>> On 1/25/23 6:28 AM, Reshetova, Elena wrote:
>>> 2) One of the described in the above doc mitigations is "hardening of the enabled
>>> code". What we mean by this, as well as techniques that are being used are
>>> described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
>> Regarding driver hardening, does anyone have a better filtering idea?
>>
>> The current solution assumes the kernel command line is trusted and cannot
>> avoid the __init() functions that waste memory.
> That is two different things (command line trust and __init()
> functions), so I do not understand the relationship at all here. Please
> explain it better.


No relation other than it would be nice to have a solution that does not
require kernel command line and that prevents __init()s.


>
> Also, why would an __init() function waste memory? Memory usage isn't
> an issue here, right?
>
>> I don't know if the
>> __exit() routines of the filtered devices are called, but it doesn't sound
>> much better to allocate memory and free it right after.
> What device has a __exit() function? Drivers have module init/exit
> functions but they should do nothing but register themselves with the
> relevant busses and they are only loaded if the device is found in the
> system.
>
> And what exactly is incorrect about allocating memory and then freeing
> it when not needed?


Currently proposed device filtering does not stop the __init() functions
from these drivers to be called. Whatever memory is allocated by
blacklisted drivers is wasted because those drivers cannot ever be used.
Sure, memory can be allocated and freed as soon as it is no longer needed,
but these memory would never be needed.


More pressing concern than wasted memory, which may be unimportant, there's
the issue of what are those driver init functions doing. For example, as
part of device setup, MMIO regs may be involved, which we cannot trust. It's
a lot more code to worry about from a CoCo perspective.


>
> So again, I don't understand the question, sorry.


Given the limitations of current approach, does anyone have any other ideas
for filtering devices prior to their initialization?


>
> thanks,
>
> greg k-h


Thanks,
Carlos
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> Given the limitations of current approach, does anyone have any other ideas
> for filtering devices prior to their initialization?

/me mumbles ... something something ... bpf ...

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> Currently proposed device filtering does not stop the __init() functions
> from these drivers to be called. Whatever memory is allocated by
> blacklisted drivers is wasted because those drivers cannot ever be used.
> Sure, memory can be allocated and freed as soon as it is no longer needed,
> but these memory would never be needed.
>
>
> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Why not just simply compile a special CoCo kernel that doesn't have
any drivers that you don't trust. Now, the distros may be pushing
back in that they don't want to support a separate kernel image. But
this apparently really a pain allocation negotiation, isn't it? Intel
and other companies want to make $$$$$ with CoCo.

In order to make $$$$$, you need to push the costs onto various
different players in the ecosystem. This is cleverly disguised as
taking current perfectly acceptable design paradigm when the trust
boundary is in the traditional location, and causing all of the
assumptions which you have broken as "bugs" that must be fixed by
upstream developers.

But another place to push the costs is to the distro vendors, who
might need to maintain a separate CoCo kernel that is differently
configured. Now, Red Hat and company will no doubt push back. But
the usptream development community will also push back if you try to
dump too much work on *us*.

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 01:53:34PM -0600, Carlos Bilbao wrote:
> On 2/7/23 00:03, Greg Kroah-Hartman wrote:
>
> > On Mon, Feb 06, 2023 at 06:27:48PM -0600, Carlos Bilbao wrote:
> > > On 1/25/23 6:28 AM, Reshetova, Elena wrote:
> > > > 2) One of the described in the above doc mitigations is "hardening of the enabled
> > > > code". What we mean by this, as well as techniques that are being used are
> > > > described in this document: > https://intel.github.io/ccc-linux-guest-hardening-docs/tdx-guest-hardening.html
> > > Regarding driver hardening, does anyone have a better filtering idea?
> > >
> > > The current solution assumes the kernel command line is trusted and cannot
> > > avoid the __init() functions that waste memory.
> > That is two different things (command line trust and __init()
> > functions), so I do not understand the relationship at all here. Please
> > explain it better.
>
>
> No relation other than it would be nice to have a solution that does not
> require kernel command line and that prevents __init()s.

Again, __init() has nothing to do with the kernel command line so I do
not understand the relationship here. Have a specific example?

> > Also, why would an __init() function waste memory? Memory usage isn't
> > an issue here, right?
> >
> > > I don't know if the
> > > __exit() routines of the filtered devices are called, but it doesn't sound
> > > much better to allocate memory and free it right after.
> > What device has a __exit() function? Drivers have module init/exit
> > functions but they should do nothing but register themselves with the
> > relevant busses and they are only loaded if the device is found in the
> > system.
> >
> > And what exactly is incorrect about allocating memory and then freeing
> > it when not needed?
>
>
> Currently proposed device filtering does not stop the __init() functions
> from these drivers to be called. Whatever memory is allocated by
> blacklisted drivers is wasted because those drivers cannot ever be used.
> Sure, memory can be allocated and freed as soon as it is no longer needed,
> but these memory would never be needed.

Drivers are never even loaded if the hardware is not present, and a
driver init function should do nothing anyway if it is written properly,
so again, I do not understand what you are referring to here.

Again, a real example might help explain your concerns, pointers to the
code?

> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Again, specific example?

And if you don't want a driver to be loaded, don't build it into your
kernel as Ted said. Or better yet, use the in-kernel functionality to
prevent drivers from ever loading or binding to a device until you tell
it from userspace that it is safe to do so.

So I don't think this is a real issue unless you have pointers to code
you are concerned about.

> > So again, I don't understand the question, sorry.
>
> Given the limitations of current approach, does anyone have any other ideas
> for filtering devices prior to their initialization?

What is wrong with the functionality we have today for this very thing?
Does it not work properly for you? If so, why not, for what devices and
drivers and busses do you still have problems with?

thanks,

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> Why not just simply compile a special CoCo kernel that doesn't have
> any drivers that you don't trust.

Or at least, start with that? You can then gradually expand that until
some config is both acceptable to distros and seems sufficiently trusty
to the CoCo project. Lots of kernel features got upstreamed this way.
Requirement to have an arbitrary config satisfy CoCo seems like a very
high bar to clear.

--
MST
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> No relation other than it would be nice to have a solution that does not
>require kernel command line and that prevents __init()s.

For __inits see below. For the command line, it is pretty straightforward to
measure it and attest its integrity later: we need to do it for other parts
anyhow as acpi tables, etc. So I don’t see why we need to do smth special
about it? In any case it is indeed very different from driver discussion and
goes into "what should be covered by attestation for CC guest" topic.

> More pressing concern than wasted memory, which may be unimportant, there's
> the issue of what are those driver init functions doing. For example, as
> part of device setup, MMIO regs may be involved, which we cannot trust. It's
> a lot more code to worry about from a CoCo perspective.

Yes, we have seen such cases in kernel where drivers or modules would access
MMIO or pci config space already in their __init() functions.
Some concrete examples from modules and drivers (there are more):

intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch()
skx_init() -> get_all_munits()
skx_init() -> skx_register_mci() -> skx_get_dimm_config()
intel_rng_mod_init() -> intel_init_hw_struct()
i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log()

However, this is how we address this from security point of view:

1. In order for a MMIO read to obtain data from a untrusted host, the memory
range must be shared with the host to begin with. We enforce that
all MMIO mappings are private by default to the CC guest unless it is
explicitly shared (and we do automatically share for the authorized devices
and their drivers from the allow list). This removes a problem of an
"unexpected MMIO region interaction"
(modulo acpi AML operation regions that we do have to share also unfortunately,
but acpi is a whole different difficult case on its own).

2. For pci config space, we limit any interaction with pci config
space only to authorized devices and their drivers (that are in the allow list).
As a result device drivers outside of the allow list are not able to access pci
config space even in their __init routines. It is done by setting the
to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized
devices.

So, even if host made the driver __init function to run
(by faking the device on the host side), it should not be able to supply any
malicious data to it via MMIO or pci config space, so running their __init
routines should be ok from security point of view or does anyone see any
holes here?

Best Regards,
Elena.
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> > Why not just simply compile a special CoCo kernel that doesn't have
> > any drivers that you don't trust.

Aside from complexity and scalability management of such a config that has
to change with every kernel release, what about the build-in platform drivers?
I am not a driver expert here but as far as I understand they cannot be disabled
via config. Please correct if this statement is wrong.

> In order to make $$$$$, you need to push the costs onto various
> different players in the ecosystem. This is cleverly disguised as
> taking current perfectly acceptable design paradigm when the trust
> boundary is in the traditional location, and causing all of the
> assumptions which you have broken as "bugs" that must be fixed by
> upstream developers.

The CC threat model does change the traditional linux trust boundary regardless of
what mitigations are used (kernel config vs. runtime filtering). Because for the
drivers that CoCo guest happens to need, there is no way to fix this problem by
either of these mechanisms (we cannot disable the code that we need), unless somebody
writes a totally new set of coco specific drivers (who needs another set of
CoCo specific virtio drivers in the kernel?).

So, if the path is to be able to use existing driver kernel code, then we need:

1. these selective CoCo guest required drivers (small set) needs to be hardened
(or whatever word people prefer to use here), which only means that in
the presence of malicious host/hypervisor that can manipulate pci config space,
port IO and MMIO, these drivers should not expose CC guest memory
confidentiality or integrity (including via privilege escalation into CC guest).
Please note that this only applies to a small set (in tdx virtio setup we have less
than 10 of them) of drivers and does not present invasive changes to the kernel
code. There is also an additional core pci/msi code that is involved with discovery
and configuration of these drivers, this code also falls into the category we need to
make robust.

2. rest of non-needed drivers must be disabled. Here we can argue about what
is the correct method of doing this and who should bare the costs of enforcing it.
But from pure security point of view: the method that is simple and clear, that
requires as little maintenance as possible usually has the biggest chance of
enforcing security.
And given that we already have the concept of authorized devices in Linux,
does this method really brings so much additional complexity to the kernel?
But hard to argue here without the code: we need to submit the filter proposal first
(under internal review still).

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
>
>
> > On Tue, Feb 07, 2023 at 08:51:56PM -0500, Theodore Ts'o wrote:
> > > Why not just simply compile a special CoCo kernel that doesn't have
> > > any drivers that you don't trust.
>
> Aside from complexity and scalability management of such a config that has
> to change with every kernel release, what about the build-in platform drivers?

What do you mean by "built in platform drivers"? You are creating a
.config for a specific cloud platform, just only select the drivers for
that exact configuration and you should be fine.

And as for the management of such a config, distros do this just fine,
why can't you? It's not that hard to manage properly.

> I am not a driver expert here but as far as I understand they cannot be disabled
> via config. Please correct if this statement is wrong.

Again, which specific drivers are you referring to? And why are they a
problem?

> > In order to make $$$$$, you need to push the costs onto various
> > different players in the ecosystem. This is cleverly disguised as
> > taking current perfectly acceptable design paradigm when the trust
> > boundary is in the traditional location, and causing all of the
> > assumptions which you have broken as "bugs" that must be fixed by
> > upstream developers.
>
> The CC threat model does change the traditional linux trust boundary regardless of
> what mitigations are used (kernel config vs. runtime filtering). Because for the
> drivers that CoCo guest happens to need, there is no way to fix this problem by
> either of these mechanisms (we cannot disable the code that we need), unless somebody
> writes a totally new set of coco specific drivers (who needs another set of
> CoCo specific virtio drivers in the kernel?).

It sounds like you want such a set of drivers, why not just write them?
We have zillions of drivers already, it's not hard to write new ones, as
it really sounds like that's exactly what you want to have happen here
in the end as you don't trust the existing set of drivers you are using
for some reason.

> So, if the path is to be able to use existing driver kernel code, then we need:

Wait, again, why? Why not just have your own? That should be the
simplest thing overall. What's wrong with that?

> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> (or whatever word people prefer to use here), which only means that in
> the presence of malicious host/hypervisor that can manipulate pci config space,
> port IO and MMIO, these drivers should not expose CC guest memory
> confidentiality or integrity (including via privilege escalation into CC guest).

Again, stop it please with the "hardened" nonsense, that means nothing.
Either the driver has bugs, or it doesn't. I welcome you to prove it
doesn't :)

> Please note that this only applies to a small set (in tdx virtio setup we have less
> than 10 of them) of drivers and does not present invasive changes to the kernel
> code. There is also an additional core pci/msi code that is involved with discovery
> and configuration of these drivers, this code also falls into the category we need to
> make robust.

Again, why wouldn't we all want "robust" drivers? This is not anything
new here, all you are somehow saying is that you are changing the thread
model that the kernel "must" support. And for that, you need to then
change the driver code to support that.

So again, why not just have your own drivers and driver subsystem that
meets your new requirements? Let's see what that looks like and if
there even is any overlap between that and the existing kernel driver
subsystems.

> 2. rest of non-needed drivers must be disabled. Here we can argue about what
> is the correct method of doing this and who should bare the costs of enforcing it.

You bare that cost. Or you get a distro to do that. That's not up to
us in the kernel community, sorry, we give you the option to do that if
you want to, that's all that we can do.

> But from pure security point of view: the method that is simple and clear, that
> requires as little maintenance as possible usually has the biggest chance of
> enforcing security.

Again, that's up to your configuration management. Please do it, tell
us what doesn't work and send changes if you find better ways to do it.
Again, this is all there for you to do today, nothing for us to have to
do for you.

> And given that we already have the concept of authorized devices in Linux,
> does this method really brings so much additional complexity to the kernel?

No idea, you tell us! :)

Again, I recommend you just having your own drivers, that will allow you
to show us all exactly what you mean by the terms you keep using. Why
not just submit that for review instead?

good luck!

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> Because for the
> drivers that CoCo guest happens to need, there is no way to fix this problem by
> either of these mechanisms (we cannot disable the code that we need), unless somebody
> writes a totally new set of coco specific drivers (who needs another set of
> CoCo specific virtio drivers in the kernel?).

I think it's more about pci and all that jazz, no?
As a virtio maintainer I applied patches adding validation and intend to
do so in the future simply because for virtio specifically people
build all kind of weird setups out of software and so validating
everything is a good idea.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:16:14AM +0000, Reshetova, Elena wrote:
> > No relation other than it would be nice to have a solution that does not
> >require kernel command line and that prevents __init()s.
>
> For __inits see below. For the command line, it is pretty straightforward to
> measure it and attest its integrity later: we need to do it for other parts
> anyhow as acpi tables, etc. So I don’t see why we need to do smth special
> about it? In any case it is indeed very different from driver discussion and
> goes into "what should be covered by attestation for CC guest" topic.
>
> > More pressing concern than wasted memory, which may be unimportant, there's
> > the issue of what are those driver init functions doing. For example, as
> > part of device setup, MMIO regs may be involved, which we cannot trust. It's
> > a lot more code to worry about from a CoCo perspective.
>
> Yes, we have seen such cases in kernel where drivers or modules would access
> MMIO or pci config space already in their __init() functions.
> Some concrete examples from modules and drivers (there are more):
>
> intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch()

An iommu driver. So maybe you want to use virtio iommu then?

> skx_init() -> get_all_munits()
> skx_init() -> skx_register_mci() -> skx_get_dimm_config()

A memory controller driver, right? And you need it in a VM? why?

> intel_rng_mod_init() -> intel_init_hw_struct()

And virtio iommu?

> i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log()

Another memory controller driver? Can we decide on a single one?

> However, this is how we address this from security point of view:
>
> 1. In order for a MMIO read to obtain data from a untrusted host, the memory
> range must be shared with the host to begin with. We enforce that
> all MMIO mappings are private by default to the CC guest unless it is
> explicitly shared (and we do automatically share for the authorized devices
> and their drivers from the allow list). This removes a problem of an
> "unexpected MMIO region interaction"
> (modulo acpi AML operation regions that we do have to share also unfortunately,
> but acpi is a whole different difficult case on its own).

How does it remove the problem? You basically get trash from host, no?
But it seems that whether said trash is exploitable will really depend
on how it's used, e.g. if it's an 8 bit value host can just scan all
options in a couple of hundred attempts. What did I miss?


> 2. For pci config space, we limit any interaction with pci config
> space only to authorized devices and their drivers (that are in the allow list).
> As a result device drivers outside of the allow list are not able to access pci
> config space even in their __init routines. It is done by setting the
> to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-authorized
> devices.

This seems to be assuming drivers check return code from pci config
space accesses, right? I doubt all drivers do though. Even if they do
that's unlikely to be a well tested path, right?

> So, even if host made the driver __init function to run
> (by faking the device on the host side), it should not be able to supply any
> malicious data to it via MMIO or pci config space, so running their __init
> routines should be ok from security point of view or does anyone see any
> holes here?
>
> Best Regards,
> Elena.

See above. I am not sure the argument that the bugs are unexploitable
sits well with the idea that all this effort is improving code quality.

--
MST
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> 2. rest of non-needed drivers must be disabled. Here we can argue about what
> is the correct method of doing this and who should bare the costs of enforcing it.
> But from pure security point of view: the method that is simple and clear, that
> requires as little maintenance as possible usually has the biggest chance of
> enforcing security.
> And given that we already have the concept of authorized devices in Linux,
> does this method really brings so much additional complexity to the kernel?
> But hard to argue here without the code: we need to submit the filter proposal first
> (under internal review still).

I think the problem here is that we've had a lot of painful experience
where fuzzing produces a lot of false positives which then
security-types then insist that all kernel developers must fix so that
we can see the "important" security issues from the false positives.

So "as little maintenance as possible" and fuzzing have not
necessarily gone together. It might be less maintenance costs for
*you*, but it's not necessarily less maintenance work for *us*. I've
seen Red Hat principal engineers take completely bogus issues and
raise them to CVE "high" priority levels, when it was nothing like
that, thus forcing distro and data center people to be forced to do
global pushes to production because it's easier than trying to explain
to FEDramp auditors why the CVE SS is bogus --- and every single
unnecessary push to production has its own costs and risks.

I've seen the constant load of syzbot false positives that generate
noise in my inbox and in bug tracking issues assigned to me at $WORK.
I've seen the false positives generated by DEPT, which is why I've
pushed back on it. So if you are going to insist on fuzzing all of
the PCI config space, and treat them all as "bugs", there is going to
be huge pushback.

Even if the "fixes" are minor, and don't have any massive impact on
memory used or cache line misses or code/maintainability bloat, the
fact that we treat them as P3 quality of implementation issues, and
*you* treat them as P1 security bugs that must be fixed Now! Now!
Now! is going to cause friction. (This is especially true since CVE
SS scores are unidimentional, and what might be high security --- or
at least embarassing --- for CoCo, might be completely innocuous QOI
bugs for the rest of the world.)

So it might be that a simple, separate, kerenl config is going to be
the massively simpler way to go, instead of insisting that all PCI
device drivers must be fuzzed and be made CoCo safe, even if they will
never be used in a CoCo context. Again, please be cognizant about the
costs that CoCo may be imposing and pushing onto the rest of the
ecosystem.

Cheers,

- Ted
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
>>
>> The CC threat model does change the traditional linux trust boundary regardless of
>> what mitigations are used (kernel config vs. runtime filtering). Because for the
>> drivers that CoCo guest happens to need, there is no way to fix this problem by
>> either of these mechanisms (we cannot disable the code that we need), unless somebody
>> writes a totally new set of coco specific drivers (who needs another set of
>> CoCo specific virtio drivers in the kernel?).
>
> It sounds like you want such a set of drivers, why not just write them?
> We have zillions of drivers already, it's not hard to write new ones, as
> it really sounds like that's exactly what you want to have happen here
> in the end as you don't trust the existing set of drivers you are using
> for some reason.

In the CC approach, the hypervisor is considered as hostile. The rest of the
system is not changed much. If we pass-through some existing NIC, we'd
rather use the existing driver for that NIC rather than reinvent
it. However, we need to also consider the possibility that someone
maliciously replaced the actual NIC with a cleverly crafted software
emulator designed to cause the driver to leak confidential data.


>> So, if the path is to be able to use existing driver kernel code, then we need:
>
> Wait, again, why? Why not just have your own? That should be the
> simplest thing overall. What's wrong with that?

That would require duplication for the majority of hardware drivers.



>> 1. these selective CoCo guest required drivers (small set) needs to be hardened
>> (or whatever word people prefer to use here), which only means that in
>> the presence of malicious host/hypervisor that can manipulate pci config space,
>> port IO and MMIO, these drivers should not expose CC guest memory
>> confidentiality or integrity (including via privilege escalation into CC guest).
>
> Again, stop it please with the "hardened" nonsense, that means nothing.
> Either the driver has bugs, or it doesn't. I welcome you to prove it
> doesn't :)

In a non-CC scenario, a driver is correct if, among other things, it does
not leak kernel data to user space. However, it assumes that PCI devices are
working correctly and according to spec.

In a CC scenario, an additional condition for correctness is that it must
not leak data from the trusted environment to the host. It assumes that a
_virtual_ PCI device can be implemented on the host side to cause an
existing driver to leak secrets to the host.

It is this additional condition that we are talking about.

Think of this as a bit similar to the introduction of IOMMUs, which meant
there was a new condition impacting _the entire kernel_ that you had to make
sure your DMA operations and IOMMU were in agreement. Here, it is a bit of a
similar situation: CC forbids some specific operations the same way an IOMMU
does, except instead of stray DMAs, it's stray accesses from the host.

Note that, as James Bottomley pointed out, a crash is not seen as a failure
of the CC model, unless it leads to a subsequent leak of confidential data.
Denial of service, through crash or otherwise, is so easy to do from host or
hypervisor side that it is entirely out of scope.


>
>> Please note that this only applies to a small set (in tdx virtio setup we have less
>> than 10 of them) of drivers and does not present invasive changes to the kernel
>> code. There is also an additional core pci/msi code that is involved with discovery
>> and configuration of these drivers, this code also falls into the category we need to
>> make robust.
>
> Again, why wouldn't we all want "robust" drivers? This is not anything
> new here,

What is new is that CC requires driver to be "robust" against a new kind of
attack "from below" (i.e. from the [virtual] hardware side).

> all you are somehow saying is that you are changing the thread
> model that the kernel "must" support. And for that, you need to then
> change the driver code to support that.

What is being argued is that CC is not robust unless we block host-side
attacks that can cause the guest to leak data to the host.

>
> So again, why not just have your own drivers and driver subsystem that
> meets your new requirements? Let's see what that looks like and if
> there even is any overlap between that and the existing kernel driver
> subsystems.

Would a "CC-aware PCI" subsystem fit your definition?

>
>> 2. rest of non-needed drivers must be disabled. Here we can argue about what
>> is the correct method of doing this and who should bare the costs of enforcing it.
>
> You bare that cost.

I believe the CC community understands that.

The first step before introducing modifications in the drivers is getting an
understanding of why we think that CC introduces a new condition for
robustness.

We will not magically turn all drivers into CC-safe drivers. It will take a
lot of time, and the patches are likely to come from the CC community. At
that stage, though, the question is: "do you understand the problem we are
trying to solve?". I hope that my IOMMU analogy above helps.


> Or you get a distro to do that.

Best a distro can do is have a minified kernel tuned for CC use-cases, or
enabling an hypothetical CONFIG_COCO_SAFETY configuration.
A distro cannot decide what work goes behing CONFIG_COCO_SAFETY.


> That's not up to us in the kernel community, sorry, we give you the option
> to do that if you want to, that's all that we can do.

I hope that the explanations above will help you change your mind on that
statement. That cannot be a config-only or custom-drivers-only solution.
(or maybe you can convince us it can ;-)

>
>> But from pure security point of view: the method that is simple and clear, that
>> requires as little maintenance as possible usually has the biggest chance of
>> enforcing security.
>
> Again, that's up to your configuration management. Please do it, tell
> us what doesn't work and send changes if you find better ways to do it.
> Again, this is all there for you to do today, nothing for us to have to
> do for you.
>
>> And given that we already have the concept of authorized devices in Linux,
>> does this method really brings so much additional complexity to the kernel?
>
> No idea, you tell us! :)
>
> Again, I recommend you just having your own drivers, that will allow you
> to show us all exactly what you mean by the terms you keep using. Why
> not just submit that for review instead?
>
> good luck!
>
> greg k-h


--
Cheers,
Christophe de Dinechin (https://c3d.github.io)
Theory of Incomplete Measurements (https://c3d.github.io/TIM)
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote:
>
> On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> >>
> >> The CC threat model does change the traditional linux trust boundary regardless of
> >> what mitigations are used (kernel config vs. runtime filtering). Because for the
> >> drivers that CoCo guest happens to need, there is no way to fix this problem by
> >> either of these mechanisms (we cannot disable the code that we need), unless somebody
> >> writes a totally new set of coco specific drivers (who needs another set of
> >> CoCo specific virtio drivers in the kernel?).
> >
> > It sounds like you want such a set of drivers, why not just write them?
> > We have zillions of drivers already, it's not hard to write new ones, as
> > it really sounds like that's exactly what you want to have happen here
> > in the end as you don't trust the existing set of drivers you are using
> > for some reason.
>
> In the CC approach, the hypervisor is considered as hostile. The rest of the
> system is not changed much. If we pass-through some existing NIC, we'd
> rather use the existing driver for that NIC rather than reinvent
> it.

But that is not what was proposed. I thought this was all about virtio.
If not, again, someone needs to write a solid definition.

So if you want to use existing drivers, wonderful, please work on making
the needed changes to meet your goals to all of them. I was trying to
give you a simple way out :)

> >> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> >> (or whatever word people prefer to use here), which only means that in
> >> the presence of malicious host/hypervisor that can manipulate pci config space,
> >> port IO and MMIO, these drivers should not expose CC guest memory
> >> confidentiality or integrity (including via privilege escalation into CC guest).
> >
> > Again, stop it please with the "hardened" nonsense, that means nothing.
> > Either the driver has bugs, or it doesn't. I welcome you to prove it
> > doesn't :)
>
> In a non-CC scenario, a driver is correct if, among other things, it does
> not leak kernel data to user space. However, it assumes that PCI devices are
> working correctly and according to spec.

And you also assume that your CPU is working properly. And what spec
exactly are you referring to? How can you validate any of that without
using the PCI authentication protocol already discussed in this thread?

> >> Please note that this only applies to a small set (in tdx virtio setup we have less
> >> than 10 of them) of drivers and does not present invasive changes to the kernel
> >> code. There is also an additional core pci/msi code that is involved with discovery
> >> and configuration of these drivers, this code also falls into the category we need to
> >> make robust.
> >
> > Again, why wouldn't we all want "robust" drivers? This is not anything
> > new here,
>
> What is new is that CC requires driver to be "robust" against a new kind of
> attack "from below" (i.e. from the [virtual] hardware side).

And as I have said multiple times, that is a totally new "requirement"
and one that Linux does not meet in any way at this point in time. If
you somehow feel this is a change that is ok to make for Linux, you will
need to do a lot of work to make this happen.

Anyway, you all are just spinning in circles now. I'll just mute this
thread until I see an actual code change as it seems to be full of
people not actually sending anything we can actually do anything with.

greg k-h
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> On Wed, Feb 08, 2023 at 05:19:37PM +0100, Christophe de Dinechin wrote:
> >
> > On 2023-02-08 at 11:58 +01, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote...
> > > On Wed, Feb 08, 2023 at 10:44:25AM +0000, Reshetova, Elena wrote:
> > >>
> > >> The CC threat model does change the traditional linux trust boundary regardless of
> > >> what mitigations are used (kernel config vs. runtime filtering). Because for the
> > >> drivers that CoCo guest happens to need, there is no way to fix this problem by
> > >> either of these mechanisms (we cannot disable the code that we need), unless somebody
> > >> writes a totally new set of coco specific drivers (who needs another set of
> > >> CoCo specific virtio drivers in the kernel?).
> > >
> > > It sounds like you want such a set of drivers, why not just write them?
> > > We have zillions of drivers already, it's not hard to write new ones, as
> > > it really sounds like that's exactly what you want to have happen here
> > > in the end as you don't trust the existing set of drivers you are using
> > > for some reason.
> >
> > In the CC approach, the hypervisor is considered as hostile. The rest of the
> > system is not changed much. If we pass-through some existing NIC, we'd
> > rather use the existing driver for that NIC rather than reinvent
> > it.
>
> But that is not what was proposed. I thought this was all about virtio.
> If not, again, someone needs to write a solid definition.

As I said in my reply to you a couple of weeks ago:

I'm not sure the request here isn't really to make sure *all* PCI devices
are safe; just the ones we care about in a CoCo guest (e.g. the virtual devices) -
and potentially ones that people will want to pass-through (which
generally needs a lot more work to make safe).
(I've not looked at these Intel tools to see what they cover)

so *mostly* virtio, and just a few of the other devices.

> So if you want to use existing drivers, wonderful, please work on making
> the needed changes to meet your goals to all of them. I was trying to
> give you a simple way out :)
>
> > >> 1. these selective CoCo guest required drivers (small set) needs to be hardened
> > >> (or whatever word people prefer to use here), which only means that in
> > >> the presence of malicious host/hypervisor that can manipulate pci config space,
> > >> port IO and MMIO, these drivers should not expose CC guest memory
> > >> confidentiality or integrity (including via privilege escalation into CC guest).
> > >
> > > Again, stop it please with the "hardened" nonsense, that means nothing.
> > > Either the driver has bugs, or it doesn't. I welcome you to prove it
> > > doesn't :)
> >
> > In a non-CC scenario, a driver is correct if, among other things, it does
> > not leak kernel data to user space. However, it assumes that PCI devices are
> > working correctly and according to spec.
>
> And you also assume that your CPU is working properly.

We require the CPU to give us a signed attestation to prove that it's a
trusted CPU, that someone external can validate. So, not quite
'assume'.

> And what spec
> exactly are you referring to? How can you validate any of that without
> using the PCI authentication protocol already discussed in this thread?

The PCI auth protocol looks promising and is possibly the right long
term answer. But for a pass through NIC for example, all we'd want is
that (with the help of the IOMMU) it can't get or corrupt any data the
guest doesn't give it - and then it's upto the guest to run encryption
over the protocols over the NIC.

>
> > >> Please note that this only applies to a small set (in tdx virtio setup we have less
> > >> than 10 of them) of drivers and does not present invasive changes to the kernel
> > >> code. There is also an additional core pci/msi code that is involved with discovery
> > >> and configuration of these drivers, this code also falls into the category we need to
> > >> make robust.
> > >
> > > Again, why wouldn't we all want "robust" drivers? This is not anything
> > > new here,
> >
> > What is new is that CC requires driver to be "robust" against a new kind of
> > attack "from below" (i.e. from the [virtual] hardware side).
>
> And as I have said multiple times, that is a totally new "requirement"
> and one that Linux does not meet in any way at this point in time.

Yes, that's a fair statement.

> If
> you somehow feel this is a change that is ok to make for Linux, you will
> need to do a lot of work to make this happen.
>
> Anyway, you all are just spinning in circles now. I'll just mute this
> thread until I see an actual code change as it seems to be full of
> people not actually sending anything we can actually do anything with.

I think the challenge will be to come up with non-intrusive, minimal
changes; obviously you don't want stuff shutgunned everywhere.

Dave

> greg k-h
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
On Wed, Feb 08 2023 at 18:02, David Alan Gilbert wrote:
> * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
>> Anyway, you all are just spinning in circles now. I'll just mute this
>> thread until I see an actual code change as it seems to be full of
>> people not actually sending anything we can actually do anything with.

There have been random patchs posted which finally caused this
discussion to start. Wrong order obviously :)

> I think the challenge will be to come up with non-intrusive, minimal
> changes; obviously you don't want stuff shutgunned everywhere.

That has been tried by doing random surgery, e.g. caching some
particular PCI config value. While that might not look intrusive on the
first glance, these kind of punctual changes are the begin of a whack a
mole game and will end up in an uncoordinated maze of tiny mitigations
which make the code harder to maintain.

The real challenge is to come up with threat classes and mechanisms
which squash the whole class. Done right, e.g. caching a range of config
space values (or all of it) might give a benefit even for the bare metal
or general virtualization case.

That's quite some work, but its much more palatable than a trickle of
"fixes" when yet another source of trouble has been detected by a tool
or human inspection.

It's also more future proof because with the current approach of
scratching the itch of the day the probability that the just "mitigated"
issue comes back due to unrelated changes is very close to 100%.

It's not any different than any other threat class problem.

Thanks,

tglx
RE: Linux guest kernel threat model for Confidential Computing [ In reply to ]
> On Wed, Feb 08, 2023 at 10:16:14AM +0000, Reshetova, Elena wrote:
> > > No relation other than it would be nice to have a solution that does not
> > >require kernel command line and that prevents __init()s.
> >
> > For __inits see below. For the command line, it is pretty straightforward to
> > measure it and attest its integrity later: we need to do it for other parts
> > anyhow as acpi tables, etc. So I don’t see why we need to do smth special
> > about it? In any case it is indeed very different from driver discussion and
> > goes into "what should be covered by attestation for CC guest" topic.
> >
> > > More pressing concern than wasted memory, which may be unimportant,
> there's
> > > the issue of what are those driver init functions doing. For example, as
> > > part of device setup, MMIO regs may be involved, which we cannot trust. It's
> > > a lot more code to worry about from a CoCo perspective.
> >
> > Yes, we have seen such cases in kernel where drivers or modules would access
> > MMIO or pci config space already in their __init() functions.
> > Some concrete examples from modules and drivers (there are more):
> >
> > intel_iommu_init() -> init_dmars() -> check_tylersburg_isoch()
>
> An iommu driver. So maybe you want to use virtio iommu then?
>
> > skx_init() -> get_all_munits()
> > skx_init() -> skx_register_mci() -> skx_get_dimm_config()
>
> A memory controller driver, right? And you need it in a VM? why?
>
> > intel_rng_mod_init() -> intel_init_hw_struct()
>
> And virtio iommu?
>
> > i10nm_exit()->enable_retry_rd_err_log ->__enable_retry_rd_err_log()
>
> Another memory controller driver? Can we decide on a single one?

We don’t need any of the above in CC guest. The point was to indicate that
we know that the current device filter design we have, we will not necessary
prevent the __init functions of drivers running in CC guest and we have seen
in Linux codebase the code paths that may potentially execute and consume
malicious host input already in __init functions (most of drivers luckily
do it in probe). However, the argument below I gave is why we think such
__init functions are not that big security problem in our case.


>
> > However, this is how we address this from security point of view:
> >
> > 1. In order for a MMIO read to obtain data from a untrusted host, the memory
> > range must be shared with the host to begin with. We enforce that
> > all MMIO mappings are private by default to the CC guest unless it is
> > explicitly shared (and we do automatically share for the authorized devices
> > and their drivers from the allow list). This removes a problem of an
> > "unexpected MMIO region interaction"
> > (modulo acpi AML operation regions that we do have to share also
> unfortunately,
> > but acpi is a whole different difficult case on its own).
>
> How does it remove the problem? You basically get trash from host, no?
> But it seems that whether said trash is exploitable will really depend
> on how it's used, e.g. if it's an 8 bit value host can just scan all
> options in a couple of hundred attempts. What did I miss?

No, it wont work like that. Guest code will never be able to consume
the garbage data written into its private memory by host: we will get a memory
integrity violation and guest is killed for safety reasons. The confidentiality
and integrity of private memory is guaranteed by CC technology itself.

>
>
> > 2. For pci config space, we limit any interaction with pci config
> > space only to authorized devices and their drivers (that are in the allow list).
> > As a result device drivers outside of the allow list are not able to access pci
> > config space even in their __init routines. It is done by setting the
> > to_pci_dev(dev)->error_state = pci_channel_io_perm_failure for non-
> authorized
> > devices.
>
> This seems to be assuming drivers check return code from pci config
> space accesses, right? I doubt all drivers do though. Even if they do
> that's unlikely to be a well tested path, right?

This is a good thing to double check, thank you for pointing this out!

Best Regards,
Elena.
Re: Linux guest kernel threat model for Confidential Computing [ In reply to ]
* Thomas Gleixner (tglx@linutronix.de) wrote:
> On Wed, Feb 08 2023 at 18:02, David Alan Gilbert wrote:
> > * Greg Kroah-Hartman (gregkh@linuxfoundation.org) wrote:
> >> Anyway, you all are just spinning in circles now. I'll just mute this
> >> thread until I see an actual code change as it seems to be full of
> >> people not actually sending anything we can actually do anything with.
>
> There have been random patchs posted which finally caused this
> discussion to start. Wrong order obviously :)
>
> > I think the challenge will be to come up with non-intrusive, minimal
> > changes; obviously you don't want stuff shutgunned everywhere.
>
> That has been tried by doing random surgery, e.g. caching some
> particular PCI config value. While that might not look intrusive on the
> first glance, these kind of punctual changes are the begin of a whack a
> mole game and will end up in an uncoordinated maze of tiny mitigations
> which make the code harder to maintain.
>
> The real challenge is to come up with threat classes and mechanisms
> which squash the whole class. Done right, e.g. caching a range of config
> space values (or all of it) might give a benefit even for the bare metal
> or general virtualization case.

Yeh, reasonable.

> That's quite some work, but its much more palatable than a trickle of
> "fixes" when yet another source of trouble has been detected by a tool
> or human inspection.
>
> It's also more future proof because with the current approach of
> scratching the itch of the day the probability that the just "mitigated"
> issue comes back due to unrelated changes is very close to 100%.
>
> It's not any different than any other threat class problem.

I wonder if trying to group/categorise the output of Intel's
tool would allow common problematic patterns to be found to then
try and come up with more concrete fixes for whole classes of issues.

Dave

> Thanks,
>
> tglx
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK