Mailing List Archive

[PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add new members to strut kvm_gfn_range to indicate which mapping
(private-vs-shared) to operate on. only_private and only_shared. Update
mmu notifier, set memory attributes ioctl or KVM gmem callback to
initialize them.

It was premature for set_memory_attributes ioctl to call
kvm_unmap_gfn_range(). Instead, let kvm_arch_ste_memory_attributes()
handle it and add a new x86 vendor callback to react to memory attribute
change. [1]

- If it's from the mmu notifier, zap shared pages only
- If it's from the KVM gmem, zap private pages only
- If setting memory attributes, vendor callback checks new attributes
and make decisions.
SNP would do nothing and handle it later with gmem callback
TDX callback would do as follows.
When it converts pages to shared, zap private pages only.
When it converts pages to private, zap shared pages only.

TDX needs to know which mapping to operate on. Shared-EPT vs. Secure-EPT.
The following sequence to convert the GPA to private doesn't work for TDX
because the page can already be private.

1) Update memory attributes to private in memory attributes xarray
2) Zap the GPA range irrespective of private-or-shared.
Even if the page is already private, zap the entry.
3) EPT violation on the GPA
4) Populate the GPA as private
The page is zeroed, and the guest has to accept the page again.

In step 2, TDX wants to zap only shared pages and skip private ones.

[1] https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

---
Changes v18:
- rebased to kvm-next

Changes v2 -> v3:
- Drop the KVM_GFN_RANGE flags
- Updated struct kvm_gfn_range
- Change kvm_arch_set_memory_attributes() to return bool for flush
- Added set_memory_attributes x86 op for vendor backends
- Refined commit message to describe TDX care concretely

Changes v1 -> v2:
- consolidate KVM_GFN_RANGE_FLAGS_GMEM_{PUNCH_HOLE, RELEASE} into
KVM_GFN_RANGE_FLAGS_GMEM.
- Update the commit message to describe TDX more. Drop SEV_SNP.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/guest_memfd.c | 3 +++
virt/kvm/kvm_main.c | 17 +++++++++++++++++
3 files changed, 22 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..0520cd8d03cc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -264,6 +264,8 @@ struct kvm_gfn_range {
gfn_t start;
gfn_t end;
union kvm_mmu_notifier_arg arg;
+ bool only_private;
+ bool only_shared;
bool may_block;
};
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 0f4e0cf4f158..3830d50b9b67 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -64,6 +64,9 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
.end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
.slot = slot,
.may_block = true,
+ /* guest memfd is relevant to only private mappings. */
+ .only_private = true,
+ .only_shared = false,
};

if (!found_memslot) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88a69f7..0349e1f241d1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -634,6 +634,12 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
*/
gfn_range.arg = range->arg;
gfn_range.may_block = range->may_block;
+ /*
+ * HVA-based notifications aren't relevant to private
+ * mappings as they don't have a userspace mapping.
+ */
+ gfn_range.only_private = false;
+ gfn_range.only_shared = true;

/*
* {gfn(page) | page intersects with [hva_start, hva_end)} =
@@ -2486,6 +2492,16 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
gfn_range.arg = range->arg;
gfn_range.may_block = range->may_block;

+ /*
+ * If/when KVM supports more attributes beyond private .vs shared, this
+ * _could_ set only_{private,shared} appropriately if the entire target
+ * range already has the desired private vs. shared state (it's unclear
+ * if that is a net win). For now, KVM reaches this point if and only
+ * if the private flag is being toggled, i.e. all mappings are in play.
+ */
+ gfn_range.only_private = false;
+ gfn_range.only_shared = false;
+
for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
slots = __kvm_memslots(kvm, i);

@@ -2542,6 +2558,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
struct kvm_mmu_notifier_range pre_set_range = {
.start = start,
.end = end,
+ .arg.attributes = attributes,
.handler = kvm_pre_set_memory_attributes,
.on_lock = kvm_mmu_invalidate_begin,
.flush_on_ret = true,
--
2.25.1
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On 2/26/2024 4:25 PM, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Add new members to strut kvm_gfn_range to indicate which mapping
> (private-vs-shared) to operate on. only_private and only_shared. Update
> mmu notifier, set memory attributes ioctl or KVM gmem callback to
> initialize them.
>
> It was premature for set_memory_attributes ioctl to call
> kvm_unmap_gfn_range(). Instead, let kvm_arch_ste_memory_attributes()
"kvm_arch_ste_memory_attributes()" -> "kvm_vm_set_mem_attributes()" ?


> handle it and add a new x86 vendor callback to react to memory attribute
> change. [1]
Which new x86 vendor callback?


>
> - If it's from the mmu notifier, zap shared pages only
> - If it's from the KVM gmem, zap private pages only
> - If setting memory attributes, vendor callback checks new attributes
> and make decisions.
> SNP would do nothing and handle it later with gmem callback
> TDX callback would do as follows.
> When it converts pages to shared, zap private pages only.
> When it converts pages to private, zap shared pages only.
>
> TDX needs to know which mapping to operate on. Shared-EPT vs. Secure-EPT.
> The following sequence to convert the GPA to private doesn't work for TDX
> because the page can already be private.
>
> 1) Update memory attributes to private in memory attributes xarray
> 2) Zap the GPA range irrespective of private-or-shared.
> Even if the page is already private, zap the entry.
> 3) EPT violation on the GPA
> 4) Populate the GPA as private
> The page is zeroed, and the guest has to accept the page again.
>
> In step 2, TDX wants to zap only shared pages and skip private ones.
>
> [1] https://lore.kernel.org/all/ZJX0hk+KpQP0KUyB@google.com/
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>
> ---
> Changes v18:
> - rebased to kvm-next
>
> Changes v2 -> v3:
> - Drop the KVM_GFN_RANGE flags
> - Updated struct kvm_gfn_range
> - Change kvm_arch_set_memory_attributes() to return bool for flush
> - Added set_memory_attributes x86 op for vendor backends
> - Refined commit message to describe TDX care concretely
>
> Changes v1 -> v2:
> - consolidate KVM_GFN_RANGE_FLAGS_GMEM_{PUNCH_HOLE, RELEASE} into
> KVM_GFN_RANGE_FLAGS_GMEM.
> - Update the commit message to describe TDX more. Drop SEV_SNP.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> include/linux/kvm_host.h | 2 ++
> virt/kvm/guest_memfd.c | 3 +++
> virt/kvm/kvm_main.c | 17 +++++++++++++++++
> 3 files changed, 22 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..0520cd8d03cc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -264,6 +264,8 @@ struct kvm_gfn_range {
> gfn_t start;
> gfn_t end;
> union kvm_mmu_notifier_arg arg;
> + bool only_private;
> + bool only_shared;

IMO, an enum will be clearer than the two flags.

    enum {
        PROCESS_PRIVATE_AND_SHARED,
        PROCESS_ONLY_PRIVATE,
        PROCESS_ONLY_SHARED,
    };


> bool may_block;
> };
> bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 0f4e0cf4f158..3830d50b9b67 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -64,6 +64,9 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
> .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff,
> .slot = slot,
> .may_block = true,
> + /* guest memfd is relevant to only private mappings. */
> + .only_private = true,
> + .only_shared = false,
> };
>
> if (!found_memslot) {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 10bfc88a69f7..0349e1f241d1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -634,6 +634,12 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
> */
> gfn_range.arg = range->arg;
> gfn_range.may_block = range->may_block;
> + /*
> + * HVA-based notifications aren't relevant to private
> + * mappings as they don't have a userspace mapping.
> + */
> + gfn_range.only_private = false;
> + gfn_range.only_shared = true;
>
> /*
> * {gfn(page) | page intersects with [hva_start, hva_end)} =
> @@ -2486,6 +2492,16 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
> gfn_range.arg = range->arg;
> gfn_range.may_block = range->may_block;
>
> + /*
> + * If/when KVM supports more attributes beyond private .vs shared, this
> + * _could_ set only_{private,shared} appropriately if the entire target
> + * range already has the desired private vs. shared state (it's unclear
> + * if that is a net win). For now, KVM reaches this point if and only
> + * if the private flag is being toggled, i.e. all mappings are in play.
> + */
> + gfn_range.only_private = false;
> + gfn_range.only_shared = false;
> +
> for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
> slots = __kvm_memslots(kvm, i);
>
> @@ -2542,6 +2558,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> struct kvm_mmu_notifier_range pre_set_range = {
> .start = start,
> .end = end,
> + .arg.attributes = attributes,
> .handler = kvm_pre_set_memory_attributes,
> .on_lock = kvm_mmu_invalidate_begin,
> .flush_on_ret = true,
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Tue, Mar 12, 2024 at 09:33:31PM +0800,
Binbin Wu <binbin.wu@linux.intel.com> wrote:

>
>
> On 2/26/2024 4:25 PM, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > Add new members to strut kvm_gfn_range to indicate which mapping
> > (private-vs-shared) to operate on. only_private and only_shared. Update
> > mmu notifier, set memory attributes ioctl or KVM gmem callback to
> > initialize them.
> >
> > It was premature for set_memory_attributes ioctl to call
> > kvm_unmap_gfn_range(). Instead, let kvm_arch_ste_memory_attributes()
> "kvm_arch_ste_memory_attributes()" -> "kvm_vm_set_mem_attributes()" ?

Yes, will fix it.

> > handle it and add a new x86 vendor callback to react to memory attribute
> > change. [1]
> Which new x86 vendor callback?

Now we don't have it. Will drop this sentnse.

> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 7e7fd25b09b3..0520cd8d03cc 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -264,6 +264,8 @@ struct kvm_gfn_range {
> > gfn_t start;
> > gfn_t end;
> > union kvm_mmu_notifier_arg arg;
> > + bool only_private;
> > + bool only_shared;
>
> IMO, an enum will be clearer than the two flags.
>
>     enum {
>         PROCESS_PRIVATE_AND_SHARED,
>         PROCESS_ONLY_PRIVATE,
>         PROCESS_ONLY_SHARED,
>     };

The code will be ugly like
"if (== PRIVATE || == PRIVATE_AND_SHARED)" or
"if (== SHARED || == PRIVATE_AND_SHARED)"

two boolean (or two flags) is less error-prone.

Thanks,
--
Isaku Yamahata <isaku.yamahata@intel.com>
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Wed, 2024-03-13 at 10:14 -0700, Isaku Yamahata wrote:
> > IMO, an enum will be clearer than the two flags.
> >
> >     enum {
> >         PROCESS_PRIVATE_AND_SHARED,
> >         PROCESS_ONLY_PRIVATE,
> >         PROCESS_ONLY_SHARED,
> >     };
>
> The code will be ugly like
> "if (== PRIVATE || == PRIVATE_AND_SHARED)" or
> "if (== SHARED || == PRIVATE_AND_SHARED)"
>
> two boolean (or two flags) is less error-prone.

Yes the enum would be awkward to handle. But I also thought the way
this is specified in struct kvm_gfn_range is a little strange.

It is ambiguous what it should mean if you set:
.only_private=true;
.only_shared=true;
...as happens later in the series (although it may be a mistake).

Reading the original conversation, it seems Sean suggested this
specifically. But it wasn't clear to me from the discussion what the
intention of the "only" semantics was. Like why not?
bool private;
bool shared;
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Mon, 2024-03-18 at 19:50 -0700, Rick Edgecombe wrote:
> On Wed, 2024-03-13 at 10:14 -0700, Isaku Yamahata wrote:
> > > IMO, an enum will be clearer than the two flags.
> > >
> > >     enum {
> > >         PROCESS_PRIVATE_AND_SHARED,
> > >         PROCESS_ONLY_PRIVATE,
> > >         PROCESS_ONLY_SHARED,
> > >     };
> >
> > The code will be ugly like
> > "if (== PRIVATE || == PRIVATE_AND_SHARED)" or
> > "if (== SHARED || == PRIVATE_AND_SHARED)"
> >
> > two boolean (or two flags) is less error-prone.
>
> Yes the enum would be awkward to handle. But I also thought the way
> this is specified in struct kvm_gfn_range is a little strange.
>
> It is ambiguous what it should mean if you set:
>  .only_private=true;
>  .only_shared=true;
> ...as happens later in the series (although it may be a mistake).
>
> Reading the original conversation, it seems Sean suggested this
> specifically. But it wasn't clear to me from the discussion what the
> intention of the "only" semantics was. Like why not?
>  bool private;
>  bool shared;

I see Binbin brought up this point on v18 as well:
https://lore.kernel.org/kvm/6220164a-aa1d-43d2-b918-6a6eaad769fb@linux.intel.com/#t

and helpfully dug up some other discussion with Sean where he agreed
the "_only" is confusing and proposed the the enum:
https://lore.kernel.org/kvm/ZUO1Giju0GkUdF0o@google.com/

He wanted the default value (in the case the caller forgets to set
them), to be to include both private and shared. I think the enum has
the issues that Isaku mentioned. What about?

bool exclude_private;
bool exclude_shared;

It will become onerous if more types of aliases grow, but it clearer
semantically and has the safe default behavior.
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Tue, Mar 19, 2024 at 02:47:47PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Mon, 2024-03-18 at 19:50 -0700, Rick Edgecombe wrote:
> > On Wed, 2024-03-13 at 10:14 -0700, Isaku Yamahata wrote:
> > > > IMO, an enum will be clearer than the two flags.
> > > >
> > > >     enum {
> > > >         PROCESS_PRIVATE_AND_SHARED,
> > > >         PROCESS_ONLY_PRIVATE,
> > > >         PROCESS_ONLY_SHARED,
> > > >     };
> > >
> > > The code will be ugly like
> > > "if (== PRIVATE || == PRIVATE_AND_SHARED)" or
> > > "if (== SHARED || == PRIVATE_AND_SHARED)"
> > >
> > > two boolean (or two flags) is less error-prone.
> >
> > Yes the enum would be awkward to handle. But I also thought the way
> > this is specified in struct kvm_gfn_range is a little strange.
> >
> > It is ambiguous what it should mean if you set:
> >  .only_private=true;
> >  .only_shared=true;
> > ...as happens later in the series (although it may be a mistake).
> >
> > Reading the original conversation, it seems Sean suggested this
> > specifically. But it wasn't clear to me from the discussion what the
> > intention of the "only" semantics was. Like why not?
> >  bool private;
> >  bool shared;
>
> I see Binbin brought up this point on v18 as well:
> https://lore.kernel.org/kvm/6220164a-aa1d-43d2-b918-6a6eaad769fb@linux.intel.com/#t
>
> and helpfully dug up some other discussion with Sean where he agreed
> the "_only" is confusing and proposed the the enum:
> https://lore.kernel.org/kvm/ZUO1Giju0GkUdF0o@google.com/
>
> He wanted the default value (in the case the caller forgets to set
> them), to be to include both private and shared. I think the enum has
> the issues that Isaku mentioned. What about?
>
> bool exclude_private;
> bool exclude_shared;
>
> It will become onerous if more types of aliases grow, but it clearer
> semantically and has the safe default behavior.

I'm fine with those names. Anyway, I'm fine with wither way, two bools or enum.
--
Isaku Yamahata <isaku.yamahata@intel.com>
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
Hi,

On Tue, Mar 19, 2024 at 9:50?PM Isaku Yamahata <isaku.yamahata@intel.com> wrote:
>
> On Tue, Mar 19, 2024 at 02:47:47PM +0000,
> "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:
>
> > On Mon, 2024-03-18 at 19:50 -0700, Rick Edgecombe wrote:
> > > On Wed, 2024-03-13 at 10:14 -0700, Isaku Yamahata wrote:
> > > > > IMO, an enum will be clearer than the two flags.
> > > > >
> > > > > enum {
> > > > > PROCESS_PRIVATE_AND_SHARED,
> > > > > PROCESS_ONLY_PRIVATE,
> > > > > PROCESS_ONLY_SHARED,
> > > > > };
> > > >
> > > > The code will be ugly like
> > > > "if (== PRIVATE || == PRIVATE_AND_SHARED)" or
> > > > "if (== SHARED || == PRIVATE_AND_SHARED)"
> > > >
> > > > two boolean (or two flags) is less error-prone.
> > >
> > > Yes the enum would be awkward to handle. But I also thought the way
> > > this is specified in struct kvm_gfn_range is a little strange.
> > >
> > > It is ambiguous what it should mean if you set:
> > > .only_private=true;
> > > .only_shared=true;
> > > ...as happens later in the series (although it may be a mistake).
> > >
> > > Reading the original conversation, it seems Sean suggested this
> > > specifically. But it wasn't clear to me from the discussion what the
> > > intention of the "only" semantics was. Like why not?
> > > bool private;
> > > bool shared;
> >
> > I see Binbin brought up this point on v18 as well:
> > https://lore.kernel.org/kvm/6220164a-aa1d-43d2-b918-6a6eaad769fb@linux.intel.com/#t
> >
> > and helpfully dug up some other discussion with Sean where he agreed
> > the "_only" is confusing and proposed the the enum:
> > https://lore.kernel.org/kvm/ZUO1Giju0GkUdF0o@google.com/
> >
> > He wanted the default value (in the case the caller forgets to set
> > them), to be to include both private and shared. I think the enum has
> > the issues that Isaku mentioned. What about?
> >
> > bool exclude_private;
> > bool exclude_shared;
> >
> > It will become onerous if more types of aliases grow, but it clearer
> > semantically and has the safe default behavior.
>
> I'm fine with those names. Anyway, I'm fine with wither way, two bools or enum.

I don't have a strong opinion, but I'd brought it up in a previous
patch series. I think that having two bools to encode three states is
less intuitive and potentially more bug prone, more so than the naming
itself (i.e., _only):
https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/

Cheers,
/fuad

> --
> Isaku Yamahata <isaku.yamahata@intel.com>
>
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, 2024-04-26 at 08:39 +0100, Fuad Tabba wrote:
> > I'm fine with those names. Anyway, I'm fine with wither way, two bools or
> > enum.
>
> I don't have a strong opinion, but I'd brought it up in a previous
> patch series. I think that having two bools to encode three states is
> less intuitive and potentially more bug prone, more so than the naming
> itself (i.e., _only):
> https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/

Currently in our internal branch we switched to:
exclude_private
exclude_shared

It came together bettter in the code that uses it.

But I started to wonder if we actually really need exclude_shared. For TDX
zapping private memory has to be done with more care, because it cannot be re-
populated without guest coordination. But for shared memory if we are zapping a
range that includes both private and shared memory, I don't think it should hurt
to zap the shared memory.
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, Apr 26, 2024, Rick P Edgecombe wrote:
> On Fri, 2024-04-26 at 08:39 +0100, Fuad Tabba wrote:
> > > I'm fine with those names. Anyway, I'm fine with wither way, two bools or
> > > enum.
> >
> > I don't have a strong opinion, but I'd brought it up in a previous
> > patch series. I think that having two bools to encode three states is
> > less intuitive and potentially more bug prone, more so than the naming
> > itself (i.e., _only):

Hmm, yeah, I buy that argument. We could even harded further by poisoning '0'
to force KVM to explicitly. Aha! And maybe use a bitmap?

enum {
BUGGY_KVM_INVALIDATION = 0,
PROCESS_SHARED = BIT(0),
PROCESS_PRIVATE = BIT(1),
PROCESS_PRIVATE_AND_SHARED = PROCESS_SHARED | PROCESS_PRIVATE,
};

> > https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/
>
> Currently in our internal branch we switched to:
> exclude_private
> exclude_shared
>
> It came together bettter in the code that uses it.

If the choice is between an enum and exclude_*, I would strongly prefer the enum.
Using exclude_* results in inverted polarity for the code that triggers invalidations.

> But I started to wonder if we actually really need exclude_shared. For TDX
> zapping private memory has to be done with more care, because it cannot be re-
> populated without guest coordination. But for shared memory if we are zapping a
> range that includes both private and shared memory, I don't think it should hurt
> to zap the shared memory.

Hell no, I am not risking taking on more baggage in KVM where userspace or some
other subsystem comes to rely on KVM spuriously zapping SPTEs in response to an
unrelated userspace action.
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, 2024-04-26 at 08:28 -0700, Sean Christopherson wrote:
> Hmm, yeah, I buy that argument.  We could even harded further by poisoning '0'
> to force KVM to explicitly.  Aha!  And maybe use a bitmap?
>
>         enum {
>                 BUGGY_KVM_INVALIDATION          = 0,
>                 PROCESS_SHARED                  = BIT(0),
>                 PROCESS_PRIVATE                 = BIT(1),
>                 PROCESS_PRIVATE_AND_SHARED      = PROCESS_SHARED |
> PROCESS_PRIVATE,
>         };

Seems like it would work for all who have been concerned. The previous objection
to the enum (can't find the mail) was for requiring logic like:

if (zap == PROCESS_PRIVATE_AND_SHARED || zap == PROCESS_PRIVATE)
do_private_zap_stuff();


We are trying to tie things up internally so we can jointly have something to
stare at again, as the patches are diverging. But will make this adjustment.


>
> > > https://lore.kernel.org/all/ZUO1Giju0GkUdF0o@google.com/
> >
> > Currently in our internal branch we switched to:
> > exclude_private
> > exclude_shared
> >
> > It came together bettter in the code that uses it.
>
> If the choice is between an enum and exclude_*, I would strongly prefer the
> enum.
> Using exclude_* results in inverted polarity for the code that triggers
> invalidations.

Right, the awkwardness lands in that code.

The processing code looks nice though:
https://lore.kernel.org/kvm/5210e6e6e2eb73b04cb7039084015612479ae2fe.camel@intel.com/

>
> > But I started to wonder if we actually really need exclude_shared. For TDX
> > zapping private memory has to be done with more care, because it cannot be
> > re-
> > populated without guest coordination. But for shared memory if we are
> > zapping a
> > range that includes both private and shared memory, I don't think it should
> > hurt
> > to zap the shared memory.
>
> Hell no, I am not risking taking on more baggage in KVM where userspace or
> some
> other subsystem comes to rely on KVM spuriously zapping SPTEs in response to
> an
> unrelated userspace action. 

Hmm, I see the point. Thanks. This was just being left for later discussion
anyway.
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, Apr 26, 2024, Rick P Edgecombe wrote:
> On Fri, 2024-04-26 at 08:28 -0700, Sean Christopherson wrote:
> > If the choice is between an enum and exclude_*, I would strongly prefer the
> > enum. Using exclude_* results in inverted polarity for the code that
> > triggers invalidations.
>
> Right, the awkwardness lands in that code.
>
> The processing code looks nice though:
> https://lore.kernel.org/kvm/5210e6e6e2eb73b04cb7039084015612479ae2fe.camel@intel.com/

Heh, where's your bitmask abuse spirit? It's a little evil (and by "evil" I mean
awesome), but the need to process different roots is another good argument for an
enum+bitmask.

enum tdp_mmu_root_types {
KVM_SHARED_ROOTS = KVM_PROCESS_SHARED,
KVM_PRIVATE_ROOTS = KVM_PROCESS_PRIVATE,
KVM_VALID_ROOTS = BIT(2),
KVM_ANY_VALID_ROOT = KVM_SHARED_ROOT | KVM_PRIVATE_ROOT | KVM_VALID_ROOT,
KVM_ANY_ROOT = KVM_SHARED_ROOT | KVM_PRIVATE_ROOT,
}
static_assert(!(KVM_SHARED_ROOTS & KVM_VALID_ROOTS));
static_assert(!(KVM_PRIVATE_ROOTS & KVM_VALID_ROOTS));
static_assert(KVM_PRIVATE_ROOTS == (KVM_SHARED_ROOTS << 1));

/*
* Returns the next root after @prev_root (or the first root if @prev_root is
* NULL). A reference to the returned root is acquired, and the reference to
* @prev_root is released (the caller obviously must hold a reference to
* @prev_root if it's non-NULL).
*
* Returns NULL if the end of tdp_mmu_roots was reached.
*/
static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
struct kvm_mmu_page *prev_root,
enum tdp_mmu_root_types types)
{
bool only_valid = types & KVM_VALID_ROOTS;
struct kvm_mmu_page *next_root;

/*
* While the roots themselves are RCU-protected, fields such as
* role.invalid are protected by mmu_lock.
*/
lockdep_assert_held(&kvm->mmu_lock);

rcu_read_lock();

if (prev_root)
next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
&prev_root->link,
typeof(*prev_root), link);
else
next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
typeof(*next_root), link);

while (next_root) {
if ((!only_valid || !next_root->role.invalid) &&
(types & (KVM_SHARED_ROOTS << is_private_sp(root))) &&
kvm_tdp_mmu_get_root(next_root))
break;

next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
&next_root->link, typeof(*next_root), link);
}

rcu_read_unlock();

if (prev_root)
kvm_tdp_mmu_put_root(kvm, prev_root);

return next_root;
}

#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _types) \
for (_root = tdp_mmu_next_root(_kvm, NULL, _types); \
({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \
_root = tdp_mmu_next_root(_kvm, _root, _types)) \
if (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \
} else

#define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \
__for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, KVM_ANY_VALID_ROOT)

#define for_each_tdp_mmu_root_yield_safe(_kvm, _root) \
for (_root = tdp_mmu_next_root(_kvm, NULL, KVM_ANY_ROOT); \
({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \
_root = tdp_mmu_next_root(_kvm, _root, KVM_ANY_ROOT))
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, 2024-04-26 at 09:49 -0700, Sean Christopherson wrote:
>
> Heh, where's your bitmask abuse spirit?  It's a little evil (and by "evil" I
> mean
> awesome), but the need to process different roots is another good argument for
> an
> enum+bitmask.

Haha. There seems to be some special love for bit math in KVM. I was just
relaying my struggle to understand permission_fault() the other day.

>
> enum tdp_mmu_root_types {
>         KVM_SHARED_ROOTS = KVM_PROCESS_SHARED,
>         KVM_PRIVATE_ROOTS = KVM_PROCESS_PRIVATE,
>         KVM_VALID_ROOTS = BIT(2),
>         KVM_ANY_VALID_ROOT = KVM_SHARED_ROOT | KVM_PRIVATE_ROOT |
> KVM_VALID_ROOT,
>         KVM_ANY_ROOT = KVM_SHARED_ROOT | KVM_PRIVATE_ROOT,
> }
> static_assert(!(KVM_SHARED_ROOTS & KVM_VALID_ROOTS));
> static_assert(!(KVM_PRIVATE_ROOTS & KVM_VALID_ROOTS));
> static_assert(KVM_PRIVATE_ROOTS == (KVM_SHARED_ROOTS << 1));
>
> /*
>  * Returns the next root after @prev_root (or the first root if @prev_root is
>  * NULL).  A reference to the returned root is acquired, and the reference to
>  * @prev_root is released (the caller obviously must hold a reference to
>  * @prev_root if it's non-NULL).
>  *
>  * Returns NULL if the end of tdp_mmu_roots was reached.
>  */
> static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm,
>                                               struct kvm_mmu_page *prev_root,
>                                               enum tdp_mmu_root_types types)
> {
>         bool only_valid = types & KVM_VALID_ROOTS;
>         struct kvm_mmu_page *next_root;
>
>         /*
>          * While the roots themselves are RCU-protected, fields such as
>          * role.invalid are protected by mmu_lock.
>          */
>         lockdep_assert_held(&kvm->mmu_lock);
>
>         rcu_read_lock();
>
>         if (prev_root)
>                 next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>                                                   &prev_root->link,
>                                                   typeof(*prev_root), link);
>         else
>                 next_root = list_first_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>                                                    typeof(*next_root), link);
>
>         while (next_root) {
>                 if ((!only_valid || !next_root->role.invalid) &&
>                     (types & (KVM_SHARED_ROOTS << is_private_sp(root))) &&
>                     kvm_tdp_mmu_get_root(next_root))
>                         break;
>
>                 next_root = list_next_or_null_rcu(&kvm->arch.tdp_mmu_roots,
>                                 &next_root->link, typeof(*next_root), link);
>         }
>
>         rcu_read_unlock();
>
>         if (prev_root)
>                 kvm_tdp_mmu_put_root(kvm, prev_root);
>
>         return next_root;
> }
>
> #define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id,
> _types)         \
>         for (_root = tdp_mmu_next_root(_kvm, NULL,
> _types);                     \
>              ({ lockdep_assert_held(&(_kvm)->mmu_lock); }),
> _root;              \
>              _root = tdp_mmu_next_root(_kvm, _root,
> _types))                    \
>                 if (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id)
> {       \
>                 } else
>
> #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root,
> _as_id)             \
>         __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id,
> KVM_ANY_VALID_ROOT)
>
> #define for_each_tdp_mmu_root_yield_safe(_kvm,
> _root)                           \
>         for (_root = tdp_mmu_next_root(_kvm, NULL,
> KVM_ANY_ROOT);               \
>              ({ lockdep_assert_held(&(_kvm)->mmu_lock); }),
> _root;              \
>              _root = tdp_mmu_next_root(_kvm, _root, KVM_ANY_ROOT))

Ohh, yes move it into the iterators. I like it a lot.
Re: [PATCH v19 011/130] KVM: Add new members to struct kvm_gfn_range to operate on [ In reply to ]
On Fri, Apr 26, 2024, Rick P Edgecombe wrote:
> On Fri, 2024-04-26 at 09:49 -0700, Sean Christopherson wrote:
> >
> > Heh, where's your bitmask abuse spirit?  It's a little evil (and by "evil"
> > I mean awesome), but the need to process different roots is another good
> > argument for an enum+bitmask.
>
> Haha. There seems to be some special love for bit math in KVM. I was just
> relaying my struggle to understand permission_fault() the other day.

LOL, you and everyone else that's ever looked at that code. Just wait until you
run into one of Paolo's decimal-based bitmasks.