Mailing List Archive

[xen master] x86/hvm: Improve hvm_set_guest_pat() code generation again
commit 715b92ba30f792e326bdd37b5a4969da9c5d4a6c
Author: Edwin Török <edvin.torok@citrix.com>
AuthorDate: Mon May 16 20:45:13 2022 +0100
Commit: Andrew Cooper <andrew.cooper3@citrix.com>
CommitDate: Fri Mar 24 12:16:31 2023 +0000

x86/hvm: Improve hvm_set_guest_pat() code generation again

Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat()
code generation"), and the discovery that Clang/LLVM makes some especially
disastrous code generation for the loop at -O2

https://github.com/llvm/llvm-project/issues/54644

Edvin decided to remove the loop entirely by fully vectorising it. This is
substantially more efficient than the loop, and rather harder for a typical
compiler to mess up.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
xen/arch/x86/hvm/hvm.c | 51 ++++++++++++++++++++++++++++++++++----------------
1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0c81e2afc7..7342408233 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -299,24 +299,43 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat)
*guest_pat = v->arch.hvm.pat_cr;
}

-int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
+/*
+ * MSR_PAT takes 8 uniform fields, each of which must be a valid architectural
+ * memory type (0-1, 4-7). This is a fully vectorised form of the 8-iteration
+ * loop over bytes looking for X86_MT_* constants.
+ */
+static bool pat_valid(uint64_t val)
{
- unsigned int i;
- uint64_t tmp;
+ /* Yields a non-zero value in any lane which had value greater than 7. */
+ uint64_t any_gt_7 = val & 0xf8f8f8f8f8f8f8f8ull;

- for ( i = 0, tmp = guest_pat; i < 8; i++, tmp >>= 8 )
- switch ( tmp & 0xff )
- {
- case X86_MT_UCM:
- case X86_MT_UC:
- case X86_MT_WB:
- case X86_MT_WC:
- case X86_MT_WP:
- case X86_MT_WT:
- break;
- default:
- return 0;
- }
+ /*
+ * With the > 7 case covered, identify lanes with the value 0-3 by finding
+ * lanes with bit 2 clear.
+ *
+ * Yields bit 2 set in each lane which has a value <= 3.
+ */
+ uint64_t any_le_3 = ~val & 0x0404040404040404ull;
+
+ /*
+ * Logically, any_2_or_3 is "any_le_3 && bit 1 set".
+ *
+ * We could calculate any_gt_1 as val & 0x02 and resolve the two vectors
+ * of booleans (shift one of them until the mask lines up, then bitwise
+ * and), but that is unnecessary calculation.
+ *
+ * Shift any_le_3 so it becomes bit 1 in each lane which has a value <= 3,
+ * and look for bit 1 in a subset of lanes.
+ */
+ uint64_t any_2_or_3 = val & (any_le_3 >> 1);
+
+ return !(any_gt_7 | any_2_or_3);
+}
+
+int hvm_set_guest_pat(struct vcpu *v, uint64_t guest_pat)
+{
+ if ( !pat_valid(guest_pat) )
+ return 0;

if ( !alternative_call(hvm_funcs.set_guest_pat, v, guest_pat) )
v->arch.hvm.pat_cr = guest_pat;
--
generated by git-patchbot for /home/xen/git/xen.git#master