Mailing List Archive

[PATCH 0/8] use struct pt_regs based syscall calling for x86-64
Ingo,


On top of all the patches which remove in-kernel calls to syscall functions
merged in commit 642e7fd23353, it now becomes easy for achitectures to
re-define the syscall calling convention. For x86, this may be used to
merely decode those entries from struct pt_regs which are needed for a
specific syscall.

This approach avoids leaking random user-provided register content down
the call chain. Therefore, the seventh patch of this series extends the
register clearing in the entry path to a few more registers.

To exemplify: sys_recv() is a classic 4-parameter syscall. For this syscall,
the DEFINE_SYSCALL macro creates the following stub:

asmlinkage long sys_recv(struct pt_regs *regs)
{
return SyS_recv(regs->di, regs->si, regs->dx, regs->r10);
}

The assembly of that function then becomes, in slightly reordered fashion:

<sys_recv>:
callq <__fentry__>

/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx

[. SyS_recv() is inlined here by the compiler, as it is tiny ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d

/* do the actual work */
callq __sys_recvfrom

/* cleanup and return */
cltq
retq

For IA32_EMULATION and X32, additional care needs to be taken as they use
different registers to pass parameters to syscalls; vsyscalls need to be
modified to use this new calling convention as well.

This actual conversion of x86 syscalls is heavily based on a proof-of-concept
by Linus[*]. This patchset here differs, for example, as it provides a generic
config symbol ARCH_HAS_SYSCALL_WRAPPER, introduces <asm/syscall_wrapper.h>,
splits up the patch into several parts, and adds the actual register clearing.

[*] Accessible at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git WIP-syscall
It contains an additional patch
x86: avoid per-cpu system call trampoline
which is not included in my series as it addresses a different
issue, but may be of interest to the x86 maintainers as well.

Compared to v4.16-rc5 baseline and on a random kernel config, these patches
(in combination with the large do-not-call-syscalls-in-the-kernel series)
lead to a minisculue increase in text (+0.005%) and data (+0.11%) size on a
pure 64bit system,

text data bss dec hex filename
18853337 9535476 938380 29327193 1bf7f59 vmlinux-orig
18854227 9546100 938380 29338707 1bfac53 vmlinux,

with IA32_EMULATION and X32 enabled, the situation is just a little bit worse
for text size (+0.009%) and data (+0.38%) size.

text data bss dec hex filename
18902496 9603676 938444 29444616 1c14a08 vmlinux-orig
18904136 9640604 938444 29483184 1c1e0b0 vmlinux.

The 64bit part of this series has worked flawlessly on my local system for a
few weeks. IA32_EMULATION and x32 has passed some basic testing as well, but
has not yet been tested as extensively as x86-64. Pure i386 kernels are left
as-is, as they use a different asmlinkage anyway.


Changes since the series sent out to linux-kernel on March 30th:

all patches:
- rebase on top of commit 642e7fd23353

several patches:
- further extend and fix commentary; spelling fixes (e.g., nospec, 64-bit,
32-bit)

patch 3:
- do not clobber regs->dx on sys_getcpu() vsyscall

patch 5:
- rename __sys32_ia32_*() stubs to __sys_ia32_*()
- do not generate __sys_ia32_*() syscall table entries automatically, but
have them explicitely in arch/x86/entry/syscalls/syscall_32.tbl
- this means that there is no need to redefine SYSCALL_DEFINE0
- rename compat_sys_*() to __compat_sys_ia32_*(), as the calling convention
is different to "generic" compat_sys_*() [but see below]

patch 8: (your call...)
- introduce new patch 8: rename sys_*() to __sys_x86_*() -- while this
avoids symbol space overlap per your request, it doesn't improve the
code readibility by much. Moreover, if other architectures switch to
this syscall calling convention, there is no real "default" calling
convention any more. Therefore, I'd suggest *NOT* to apply this patch.

Thanks,
Dominik

Dominik Brodowski (7):
syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER
syscalls/x86: use struct pt_regs based syscall calling for 64-bit
syscalls
syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls
syscalls/x86: use struct pt_regs based syscall calling for
IA32_EMULATION and x32
syscalls/x86: unconditionally enable struct pt_regs based syscalls on
x86_64
x86/entry/64: extend register clearing on syscall entry to lower
registers
syscalls/x86: rename struct pt_regs-based sys_*() to __sys_x86_*()

Linus Torvalds (1):
x86: don't pointlessly reload the system call number

arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 2 +
arch/x86/entry/common.c | 20 +-
arch/x86/entry/entry_64.S | 3 +-
arch/x86/entry/entry_64_compat.S | 6 +
arch/x86/entry/syscall_32.c | 15 +-
arch/x86/entry/syscall_64.c | 6 +-
arch/x86/entry/syscalls/syscall_32.tbl | 724 +++++++++++++++++----------------
arch/x86/entry/syscalls/syscall_64.tbl | 712 ++++++++++++++++----------------
arch/x86/entry/vsyscall/vsyscall_64.c | 18 +-
arch/x86/include/asm/syscall.h | 4 +
arch/x86/include/asm/syscall_wrapper.h | 197 +++++++++
arch/x86/include/asm/syscalls.h | 17 +-
include/linux/compat.h | 22 +
include/linux/syscalls.h | 25 +-
init/Kconfig | 10 +
kernel/sys_ni.c | 10 +
kernel/time/posix-stubs.c | 10 +
18 files changed, 1054 insertions(+), 748 deletions(-)
create mode 100644 arch/x86/include/asm/syscall_wrapper.h

--
2.16.3
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> Dominik Brodowski (7):
> syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> syscalls/x86: use struct pt_regs based syscall calling for 64-bit
> syscalls
> syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls
> syscalls/x86: use struct pt_regs based syscall calling for
> IA32_EMULATION and x32
> syscalls/x86: unconditionally enable struct pt_regs based syscalls on
> x86_64
> x86/entry/64: extend register clearing on syscall entry to lower
> registers
> syscalls/x86: rename struct pt_regs-based sys_*() to __sys_x86_*()
>
> Linus Torvalds (1):
> x86: don't pointlessly reload the system call number
>
> arch/x86/Kconfig | 1 +
> arch/x86/entry/calling.h | 2 +
> arch/x86/entry/common.c | 20 +-
> arch/x86/entry/entry_64.S | 3 +-
> arch/x86/entry/entry_64_compat.S | 6 +
> arch/x86/entry/syscall_32.c | 15 +-
> arch/x86/entry/syscall_64.c | 6 +-
> arch/x86/entry/syscalls/syscall_32.tbl | 724 +++++++++++++++++----------------
> arch/x86/entry/syscalls/syscall_64.tbl | 712 ++++++++++++++++----------------
> arch/x86/entry/vsyscall/vsyscall_64.c | 18 +-
> arch/x86/include/asm/syscall.h | 4 +
> arch/x86/include/asm/syscall_wrapper.h | 197 +++++++++
> arch/x86/include/asm/syscalls.h | 17 +-
> include/linux/compat.h | 22 +
> include/linux/syscalls.h | 25 +-
> init/Kconfig | 10 +
> kernel/sys_ni.c | 10 +
> kernel/time/posix-stubs.c | 10 +
> 18 files changed, 1054 insertions(+), 748 deletions(-)
> create mode 100644 arch/x86/include/asm/syscall_wrapper.h

Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:

make[2]: *** No rule to make target 'archheaders'. Stop.
arch/um/Makefile:119: recipe for target 'archheaders' failed
make[1]: *** [archheaders] Error 2
make[1]: *** Waiting for unfinished jobs....

Thanks,

Ingo
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
>
> make[2]: *** No rule to make target 'archheaders'. Stop.
> arch/um/Makefile:119: recipe for target 'archheaders' failed
> make[1]: *** [archheaders] Error 2
> make[1]: *** Waiting for unfinished jobs....

Ah, that's caused by patch 8/8 which I did and do not like all that much
anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
__sys_x86_pread64, but expects the generic syscall stub sys_pread64
referenced there. Fixup patch below; could be folded with patch 8/8. Or
patch 8/8 could simply be dropped from the series altogether...

Thanks,
Dominik

--------------------------------------------------------------------------
From f5049ea4e1e5e7751e72a22cbc1b3a9389959a04 Mon Sep 17 00:00:00 2001
From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Thu, 5 Apr 2018 22:16:23 +0200
Subject: [PATCH] syscalls/x86: fix UML syscall table

To differentiate the different calling regime used on x86, the syscall
functions received __sys_x86_ as prefix (instead of sys_). This breaks
the build of UML on 64-bit x86, as it re-uses the same syscall table.

To fix this, replace the __sys_x86_ prefix with sys_ during the generation
of <asm/syscalls_64.h>.

Fixes: syscalls/x86: rename struct pt_regs-based sys_*() to __sys_x86_*()
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: x86@kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index d71ef4bd3615..d6376b87ecb2 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -20,17 +20,47 @@ syscall_macro() {
echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)"
}

-emit() {
+emit64() {
abi="$1"
nr="$2"
entry="$3"
compat="$4"

- if [ "$abi" = "64" -a -n "$compat" ]; then
+ if [ "$abi" = "I386" ]; then
+ echo "emit64() called for a 32-bit syscall" >&2
+ exit 1
+ fi
+
+ if [ -n "$compat" ]; then
echo "a compat entry for a 64-bit syscall makes no sense" >&2
exit 1
fi

+ if [ -n "$entry" ]; then
+ # For CONFIG_UML, we need to strip the __sys_x86 prefix
+ if [ "${entry}" = "${entry#__compat_sys}" ]; then
+ umlentry="sys${entry#__sys_x86}"
+ fi
+
+ echo "#ifdef CONFIG_X86"
+ syscall_macro "$abi" "$nr" "$entry"
+ echo "#else /* CONFIG_UML */"
+ syscall_macro "$abi" "$nr" "$umlentry"
+ echo "#endif"
+ fi
+}
+
+emit32() {
+ abi="$1"
+ nr="$2"
+ entry="$3"
+ compat="$4"
+
+ if [ "$abi" = "64" ]; then
+ echo "emit32() called for a 64-bit syscall" >&2
+ exit 1
+ fi
+
if [ -z "$compat" ]; then
if [ -n "$entry" ]; then
syscall_macro "$abi" "$nr" "$entry"
@@ -53,14 +83,14 @@ grep '^[0-9]' "$in" | sort -n | (
# COMMON is the same as 64, except that we don't expect X32
# programs to use it. Our expectation has nothing to do with
# any generated code, so treat them the same.
- emit 64 "$nr" "$entry" "$compat"
+ emit64 64 "$nr" "$entry" "$compat"
elif [ "$abi" = "X32" ]; then
# X32 is equivalent to 64 on an X32-compatible kernel.
echo "#ifdef CONFIG_X86_X32_ABI"
- emit 64 "$nr" "$entry" "$compat"
+ emit64 64 "$nr" "$entry" "$compat"
echo "#endif"
elif [ "$abi" = "I386" ]; then
- emit "$abi" "$nr" "$entry" "$compat"
+ emit32 "$abi" "$nr" "$entry" "$compat"
else
echo "Unknown abi $abi" >&2
exit 1
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
On Fri, Apr 06, 2018 at 10:23:22AM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> > > Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
> > >
> > > make[2]: *** No rule to make target 'archheaders'. Stop.
> > > arch/um/Makefile:119: recipe for target 'archheaders' failed
> > > make[1]: *** [archheaders] Error 2
> > > make[1]: *** Waiting for unfinished jobs....
> >
> > Ah, that's caused by patch 8/8 which I did and do not like all that much
> > anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
> > __sys_x86_pread64, but expects the generic syscall stub sys_pread64
> > referenced there. Fixup patch below; could be folded with patch 8/8. Or
> > patch 8/8 could simply be dropped from the series altogether...
>
> I still like the 'truth in advertising' aspect. For example if I see this in the
> syscall table:
>
> 10 common mprotect __sys_x86_mprotect
>
> I can immediately find the _real_ syscall entry point:
>
> ffffffff81180a10 <__sys_x86_mprotect>:
> ffffffff81180a10: 48 8b 57 60 mov 0x60(%rdi),%rdx
> ffffffff81180a14: 48 8b 77 68 mov 0x68(%rdi),%rsi
> ffffffff81180a18: b9 ff ff ff ff mov $0xffffffff,%ecx
> ffffffff81180a1d: 48 8b 7f 70 mov 0x70(%rdi),%rdi
> ffffffff81180a21: e8 fa fc ff ff callq ffffffff81180720 <do_mprotect_pkey>
> ffffffff81180a26: 48 98 cltq
> ffffffff81180a28: c3 retq
> ffffffff81180a29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
>
> If, on the other hand, I see this entry:
>
> 10 common mprotect sys_mprotect
>
> Then, as a first step, no symbol anywhere matches with this:
>
> triton:~/tip> grep sys_mprotect System.map
> triton:~/tip>
>
> "sys_mprotect" does not exist in any easily discoverable sense. You have to *know*
> to replace the sys_ prefix with __sys_x86_ to find it.
>
> Now arguably we could use a __sys_ prefix instead of the grep-barrier __sys_x86
> prefix - but that too would be somewhat confusing I think.
>
> I mean, the fact that we are passing in a ptregs pointer is a complexity of the
> x86 kernel that *exists*, why hide it and make it harder to discover what's
> happening, for something as important as system calls?
>
> In terms of UML breakage, UML arguably is tightly coupled to its host
> architecture:
>
> > Subject: [PATCH] syscalls/x86: fix UML syscall table
>
> Even with your patch applied I still see build failures:
>
> $ make ARCH=um defconfig
> $ make ARCH=um linux
> ...
> arch/um/os-Linux/signal.c: In function ‘hard_handler’:
> arch/um/os-Linux/signal.c:163:22: error: dereferencing pointer to incomplete type
> ‘struct ucontext’
> mcontext_t *mc = &uc->uc_mcontext;
> ^~
> scripts/Makefile.build:324: recipe for target 'arch/um/os-Linux/signal.o' failed
> make[1]: *** [arch/um/os-Linux/signal.o] Error 1

That build failure is already present in mainline as of 38c23685b273
(when building on Arch / gcc-7.3.1; building on Debian oldstable / gcc-4.9
works fine). And -- just checked -- this build failure also exists for
plain v4.16.

Thanks,
Dominik
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> > Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
> >
> > make[2]: *** No rule to make target 'archheaders'. Stop.
> > arch/um/Makefile:119: recipe for target 'archheaders' failed
> > make[1]: *** [archheaders] Error 2
> > make[1]: *** Waiting for unfinished jobs....
>
> Ah, that's caused by patch 8/8 which I did and do not like all that much
> anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
> __sys_x86_pread64, but expects the generic syscall stub sys_pread64
> referenced there. Fixup patch below; could be folded with patch 8/8. Or
> patch 8/8 could simply be dropped from the series altogether...

I still like the 'truth in advertising' aspect. For example if I see this in the
syscall table:

10 common mprotect __sys_x86_mprotect

I can immediately find the _real_ syscall entry point:

ffffffff81180a10 <__sys_x86_mprotect>:
ffffffff81180a10: 48 8b 57 60 mov 0x60(%rdi),%rdx
ffffffff81180a14: 48 8b 77 68 mov 0x68(%rdi),%rsi
ffffffff81180a18: b9 ff ff ff ff mov $0xffffffff,%ecx
ffffffff81180a1d: 48 8b 7f 70 mov 0x70(%rdi),%rdi
ffffffff81180a21: e8 fa fc ff ff callq ffffffff81180720 <do_mprotect_pkey>
ffffffff81180a26: 48 98 cltq
ffffffff81180a28: c3 retq
ffffffff81180a29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)

If, on the other hand, I see this entry:

10 common mprotect sys_mprotect

Then, as a first step, no symbol anywhere matches with this:

triton:~/tip> grep sys_mprotect System.map
triton:~/tip>

"sys_mprotect" does not exist in any easily discoverable sense. You have to *know*
to replace the sys_ prefix with __sys_x86_ to find it.

Now arguably we could use a __sys_ prefix instead of the grep-barrier __sys_x86
prefix - but that too would be somewhat confusing I think.

I mean, the fact that we are passing in a ptregs pointer is a complexity of the
x86 kernel that *exists*, why hide it and make it harder to discover what's
happening, for something as important as system calls?

In terms of UML breakage, UML arguably is tightly coupled to its host
architecture:

> Subject: [PATCH] syscalls/x86: fix UML syscall table

Even with your patch applied I still see build failures:

$ make ARCH=um defconfig
$ make ARCH=um linux
...
arch/um/os-Linux/signal.c: In function ‘hard_handler’:
arch/um/os-Linux/signal.c:163:22: error: dereferencing pointer to incomplete type
‘struct ucontext’
mcontext_t *mc = &uc->uc_mcontext;
^~
scripts/Makefile.build:324: recipe for target 'arch/um/os-Linux/signal.o' failed
make[1]: *** [arch/um/os-Linux/signal.o] Error 1

Thanks,

Ingo
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
On Fri, Apr 06, 2018 at 10:23:22AM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> > > Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
> > >
> > > make[2]: *** No rule to make target 'archheaders'. Stop.
> > > arch/um/Makefile:119: recipe for target 'archheaders' failed
> > > make[1]: *** [archheaders] Error 2
> > > make[1]: *** Waiting for unfinished jobs....
> >
> > Ah, that's caused by patch 8/8 which I did and do not like all that much
> > anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
> > __sys_x86_pread64, but expects the generic syscall stub sys_pread64
> > referenced there. Fixup patch below; could be folded with patch 8/8. Or
> > patch 8/8 could simply be dropped from the series altogether...
>
> I still like the 'truth in advertising' aspect. For example if I see this in the
> syscall table:
>
> 10 common mprotect __sys_x86_mprotect
>
> I can immediately find the _real_ syscall entry point:
>
> ffffffff81180a10 <__sys_x86_mprotect>:
> ffffffff81180a10: 48 8b 57 60 mov 0x60(%rdi),%rdx
> ffffffff81180a14: 48 8b 77 68 mov 0x68(%rdi),%rsi
> ffffffff81180a18: b9 ff ff ff ff mov $0xffffffff,%ecx
> ffffffff81180a1d: 48 8b 7f 70 mov 0x70(%rdi),%rdi
> ffffffff81180a21: e8 fa fc ff ff callq ffffffff81180720 <do_mprotect_pkey>
> ffffffff81180a26: 48 98 cltq
> ffffffff81180a28: c3 retq
> ffffffff81180a29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
>
> If, on the other hand, I see this entry:
>
> 10 common mprotect sys_mprotect
>
> Then, as a first step, no symbol anywhere matches with this:
>
> triton:~/tip> grep sys_mprotect System.map
> triton:~/tip>
>
> "sys_mprotect" does not exist in any easily discoverable sense. You have to *know*
> to replace the sys_ prefix with __sys_x86_ to find it.
>
> Now arguably we could use a __sys_ prefix instead of the grep-barrier __sys_x86
> prefix - but that too would be somewhat confusing I think.

Well, if looking at the ARCH="um" kernel, you won't find the
__sys_x86_mprotect there in its System.map -- so we either have to
disentangle um and plain x86, or live with some cause for confusion.

__sys_mprotect as prefix won't work by the way, as the double-underscore
__sys_ variant is already used in net/* for internal syscall helpers.

Thanks,
Dominik
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
On Fri, Apr 06, 2018 at 11:20:46AM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > On Fri, Apr 06, 2018 at 10:23:22AM +0200, Ingo Molnar wrote:
> > >
> > > * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > >
> > > > On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> > > > > Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
> > > > >
> > > > > make[2]: *** No rule to make target 'archheaders'. Stop.
> > > > > arch/um/Makefile:119: recipe for target 'archheaders' failed
> > > > > make[1]: *** [archheaders] Error 2
> > > > > make[1]: *** Waiting for unfinished jobs....
> > > >
> > > > Ah, that's caused by patch 8/8 which I did and do not like all that much
> > > > anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
> > > > __sys_x86_pread64, but expects the generic syscall stub sys_pread64
> > > > referenced there. Fixup patch below; could be folded with patch 8/8. Or
> > > > patch 8/8 could simply be dropped from the series altogether...
> > >
> > > I still like the 'truth in advertising' aspect. For example if I see this in the
> > > syscall table:
> > >
> > > 10 common mprotect __sys_x86_mprotect
> > >
> > > I can immediately find the _real_ syscall entry point:
> > >
> > > ffffffff81180a10 <__sys_x86_mprotect>:
> > > ffffffff81180a10: 48 8b 57 60 mov 0x60(%rdi),%rdx
> > > ffffffff81180a14: 48 8b 77 68 mov 0x68(%rdi),%rsi
> > > ffffffff81180a18: b9 ff ff ff ff mov $0xffffffff,%ecx
> > > ffffffff81180a1d: 48 8b 7f 70 mov 0x70(%rdi),%rdi
> > > ffffffff81180a21: e8 fa fc ff ff callq ffffffff81180720 <do_mprotect_pkey>
> > > ffffffff81180a26: 48 98 cltq
> > > ffffffff81180a28: c3 retq
> > > ffffffff81180a29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> > >
> > > If, on the other hand, I see this entry:
> > >
> > > 10 common mprotect sys_mprotect
> > >
> > > Then, as a first step, no symbol anywhere matches with this:
> > >
> > > triton:~/tip> grep sys_mprotect System.map
> > > triton:~/tip>
> > >
> > > "sys_mprotect" does not exist in any easily discoverable sense. You have to *know*
> > > to replace the sys_ prefix with __sys_x86_ to find it.
> > >
> > > Now arguably we could use a __sys_ prefix instead of the grep-barrier __sys_x86
> > > prefix - but that too would be somewhat confusing I think.
> >
> > Well, if looking at the ARCH="um" kernel, you won't find the __sys_x86_mprotect
> > there in its System.map -- so we either have to disentangle um and plain x86, or
> > live with some cause for confusion.
>
> I'm primarily concerned about everything making sense on x86 - UML is an entirely
> separate architecture with heavy tradeoffs and kludges.

Agreed.

> > __sys_mprotect as prefix won't work by the way, as the double-underscore __sys_
> > variant is already used in net/* for internal syscall helpers.
>
> Ok - then triple underscore - but overall I think it's more confusing.
>
> Btw., what was the problem with calling the x86 ptregs wrapper sys_mprotect?
>
> The only reason I suggested the __sys_x86_ prefix was because you originally
> suggested that there's symbol name overlap, but I don't think that's the case
> within the same kernel build, as the regular non-ptregs prototype:

Indeed, there's no symbol name overlap within the same kernel build, but
technically different stubs named the same. If that's fine, just drop patch
8/8 (including the UML fixup) and things should be fine, with the stub and
the entry in the syscall table both named sys_mprotect.

For IA32_EMULATION, we have __sys_ia32_mprotect as stub for the same
syscall, including this name as entry in syscall_32.tbl.

More problematic is the naming for the compat stubs for IA32_EMAULATION and
X32, where we have

__compat_sys_ia32_waitid
__compat_sys_x32_waitid

for example. We *could* rename one of those to compat_sys_waitid() and levae
the other as-is, but actually I prefer it now how it is.

Thanks,
Dominik
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> On Fri, Apr 06, 2018 at 10:23:22AM +0200, Ingo Molnar wrote:
> >
> > * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> >
> > > On Thu, Apr 05, 2018 at 05:19:33PM +0200, Ingo Molnar wrote:
> > > > Ok, this series looks mostly good to me, but AFAICS this breaks the UML build:
> > > >
> > > > make[2]: *** No rule to make target 'archheaders'. Stop.
> > > > arch/um/Makefile:119: recipe for target 'archheaders' failed
> > > > make[1]: *** [archheaders] Error 2
> > > > make[1]: *** Waiting for unfinished jobs....
> > >
> > > Ah, that's caused by patch 8/8 which I did and do not like all that much
> > > anyway: UML re-uses syscall_64.tbl which now has x86-specific entries like
> > > __sys_x86_pread64, but expects the generic syscall stub sys_pread64
> > > referenced there. Fixup patch below; could be folded with patch 8/8. Or
> > > patch 8/8 could simply be dropped from the series altogether...
> >
> > I still like the 'truth in advertising' aspect. For example if I see this in the
> > syscall table:
> >
> > 10 common mprotect __sys_x86_mprotect
> >
> > I can immediately find the _real_ syscall entry point:
> >
> > ffffffff81180a10 <__sys_x86_mprotect>:
> > ffffffff81180a10: 48 8b 57 60 mov 0x60(%rdi),%rdx
> > ffffffff81180a14: 48 8b 77 68 mov 0x68(%rdi),%rsi
> > ffffffff81180a18: b9 ff ff ff ff mov $0xffffffff,%ecx
> > ffffffff81180a1d: 48 8b 7f 70 mov 0x70(%rdi),%rdi
> > ffffffff81180a21: e8 fa fc ff ff callq ffffffff81180720 <do_mprotect_pkey>
> > ffffffff81180a26: 48 98 cltq
> > ffffffff81180a28: c3 retq
> > ffffffff81180a29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
> >
> > If, on the other hand, I see this entry:
> >
> > 10 common mprotect sys_mprotect
> >
> > Then, as a first step, no symbol anywhere matches with this:
> >
> > triton:~/tip> grep sys_mprotect System.map
> > triton:~/tip>
> >
> > "sys_mprotect" does not exist in any easily discoverable sense. You have to *know*
> > to replace the sys_ prefix with __sys_x86_ to find it.
> >
> > Now arguably we could use a __sys_ prefix instead of the grep-barrier __sys_x86
> > prefix - but that too would be somewhat confusing I think.
>
> Well, if looking at the ARCH="um" kernel, you won't find the __sys_x86_mprotect
> there in its System.map -- so we either have to disentangle um and plain x86, or
> live with some cause for confusion.

I'm primarily concerned about everything making sense on x86 - UML is an entirely
separate architecture with heavy tradeoffs and kludges.

> __sys_mprotect as prefix won't work by the way, as the double-underscore __sys_
> variant is already used in net/* for internal syscall helpers.

Ok - then triple underscore - but overall I think it's more confusing.

Btw., what was the problem with calling the x86 ptregs wrapper sys_mprotect?

The only reason I suggested the __sys_x86_ prefix was because you originally
suggested that there's symbol name overlap, but I don't think that's the case
within the same kernel build, as the regular non-ptregs prototype:

asmlinkage long sys_mprotect(unsigned long start, size_t len, unsigned long prot);

... will only exist on !CONFIG_ARCH_HAS_SYSCALL_WRAPPER kernels.

So maybe that's the simplest and least confusing solution.

Thanks,

Ingo
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> > > __sys_mprotect as prefix won't work by the way, as the double-underscore __sys_
> > > variant is already used in net/* for internal syscall helpers.
> >
> > Ok - then triple underscore - but overall I think it's more confusing.
> >
> > Btw., what was the problem with calling the x86 ptregs wrapper sys_mprotect?
> >
> > The only reason I suggested the __sys_x86_ prefix was because you originally
> > suggested that there's symbol name overlap, but I don't think that's the case
> > within the same kernel build, as the regular non-ptregs prototype:
>
> Indeed, there's no symbol name overlap within the same kernel build, but
> technically different stubs named the same. If that's fine, just drop patch
> 8/8 (including the UML fixup) and things should be fine, with the stub and
> the entry in the syscall table both named sys_mprotect.

Ok, I've dropped patch #8.

> For IA32_EMULATION, we have __sys_ia32_mprotect as stub for the same
> syscall, including this name as entry in syscall_32.tbl.
>
> More problematic is the naming for the compat stubs for IA32_EMAULATION and
> X32, where we have
>
> __compat_sys_ia32_waitid
> __compat_sys_x32_waitid
>
> for example. We *could* rename one of those to compat_sys_waitid() and levae
> the other as-is, but actually I prefer it now how it is.

yeah, this is more symmetric I think.

So right now we have these symbols:

triton:~/tip> grep waitid System.map

ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
ffffffff8105f2b0 t SYSC_waitid # 64-bit uaddr args C function 352 bytes
ffffffff8105f410 T sys_waitid # 64-bit-ptregs -> C stub, 32 bytes
ffffffff8105f430 T __sys_ia32_waitid # 32-bit-ptregs -> C stub, 32 bytes
ffffffff8105f450 t C_SYSC_waitid # 32-bit uaddr args C function, 400 bytes
ffffffff8105f5e0 T __compat_sys_ia32_waitid # 32-bit-ptregs -> C stub, 32 bytes
ffffffff8105f600 T __compat_sys_x32_waitid # 64-bit-ptregs -> C stub, 32 bytes

BTW., what is the role of generating __sys_ia32_waitid()? It seems unused when a
syscall has a compat variant otherwise - like here.

Naming wise the odd thumb out is sys_waitid :-/

I'd argue that we should at minimum name it __sys_waitid:

ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
ffffffff8105f2b0 t SYSC_waitid # 64-bit uaddr args C function
ffffffff8105f410 T __sys_waitid # 64-bit-ptregs -> C stub
ffffffff8105f430 T __sys_ia32_waitid # 32-bit-ptregs -> C stub
ffffffff8105f450 t C_SYSC_waitid # 32-bit uaddr args C function
ffffffff8105f5e0 T __compat_sys_ia32_waitid # 32-bit-ptregs -> C stub
ffffffff8105f600 T __compat_sys_x32_waitid # 64-bit-ptregs -> C stub

because that makes it all organized based on the same principle:

{__compat|_}_sys{_ia32|_x32|}_waittid

But arguably there are a whole lot more naming weirdnesses we could fix:

- I find it somewhat confusing that that 'C' in C_SYSC stands not for a C callign
convention, but for 'COMPAT' - i.e. COMPAT_SYSC would be better.

- Another detail is that why is it called 'SYSC', if the other functions use the
'sys' prefix? Wouldn't 'SYS' be more consistent?

- If we introduced a prefix for the regular 64-bit system call format as well,
we could have: x64, x32 and ia32.

- And finally, I think the argument format modifiers should be consistently
prefixes - right now they are a mixture of pre- and post-fixes.

I.e. I'd generate the names like this:

__{x64,x32,ia32}[_compat]_sys_waittid()

The fully consistent nomenclature would be someting like this:

ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
ffffffff8105f2b0 t SYS_waitid # 64-bit uaddr args C function
ffffffff8105f410 T __x64_sys_waitid # 64-bit-ptregs -> C stub
ffffffff8105f430 T __ia32_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f450 t COMPAT_SYS_waitid # 32-bit uaddr args C function
ffffffff8105f5e0 T __ia32_compat_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f600 T __x32_compat_sys_waitid # 64-bit-ptregs -> C stub

Looks a lot tidier and a lot more logical, doesn't it?

Makes grepping easier as well, because (case-insensitive) patterns like
'sys_waittid' would identify all the variants.

Personally I'd also do a s/ia32/i32 rename:

ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
ffffffff8105f2b0 t SYS_waitid # 64-bit uaddr args C function
ffffffff8105f410 T __x64_sys_waitid # 64-bit-ptregs -> C stub
ffffffff8105f430 T __i32_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f450 t COMPAT_SYS_waitid # 32-bit uaddr args C function
ffffffff8105f5e0 T __i32_compat_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f600 T __x32_compat_sys_waitid # 64-bit-ptregs -> C stub

... but maybe that's too much ;-)

Thanks,

Ingo
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
On Fri, Apr 06, 2018 at 02:34:20PM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > > > __sys_mprotect as prefix won't work by the way, as the double-underscore __sys_
> > > > variant is already used in net/* for internal syscall helpers.
> > >
> > > Ok - then triple underscore - but overall I think it's more confusing.
> > >
> > > Btw., what was the problem with calling the x86 ptregs wrapper sys_mprotect?
> > >
> > > The only reason I suggested the __sys_x86_ prefix was because you originally
> > > suggested that there's symbol name overlap, but I don't think that's the case
> > > within the same kernel build, as the regular non-ptregs prototype:
> >
> > Indeed, there's no symbol name overlap within the same kernel build, but
> > technically different stubs named the same. If that's fine, just drop patch
> > 8/8 (including the UML fixup) and things should be fine, with the stub and
> > the entry in the syscall table both named sys_mprotect.
>
> Ok, I've dropped patch #8.

Thanks!

> > For IA32_EMULATION, we have __sys_ia32_mprotect as stub for the same
> > syscall, including this name as entry in syscall_32.tbl.
> >
> > More problematic is the naming for the compat stubs for IA32_EMAULATION and
> > X32, where we have
> >
> > __compat_sys_ia32_waitid
> > __compat_sys_x32_waitid
> >
> > for example. We *could* rename one of those to compat_sys_waitid() and levae
> > the other as-is, but actually I prefer it now how it is.
>
> yeah, this is more symmetric I think.
>
> So right now we have these symbols:
>
> triton:~/tip> grep waitid System.map
>
> ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
> ffffffff8105f2b0 t SYSC_waitid # 64-bit uaddr args C function 352 bytes
> ffffffff8105f410 T sys_waitid # 64-bit-ptregs -> C stub, 32 bytes
> ffffffff8105f430 T __sys_ia32_waitid # 32-bit-ptregs -> C stub, 32 bytes
> ffffffff8105f450 t C_SYSC_waitid # 32-bit uaddr args C function, 400 bytes
> ffffffff8105f5e0 T __compat_sys_ia32_waitid # 32-bit-ptregs -> C stub, 32 bytes
> ffffffff8105f600 T __compat_sys_x32_waitid # 64-bit-ptregs -> C stub, 32 bytes
>
> BTW., what is the role of generating __sys_ia32_waitid()? It seems unused when a
> syscall has a compat variant otherwise - like here.

Indeed it is unused -- but when compiling

SYSCALL_DEFINE5(waitid, ...)

we don't know yet that somewhere else in the kernel there is

COMPAT_SYSCALL_DEFINE5(waitid, ...)

So if we want to compile the __sys_ia32_ stub only iff it is used, we'd
need to have different macros for syscalls which have compat variants and
for those which do not. Or do you have an idea how we could trick the
compiler to do the right thing?

> Naming wise the odd thumb out is sys_waitid :-/
>
> I'd argue that we should at minimum name it __sys_waitid:

Can't do, as that namespace is already taken.

> ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
> ffffffff8105f2b0 t SYSC_waitid # 64-bit uaddr args C function
> ffffffff8105f410 T __sys_waitid # 64-bit-ptregs -> C stub
> ffffffff8105f430 T __sys_ia32_waitid # 32-bit-ptregs -> C stub
> ffffffff8105f450 t C_SYSC_waitid # 32-bit uaddr args C function
> ffffffff8105f5e0 T __compat_sys_ia32_waitid # 32-bit-ptregs -> C stub
> ffffffff8105f600 T __compat_sys_x32_waitid # 64-bit-ptregs -> C stub
>
> because that makes it all organized based on the same principle:
>
> {__compat|_}_sys{_ia32|_x32|}_waittid
>
> But arguably there are a whole lot more naming weirdnesses we could fix:
>
> - I find it somewhat confusing that that 'C' in C_SYSC stands not for a C callign
> convention, but for 'COMPAT' - i.e. COMPAT_SYSC would be better.
>
> - Another detail is that why is it called 'SYSC', if the other functions use the
> 'sys' prefix? Wouldn't 'SYS' be more consistent?

It sounds totally reasonable to re-name those, but we should do it not only
for the x86 case, but in include/linux/syscalls.h as well.

> - If we introduced a prefix for the regular 64-bit system call format as well,
> we could have: x64, x32 and ia32.
>
> - And finally, I think the argument format modifiers should be consistently
> prefixes - right now they are a mixture of pre- and post-fixes.
>
> I.e. I'd generate the names like this:
>
> __{x64,x32,ia32}[_compat]_sys_waittid()
>
> The fully consistent nomenclature would be someting like this:
>
> ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
> ffffffff8105f2b0 t SYS_waitid # 64-bit uaddr args C function
> ffffffff8105f410 T __x64_sys_waitid # 64-bit-ptregs -> C stub
> ffffffff8105f430 T __ia32_sys_waitid # 32-bit-ptregs -> C stub
> ffffffff8105f450 t COMPAT_SYS_waitid # 32-bit uaddr args C function
> ffffffff8105f5e0 T __ia32_compat_sys_waitid # 32-bit-ptregs -> C stub
> ffffffff8105f600 T __x32_compat_sys_waitid # 64-bit-ptregs -> C stub
>
> Looks a lot tidier and a lot more logical, doesn't it?

Indeed. Want me to prepare a new patch 8/8 on top which does the renaming
(for x86 and for the generic case), or will you do the re-naming while
merging my patches yourself?

> Personally I'd also do a s/ia32/i32 rename:

Uh, no, please. It makes things align more beautifully, yes, but the
32-bit sub-architecture commonly is referred to as either i386 or ia32...

Thanks,
Dominik
Re: [PATCH 0/8] use struct pt_regs based syscall calling for x86-64 [ In reply to ]
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> > I.e. I'd generate the names like this:
> >
> > __{x64,x32,ia32}[_compat]_sys_waittid()
> >
> > The fully consistent nomenclature would be someting like this:
> >
> > ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
> > ffffffff8105f2b0 t SYS_waitid # 64-bit uaddr args C function
> > ffffffff8105f410 T __x64_sys_waitid # 64-bit-ptregs -> C stub
> > ffffffff8105f430 T __ia32_sys_waitid # 32-bit-ptregs -> C stub
> > ffffffff8105f450 t COMPAT_SYS_waitid # 32-bit uaddr args C function
> > ffffffff8105f5e0 T __ia32_compat_sys_waitid # 32-bit-ptregs -> C stub
> > ffffffff8105f600 T __x32_compat_sys_waitid # 64-bit-ptregs -> C stub
> >
> > Looks a lot tidier and a lot more logical, doesn't it?
>
> Indeed. Want me to prepare a new patch 8/8 on top which does the renaming
> (for x86 and for the generic case), or will you do the re-naming while
> merging my patches yourself?

Please do an 8/8 patch that does the rename - I'll push out the first 7 patches so
they get more testing.

Note, I have not checked the above name space for namespace collisions - but
unless we are unlucky it should be fine.

BTW., is there any deep reason why some of these names are capitalized?

I.e. could we use:

ffffffff8105f1e0 t kernel_waitid # common C function (64-bit kargs)
ffffffff8105f2b0 t sys_waitid # 64-bit uaddr args C function
ffffffff8105f410 T __x64_sys_waitid # 64-bit-ptregs -> C stub
ffffffff8105f430 T __ia32_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f450 t compat_sys_waitid # 32-bit uaddr args C function
ffffffff8105f5e0 T __ia32_compat_sys_waitid # 32-bit-ptregs -> C stub
ffffffff8105f600 T __x32_compat_sys_waitid # 64-bit-ptregs -> C stub

?

Note how this reduces naming complexity and increases the self-consistency even more.

Thanks,

Ingo