Mailing List Archive

[PATCH v2 0/8] x86emul: a few small steps towards disintegration
... of the huge monolithic source file. The series is largely code
movement and hence has the intention of not incurring any functional
change.

It has now been almost a year since the v1 submission, without having
had any feedback. Some re-basing was necessary in the meantime, and a
new patch (the last one) has been added - even if seemingly unrelated,
it was in this context where I did think of that possible adjustment
(which may want to be viewed somewhat RFC, as I know there are
reservations against the use of -Os).

1: split off opcode 0f01 handling
2: split off opcode 0fae handling
3: split off opcode 0fc7 handling
4: split off FPU opcode handling
5: split off insn decoding
6: move x86_emul_blk() to separate source file
7: move various utility functions to separate source files
8: build with -Os

Jan
RE: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
Hi,

It seems that this series has been stale for nearly a month, with nothing heard
from maintainers. So I am sending this email as a gentle reminder for maintainers.
Thanks!

Kind regards,
Henry

> -----Original Message-----
> Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration
>
> ... of the huge monolithic source file. The series is largely code
> movement and hence has the intention of not incurring any functional
> change.
>
> It has now been almost a year since the v1 submission, without having
> had any feedback. Some re-basing was necessary in the meantime, and a
> new patch (the last one) has been added - even if seemingly unrelated,
> it was in this context where I did think of that possible adjustment
> (which may want to be viewed somewhat RFC, as I know there are
> reservations against the use of -Os).
>
> 1: split off opcode 0f01 handling
> 2: split off opcode 0fae handling
> 3: split off opcode 0fc7 handling
> 4: split off FPU opcode handling
> 5: split off insn decoding
> 6: move x86_emul_blk() to separate source file
> 7: move various utility functions to separate source files
> 8: build with -Os
>
> Jan
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On 06.07.2022 09:31, Henry Wang wrote:
> It seems that this series has been stale for nearly a month, with nothing heard
> from maintainers. So I am sending this email as a gentle reminder for maintainers.

A month? That's only since v2 submission. See ...

>> -----Original Message-----
>> Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration
>>
>> ... of the huge monolithic source file. The series is largely code
>> movement and hence has the intention of not incurring any functional
>> change.
>>
>> It has now been almost a year since the v1 submission, without having
>> had any feedback.

... here.

Jan

>> Some re-basing was necessary in the meantime, and a
>> new patch (the last one) has been added - even if seemingly unrelated,
>> it was in this context where I did think of that possible adjustment
>> (which may want to be viewed somewhat RFC, as I know there are
>> reservations against the use of -Os).
>>
>> 1: split off opcode 0f01 handling
>> 2: split off opcode 0fae handling
>> 3: split off opcode 0fc7 handling
>> 4: split off FPU opcode handling
>> 5: split off insn decoding
>> 6: move x86_emul_blk() to separate source file
>> 7: move various utility functions to separate source files
>> 8: build with -Os
>>
>> Jan
>
RE: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
>
> On 06.07.2022 09:31, Henry Wang wrote:
> > It seems that this series has been stale for nearly a month, with nothing
> heard
> > from maintainers. So I am sending this email as a gentle reminder for
> maintainers.
>
> A month? That's only since v2 submission. See ...

Oh, yes indeed. Thank you for the background information, normally I use
patchwork to track series I did missed the v1 as v1 is superseded by v2
there (thus "a month", but honestly this series is stale for more than a year...).

>
> >> -----Original Message-----
> >> Subject: [PATCH v2 0/8] x86emul: a few small steps towards disintegration
> >>
> >> ... of the huge monolithic source file. The series is largely code
> >> movement and hence has the intention of not incurring any functional
> >> change.
> >>
> >> It has now been almost a year since the v1 submission, without having
> >> had any feedback.

To possibly make the ping better, I will try add Wei Liu's another 2 emails in the
CC list (I was once told by him in this way he will be more likely to notice the email)
and put Andrew as TO.

Kind regards,
Henry

>
> ... here.
>
> Jan
>
> >> Some re-basing was necessary in the meantime, and a
> >> new patch (the last one) has been added - even if seemingly unrelated,
> >> it was in this context where I did think of that possible adjustment
> >> (which may want to be viewed somewhat RFC, as I know there are
> >> reservations against the use of -Os).
> >>
> >> 1: split off opcode 0f01 handling
> >> 2: split off opcode 0fae handling
> >> 3: split off opcode 0fc7 handling
> >> 4: split off FPU opcode handling
> >> 5: split off insn decoding
> >> 6: move x86_emul_blk() to separate source file
> >> 7: move various utility functions to separate source files
> >> 8: build with -Os
> >>
> >> Jan
> >
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote:
> ... of the huge monolithic source file. The series is largely code
> movement and hence has the intention of not incurring any functional
> change.

I take the intention is to make code simpler and easier to follow by
splitting it up into smaller files?

Thanks, Roger.
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On 28.03.2023 16:19, Roger Pau Monné wrote:
> On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote:
>> ... of the huge monolithic source file. The series is largely code
>> movement and hence has the intention of not incurring any functional
>> change.
>
> I take the intention is to make code simpler and easier to follow by
> splitting it up into smaller files?

Well, I can't say yes or no to "simpler" or "easier to follow", but
splitting is the goal, in the hope that these may end up as a side
effects. There's always the risk that scattering things around may
also make things less obvious. My main motivation, however, is the
observation that this huge source file alone consumes a fair part
of (non-parallelizable) build time. To the degree that with older
gcc building this one file takes ten (or so) times as long as the
entire rest of the hypervisor.

Jan
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote:
> On 28.03.2023 16:19, Roger Pau Monné wrote:
> > On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote:
> >> ... of the huge monolithic source file. The series is largely code
> >> movement and hence has the intention of not incurring any functional
> >> change.
> >
> > I take the intention is to make code simpler and easier to follow by
> > splitting it up into smaller files?
>
> Well, I can't say yes or no to "simpler" or "easier to follow", but
> splitting is the goal, in the hope that these may end up as a side
> effects. There's always the risk that scattering things around may
> also make things less obvious. My main motivation, however, is the
> observation that this huge source file alone consumes a fair part
> of (non-parallelizable) build time. To the degree that with older
> gcc building this one file takes ten (or so) times as long as the

I wouldn't really trade compiler speed for clarity in a piece of code
like the x86 emulator, which is already very complex.

Do you have some figures of the performance difference this series
makes when building the emulator?

A couple of notes from someone that's not familiar with the x86
emulator. It would be clearer if the split files where prefixed with
opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO,
so that you clearly see this is an opcode family that has been split
into a separate file, or maybe insn_{opcode,blk,fpu,...}?

I've noticed in some of the newly introduced files the original
copyright notice from Keir is lost, I assume that's on purpose because
none of the code was contributed by Kier in that file? (see 0f01.c vs
0fae.c for example).

Do we expect to be able to split other opcode handling into separate
files?

I wonder how tied together are the remaining cases, and whether we
will be able to split more of them into separate units?

Overall I don't think the disintegration makes the code harder, as the split
cases seems to be fully isolated, I do however wonder whether further splits
might not be so isolated?

Thanks, Roger.
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On 29.03.2023 16:17, Roger Pau Monné wrote:
> On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote:
>> On 28.03.2023 16:19, Roger Pau Monné wrote:
>>> On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote:
>>>> ... of the huge monolithic source file. The series is largely code
>>>> movement and hence has the intention of not incurring any functional
>>>> change.
>>>
>>> I take the intention is to make code simpler and easier to follow by
>>> splitting it up into smaller files?
>>
>> Well, I can't say yes or no to "simpler" or "easier to follow", but
>> splitting is the goal, in the hope that these may end up as a side
>> effects. There's always the risk that scattering things around may
>> also make things less obvious. My main motivation, however, is the
>> observation that this huge source file alone consumes a fair part
>> of (non-parallelizable) build time. To the degree that with older
>> gcc building this one file takes ten (or so) times as long as the
>
> I wouldn't really trade compiler speed for clarity in a piece of code
> like the x86 emulator, which is already very complex.

Of course, and I specifically said "main" motivation. The hope is that
by splitting things become less entangled / convoluted. I guess fpu.c
is a good example where certain non-trivial macros have isolated use,
and hence are no longer cluttering other parts of the emulator sources.

> Do you have some figures of the performance difference this series
> makes when building the emulator?

No, I don't. And the difference isn't going to be significant, I expect,
as the build being slow is - from all I can tell - directly connected to
the huge switch() statement. Yet the number of cases there shrinks only
marginally for now. The series is named "a few small steps" for this
reason, along with others. See below for a first bigger step, which may
then make a noticeable difference.

> A couple of notes from someone that's not familiar with the x86
> emulator. It would be clearer if the split files where prefixed with
> opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO,
> so that you clearly see this is an opcode family that has been split
> into a separate file, or maybe insn_{opcode,blk,fpu,...}?

Hmm. For one, "blk" isn't really dealing with any opcode family in
particular. It contains a helper function for code using the emulator.
So it falls more in the group of util*.c. For the others may main
objection would be that I'd prefer to keep file names short. At least
at this point of splitting I think file names are sufficiently
descriptive. Nevertheless, insn-0f01.c or opc-0f01.c may be options, if
we really think we want/need to group files visually. However, I don't
expect there are going to be more files paralleling 0f01.c et al: The
opcode groups split out are the ones that are large/heterogeneous
enough to warrant doing it on this basis. Of course new such groups may
appear in the ISA down the road.

FPU is isolated functionally, and I'd expect a simd.c to appear once
it becomes clear if/how to sensibly split out SIMD code. Unlike fpu.c
I'd further expect this to (long term) consist of more than just a
single function, hopefully replacing the massive use of "goto" within
that big switch statement by function calls (but as said, plans here
- if one can call it that way in the first place - are very vague at
this point).

> I've noticed in some of the newly introduced files the original
> copyright notice from Keir is lost, I assume that's on purpose because
> none of the code was contributed by Kier in that file? (see 0f01.c vs
> 0fae.c for example).

Right - 0fae.c contains only code which was added later (mostly by me),
if I'm not mistaken.

> Do we expect to be able to split other opcode handling into separate
> files?

As per above, "expect" is perhaps too optimistic. I'd say "hope", in
particular for SIMD code (which I guess is now the main part of the
ISA as well as the sources, at least number-of-insns-wise). One
possible (likely intermediate) approach might be to move all SIMD code
from the huge switch() statement to its own file/function, invoked
from that huge switch()'s default: case. It may then still be a big
switch() statement in (e.g.) simd.c, but we'd then at least have two
of them, each about half the size of what we have right now. Plus it
may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped,
as the new file - like fpu.c - could then itself be built only
conditionally.

> I wonder how tied together are the remaining cases, and whether we
> will be able to split more of them into separate units?

That's the big open question indeed. As per above - with some effort
I expect all SIMD code could collectively be moved out; further
splitting would likely end up more involved.

> Overall I don't think the disintegration makes the code harder, as the split
> cases seems to be fully isolated, I do however wonder whether further splits
> might not be so isolated?

And again - indeed. This series, while already a lot of code churn, is
only collecting some of the very low hanging fruit. But at least I
hope that the pieces now separated out won't need a lot of touching
later on, except of course to add support for new insns.

Jan
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote:
> On 29.03.2023 16:17, Roger Pau Monné wrote:
> > On Tue, Mar 28, 2023 at 04:48:10PM +0200, Jan Beulich wrote:
> >> On 28.03.2023 16:19, Roger Pau Monné wrote:
> >>> On Wed, Jun 15, 2022 at 11:57:54AM +0200, Jan Beulich wrote:
> >>>> ... of the huge monolithic source file. The series is largely code
> >>>> movement and hence has the intention of not incurring any functional
> >>>> change.
> >>>
> >>> I take the intention is to make code simpler and easier to follow by
> >>> splitting it up into smaller files?
> >>
> >> Well, I can't say yes or no to "simpler" or "easier to follow", but
> >> splitting is the goal, in the hope that these may end up as a side
> >> effects. There's always the risk that scattering things around may
> >> also make things less obvious. My main motivation, however, is the
> >> observation that this huge source file alone consumes a fair part
> >> of (non-parallelizable) build time. To the degree that with older
> >> gcc building this one file takes ten (or so) times as long as the
> >
> > I wouldn't really trade compiler speed for clarity in a piece of code
> > like the x86 emulator, which is already very complex.
>
> Of course, and I specifically said "main" motivation. The hope is that
> by splitting things become less entangled / convoluted. I guess fpu.c
> is a good example where certain non-trivial macros have isolated use,
> and hence are no longer cluttering other parts of the emulator sources.
>
> > Do you have some figures of the performance difference this series
> > makes when building the emulator?
>
> No, I don't. And the difference isn't going to be significant, I expect,
> as the build being slow is - from all I can tell - directly connected to
> the huge switch() statement. Yet the number of cases there shrinks only
> marginally for now. The series is named "a few small steps" for this
> reason, along with others. See below for a first bigger step, which may
> then make a noticeable difference.
>
> > A couple of notes from someone that's not familiar with the x86
> > emulator. It would be clearer if the split files where prefixed with
> > opcode_ for those that deal with emulation (blk.c, 0f01.c, ...) IMO,
> > so that you clearly see this is an opcode family that has been split
> > into a separate file, or maybe insn_{opcode,blk,fpu,...}?
>
> Hmm. For one, "blk" isn't really dealing with any opcode family in
> particular. It contains a helper function for code using the emulator.
> So it falls more in the group of util*.c. For the others may main
> objection would be that I'd prefer to keep file names short. At least
> at this point of splitting I think file names are sufficiently
> descriptive. Nevertheless, insn-0f01.c or opc-0f01.c may be options, if
> we really think we want/need to group files visually. However, I don't
> expect there are going to be more files paralleling 0f01.c et al: The
> opcode groups split out are the ones that are large/heterogeneous
> enough to warrant doing it on this basis. Of course new such groups may
> appear in the ISA down the road.
>
> FPU is isolated functionally, and I'd expect a simd.c to appear once
> it becomes clear if/how to sensibly split out SIMD code. Unlike fpu.c
> I'd further expect this to (long term) consist of more than just a
> single function, hopefully replacing the massive use of "goto" within
> that big switch statement by function calls (but as said, plans here
> - if one can call it that way in the first place - are very vague at
> this point).
>
> > I've noticed in some of the newly introduced files the original
> > copyright notice from Keir is lost, I assume that's on purpose because
> > none of the code was contributed by Kier in that file? (see 0f01.c vs
> > 0fae.c for example).
>
> Right - 0fae.c contains only code which was added later (mostly by me),
> if I'm not mistaken.

OK, just wanted to make sure this wasn't an oversight.

> > Do we expect to be able to split other opcode handling into separate
> > files?
>
> As per above, "expect" is perhaps too optimistic. I'd say "hope", in
> particular for SIMD code (which I guess is now the main part of the
> ISA as well as the sources, at least number-of-insns-wise). One
> possible (likely intermediate) approach might be to move all SIMD code
> from the huge switch() statement to its own file/function, invoked
> from that huge switch()'s default: case. It may then still be a big
> switch() statement in (e.g.) simd.c, but we'd then at least have two
> of them, each about half the size of what we have right now. Plus it
> may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped,
> as the new file - like fpu.c - could then itself be built only
> conditionally.

I don't like the handling of SIMD from a default case in the global
switch much, as we then could end up chaining all kind of random
handling in the default case. It's IMO clearer if we can detect and
forward insn to the SIMD code when we know it's a SIMD instruction.

I guess that's for another series anyway, so not really the point
here.

> > I wonder how tied together are the remaining cases, and whether we
> > will be able to split more of them into separate units?
>
> That's the big open question indeed. As per above - with some effort
> I expect all SIMD code could collectively be moved out; further
> splitting would likely end up more involved.
>
> > Overall I don't think the disintegration makes the code harder, as the split
> > cases seems to be fully isolated, I do however wonder whether further splits
> > might not be so isolated?
>
> And again - indeed. This series, while already a lot of code churn, is
> only collecting some of the very low hanging fruit. But at least I
> hope that the pieces now separated out won't need a lot of touching
> later on, except of course to add support for new insns.

OK, I'm a bit concerned that we end up growing duplicated switch
cases, in the sense that we will have the following:

switch ( insn )
{
case 0x100:
case 0x1f0:
case 0x200:
x86emul_foo();
...
}

x86emul_foo()
{
switch (insn )
{
case 0x100:
/* Handle. */
break;

case 0x1f0:
/* Handle. */
break;

case 0x200:
...
}
}

So that we would end up having to add the opcodes twice, once in the
generic switch, and then again at place the insn are actually handled.

So far the introduced splits seems fine in that they deal with a
contiguous range of opcodes.

For patches 1-7:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Patch 8 I'm unsure, I guess it should be up to the user to decide
whether to use -Os or some other optimization?

I expect introspection users likely care way more about the speed
rather than the size of the generated x86 emulator code?

Thanks, Roger.
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On 30.03.2023 11:54, Roger Pau Monné wrote:
> On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote:
>> On 29.03.2023 16:17, Roger Pau Monné wrote:
>>> Do we expect to be able to split other opcode handling into separate
>>> files?
>>
>> As per above, "expect" is perhaps too optimistic. I'd say "hope", in
>> particular for SIMD code (which I guess is now the main part of the
>> ISA as well as the sources, at least number-of-insns-wise). One
>> possible (likely intermediate) approach might be to move all SIMD code
>> from the huge switch() statement to its own file/function, invoked
>> from that huge switch()'s default: case. It may then still be a big
>> switch() statement in (e.g.) simd.c, but we'd then at least have two
>> of them, each about half the size of what we have right now. Plus it
>> may allow some (most?) of the X86EMUL_NO_SIMD #ifdef-ary to be dropped,
>> as the new file - like fpu.c - could then itself be built only
>> conditionally.
>
> I don't like the handling of SIMD from a default case in the global
> switch much, as we then could end up chaining all kind of random
> handling in the default case. It's IMO clearer if we can detect and
> forward insn to the SIMD code when we know it's a SIMD instruction.

Avoiding of which would require ...

>>> I wonder how tied together are the remaining cases, and whether we
>>> will be able to split more of them into separate units?
>>
>> That's the big open question indeed. As per above - with some effort
>> I expect all SIMD code could collectively be moved out; further
>> splitting would likely end up more involved.
>>
>>> Overall I don't think the disintegration makes the code harder, as the split
>>> cases seems to be fully isolated, I do however wonder whether further splits
>>> might not be so isolated?
>>
>> And again - indeed. This series, while already a lot of code churn, is
>> only collecting some of the very low hanging fruit. But at least I
>> hope that the pieces now separated out won't need a lot of touching
>> later on, except of course to add support for new insns.
>
> OK, I'm a bit concerned that we end up growing duplicated switch
> cases, in the sense that we will have the following:
>
> switch ( insn )
> {
> case 0x100:
> case 0x1f0:
> case 0x200:
> x86emul_foo();
> ...
> }
>
> x86emul_foo()
> {
> switch (insn )
> {
> case 0x100:
> /* Handle. */
> break;
>
> case 0x1f0:
> /* Handle. */
> break;
>
> case 0x200:
> ...
> }
> }
>
> So that we would end up having to add the opcodes twice, once in the
> generic switch, and then again at place the insn are actually handled.

... doing exactly this, which looks like we a agree we'd like to avoid.
I'm also not really concerned of "chaining all kind of random handling
in the default case" - where things are terminated with an error code
doesn't really matter. As first step here I'm literally thinking of
splitting the huge switch() into two. The only thing that I can see may
get in the way is the resulting SIMD function requiring like 20
parameters.

> So far the introduced splits seems fine in that they deal with a
> contiguous range of opcodes.
>
> For patches 1-7:
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks!

> Patch 8 I'm unsure, I guess it should be up to the user to decide
> whether to use -Os or some other optimization?
>
> I expect introspection users likely care way more about the speed
> rather than the size of the generated x86 emulator code?

Perhaps. But do we want to introduce another Kconfig control just
for that? And wouldn't there likely be other performance concerns,
if performance really mattered in the introspection case?

Jan
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On Thu, Mar 30, 2023 at 01:40:43PM +0200, Jan Beulich wrote:
> On 30.03.2023 11:54, Roger Pau Monné wrote:
> > On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote:
> >> On 29.03.2023 16:17, Roger Pau Monné wrote:
> > Patch 8 I'm unsure, I guess it should be up to the user to decide
> > whether to use -Os or some other optimization?
> >
> > I expect introspection users likely care way more about the speed
> > rather than the size of the generated x86 emulator code?
>
> Perhaps. But do we want to introduce another Kconfig control just
> for that? And wouldn't there likely be other performance concerns,
> if performance really mattered in the introspection case?

I don't really have a strong opinion on the usage of -Os or not. It's
likely fine as long as we also stay consistent with the flag we use
when building the test harness and the fuzzer, just in case.

Thanks, Roger.
Re: [PATCH v2 0/8] x86emul: a few small steps towards disintegration [ In reply to ]
On 30.03.2023 16:27, Roger Pau Monné wrote:
> On Thu, Mar 30, 2023 at 01:40:43PM +0200, Jan Beulich wrote:
>> On 30.03.2023 11:54, Roger Pau Monné wrote:
>>> On Thu, Mar 30, 2023 at 09:53:23AM +0200, Jan Beulich wrote:
>>>> On 29.03.2023 16:17, Roger Pau Monné wrote:
>>> Patch 8 I'm unsure, I guess it should be up to the user to decide
>>> whether to use -Os or some other optimization?
>>>
>>> I expect introspection users likely care way more about the speed
>>> rather than the size of the generated x86 emulator code?
>>
>> Perhaps. But do we want to introduce another Kconfig control just
>> for that? And wouldn't there likely be other performance concerns,
>> if performance really mattered in the introspection case?
>
> I don't really have a strong opinion on the usage of -Os or not. It's
> likely fine as long as we also stay consistent with the flag we use
> when building the test harness and the fuzzer, just in case.

While they might be built with different compilers (and hence things can't
be compared directly anyway), I can certainly make those two use -Os as
well.

Jan