Mailing List Archive

Goto for microoptimisation
For performance sensitive tight loops, such as parsing and HTML
construction, to get the best performance it's necessary to think
about what PHP is doing on an opcode by opcode basis.

Certain flow control patterns cannot be implemented efficiently in PHP
without using "goto". The current example in Gerrit 708880
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545>
comes down to:

if ( $x == 1 ) {
action1();
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
action_not_2();
}

If $x==1 is true, we know that the $x==2 comparison is unnecessary and
is a waste of a couple of VM operations.

It's not feasible to just duplicate the actions, they are not as
simple as portrayed here and splitting them out to a separate function
would incur a function call overhead exceeding the proposed benefit.

I am proposing

if ( $x == 1 ) {
action1();
goto not_2; // avoid unnecessary comparison $x == 2
} else {
action_not_1();
}
if ( $x == 2 ) {
action2();
} else {
not_2:
action_not_2();
}

I'm familiar with the cultivated distaste for goto. Some people are
just parotting the textbook or their preferred authority, and others
are scarred by experience with other languages such as old BASIC
dialects. But I don't think either rationale really holds up to scrutiny.

I think goto is often easier to read than workarounds for the lack of
goto. For example, maybe you could do the current example with break:

do {
do {
if ( $x === 1 ) {
action1();
break;
} else {
action_not_1();
}
if ( $x === 2 ) {
action2();
break 2;
}
} while ( false );
action_not_2();
} while ( false );

But I don't think that's an improvement for readability.

You can certainly use goto in a way that makes things unreadable, but
that goes for a lot of things.

I am requesting that goto be considered acceptable for micro-optimisation.

When performance is not a concern, abstractions can be introduced
which restructure the code so that it flows in a more conventional
way. I understand that you might do a double-take when you see "goto"
in a function. Unfamiliarity slows down comprehension. That's why I'm
suggesting that it only be used when there is a performance justification.

-- Tim Starling
Re: Goto for microoptimisation [ In reply to ]
On Fri, Jul 30, 2021 at 10:10 PM Tim Starling <tstarling@wikimedia.org> wrote:
>
> For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
>
> Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880 comes down to:
>
> if ( $x == 1 ) {
> action1();
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> action_not_2();
> }
>
> If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.
>
> It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.
>
> I am proposing
>
> if ( $x == 1 ) {
> action1();
> goto not_2; // avoid unnecessary comparison $x == 2
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> not_2:
> action_not_2();
> }
>
> I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.

I feel that some people who have an absolutist stance on this issue
are directly or indirectly parroting the essay "Go To Statement
Considered Harmful" written by Edsger Dijkstra in 1968 [0][1]. I
wonder however how many who do so have read the short essay itself and
thought about it in the context of the time it was presented. In my
own reading of the essay I find Dijkstra arguing for structure and
practices in writing software that I think we all take for granted
today.

He is arguing for software to have a written structure which makes it
easier to form a mental model of how state will change and execution
will flow when the program is executed. To set up his argument,
Dijkstra states two 'remarks' from his own experience. I invite you to
read his original (dense academic) prose, but I will summarize the
premises as:

* A program must fulfill the business requirements to be useful.
* Human brains are better at static analysis than dynamic analysis,
and therefore code should be written to optimize for understandability
under static analysis.

These statements seem generally reasonable to me, and I believe that
many "standard practices" are in service of these premises. Unit,
end-to-end, and user acceptance testing are all tools to validate
fulfillment of business requirements. Our collective bias for smaller
functions, smaller classes, and 'separation of concerns', along with
linters and static checkers like the one commenting on Tim's gerrit
patch, are attempts to increase readability and comprehension of our
code. None of these things were in any way widely available in 1968,
but they are widely accepted tools and practices today.

Treating the title of the essay as dogma however, I feel misses some
of the nuance of the argument. This statement from the essay for me is
key: "The unbridled use of the go to statement has as an immediate
consequence that it becomes terribly hard to find a meaningful set of
coordinates in which to describe the process progress."

Dijkstra is arguing against what I would colloquially call 'spaghetti
code'; code where the flow of control jumps around a lot and in the
process leaves the reader confused about what is expected to happen
and why. The flow of execution becomes tangled much as a plate of
cooked noodles dumped from a pot. Finding both ends of a particular
noodle with a quick visual inspection is a mental and physical
challenge, not a trivial task.

> I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:
>
> do {
> do {
> if ( $x === 1 ) {
> action1();
> break;
> } else {
> action_not_1();
> }
> if ( $x === 2 ) {
> action2();
> break 2;
> }
> } while ( false );
> action_not_2();
> } while ( false );
>
> But I don't think that's an improvement for readability.
>
> You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.

Tim's example here is nice in that it shows how other PHP language
constructs could be used, but that these are also not common
constructs and that they do not immediately yield a more
understandable function.

> I am requesting that goto be considered acceptable for micro-optimisation.
>
> When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.

I am in agreement with Tim. I do not think that any of us should adopt
goto as a commonly used tool. I do however think there are situations
where a goto and comments actually do produce understandable code
which also conforms to a business requirement of keeping wall clock
execution time as small as possible.

Now I guess my question to the group is, how can we describe this
nuance as a replacement for statements in current coding conventions
like "Do not use the goto() syntax introduced in 5.3. PHP may have
introduced the feature, but that does not mean we should use it." [2]?
Could it be as simple as stating the bias more like "The use of `goto`
should be exceedingly rare, always accompanied by comments explaining
why it is used (likely for performance), and the author should be
prepared for others to challenge the usage"?

[0]: https://dl.acm.org/doi/10.1145/362929.362947
[1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EWD215.html
[2]: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP#Other

Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
Re: Goto for microoptimisation [ In reply to ]
On Sat, Jul 31, 2021 at 6:10 AM Tim Starling <tstarling@wikimedia.org> wrote:
>
> For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.
...
> I am proposing
>
> if ( $x == 1 ) {
> action1();
> goto not_2; // avoid unnecessary comparison $x == 2
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> not_2:
> action_not_2();
> }
...
> I am requesting that goto be considered acceptable for micro-optimisation.

ha, what question. the single goto and its target are 5 lines apart,
even me php incompetent person can understand it.

you triggered me reading more about it though. the commit comment
states it takes 30% less instructions:
Measuring instruction count per iteration with perf stat, averaged over
10M iterations, PS1. Test case:
Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )

* Baseline: 11160.7265433
* in_array(): 10390.3837233
* dropDefaults() changes: 9674.1248824
* expandAttributes() misc: 9248.1947500
* implode/explode and space check: 8318.9800417
* Sanitizer inline: 8021.7371794

does this mean these changes bring 30% speed improvement? that is
incredible! how often is this part called to retrieve one article?

now i understand why legoktm is prepared to rewrite mediawiki in rust
(https://www.mediawiki.org/wiki/Template:User_Rust), and why proposals
exist to extend php with rust (https://github.com/rethinkphp/php-rs ,
https://docs.rs/solder/0.1.6/solder/ ). tempted i was to use legoktm's
template on my user page, when i finally saw that php is amazing with
regular expressions by including pcre c library:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/regexredux.html
.

rupert
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
It might be interesting to point out to discussions around use of goto in
Linux Kernel:
https://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto-in-Linux-Kernel-Code/
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#centralized-exiting-of-functions

IIRC (from a presentation by Greg KH I watched but I can't find it now), in
Linux, goto forward is fine to use everywhere but goto backwards is only
used in the scheduler. And the scheduler is special (to put it mildly). So
I think using goto backwards should be highly discouraged and maybe
downright banned but forward should be okay.

HTH

On Sun, Aug 1, 2021 at 8:05 AM rupert THURNER <rupert.thurner@gmail.com>
wrote:

> On Sat, Jul 31, 2021 at 6:10 AM Tim Starling <tstarling@wikimedia.org>
> wrote:
> >
> > For performance sensitive tight loops, such as parsing and HTML
> construction, to get the best performance it's necessary to think about
> what PHP is doing on an opcode by opcode basis.
> ...
> > I am proposing
> >
> > if ( $x == 1 ) {
> > action1();
> > goto not_2; // avoid unnecessary comparison $x == 2
> > } else {
> > action_not_1();
> > }
> > if ( $x == 2 ) {
> > action2();
> > } else {
> > not_2:
> > action_not_2();
> > }
> ...
> > I am requesting that goto be considered acceptable for
> micro-optimisation.
>
> ha, what question. the single goto and its target are 5 lines apart,
> even me php incompetent person can understand it.
>
> you triggered me reading more about it though. the commit comment
> states it takes 30% less instructions:
> Measuring instruction count per iteration with perf stat, averaged over
> 10M iterations, PS1. Test case:
> Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
>
> * Baseline: 11160.7265433
> * in_array(): 10390.3837233
> * dropDefaults() changes: 9674.1248824
> * expandAttributes() misc: 9248.1947500
> * implode/explode and space check: 8318.9800417
> * Sanitizer inline: 8021.7371794
>
> does this mean these changes bring 30% speed improvement? that is
> incredible! how often is this part called to retrieve one article?
>
> now i understand why legoktm is prepared to rewrite mediawiki in rust
> (https://www.mediawiki.org/wiki/Template:User_Rust), and why proposals
> exist to extend php with rust (https://github.com/rethinkphp/php-rs ,
> https://docs.rs/solder/0.1.6/solder/ ). tempted i was to use legoktm's
> template on my user page, when i finally saw that php is amazing with
> regular expressions by including pcre c library:
>
> https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/regexredux.html
> .
>
> rupert
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>


--
Amir (he/him)
Re: Goto for microoptimisation [ In reply to ]
On 1/8/21 4:04 pm, rupert THURNER wrote:
> you triggered me reading more about it though. the commit comment
> states it takes 30% less instructions:
> Measuring instruction count per iteration with perf stat, averaged over
> 10M iterations, PS1. Test case:
> Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
>
> * Baseline: 11160.7265433
> * in_array(): 10390.3837233
> * dropDefaults() changes: 9674.1248824
> * expandAttributes() misc: 9248.1947500
> * implode/explode and space check: 8318.9800417
> * Sanitizer inline: 8021.7371794
>
> does this mean these changes bring 30% speed improvement? that is
> incredible!

Well, 30% reduction in instruction count. Time reduction is about 25%,
although you can take the reciprocal of that (1/0.75) and call it 34%
speed improvement.

I used instruction count rather than time because you can get 4-5
significant figures of accuracy, i.e. the first 4-5 digits stay the
same between runs, despite background activity, so you can measure
small changes very accurately.

> how often is this part called to retrieve one article?

Errr... let's just say there's no need to name a second day after me.
It's a small change.

The broader context is T284274
<https://phabricator.wikimedia.org/T284274>-- I'm trying to make sure
you can view the history page with limit=5000 without seeing a
timeout. My change to Thanks probably cut render time for big history
pages by 50% -- that's how much the 95th and 99th percentile service
times dropped by. Html::openElement() is a smaller piece of a smaller
piece of the puzzle.

-- Tim Starling
Re: Goto for microoptimisation [ In reply to ]
I notice that sonarqubebot complains about the goto, and the reviewer
suggests adding "// NOSONAR" to suppress that. I suggest we retain that
behavior (frankly I'm not familiar enough with sonarqubebot to know if we
could globally suppress it if we wanted to...)

I don't object to a (very) limited use of goto in (very) strategic
situations. But IMO it should be unusual enough that forcing the developer
to add a comment to explicitly suppress the gripe is reasonable.

Bill Pirkle
Software Engineer
www.wikimediafoundation.org


On Sun, Aug 1, 2021 at 5:37 AM Tim Starling <tstarling@wikimedia.org> wrote:

> On 1/8/21 4:04 pm, rupert THURNER wrote:
>
> you triggered me reading more about it though. the commit comment
> states it takes 30% less instructions:
> Measuring instruction count per iteration with perf stat, averaged over
> 10M iterations, PS1. Test case:
> Html::openElement('a', [ 'class' => [ 'foo', 'bar' ] ] )
>
> * Baseline: 11160.7265433
> * in_array(): 10390.3837233
> * dropDefaults() changes: 9674.1248824
> * expandAttributes() misc: 9248.1947500
> * implode/explode and space check: 8318.9800417
> * Sanitizer inline: 8021.7371794
>
> does this mean these changes bring 30% speed improvement? that is
> incredible!
>
> Well, 30% reduction in instruction count. Time reduction is about 25%,
> although you can take the reciprocal of that (1/0.75) and call it 34% speed
> improvement.
>
> I used instruction count rather than time because you can get 4-5
> significant figures of accuracy, i.e. the first 4-5 digits stay the same
> between runs, despite background activity, so you can measure small changes
> very accurately.
>
> how often is this part called to retrieve one article?
>
> Errr... let's just say there's no need to name a second day after me. It's
> a small change.
>
> The broader context is T284274 <https://phabricator.wikimedia.org/T284274>--
> I'm trying to make sure you can view the history page with limit=5000
> without seeing a timeout. My change to Thanks probably cut render time for
> big history pages by 50% -- that's how much the 95th and 99th percentile
> service times dropped by. Html::openElement() is a smaller piece of a
> smaller piece of the puzzle.
>
> -- Tim Starling
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
On 7/31/21 6:09 AM, Tim Starling wrote:

> Certain flow control patterns cannot be implemented efficiently in
PHP without using "goto". The current example in Gerrit 708880
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545>
comes down to:

If `goto` really does help, please go to it with no objections from me!

But could you split the guard logic in your example, like this? The `||`
operator will short-circuit away the expensive test for "is 2" in the
case of "is 1".

$state_1 = ($x == 1);
$state_2 = !($state_1 || $x != 2);

if ( $state_1 ) {
    action1();
} else {
    action_not_1();
}
if ( $state_2 ) {
    action2();
} else {
    action_not_2();
}

-Adam W.



> For performance sensitive tight loops, such as parsing and HTML
> construction, to get the best performance it's necessary to think
> about what PHP is doing on an opcode by opcode basis.
>
> Certain flow control patterns cannot be implemented efficiently in PHP
> without using "goto". The current example in Gerrit 708880
> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545>
> comes down to:
>
> if ( $x == 1 ) {
> action1();
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> action_not_2();
> }
>
> If $x==1 is true, we know that the $x==2 comparison is unnecessary and
> is a waste of a couple of VM operations.
>
> It's not feasible to just duplicate the actions, they are not as
> simple as portrayed here and splitting them out to a separate function
> would incur a function call overhead exceeding the proposed benefit.
>
> I am proposing
>
> if ( $x == 1 ) {
> action1();
> goto not_2; // avoid unnecessary comparison $x == 2
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> not_2:
> action_not_2();
> }
>
> I'm familiar with the cultivated distaste for goto. Some people are
> just parotting the textbook or their preferred authority, and others
> are scarred by experience with other languages such as old BASIC
> dialects. But I don't think either rationale really holds up to scrutiny.
>
> I think goto is often easier to read than workarounds for the lack of
> goto. For example, maybe you could do the current example with break:
>
> do {
> do {
> if ( $x === 1 ) {
> action1();
> break;
> } else {
> action_not_1();
> }
> if ( $x === 2 ) {
> action2();
> break 2;
> }
> } while ( false );
> action_not_2();
> } while ( false );
>
> But I don't think that's an improvement for readability.
>
> You can certainly use goto in a way that makes things unreadable, but
> that goes for a lot of things.
>
> I am requesting that goto be considered acceptable for micro-optimisation.
>
> When performance is not a concern, abstractions can be introduced
> which restructure the code so that it flows in a more conventional
> way. I understand that you might do a double-take when you see "goto"
> in a function. Unfamiliarity slows down comprehension. That's why I'm
> suggesting that it only be used when there is a performance justification.
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
Along that same line of thinking, see if the data can be morphed outside of any tight loops so that inside the loop, you can check for empty(), isset(), or === null. From what I’ve read, those are a lot faster in PHP than comparing with an untyped variable.

From: Adam Wight <adam.wight@wikimedia.de>
Sent: August 2, 2021 4:55 AM
To: wikitech-l@lists.wikimedia.org
Subject: [Wikitech-l] Re: Goto for microoptimisation


On 7/31/21 6:09 AM, Tim Starling wrote:

> Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545> comes down to:

If `goto` really does help, please go to it with no objections from me!

But could you split the guard logic in your example, like this? The `||` operator will short-circuit away the expensive test for "is 2" in the case of "is 1".

$state_1 = ($x == 1);
$state_2 = !($state_1 || $x != 2);

if ( $state_1 ) {
action1();
} else {
action_not_1();
}
if ( $state_2 ) {
action2();
} else {
action_not_2();
}

-Adam W.




For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.

Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545> comes down to:

if ( $x == 1 ) {

action1();

} else {

action_not_1();

}

if ( $x == 2 ) {

action2();

} else {

action_not_2();

}

If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.

It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.

I am proposing

if ( $x == 1 ) {

action1();

goto not_2; // avoid unnecessary comparison $x == 2

} else {

action_not_1();

}

if ( $x == 2 ) {

action2();

} else {

not_2:

action_not_2();

}

I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.

I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:

do {

do {

if ( $x === 1 ) {

action1();

break;

} else {

action_not_1();

}

if ( $x === 2 ) {

action2();

break 2;

}

} while ( false );

action_not_2();

} while ( false );

But I don't think that's an improvement for readability.

You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.

I am requesting that goto be considered acceptable for micro-optimisation.

When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.

-- Tim Starling



_______________________________________________

Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org<mailto:wikitech-l@lists.wikimedia.org>

To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org<mailto:wikitech-l-leave@lists.wikimedia.org>

https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
For the record, I merged Tim's patch
<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/> last week and
was unaware of this email thread.

My thinking was as follows:

1. The implementation does not depend on the goto statement.

That is, it is not used to write overly-clever or complicated logic. If you
remove the goto statement, the method behaves the exact same way. And thus
the moment it ceases to serve its use (performance optimisation) it can be
safely removed without further thought. think this is essential to keeping
the code easy to understand and maintain. This one principle actually
covers it all for me. The next three points are implied by this:
1a. This use of goto only jumps downward. Jumping backwards (up) would
likely violate point 1, and either way would imho be too complicated to
think about when debugging or maintaining the code in the future.
Especially the potential for an infinite loop.
1b. This use of goto only jumps to a statement within the same function.
(In fact, jumping to another file, class, or function is not supported by
PHP in the first place. This is literally the only way it can be used.
There is some sanity in the language after all!).
1c. This use of goto serves as a performance optimisation for a hot code
path. Similarly implied by point 1: If it doesn't change behaviour and
doesn't improve performance where it matters, we shouldn't bother using it.

2. An inline comment clearly stays it is a performance optimisation, and
explains why it is safe, and how we know the destination is where we would
end up regardless. (e.g. "we're inside a condition for X2, so we can skip
to the else of X1, and no other statements would run between here and
there").

-- Timo


On Sat, 31 Jul 2021 at 05:10, Tim Starling <tstarling@wikimedia.org> wrote:

> For performance sensitive tight loops, such as parsing and HTML
> construction, to get the best performance it's necessary to think about
> what PHP is doing on an opcode by opcode basis.
>
> Certain flow control patterns cannot be implemented efficiently in PHP
> without using "goto". The current example in Gerrit 708880
> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545>
> comes down to:
>
> if ( $x == 1 ) {
> action1();
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> action_not_2();
> }
>
> If $x==1 is true, we know that the $x==2 comparison is unnecessary and is
> a waste of a couple of VM operations.
>
> It's not feasible to just duplicate the actions, they are not as simple as
> portrayed here and splitting them out to a separate function would incur a
> function call overhead exceeding the proposed benefit.
>
> I am proposing
>
> if ( $x == 1 ) {
> action1();
> goto not_2; // avoid unnecessary comparison $x == 2
> } else {
> action_not_1();
> }
> if ( $x == 2 ) {
> action2();
> } else {
> not_2:
> action_not_2();
> }
>
> I'm familiar with the cultivated distaste for goto. Some people are just
> parotting the textbook or their preferred authority, and others are scarred
> by experience with other languages such as old BASIC dialects. But I don't
> think either rationale really holds up to scrutiny.
>
> I think goto is often easier to read than workarounds for the lack of
> goto. For example, maybe you could do the current example with break:
>
> do {
> do {
> if ( $x === 1 ) {
> action1();
> break;
> } else {
> action_not_1();
> }
> if ( $x === 2 ) {
> action2();
> break 2;
> }
> } while ( false );
> action_not_2();
> } while ( false );
>
> But I don't think that's an improvement for readability.
>
> You can certainly use goto in a way that makes things unreadable, but that
> goes for a lot of things.
>
> I am requesting that goto be considered acceptable for micro-optimisation.
>
> When performance is not a concern, abstractions can be introduced which
> restructure the code so that it flows in a more conventional way. I
> understand that you might do a double-take when you see "goto" in a
> function. Unfamiliarity slows down comprehension. That's why I'm suggesting
> that it only be used when there is a performance justification.
>
> -- Tim Starling
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
Just to play the devils advocate. I want to point out some example risks
(which were of course included in the discussion in the earlier linked
literature):

a) someone will introduce

if ($x==1.5)

in the middle or

b) someone will make action1 modifying on $x which might result in $x==2

the behavior would change.

However it is likely that those mistakes will be discover during code
review, if the function is not too big and Gerrit hides the line with the
goto statement. Thus jumping many lines is more dangerous than jumping only
a few lines.

After all, I still think using goto is a good idea now until the php (8+$x)
jit compiler might do the job for us in the future.

All the best
Physikerwelt

On Sat, 14 Aug 2021, 05:16 Krinkle, <krinklemail@gmail.com> wrote:

> For the record, I merged Tim's patch
> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/> last week and
> was unaware of this email thread.
>
> My thinking was as follows:
>
> 1. The implementation does not depend on the goto statement.
>
> That is, it is not used to write overly-clever or complicated logic. If
> you remove the goto statement, the method behaves the exact same way. And
> thus the moment it ceases to serve its use (performance optimisation) it
> can be safely removed without further thought. think this is essential to
> keeping the code easy to understand and maintain. This one principle
> actually covers it all for me. The next three points are implied by this:
> 1a. This use of goto only jumps downward. Jumping backwards (up) would
> likely violate point 1, and either way would imho be too complicated to
> think about when debugging or maintaining the code in the future.
> Especially the potential for an infinite loop.
> 1b. This use of goto only jumps to a statement within the same function.
> (In fact, jumping to another file, class, or function is not supported by
> PHP in the first place. This is literally the only way it can be used.
> There is some sanity in the language after all!).
> 1c. This use of goto serves as a performance optimisation for a hot code
> path. Similarly implied by point 1: If it doesn't change behaviour and
> doesn't improve performance where it matters, we shouldn't bother using it.
>
> 2. An inline comment clearly stays it is a performance optimisation, and
> explains why it is safe, and how we know the destination is where we would
> end up regardless. (e.g. "we're inside a condition for X2, so we can skip
> to the else of X1, and no other statements would run between here and
> there").
>
> -- Timo
>
>
> On Sat, 31 Jul 2021 at 05:10, Tim Starling <tstarling@wikimedia.org>
> wrote:
>
>> For performance sensitive tight loops, such as parsing and HTML
>> construction, to get the best performance it's necessary to think about
>> what PHP is doing on an opcode by opcode basis.
>>
>> Certain flow control patterns cannot be implemented efficiently in PHP
>> without using "goto". The current example in Gerrit 708880
>> <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545>
>> comes down to:
>>
>> if ( $x == 1 ) {
>> action1();
>> } else {
>> action_not_1();
>> }
>> if ( $x == 2 ) {
>> action2();
>> } else {
>> action_not_2();
>> }
>>
>> If $x==1 is true, we know that the $x==2 comparison is unnecessary and is
>> a waste of a couple of VM operations.
>>
>> It's not feasible to just duplicate the actions, they are not as simple
>> as portrayed here and splitting them out to a separate function would incur
>> a function call overhead exceeding the proposed benefit.
>>
>> I am proposing
>>
>> if ( $x == 1 ) {
>> action1();
>> goto not_2; // avoid unnecessary comparison $x == 2
>> } else {
>> action_not_1();
>> }
>> if ( $x == 2 ) {
>> action2();
>> } else {
>> not_2:
>> action_not_2();
>> }
>>
>> I'm familiar with the cultivated distaste for goto. Some people are just
>> parotting the textbook or their preferred authority, and others are scarred
>> by experience with other languages such as old BASIC dialects. But I don't
>> think either rationale really holds up to scrutiny.
>>
>> I think goto is often easier to read than workarounds for the lack of
>> goto. For example, maybe you could do the current example with break:
>>
>> do {
>> do {
>> if ( $x === 1 ) {
>> action1();
>> break;
>> } else {
>> action_not_1();
>> }
>> if ( $x === 2 ) {
>> action2();
>> break 2;
>> }
>> } while ( false );
>> action_not_2();
>> } while ( false );
>>
>> But I don't think that's an improvement for readability.
>>
>> You can certainly use goto in a way that makes things unreadable, but
>> that goes for a lot of things.
>>
>> I am requesting that goto be considered acceptable for micro-optimisation.
>>
>> When performance is not a concern, abstractions can be introduced which
>> restructure the code so that it flows in a more conventional way. I
>> understand that you might do a double-take when you see "goto" in a
>> function. Unfamiliarity slows down comprehension. That's why I'm suggesting
>> that it only be used when there is a performance justification.
>>
>> -- Tim Starling
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: Goto for microoptimisation [ In reply to ]
Just a minor point about the code: in expandAtributes(), $spaceSeparatedListAttributes should be initialized outside the loop or as a private static.

From: Krinkle <krinklemail@gmail.com>
Sent: August 13, 2021 11:15 PM
To: For developers discussing technical aspects and organization of Wikimedia projects <wikitech-l@lists.wikimedia.org>
Subject: [Wikitech-l] Re: Goto for microoptimisation

For the record, I merged Tim's patch<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/> last week and was unaware of this email thread.

My thinking was as follows:

1. The implementation does not depend on the goto statement.

That is, it is not used to write overly-clever or complicated logic. If you remove the goto statement, the method behaves the exact same way. And thus the moment it ceases to serve its use (performance optimisation) it can be safely removed without further thought. think this is essential to keeping the code easy to understand and maintain. This one principle actually covers it all for me. The next three points are implied by this:
1a. This use of goto only jumps downward. Jumping backwards (up) would likely violate point 1, and either way would imho be too complicated to think about when debugging or maintaining the code in the future. Especially the potential for an infinite loop.
1b. This use of goto only jumps to a statement within the same function. (In fact, jumping to another file, class, or function is not supported by PHP in the first place. This is literally the only way it can be used. There is some sanity in the language after all!).
1c. This use of goto serves as a performance optimisation for a hot code path. Similarly implied by point 1: If it doesn't change behaviour and doesn't improve performance where it matters, we shouldn't bother using it.

2. An inline comment clearly stays it is a performance optimisation, and explains why it is safe, and how we know the destination is where we would end up regardless. (e.g. "we're inside a condition for X2, so we can skip to the else of X1, and no other statements would run between here and there").

-- Timo


On Sat, 31 Jul 2021 at 05:10, Tim Starling <tstarling@wikimedia.org<mailto:tstarling@wikimedia.org>> wrote:

For performance sensitive tight loops, such as parsing and HTML construction, to get the best performance it's necessary to think about what PHP is doing on an opcode by opcode basis.

Certain flow control patterns cannot be implemented efficiently in PHP without using "goto". The current example in Gerrit 708880<https://gerrit.wikimedia.org/r/c/mediawiki/core/+/708880/5/includes/Html.php#545> comes down to:

if ( $x == 1 ) {

action1();

} else {

action_not_1();

}

if ( $x == 2 ) {

action2();

} else {

action_not_2();

}

If $x==1 is true, we know that the $x==2 comparison is unnecessary and is a waste of a couple of VM operations.

It's not feasible to just duplicate the actions, they are not as simple as portrayed here and splitting them out to a separate function would incur a function call overhead exceeding the proposed benefit.

I am proposing

if ( $x == 1 ) {

action1();

goto not_2; // avoid unnecessary comparison $x == 2

} else {

action_not_1();

}

if ( $x == 2 ) {

action2();

} else {

not_2:

action_not_2();

}

I'm familiar with the cultivated distaste for goto. Some people are just parotting the textbook or their preferred authority, and others are scarred by experience with other languages such as old BASIC dialects. But I don't think either rationale really holds up to scrutiny.

I think goto is often easier to read than workarounds for the lack of goto. For example, maybe you could do the current example with break:

do {

do {

if ( $x === 1 ) {

action1();

break;

} else {

action_not_1();

}

if ( $x === 2 ) {

action2();

break 2;

}

} while ( false );

action_not_2();

} while ( false );

But I don't think that's an improvement for readability.

You can certainly use goto in a way that makes things unreadable, but that goes for a lot of things.

I am requesting that goto be considered acceptable for micro-optimisation.

When performance is not a concern, abstractions can be introduced which restructure the code so that it flows in a more conventional way. I understand that you might do a double-take when you see "goto" in a function. Unfamiliarity slows down comprehension. That's why I'm suggesting that it only be used when there is a performance justification.

-- Tim Starling
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org<mailto:wikitech-l@lists.wikimedia.org>
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org<mailto:wikitech-l-leave@lists.wikimedia.org>
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/