Mailing List Archive

[issue41835] Speed up dict vectorcall creation using keywords
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

Another bench:

python -m pyperf timeit --rigorous "dict(ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')"

Result without pull:
Mean +- std dev: 486 ns +- 8 ns

Result with pull:
Mean +- std dev: 328 ns +- 4 ns

I compiled both with optimizations and lto.

Some arch info:

python -VV
Python 3.10.0a1+ (heads/master-dirty:dde91b1953, Oct 22 2020, 14:00:51)
[GCC 10.1.1 20200718]

uname -a
Linux buzz 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Change by Inada Naoki <songofacandy@gmail.com>:


----------
keywords: +patch
pull_requests: +21840
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/22909

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

Ok. Performance improvement comes from:

a. Presizing
b. Bypassing some checks in PyDict_SetItem
c. Avoiding duplication check.

(b) is relatively small so I tried to focus on (a) and (b). See GH-22909.

In case of simple keyword arguments, it is 10% faster than GH-22346:

```
$ ./python -m pyperf timeit --compare-to ./python-speedup_kw "dict(ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')"
python-speedup_kw: ..................... 357 ns +- 10 ns
python: ..................... 323 ns +- 4 ns

Mean +- std dev: [python-speedup_kw] 357 ns +- 10 ns -> [python] 323 ns +- 4 ns: 1.11x faster (-10%)
```

In case of `dict(d, key=val)` case, it is 8% slower than GH-22346, but still 8% faster than master.

```
$ ./python -m pyperf timeit --compare-to ./python-speedup_kw -s 'd={"foo":"bar"}' "dict(d, ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')"
python-speedup_kw: ..................... 505 ns +- 15 ns
python: ..................... 546 ns +- 17 ns

Mean +- std dev: [python-speedup_kw] 505 ns +- 15 ns -> [python] 546 ns +- 17 ns: 1.08x slower (+8%)

$ ./python -m pyperf timeit --compare-to ./python-master -s 'd={"foo":"bar"}' "dict(d, ihinvdono='doononon', gowwondwon='nwog', bdjbodbob='nidnnpn', nwonwno='vndononon', dooodbob='iohiwipwgpw', doidonooq='ndwnnpnpnp', fndionqinqn='ndjboqoqjb', nonoeoqgoqb='bdboboqbgoqeb', jdnvonvoddo='nvdjnvndvonoq', njnvodnoo='hiehgieba', nvdnvwnnp='wghgihpa', nvfnwnnq='nvdknnnqkm', ndonvnipnq='fndjnaobobvob', fjafosboab='ndjnodvobvojb', nownwnojwjw='nvknnndnow', niownviwnwnwi='nownvwinvwnwnwj')"
python-master: ..................... 598 ns +- 10 ns
python: ..................... 549 ns +- 19 ns

Mean +- std dev: [python-master] 598 ns +- 10 ns -> [python] 549 ns +- 19 ns: 1.09x faster (-8%)
```

Additionally, I expect we can reuse this new code to optimize BUILD_CONST_KEY_MAP.

----------
stage: patch review ->

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

@Marco Sulla Please take a look at GH-22909. It is simplified version of your PR. And I wrote another optimization based on it #42126.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Mark Shannon <mark@hotpy.org> added the comment:

Could we get a pyperformance benchmark run on this please?

----------
nosy: +Mark.Shannon

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

@methane: well, to be honest, I don't see much difference between the two pulls. The major difference is that you merged insertdict_init in dict_merge_init.

But I kept insertdict_init separate on purpose, because this function can be used in other future dedicated function on creation time only. Furthermore it's more simple to maintain, since it's quite identical to insertdict.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

@Mark.Shannon I tried to run pyperformance, but wheel does not work for Python 3.10. I get the error:

AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64')

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

@Mark.Shannon I had seen some speedup on tornado benchmark when I didn't use PGO+LTO. but it was noise.

Now I use PGO+LTO. master vs PR-22909:

$ ./python -m pyperf compare_to master-opt.json speedup_kw-opt.json -G --min-speed=1
Slower (11):
- spectral_norm: 147 ms +- 1 ms -> 153 ms +- 2 ms: 1.04x slower (+4%)
- pickle_dict: 28.6 us +- 0.1 us -> 29.5 us +- 0.6 us: 1.03x slower (+3%)
- regex_compile: 199 ms +- 1 ms -> 204 ms +- 4 ms: 1.03x slower (+3%)
- chameleon: 9.75 ms +- 0.10 ms -> 9.99 ms +- 0.09 ms: 1.02x slower (+2%)
- logging_format: 10.9 us +- 0.2 us -> 11.1 us +- 0.2 us: 1.02x slower (+2%)
- sqlite_synth: 3.29 us +- 0.05 us -> 3.36 us +- 0.05 us: 1.02x slower (+2%)
- regex_v8: 26.1 ms +- 0.1 ms -> 26.5 ms +- 0.3 ms: 1.02x slower (+2%)
- json_dumps: 14.6 ms +- 0.1 ms -> 14.8 ms +- 0.1 ms: 1.02x slower (+2%)
- logging_simple: 9.88 us +- 0.18 us -> 10.0 us +- 0.2 us: 1.02x slower (+2%)
- nqueens: 105 ms +- 1 ms -> 107 ms +- 2 ms: 1.01x slower (+1%)
- raytrace: 511 ms +- 5 ms -> 517 ms +- 6 ms: 1.01x slower (+1%)

Faster (10):
- regex_dna: 233 ms +- 1 ms -> 229 ms +- 1 ms: 1.02x faster (-2%)
- unpickle: 14.7 us +- 0.1 us -> 14.5 us +- 0.2 us: 1.02x faster (-1%)
- deltablue: 8.17 ms +- 0.29 ms -> 8.06 ms +- 0.17 ms: 1.01x faster (-1%)
- mako: 16.8 ms +- 0.2 ms -> 16.6 ms +- 0.1 ms: 1.01x faster (-1%)
- xml_etree_iterparse: 117 ms +- 1 ms -> 116 ms +- 1 ms: 1.01x faster (-1%)
- scimark_monte_carlo: 117 ms +- 2 ms -> 115 ms +- 1 ms: 1.01x faster (-1%)
- xml_etree_parse: 164 ms +- 3 ms -> 162 ms +- 1 ms: 1.01x faster (-1%)
- unpack_sequence: 62.7 ns +- 0.7 ns -> 62.0 ns +- 0.7 ns: 1.01x faster (-1%)
- regex_effbot: 3.43 ms +- 0.01 ms -> 3.39 ms +- 0.02 ms: 1.01x faster (-1%)
- scimark_fft: 405 ms +- 4 ms -> 401 ms +- 1 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (39)

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

@Marco Sulla

> @methane: well, to be honest, I don't see much difference between the two pulls. The major difference is that you merged insertdict_init in dict_merge_init.

Not only it but also some simplification which make 10% faster than GH-22346.

> But I kept insertdict_init separate on purpose, because this function can be used in other future dedicated function on creation time only.

Where do you expect to use it? Would you implement some more optimization based on your PR to demonstrate your idea?

I confirmed that GH-22909 can be used to optimize BUILD_CONST_KEY_MAP (GH-22911). That's why I merged two functions.

> AssertionError: would build wheel with unsupported tag ('cp310', 'cp310', 'linux_x86_64')

Try `pip install pyperformance==1.0.0`.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

I confirmed _PyDict_FromItems() can be used to optimize _PyStack_AsDict() too.
See https://github.com/methane/cpython/pull/25

But I can not confirm significant performance gain from it too.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

I commented out sqlalchemy in the requirements.txt in the pyperformance source code, and it worked. I had also to skip tornado:

pyperformance run -r -b,-sqlalchemy_declarative,-sqlalchemy_imperative,-tornado_http -o ../perf_master.json

This is my result:

pyperformance compare perf_master.json perf_dict_init.json -O table | grep Significant
| 2to3 | 356 ms | 348 ms | 1.02x faster | Significant (t=7.28) |
| fannkuch | 485 ms | 468 ms | 1.04x faster | Significant (t=9.68) |
| pathlib | 22.5 ms | 22.1 ms | 1.02x faster | Significant (t=13.02) |
| pickle_dict | 29.0 us | 30.3 us | 1.05x slower | Significant (t=-92.36) |
| pickle_list | 4.55 us | 4.64 us | 1.02x slower | Significant (t=-10.87) |
| pyflate | 735 ms | 702 ms | 1.05x faster | Significant (t=6.67) |
| regex_compile | 197 ms | 193 ms | 1.02x faster | Significant (t=2.81) |
| regex_v8 | 24.5 ms | 23.9 ms | 1.02x faster | Significant (t=17.63) |
| scimark_fft | 376 ms | 386 ms | 1.03x slower | Significant (t=-15.07) |
| scimark_lu | 154 ms | 158 ms | 1.03x slower | Significant (t=-12.94) |
| sqlite_synth | 3.35 us | 3.21 us | 1.04x faster | Significant (t=17.65) |
| telco | 6.54 ms | 7.14 ms | 1.09x slower | Significant (t=-8.51) |
| unpack_sequence | 58.8 ns | 61.5 ns | 1.04x slower | Significant (t=-19.66) |

It's strange that some benchmarks are slower, since the patch only do two additional checks to dict_vectorcall. Maybe they use many little dicts?

@methane:
> Would you implement some more optimization based on your PR to demonstrate your idea?

I already done them, I'll do a PR.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

Well, following your example, since split dicts seems to be no more supported, I decided to be more drastic. If you see the last push in PR 22346, I do not check anymore but always resize, so the dict is always combined. This seems to be especially good for the "unpack_sequence" bench, even if I do not know what it is:

| chaos | 132 ms | 136 ms | 1.03x slower | Significant (t=-18.09) |
| crypto_pyaes | 136 ms | 141 ms | 1.03x slower | Significant (t=-11.60) |
| float | 133 ms | 137 ms | 1.03x slower | Significant (t=-16.94) |
| go | 276 ms | 282 ms | 1.02x slower | Significant (t=-11.58) |
| logging_format | 12.3 us | 12.6 us | 1.02x slower | Significant (t=-9.75) |
| logging_silent | 194 ns | 203 ns | 1.05x slower | Significant (t=-9.00) |
| logging_simple | 11.3 us | 11.6 us | 1.02x slower | Significant (t=-12.56) |
| mako | 16.5 ms | 17.4 ms | 1.05x slower | Significant (t=-17.34) |
| meteor_contest | 116 ms | 120 ms | 1.04x slower | Significant (t=-25.59) |
| nbody | 158 ms | 166 ms | 1.05x slower | Significant (t=-12.73) |
| nqueens | 107 ms | 111 ms | 1.03x slower | Significant (t=-11.39) |
| pickle_pure_python | 631 us | 619 us | 1.02x faster | Significant (t=6.28) |
| regex_compile | 206 ms | 214 ms | 1.04x slower | Significant (t=-24.24) |
| regex_v8 | 28.4 ms | 26.7 ms | 1.06x faster | Significant (t=10.92) |
| richards | 87.8 ms | 90.3 ms | 1.03x slower | Significant (t=-10.91) |
| scimark_lu | 165 ms | 162 ms | 1.02x faster | Significant (t=4.55) |
| scimark_sor | 210 ms | 215 ms | 1.02x slower | Significant (t=-10.14) |
| scimark_sparse_mat_mult | 6.45 ms | 6.64 ms | 1.03x slower | Significant (t=-6.66) |
| spectral_norm | 158 ms | 171 ms | 1.08x slower | Significant (t=-29.11) |
| sympy_expand | 599 ms | 619 ms | 1.03x slower | Significant (t=-21.93) |
| sympy_str | 376 ms | 389 ms | 1.04x slower | Significant (t=-23.80) |
| sympy_sum | 233 ms | 239 ms | 1.02x slower | Significant (t=-14.70) |
| telco | 7.40 ms | 7.61 ms | 1.03x slower | Significant (t=-10.08) |
| unpack_sequence | 70.0 ns | 56.1 ns | 1.25x faster | Significant (t=10.62) |
| xml_etree_generate | 108 ms | 106 ms | 1.02x faster | Significant (t=5.52) |
| xml_etree_iterparse | 133 ms | 130 ms | 1.02x faster | Significant (t=11.33) |
| xml_etree_parse | 208 ms | 204 ms | 1.02x faster | Significant (t=9.19) |

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

unpack_sequence is very sensitive benchmark. Speed is dramatically changed by code alignment. PGO+LTO will reduce the noise, but we see noise always.

I believe there is no significant performance change in macro benchmarks when optimizing this part.

Not significant in macro benchmarks doesn't mean we must reject the optimization, because pyperformance doesn't cover whole application in the world.
But it means that we must be conservative about the optimization.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

Both changes add significant amount of code (100 and 85 lines correspondingly). Even if they speed up a particular case of dict constructor it is not common use case.

I think that it would be better to reject these changes. They make maintenance harder, the benefit seems insignificant, and there is always a danger that new code can slow down other code. The dict object is performance critical for python, so it is better to not touch its code without need.

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

> Both changes add significant amount of code (100 and 85 lines correspondingly). Even if they speed up a particular case of dict constructor it is not common use case.

You are right, but please wait.

Marco is new contributor and he can write correct C code for now.
So I am searching some parts which can be optimized by his code before rejecting it.

* bpo-42126, GH-22911: I can make dict display (aka. dict literal) 50% faster. But it introduce additional complexity to compiler and ceval. So I will reject it unless I find real world code using dict display in performance critical part.

* _PyStack_AsDict (https://github.com/methane/cpython/pull/25): I thought this is performance critical function. But I could not see significant performance gain in pyperformance.

* _PyEval_EvalCode (https://github.com/python/cpython/blob/master/Python/ceval.c#L4465): I am still not sure we can assume there are no duplicated keyword argument here. If we can assume it, we can optimize calling function receiving **kwds argument.

These three parts are all I found. I will reject this issue after I failed to optimize _PyEval_EvalCode.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

Do not overestimate the importance of _PyStack_AsDict(). Most of calls (~90-95% or like) are with positional only arguments, and most of functions do not have var-keyword parameter. So efforts in last years were spent on optimizing common cases, in particularly avoiding creation of a dict without need. _PyStack_AsDict() can affect perhaps 1% of code, or less, and these functions are usually not performance critical.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

Well, actually Serhiy is right, it does not seem that the macro benchs did show something significant. Maybe the code can be used in other parts of CPython, for example in _pickle, where dicts are loaded. But it needs also to expose, maybe internally only, dictresize() and DICT_NEXT_VERSION(). Not sure it's something desirable.

There's something that I do not understand: the speedup to unpack_sequence. I checked the pyperformance code, and it's a microbench for:

a, b = some_sequence

It should *not* be affected by the change. Anyway, I run the bench other 10 times, and the lowest value with the CPython code without the PR is not lower than 67.7 ns. With the PR, it reaches 53.5 ns. And I do not understand why. Maybe it affects the creation of the dicts with the local and global vars?

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

> It should *not* be affected by the change. Anyway, I run the bench other 10 times, and the lowest value with the CPython code without the PR is not lower than 67.7 ns. With the PR, it reaches 53.5 ns. And I do not understand why.

The benchmark is very affected by code placement.
Even adding dead function affects speeds. Read vstinner's blog and presentation:

* https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
* https://speakerdeck.com/haypo/how-to-run-a-stable-benchmark?slide=9

That's why we recommend PGO+LTO build for benchmarking.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Marco Sulla <launchpad.net@marco.sulla.e4ward.com> added the comment:

I did PGO+LTO... --enable-optimizations --with-lto

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Change by Inada Naoki <songofacandy@gmail.com>:


----------
pull_requests: +22023
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/23106

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

> I did PGO+LTO... --enable-optimizations --with-lto

I'm sorry about that. PGO+LTO *reduce* noises, but there are still noises. And unpack_sequence is very fragile.
I tried your branch again, and unpack_sequence is 10% *slower* than master branch.

I am running pyperformance with PR-23106, which simplifies your function and use it from _PyStack_AsDict() and _PyEval_EvalCode().

----------
stage: patch review ->

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

Short result (minspeed=2):

Slower (4):
- unpack_sequence: 65.2 ns +- 1.3 ns -> 69.2 ns +- 0.4 ns: 1.06x slower (+6%)
- unpickle_list: 5.21 us +- 0.04 us -> 5.44 us +- 0.02 us: 1.04x slower (+4%)
- chameleon: 9.80 ms +- 0.08 ms -> 10.0 ms +- 0.1 ms: 1.02x slower (+2%)
- logging_silent: 202 ns +- 5 ns -> 206 ns +- 5 ns: 1.02x slower (+2%)

Faster (9):
- pickle_dict: 30.7 us +- 0.1 us -> 29.0 us +- 0.1 us: 1.06x faster (-5%)
- scimark_lu: 169 ms +- 3 ms -> 163 ms +- 3 ms: 1.04x faster (-4%)
- sympy_str: 396 ms +- 8 ms -> 383 ms +- 5 ms: 1.04x faster (-3%)
- sqlite_synth: 3.46 us +- 0.08 us -> 3.34 us +- 0.04 us: 1.03x faster (-3%)
- scimark_fft: 415 ms +- 3 ms -> 405 ms +- 3 ms: 1.03x faster (-3%)
- pickle_list: 4.91 us +- 0.07 us -> 4.79 us +- 0.04 us: 1.03x faster (-3%)
- dulwich_log: 82.4 ms +- 0.8 ms -> 80.4 ms +- 0.8 ms: 1.02x faster (-2%)
- scimark_sparse_mat_mult: 5.49 ms +- 0.03 ms -> 5.37 ms +- 0.02 ms: 1.02x faster (-2%)
- spectral_norm: 157 ms +- 1 ms -> 153 ms +- 4 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (47): ...

Geometric mean: 1.00 (faster)

Long result is attached.

----------
Added file: https://bugs.python.org/file49560/pr23106.txt

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

And bench_kwcall.py is a microbenchmark for _PyEval_EvalCode.

$ cpython/release/python -m pyperf compare_to master.json kwcall-nodup.json

kwcall-3: Mean +- std dev: [master] 192 us +- 2 us -> [kwcall-nodup] 175 us +- 1 us: 1.09x faster (-9%)
kwcall-6: Mean +- std dev: [master] 327 us +- 6 us -> [kwcall-nodup] 291 us +- 4 us: 1.12x faster (-11%)
kwcall-9: Mean +- std dev: [master] 436 us +- 10 us -> [kwcall-nodup] 373 us +- 5 us: 1.17x faster (-14%)

Geometric mean: 0.89 (faster)

----------
Added file: https://bugs.python.org/file49561/bench_kwcall.py

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue41835] Speed up dict vectorcall creation using keywords [ In reply to ]
Inada Naoki <songofacandy@gmail.com> added the comment:

While this is an interesting optimization, the gain is not enough.
I close this issue for now.

@Marco Sulla
Optimizing dict is a bit hard job. If you want to continue, I have an idea:
`dict(zip(keys, row))` is common use case. It is used by asdict() in datacalss, _asdict() in namedtuple, and csv DictReader.
Sniffing zip object and presizing dict may be interesting optimization.

But note that this idea has low chance of accepted too. We tries many ideas like this and reject them by ourselves even without creating a pull request.

----------
resolution: -> rejected
stage: -> resolved
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue41835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com