gh-150424: Widen JIT int fast paths to full int64 range#150425
Conversation
Replace the 30-bit compact-only range check (is_medium_int) with __builtin_add_overflow / sub_overflow / mul_overflow, widening the JIT arithmetic fast path from ±2^30 to ±2^62. Relax operand guards from _PyLong_CheckExactAndCompact to PyLong_CheckExact so non-compact inputs also stay in the JIT trace. Add inline _PyLong_AsInt64 for fast digit extraction from non-compact PyLongs (avoids calling the heavy PyLong_AsLongLongAndOverflow). Results (pyperf, ARM64, rigorous): intermediate_overflow 2.70x faster double_add 2.73x faster accumulate 1.65x faster geometric mean 1.52x faster Includes pyperf-based microbenchmark (Tools/scripts/jit_int_benchmark_pyperf.py) and a simpler timeit-based version (jit_int_benchmark.py).
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
The following commit authors need to sign the Contributor License Agreement: |
Do we actually support such builds? |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
I suspect this has been entirely generated by an LLM (at least the summary has been generated by that). It's unclear what the issue is so please first wait for some feedbcak on the issue before opening the PR. This is the common process as highlighted in our devguide. |
Hi, that's incorrect, I'm talking to a core dev directly on the the PR and as noted by the status, it's still a draft and WIP |
15-bit builds are still supported: --enable-big-digits=15 is a documented configure option, and CPython still has both 15-bit and 30-bit PyLong layouts. I called it out because widening the JIT path to full int64_t exposed a real correctness issue on that https://github.com/python/cpython/blob/main/Doc/using/configure.rst#L229-L237 |
Who's the core dev? Ok for the 15-bit but please, first share a reproducer on the issue before jumping onto PRs. I don't know if you want to address correctness or performance. Those are different. And the described scope is not helpful as there are too many points on the issue. |
I'm addressing performance, the performance changes surfaced correctness issues on 15-bit by the test suite.
I'm working on updating the title and bodies, as I said, this is a draft / WIP, I still haven't settled the final body as I'm doing some cleanup |
|
Before doing any work, we should first understand the issue. It doesn't make sense to make PRs and update the issue accordingly. This is not how our worklow is. So please, first discuss the change on the issue by including a reproducer. |
Fixes gh-150424.
This widens the JIT
intfast paths from the compact-only range to the fullint64_trange, and fixes follow-up correctness issues resulting from this for non-compact exact ints and 15-bit builds.Changes:
int64_trangeint64_tPyLong_FromInt64()so 15-bit builds do not narrow throughstwodigitsTests run:
./python.exe -m unittest test.test_capi.test_opt.TestUopsOptimization./python.exe -m test test_generated_casesAdditional validation:
--with-pydebug --enable-experimental-jit --enable-big-digits=15Comparison: main (1310d2c) vs this branch (160d3cd).
Build: PGO + full LTO, installed binary, macOS arm64, Clang 22, PYTHON_JIT=0 (interpreter only).
Microbenchmarks target non-compact int arithmetic directly (values that exceed 2**30 but fit in int64).
pyperformance benchmarks are general workloads not specific to this change — included to confirm no regression.
pyperformance
Microbenchmarks (Tools/scripts/jit_int_benchmark_pyperf.py)