Skip to content

gh-150424: Widen JIT int fast paths to full int64 range#150425

Closed
KRRT7 wants to merge 5 commits into
python:mainfrom
KRRT7:jit-wide-int-fastpath
Closed

gh-150424: Widen JIT int fast paths to full int64 range#150425
KRRT7 wants to merge 5 commits into
python:mainfrom
KRRT7:jit-wide-int-fastpath

Conversation

@KRRT7
Copy link
Copy Markdown

@KRRT7 KRRT7 commented May 25, 2026

Fixes gh-150424.

This widens the JIT int fast paths from the compact-only range to the full int64_t range, and fixes follow-up correctness issues resulting from this for non-compact exact ints and 15-bit builds.

Changes:

  • widen JIT integer add/subtract/multiply fast paths to operate across the full int64_t range
  • relax guards from compact-only exact ints to exact ints that fit in int64_t
  • add fast extraction/support code for non-compact exact ints in the widened range
  • keep specialized in-place mutation compact-only and fall back safely for non-compact inputs
  • handle widened integer comparisons without compact-only assumptions
  • construct widened arithmetic results with PyLong_FromInt64() so 15-bit builds do not narrow through stwodigits
  • add regression coverage for widened operations, non-compact exact ints, boundary cases, and overflow fallback
  • add benchmark scripts for measuring widened JIT integer fast-path performance

Tests run:

  • ./python.exe -m unittest test.test_capi.test_opt.TestUopsOptimization
  • ./python.exe -m test test_generated_cases

Additional validation:

  • built and tested with --with-pydebug --enable-experimental-jit --enable-big-digits=15
  • ran targeted non-compact widened int regression tests on the 15-bit build

Comparison: main (1310d2c) vs this branch (160d3cd).
Build: PGO + full LTO, installed binary, macOS arm64, Clang 22, PYTHON_JIT=0 (interpreter only).
Microbenchmarks target non-compact int arithmetic directly (values that exceed 2**30 but fit in int64).
pyperformance benchmarks are general workloads not specific to this change — included to confirm no regression.

pyperformance

Benchmark main branch Change Significant
chaos 33.2 ms ± 1.1 ms 30.6 ms ± 0.2 ms 1.08x faster Yes (t=18.05)
float 43.1 ms ± 0.6 ms 40.1 ms ± 0.6 ms 1.08x faster Yes (t=26.37)
nbody 67.3 ms ± 0.5 ms 63.7 ms ± 0.6 ms 1.06x faster Yes (t=34.13)
pidigits 228 ms ± 0 ms 228 ms ± 0 ms 1.00x No
pyflate 254 ms ± 2 ms 247 ms ± 2 ms 1.03x faster Yes (t=19.73)
raytrace 146 ms ± 1 ms 138 ms ± 1 ms 1.06x faster Yes (t=64.71)
scimark_fft 168 ms ± 2 ms 157 ms ± 2 ms 1.07x faster Yes (t=29.51)
scimark_lu 62.6 ms ± 7.4 ms 60.3 ms ± 0.4 ms 1.04x faster Yes (t=2.44)
scimark_monte_carlo 41.4 ms ± 4.2 ms 37.5 ms ± 0.3 ms 1.10x faster Yes (t=7.09)
scimark_sor 75.8 ms ± 10.2 ms 66.8 ms ± 0.4 ms 1.13x faster Yes (t=6.81)
scimark_sparse_mat_mult 2.53 ms ± 0.06 ms 2.42 ms ± 0.02 ms 1.05x faster Yes (t=13.25)
spectral_norm 51.8 ms ± 7.5 ms 50.5 ms ± 0.5 ms 1.03x faster No

Microbenchmarks (Tools/scripts/jit_int_benchmark_pyperf.py)

Benchmark main branch Change
jit_int_small 209 ns ± 0 ns 181 ns ± 0 ns 1.16x faster
jit_int_intermediate_overflow 1.28 µs ± 0.01 µs 1.17 µs ± 0.01 µs 1.10x faster
jit_int_double_add 1.34 µs ± 0.01 µs 1.26 µs ± 0.06 µs 1.06x faster
jit_int_accumulate 806 ns ± 33 ns 664 ns ± 20 ns 1.21x faster
jit_int_always_large 459 ns ± 17 ns 422 ns ± 10 ns 1.09x faster
jit_int_mixed 176 ns ± 5 ns 153 ns ± 6 ns 1.15x faster
Geometric mean 1.13x faster

KRRT7 added 4 commits May 25, 2026 12:52
Replace the 30-bit compact-only range check (is_medium_int) with
__builtin_add_overflow / sub_overflow / mul_overflow, widening
the JIT arithmetic fast path from ±2^30 to ±2^62.

Relax operand guards from _PyLong_CheckExactAndCompact to
PyLong_CheckExact so non-compact inputs also stay in the JIT trace.

Add inline _PyLong_AsInt64 for fast digit extraction from non-compact
PyLongs (avoids calling the heavy PyLong_AsLongLongAndOverflow).

Results (pyperf, ARM64, rigorous):
  intermediate_overflow  2.70x faster
  double_add             2.73x faster
  accumulate             1.65x faster
  geometric mean         1.52x faster

Includes pyperf-based microbenchmark (Tools/scripts/jit_int_benchmark_pyperf.py)
and a simpler timeit-based version (jit_int_benchmark.py).
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 25, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot Bot commented May 25, 2026

The following commit authors need to sign the Contributor License Agreement:

CLA not signed

@KRRT7 KRRT7 changed the title gh-150424: Fix widened JIT int fast paths gh-150424: Widen JIT int fast paths to full int64 range May 25, 2026
@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 25, 2026

15-bit builds

Do we actually support such builds?

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 25, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 25, 2026

I suspect this has been entirely generated by an LLM (at least the summary has been generated by that). It's unclear what the issue is so please first wait for some feedbcak on the issue before opening the PR. This is the common process as highlighted in our devguide.

@picnixz picnixz closed this May 25, 2026
@KRRT7
Copy link
Copy Markdown
Author

KRRT7 commented May 25, 2026

I suspect this has been entirely generated by an LLM (at least the summary has been generated by that). It's unclear what the issue is so please first wait for some feedbcak on the issue before opening the PR. This is the common process as highlighted in our devguide.

Hi, that's incorrect, I'm talking to a core dev directly on the the PR and as noted by the status, it's still a draft and WIP

@KRRT7
Copy link
Copy Markdown
Author

KRRT7 commented May 25, 2026

Do we actually support such builds?

15-bit builds are still supported: --enable-big-digits=15 is a documented configure option, and CPython still has both 15-bit and 30-bit PyLong layouts. I called it out because widening the JIT path to full int64_t exposed a real correctness issue on that

https://github.com/python/cpython/blob/main/Doc/using/configure.rst#L229-L237
https://github.com/python/cpython/blob/main/configure.ac#L6452-L6466

@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 25, 2026

Hi, that's incorrect, I'm talking to a core dev directly on the the PR and as noted by the status

Who's the core dev?


Ok for the 15-bit but please, first share a reproducer on the issue before jumping onto PRs. I don't know if you want to address correctness or performance. Those are different. And the described scope is not helpful as there are too many points on the issue.

@KRRT7
Copy link
Copy Markdown
Author

KRRT7 commented May 25, 2026

I don't know if you want to address correctness or performance.

I'm addressing performance, the performance changes surfaced correctness issues on 15-bit by the test suite.

the described scope is not helpful as there are too many points on the issue.

I'm working on updating the title and bodies, as I said, this is a draft / WIP, I still haven't settled the final body as I'm doing some cleanup

@picnixz
Copy link
Copy Markdown
Member

picnixz commented May 25, 2026

Before doing any work, we should first understand the issue. It doesn't make sense to make PRs and update the issue accordingly. This is not how our worklow is. So please, first discuss the change on the issue by including a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Widen JIT int fast paths to full int64 range

2 participants