You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tests/test_patches.py currently hashes botocore function bytes (SHA1 of source text). This catches every change — substantive behavior shifts AND cosmetic edits (docstrings, whitespace, type hints, import reorder) alike. The result: contributors can't tell from a hash-fail alone whether a real port is needed or just a hash bump.
Proposal: add an AST-level hash alongside the existing byte hash. Two signals, different intents:
Hash
What it catches
Enforcement meaning
Byte hash (existing)
All changes, including cosmetic
Aiobotocore should mirror 1:1 for diff minimality (the repo's core principle). A mismatch means the override may be out of sync even for docstrings/whitespace.
AST hash (new)
Substantive changes only — signature, control flow, call structure
A mismatch means a real functional port is needed, not just a mechanical mirror.
With both, a pure-docstring botocore change busts the byte hash but not the AST hash, so the contributor knows to apply the cosmetic mirror + bump the byte hash but doesn't need to do behavioral porting work. A real body change busts both, signaling both mirror and port.
A recent case (PR Bump botocore dependency specification #1480) had a mix: botocore's _apply_request_trailer_checksum had a real body change AND adjacent classes got docstrings/type hints. The byte hash fires on both; the classifier now correctly only port-flags the body change. An AST hash would let the test suite encode the same distinction.
Contributors doing a relax upgrade (no-port) with only cosmetic botocore changes upstream currently have to update hashes for no functional reason — looks like a port but isn't.
Proposed approach (sketch)
For each entry in test_patches.py track two hashes:
# before'hash': 'a1b2c3...'# SHA1 of source bytes# after'byte_hash': 'a1b2c3...'# unchanged; mirrors minimality principle'ast_hash': 'd4e5f6...'# normalized AST hash; behavior-only signal
AST normalization:
Load the botocore source module.
Extract the function/class via ast.parse → find by name.
Normalize the AST:
Strip module-level and function-level docstrings (first Expr(Constant(str))).
Strip pure type annotations (ast.arg.annotation, FunctionDef.returns).
Unparse back to source with a canonical formatter (ast.unparse).
Hash the normalized unparsed source.
On test failure, the contributor sees which hash(es) failed and gets a clear signal: byte-only = cosmetic mirror + hash bump, both = real port needed.
Risks / considerations
False negatives: an AST-equivalent refactor (if cond: return a\nreturn b → return a if cond else b) busts the byte hash but equals the AST hash. Usually that's fine — behavior is the same — but edge cases exist (walrus-in-expr-context, tuple-unpacking reorder). Worth auditing the worst 5-10 AST-equivalence classes against aiobotocore's actual overrides before committing.
Ordering stability: ast.unparse output can vary across Python versions. Pin the hashing-Python version in test_patches.py or bake it into the normalization.
Debuggability: when the AST hash fails, the raw byte diff is harder to map to. Need a small diff tool that shows normalized forms side-by-side on failure.
Non-goals
Replacing the byte hash. Aiobotocore's "minimize divergence from botocore" principle (see docs/override-patterns.md) means byte-level fidelity matters — the AST hash is additive signal, not replacement.
Replacing the check-async-need classifier. The hash tests and the classifier serve different purposes (mechanical coupling vs async-need judgment). All three signals should coexist.
Changing aiobotocore's override model. The hashing is CI-side enforcement, not a runtime concern.
Summary
tests/test_patches.pycurrently hashes botocore function bytes (SHA1 of source text). This catches every change — substantive behavior shifts AND cosmetic edits (docstrings, whitespace, type hints, import reorder) alike. The result: contributors can't tell from a hash-fail alone whether a real port is needed or just a hash bump.Proposal: add an AST-level hash alongside the existing byte hash. Two signals, different intents:
With both, a pure-docstring botocore change busts the byte hash but not the AST hash, so the contributor knows to apply the cosmetic mirror + bump the byte hash but doesn't need to do behavioral porting work. A real body change busts both, signaling both mirror and port.
Why this matters now
/aiobotocore-bot:check-async-needclassifier distinguishes substantive from cosmetic per Step 2/3 (refined in evals + sync-prompt: shared botocore cache, concurrency, and classifier refinements from first eval run #1565). The byte hash doesn't have that distinction baked in.botocoredependency specification #1480) had a mix: botocore's_apply_request_trailer_checksumhad a real body change AND adjacent classes got docstrings/type hints. The byte hash fires on both; the classifier now correctly only port-flags the body change. An AST hash would let the test suite encode the same distinction.Proposed approach (sketch)
For each entry in
test_patches.pytrack two hashes:AST normalization:
ast.parse→ find by name.Expr(Constant(str))).ast.arg.annotation,FunctionDef.returns).ast.unparse).On test failure, the contributor sees which hash(es) failed and gets a clear signal: byte-only = cosmetic mirror + hash bump, both = real port needed.
Risks / considerations
if cond: return a\nreturn b→return a if cond else b) busts the byte hash but equals the AST hash. Usually that's fine — behavior is the same — but edge cases exist (walrus-in-expr-context, tuple-unpacking reorder). Worth auditing the worst 5-10 AST-equivalence classes against aiobotocore's actual overrides before committing.ast.unparseoutput can vary across Python versions. Pin the hashing-Python version intest_patches.pyor bake it into the normalization.Non-goals
Related
/aiobotocore-bot:check-async-needwhich encodes similar logic in prompt form.botocoredependency specification #1480.CONTRIBUTING.rst§"Hashes of Botocore Code" is the current documentation for test_patches.py behavior.