Skip to content

Investigate AST-level hashing in test_patches.py to distinguish substantive from cosmetic changes #1566

@thehesiod

Description

@thehesiod

Summary

tests/test_patches.py currently hashes botocore function bytes (SHA1 of source text). This catches every change — substantive behavior shifts AND cosmetic edits (docstrings, whitespace, type hints, import reorder) alike. The result: contributors can't tell from a hash-fail alone whether a real port is needed or just a hash bump.

Proposal: add an AST-level hash alongside the existing byte hash. Two signals, different intents:

Hash What it catches Enforcement meaning
Byte hash (existing) All changes, including cosmetic Aiobotocore should mirror 1:1 for diff minimality (the repo's core principle). A mismatch means the override may be out of sync even for docstrings/whitespace.
AST hash (new) Substantive changes only — signature, control flow, call structure A mismatch means a real functional port is needed, not just a mechanical mirror.

With both, a pure-docstring botocore change busts the byte hash but not the AST hash, so the contributor knows to apply the cosmetic mirror + bump the byte hash but doesn't need to do behavioral porting work. A real body change busts both, signaling both mirror and port.

Why this matters now

  • The /aiobotocore-bot:check-async-need classifier distinguishes substantive from cosmetic per Step 2/3 (refined in evals + sync-prompt: shared botocore cache, concurrency, and classifier refinements from first eval run #1565). The byte hash doesn't have that distinction baked in.
  • A recent case (PR Bump botocore dependency specification #1480) had a mix: botocore's _apply_request_trailer_checksum had a real body change AND adjacent classes got docstrings/type hints. The byte hash fires on both; the classifier now correctly only port-flags the body change. An AST hash would let the test suite encode the same distinction.
  • Contributors doing a relax upgrade (no-port) with only cosmetic botocore changes upstream currently have to update hashes for no functional reason — looks like a port but isn't.

Proposed approach (sketch)

For each entry in test_patches.py track two hashes:

# before
'hash': 'a1b2c3...'  # SHA1 of source bytes

# after
'byte_hash': 'a1b2c3...'   # unchanged; mirrors minimality principle
'ast_hash':  'd4e5f6...'   # normalized AST hash; behavior-only signal

AST normalization:

  1. Load the botocore source module.
  2. Extract the function/class via ast.parse → find by name.
  3. Normalize the AST:
    • Strip module-level and function-level docstrings (first Expr(Constant(str))).
    • Strip pure type annotations (ast.arg.annotation, FunctionDef.returns).
    • Unparse back to source with a canonical formatter (ast.unparse).
  4. Hash the normalized unparsed source.

On test failure, the contributor sees which hash(es) failed and gets a clear signal: byte-only = cosmetic mirror + hash bump, both = real port needed.

Risks / considerations

  • False negatives: an AST-equivalent refactor (if cond: return a\nreturn breturn a if cond else b) busts the byte hash but equals the AST hash. Usually that's fine — behavior is the same — but edge cases exist (walrus-in-expr-context, tuple-unpacking reorder). Worth auditing the worst 5-10 AST-equivalence classes against aiobotocore's actual overrides before committing.
  • Ordering stability: ast.unparse output can vary across Python versions. Pin the hashing-Python version in test_patches.py or bake it into the normalization.
  • Debuggability: when the AST hash fails, the raw byte diff is harder to map to. Need a small diff tool that shows normalized forms side-by-side on failure.

Non-goals

  • Replacing the byte hash. Aiobotocore's "minimize divergence from botocore" principle (see docs/override-patterns.md) means byte-level fidelity matters — the AST hash is additive signal, not replacement.
  • Replacing the check-async-need classifier. The hash tests and the classifier serve different purposes (mechanical coupling vs async-need judgment). All three signals should coexist.
  • Changing aiobotocore's override model. The hashing is CI-side enforcement, not a runtime concern.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions