Skip to content

ci: build and validate the PyPI sdist/wheel#16

Merged
jeremyandrews merged 1 commit into
mainfrom
ci/dist-validation
Jun 14, 2026
Merged

ci: build and validate the PyPI sdist/wheel#16
jeremyandrews merged 1 commit into
mainfrom
ci/dist-validation

Conversation

@jeremyandrews

Copy link
Copy Markdown
Member

What

Adds a dist job to ci.yml that builds and validates the PyPI artifacts on every push/PR, plus a fix for a real sdist bloat found while building it.

Publishing scolta is manual, and nothing in CI built the sdist/wheel — so packaging breakage or cruft only surfaced at twine upload time. The new job:

  1. uv build — builds both sdist and wheel exactly as a release would.
  2. twine check dist/* — metadata / long-description (PyPI render) validity.
  3. scripts/validate-dist.py — content + size asserts (runs locally too, after uv build).

What the gate checks

Fail-closed on the file sets that are ours to enumerate; every failure message says what leaked/is missing and where the controlling filter lives (pyproject.toml hatch targets, or scripts/vendor_assets.py).

Wheel:

  • Every vendored browser runtime asset is present — this is the failure mode that matters most here, a wheel that imports fine but renders no search UI because vendor_assets.py was never run or hatch dropped the data files:
    • scolta/assets/css/scolta.css
    • scolta/assets/js/scolta.js
    • scolta/assets/pagefind/pagefind-worker.js, pagefind.js, wasm.en.pagefind, wasm.unknown.pagefind
    • scolta/assets/wasm/scolta_core.js, scolta_core_bg.wasm
  • Nothing outside the scolta package + dist-info (no tests/, __pycache__, *.pyc, or .sha256/.d.ts/.map sidecars).

Sdist:

  • Buildable source set present (pyproject.toml, src/scolta/__init__.py, PKG-INFO, src/scolta/assets/).
  • No local build-dir junk: node_modules, cargo target/, caches, .idea/.vscode, .pyc, .DS_Store.

Size caps (shared pattern — ~2x the measured good artifact, value documented in the script):

  • wheel: 1.5 MB cap vs measured ~712 KiB (729,525 B)
  • sdist: 4.7 MB cap vs measured ~2.24 MiB (2,349,924 B)

Real find fixed in the same PR

The sdist was 7.9 MB. Hatchling's source dist defaults to "everything on disk minus VCS-ignored", which does not prune nested untracked trees — it pulled in tests/js/node_modules (61 MB, including .idea IDE files from a vendored npm dep) and tools/stemmer-golden/target (Rust build artifacts). An enumerated [tool.hatch.build.targets.sdist].exclude list prunes those plus cache/IDE/pyc junk, dropping the sdist to ~2.35 MB while keeping the full ported test corpus and stemmer fixtures. The wheel was already clean (packages = ["src/scolta"]).

Why

Same dist-cruft class as the scolta-wp 13 MB plugin-zip incident and the WP.org dist-cruft compliance flags — caught now in PR instead of at publish.

Tested

  • uv build + twine check dist/* — both PASSED.
  • uv run python scripts/validate-dist.py against the real build — passed (wheel 729,525 B, sdist ~2.35 MB).
  • Negative-tested the validator against a crafted bad wheel (missing assets, leaked tests/) and bad sdist (node_modules, cargo target/) — all 14 expected asserts fired with actionable messages.
  • uv run ruff check . and uv run ruff format --check . — green.

Publishing scolta to PyPI is manual and nothing in CI built the
distribution artifacts, so packaging breakage or cruft only surfaced at
`twine upload` time. Add a `dist` job that builds both artifacts
(`uv build`), runs `twine check dist/*`, and runs a new
`scripts/validate-dist.py` content gate that also runs locally.

The gate asserts (fail-closed on the file sets that are ours to
enumerate):

- wheel carries every vendored browser runtime asset (css/js/pagefind/
  wasm) — the failure mode that matters most: a wheel that imports but
  ships no search UI because vendor_assets.py was never run;
- wheel has nothing outside the scolta package + dist-info (no tests,
  __pycache__, *.pyc, .sha256/.d.ts/.map sidecars);
- sdist is a buildable source set with no local build-dir junk;
- both artifacts under size caps (~2x the measured good build).

Also fix a real sdist bloat: hatchling's source dist defaulted to
"everything on disk minus VCS-ignored", which pulled in
tests/js/node_modules (61 MB, incl. .idea files) and
tools/stemmer-golden/target (Rust artifacts) — a 7.9 MB sdist. An
enumerated [tool.hatch.build.targets.sdist] exclude list prunes those
and caches/IDE/pyc junk, dropping the sdist to ~2.35 MB. The wheel was
already clean.
@jeremyandrews jeremyandrews merged commit 2c58ac4 into main Jun 14, 2026
6 checks passed
@jeremyandrews jeremyandrews deleted the ci/dist-validation branch June 14, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant