Skip to content

fix(py_venv): merge colliding package directories#1205

Open
tamird wants to merge 1 commit into
aspect-build:mainfrom
tamird:fix-regular-package-overlay-v1
Open

fix(py_venv): merge colliding package directories#1205
tamird wants to merge 1 commit into
aspect-build:mainfrom
tamird:fix-regular-package-overlay-v1

Conversation

@tamird

@tamird tamird commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Permissive top-level collisions currently select one whole-tree symlink and leave the other wheel on the pth fallback. When every claimant is directory-backed and one contributes a regular package, Python binds that package to the selected directory and hides unique children from the other wheel.

Carry RECORD-derived top_level_dirs and native_roots through generated whl_install metadata into PyWheelsInfo. Merge relocatable directory roots in provider order and mark their contributors covered so fallback cannot shadow the merged tree. Keep roots with native-library RECORD entries on direct wheel projections plus losing-wheel fallback, because copying those roots into a tree artifact changes their physical origin and can break origin-relative sibling lookup. When a regular package remains reachable through fallback, project the later regular claimant for both top-level and nested native conflicts; a namespace-only graft cannot override it and remains hidden. Pure sibling roots still merge.

The regression coverage keeps ordinary module winner behavior, covers regular and mixed directory unions, checks selected-wheel native metadata, and proves ordinary, top-level, and nested native collisions preserve runtime-consistent winner-only behavior while pure nested roots still union. The package-backed google 1.9.3 plus protobuf 6.33.5 e2e reproduces the top-level native mismatch through generated wheel metadata and asserts the concrete venv projection agrees with runtime import ownership while protobuf remains on fallback. The Airflow e2e asserts the resolved merged tree keeps core-owned and provider-only files. The focused py_venv, site_merge, whl_install, generated snapshot, Azure overlap, legacy pth fallback, google-native overlay, and Airflow suites pass.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

py_binary startup benchmark

Version Mean (ms) Median (ms) ± stddev vs BCR vs main Build (s)
BCR 1.11.7 (baseline) 199.294 196.655 ±12.614 49.18
HEAD main 61.231 61.027 ±0.948 -69.3% 10.36
This PR 60.525 59.671 ±3.549 -69.6% -1.2% 6.91

Measured with hyperfine --warmup 5 --runs 50 on Linux
Gate: PR vs HEAD main (threshold: 10%). BCR is shown only as a historical baseline.
Build time: cold bazel build //:bench with isolated output base, no disk cache.

sys.path quality

Version sys.path entries distinct site-packages roots duplicate realpaths
BCR 1.11.7 (baseline) 6 1 0
HEAD main 7 2 0
This PR 7 2 0

sys.path quality measured by bench_syspath inside the assembled venv. Duplicate realpaths indicate symlink redundancy; many distinct site-packages roots suggest an inefficient venv layout.

✅ No regression detected (PR is -1.2% vs HEAD main)

@tamird tamird force-pushed the fix-regular-package-overlay-v1 branch from 9cad97c to 25718b2 Compare June 26, 2026 22:28

tamird commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

The first matrix exposed an existing Airflow e2e assertion that required airflow.file to stay under the selected wheel repository. That is no longer the right contract once the directory is physically merged. This reroll replaces that path assertion with core-owned plus provider-only module checks; the exact two Airflow targets pass locally.

@tamird tamird force-pushed the fix-regular-package-overlay-v1 branch from 25718b2 to 292b635 Compare June 26, 2026 23:04
@tamird

tamird commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

The rerolled matrix was 16/17 green; only the Bazel 8 e2e shard failed the two Airflow tests, while Bazel 9 passed. GitHub did not preserve the inner test logs. Both exact targets pass locally under Bazel 8.7.0 and the e2e 8.5.1 pin. I found the fixture was still using dotted find_spec calls, which import parent Airflow packages and test unrelated import side effects. This reroll checks the core and provider files in the resolved merged airflow directory instead.

@tamird tamird marked this pull request as ready for review June 26, 2026 23:10
@tamird

tamird commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

The rerolled full matrix is green at 292b635. @jbedard, ready for review when you have time.

@tamird tamird force-pushed the fix-regular-package-overlay-v1 branch from 292b635 to 16694eb Compare June 27, 2026 01:38

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16694ebac0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread py/private/py_venv/venv.bzl Outdated
@jbedard

jbedard commented Jun 27, 2026

Copy link
Copy Markdown
Member

What new test cases failed without this fix? If none then add a new e2e/cases/ scenario to reproduce the issue.

tamird commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

The new //py/tests/py_venv_conflict:native_nested_regular_first_collision_test fails against 16694eb before the reroll: the concrete venv projection resolves to native_graft.install while Python imports the earlier native_regular.install package. The reroll adds that reverse-order fixture alongside the existing native_nested_collision_test and asserts that the concrete projection and runtime import resolve to the same regular owner while pure sibling roots still union. I will post the reroll SHA when the public-repository push guard is cleared.

@tamird tamird force-pushed the fix-regular-package-overlay-v1 branch from 16694eb to 2ec6556 Compare June 27, 2026 11:22

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ec6556a18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread py/private/py_venv/venv.bzl Outdated
@jbedard

jbedard commented Jun 28, 2026

Copy link
Copy Markdown
Member

Create a proper reproduction of this bug using packages in an e2e test case, not only in unit tests.

Permissive top-level collisions currently select one whole-tree symlink and leave the other wheel on the pth fallback. When every claimant is directory-backed and one contributes a regular package, Python binds that package to the selected directory and hides unique children from the other wheel.

Carry RECORD-derived top_level_dirs and native_roots through wheel metadata. Merge only relocatable directory roots. Keep roots with native-library RECORD entries on direct wheel projections plus losing-wheel fallback because copying them into a tree artifact changes their physical origin and can break origin-relative sibling lookup. When a regular package remains reachable through fallback, select its later claimant for both top-level and nested native conflicts; a namespace-only graft cannot override it and remains hidden.

Cover regular and mixed directory unions, selected-wheel native metadata, ordinary native winner behavior, both nested claimant orders, and the top-level regular-first namespace-graft case while pure sibling roots still merge. Reverse-order tests assert that concrete venv projection and runtime import resolution agree and that losing native wheels stay on fallback. A package-backed google 1.9.3 plus protobuf 6.33.5 e2e reproduces the top-level mismatch through generated wheel metadata: with the old winner selection, concrete venv google points at protobuf while runtime binds google. Regenerate wheel metadata snapshots canonically.

tamird commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Added e2e/cases/google-native-overlay with real google 1.9.3 and protobuf 6.33.5 wheels. google owns google/init.py; protobuf contributes native google/_upb/_message without a top-level init. With the 2ec6556 winner selection restored, the new target fails because the concrete venv projects protobuf/google and site-packages/google/init.py is absent while runtime still binds fallback google. The reroll selects the regular claimant, so concrete projection and runtime owner agree; protobuf stays on fallback and google.protobuf remains hidden. The focused e2e target passes at local 718f03a; I’ll post the public SHA after the manual push.

@tamird

tamird commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Downstream exact-head CI found an interaction with merged #1191. Two same-version pydantic_core wheels from different lock universes collide on a native root; one claimant stays on whole-wheel fallback, then the metadata pass rejects its duplicate dist-info even though it is the only fallback copy.

I have a focused follow-up ready: apply collision policy first; with no fallback project the normal winner, with one fallback project nothing and let that sole visible copy own metadata discovery, and with multiple fallback copies keep the hard failure. The regression covers those three cases plus package_collisions = error. I will send the PR as soon as I can publish the branch.

tamird commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

The focused follow-up is #1215 at e595304, and its full matrix
is green. It keeps one visible metadata owner for the Stripe/default
pydantic_core collision while preserving the hard failure when multiple
fallback copies remain visible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants