fix(ci): run the health check before warming's storage cleanup#1300
Merged
Conversation
`aspect ci warming` cleaned the prior Bazel state under the runner storage mount *before* `setup_phase`, so the Workflows `agent_health_check` ran after the wipe. The health check must run first — it validates the runner before warming touches any of its state. Move the cleanup to after `setup_phase`. The health check (`ctx.bazel.health_check()`) starts a Bazel server in the runner `--output_base` under the same mount, so cleaning afterwards would wipe `<mount>/output/*` out from under a live server. Shut that server down first (`bazel <resolved-startup-flags> shutdown`, targeting the runner output base) before the wipe; best-effort, so a missing server is tolerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e85280b476
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
✨ Aspect Workflows Tasks📅 Sun Jun 28 14:21:05 UTC 2026
|
Addressing Codex review on #1300: the cleanup shut the server down with a raw `command("bazel")`, which bypasses the runtime's launcher resolution. When Aspect runs with a resolved Bazel binary (BAZEL_REAL) or a `tools/bazel` wrapper, the health check and build go through `bazel_command()` (honoring BAZEL_REAL, setting ASPECT_CLI_RUNNING), so a bare `bazel` could invoke a different binary/wrapper that doesn't target the server the health check started — leaving it live while the cleanup wipes the output base. Add a `ctx.bazel.shutdown()` runtime method that goes through `bazel_command()` and replays the active run command's startup flags (like `build`/`info`/ `health_check`), and call it from warming instead of the raw command. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
aspect ci warmingcleaned the prior Bazel state under the runner storage mount (ASPECT_WORKFLOWS_RUNNER_STORAGE_PATH) beforesetup_phase, which is where the Workflowsagent_health_checkruns. So the health check ran after the wipe. The health check must run first — it validates the runner before warming touches any of its state.This moves the cleanup to after
setup_phase. Because the health check (ctx.bazel.health_check()) starts a Bazel server in the runner--output_baseunder that same mount, cleaning afterwards would wipe<mount>/output/*out from under a live server and poison it. So the cleanup now shuts that server down first —bazel <resolved-startup-flags> shutdown, replaying the RunCommand's--output_base/--output_user_rootso it targets the right server — then does therm. Best-effort throughout (a missing server is tolerated).