[ENG-603] Fix caching in care backend build actions#3696
Conversation
📝 WalkthroughWalkthroughAdds a plugin-hash resolver, switches reusable-test and deploy builds to GHCR-backed Buildx caches, passes the resolved hash into Docker builds, and adds a deploy job that deletes older GHCR build cache versions. ChangesBuild cache and plugin hash plumbing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…n staleness mitigation
- Replace actions/cache + local buildx cache with type=registry GHCR cache
- Use weekly-rotated cache tags (buildcache-{platform}-{year-Wweek})
- Add prune-cache job to delete tags older than 4 weeks on develop
- Add resolve_plugins.py to compute PLUGIN_RESOLVED_HASH from @Branch refs
for cache busting when plugin upstreams change
- Add PLUGIN_RESOLVED_HASH ARG/ENV to both Dockerfiles
- Remove buildkit-cache-dance and all GHA cache boilerplate from test workflow
- Conditional cache-to in reusable-test.yml (write only on push)
9c55d2e to
9d9a3ea
Compare
Greptile SummaryThis PR replaces the GHA filesystem cache (
Confidence Score: 4/5Safe to merge after fixing the unquoted ADDITIONAL_PLUGS arg; the change only affects CI infrastructure and does not touch application code. The --build-arg ADDITIONAL_PLUGS= line in reusable-test.yml is unquoted in a shell command. When ADDITIONAL_PLUGS is a non-empty JSON array, spaces inside the JSON cause shell word-splitting and break the docker buildx build invocation for any deployment that uses plugins. The rest of the changes look correct. reusable-test.yml — the unquoted ADDITIONAL_PLUGS build-arg in the docker buildx build shell command needs to be quoted before merging. Important Files Changed
Reviews (1): Last reviewed commit: "ci: switch Docker cache to GHCR registry..." | Re-trigger Greptile |
| except Exception: | ||
| sha = ref |
There was a problem hiding this comment.
When
git ls-remote fails (network timeout, auth error, etc.), the exception is silently swallowed and the branch name (e.g. main) is used as the "SHA". Any subsequent build that also fails the lookup will compute the same hash and get a stale cache hit — the plugin layer won't be invalidated even if the branch moved. Emitting a warning to stderr makes the fallback visible in CI logs without changing the exit behaviour.
| except Exception: | |
| sha = ref | |
| except Exception as exc: | |
| sys.stderr.write(f"warning: git ls-remote failed for {git_url!r} ({exc}); using ref name as cache key\n") | |
| sha = ref |
| --build-arg PLUGIN_RESOLVED_HASH=${{ steps.plugin-hash.outputs.resolved_hash }} \ | ||
| --build-arg ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }} \ |
There was a problem hiding this comment.
Unquoted
ADDITIONAL_PLUGS causes shell word-splitting
${{ env.ADDITIONAL_PLUGS }} is expanded by the Actions expression engine before the shell runs, so a JSON value that contains spaces (e.g. [{"package_name": "...", "version": "@main"}]) gets split into multiple tokens. Docker receives several malformed arguments instead of a single key=value pair and the build fails. The fix is to wrap the argument in double quotes: "--build-arg=ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }}".
Note: PLUGIN_RESOLVED_HASH is always a 16-char hex string or no-plugins, so it is safe unquoted — only the JSON-valued arg needs this treatment.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/reusable-test.yml (1)
24-29: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueOptional: unpinned
docker/login-action@v3.zizmor flags Line 25 as unpinned per a blanket-hash policy. It's consistent with the rest of this workflow's tag-pinned actions, so this is only worth addressing if you intend to enforce SHA pinning repo-wide.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/reusable-test.yml around lines 24 - 29, The GitHub Container Registry login step uses an action pinned only by tag, which may violate the repo’s SHA-pin policy. If this workflow should follow the same enforcement as the other tag-pinned actions, update the Docker login step in the reusable test workflow to use a fixed commit SHA for docker/login-action instead of `@v3`, keeping the existing Login to GitHub Container Registry block and its inputs unchanged.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/scripts/resolve_plugins.py:
- Around line 26-42: The plugin ref resolution in resolve_plugins.py is silently
falling back to the raw branch/ref when git ls-remote fails or returns no
output, which makes values like pkg@main stop changing and hides resolution
problems; update the logic around the ref handling to surface the failure
instead of assigning sha from ref. Also fix the version defaulting in the
package version lookup so a null version is treated like missing input by using
the same default path as the versionless case, and make sure the
ver.startswith("@") branch in the resolver no longer crashes on null values.
In @.github/workflows/reusable-test.yml:
- Around line 42-54: The build step in the reusable workflow is passing
ADDITIONAL_PLUGS directly from the GitHub expression into the shell, which can
strip JSON quotes and split the value before docker buildx receives it. Move
ADDITIONAL_PLUGS into the step env and reference it as a quoted shell variable
in the Build images run block, using the existing docker buildx build invocation
and keeping PLUGIN_RESOLVED_HASH unchanged.
In `@docker/prod.Dockerfile`:
- Around line 37-40: The builder-stage plugin install step in
docker/prod.Dockerfile is not using PLUGIN_RESOLVED_HASH, so changes to the
resolved plugin set do not invalidate the install_plugins.py layer. Update the
RUN step that invokes install_plugins.py to reference PLUGIN_RESOLVED_HASH
alongside the existing ARGs, using the same pattern as the corresponding
dev.Dockerfile cache-busting fix, so the layer is rebuilt whenever the hash
changes.
---
Nitpick comments:
In @.github/workflows/reusable-test.yml:
- Around line 24-29: The GitHub Container Registry login step uses an action
pinned only by tag, which may violate the repo’s SHA-pin policy. If this
workflow should follow the same enforcement as the other tag-pinned actions,
update the Docker login step in the reusable test workflow to use a fixed commit
SHA for docker/login-action instead of `@v3`, keeping the existing Login to GitHub
Container Registry block and its inputs unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8740dfbc-e1d4-4a9b-896b-a767c37f5131
📒 Files selected for processing (5)
.github/scripts/resolve_plugins.py.github/workflows/deploy.yml.github/workflows/reusable-test.ymldocker/dev.Dockerfiledocker/prod.Dockerfile
| if ver.startswith("@"): | ||
| ref = ver[1:] | ||
| git_url = pkg.removeprefix("git+") | ||
| try: | ||
| out = subprocess.run( # noqa: S603 | ||
| ["git", "ls-remote", git_url, ref], # noqa: S607 | ||
| check=False, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=15, | ||
| ) | ||
| sha = out.stdout.split()[0] if out.stdout else ref | ||
| except Exception: | ||
| sha = ref | ||
| resolved.append(f"{pkg}@{sha}") | ||
| else: | ||
| resolved.append(f"{pkg}{ver}") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
python3 - <<'PY'
from pathlib import Path
p = Path('.github/scripts/resolve_plugins.py')
print(p.exists(), p)
print(p.read_text())
PYRepository: ohcnetwork/care
Length of output: 1395
🏁 Script executed:
python3 - <<'PY'
from pathlib import Path
p = Path('.github/scripts/resolve_plugins.py')
print(p.exists(), p)
print(p.read_text())
PYRepository: ohcnetwork/care
Length of output: 1395
Avoid falling back to the raw ref here. When git ls-remote fails or returns nothing, sha becomes ref, so @main hashes like pkg@main and stops busting the cache when the branch moves. Surface the failure instead of quietly pretending it resolved.
Also, p.get("version", "@main") does not cover "version": null; that still reaches ver.startswith("@") and crashes. Use p.get("version") or "@main" if null should mean default.
🧰 Tools
🪛 ast-grep (0.44.0)
[error] 29-35: Command coming from incoming request
Context: subprocess.run( # noqa: S603
["git", "ls-remote", git_url, ref], # noqa: S607
check=False,
capture_output=True,
text=True,
timeout=15,
)
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
(subprocess-from-request)
🪛 Ruff (0.15.18)
[warning] 38-38: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/scripts/resolve_plugins.py around lines 26 - 42, The plugin ref
resolution in resolve_plugins.py is silently falling back to the raw branch/ref
when git ls-remote fails or returns no output, which makes values like pkg@main
stop changing and hides resolution problems; update the logic around the ref
handling to surface the failure instead of assigning sha from ref. Also fix the
version defaulting in the package version lookup so a null version is treated
like missing input by using the same default path as the versionless case, and
make sure the ver.startswith("@") branch in the resolver no longer crashes on
null values.
Source: Linters/SAST tools
| - name: Build images | ||
| run: | | ||
| CACHE_TO="" | ||
| if [[ '${{ github.event_name }}' == 'push' ]]; then | ||
| CACHE_TO="--cache-to=type=registry,ref=ghcr.io/${{ github.repository }}:$CACHE_TAG,mode=max" | ||
| fi | ||
| docker buildx build \ | ||
| --file docker/dev.Dockerfile \ | ||
| --tag care_local \ | ||
| --cache-from=type=local,src=${{ runner.temp }}/.buildx-cache \ | ||
| --cache-to=type=local,dest=${{ runner.temp }}/.buildx-cache-new,mode=max \ | ||
| --build-arg PLUGIN_RESOLVED_HASH=${{ steps.plugin-hash.outputs.resolved_hash }} \ | ||
| --build-arg ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }} \ | ||
| --cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:$CACHE_TAG \ | ||
| $CACHE_TO \ |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the workflow and related references.
git ls-files .github/workflows/reusable-test.yml .github/workflows/deploy.yml
printf '\n--- reusable-test.yml (relevant section) ---\n'
cat -n .github/workflows/reusable-test.yml | sed -n '1,180p'
printf '\n--- search ADDITIONAL_PLUGS / plugin-hash / CACHE_TAG ---\n'
rg -n "ADDITIONAL_PLUGS|plugin-hash|CACHE_TAG|build-args|build-arg" .github/workflows -S
printf '\n--- deploy.yml relevant section ---\n'
cat -n .github/workflows/deploy.yml | sed -n '1,260p'Repository: ohcnetwork/care
Length of output: 15953
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where ADDITIONAL_PLUGS is defined and what shape it has.
rg -n "ADDITIONAL_PLUGS" -S .
printf '\n--- files containing plugin hash step ---\n'
rg -n "plugin-hash|resolved_hash" -S .github/workflows .Repository: ohcnetwork/care
Length of output: 1501
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Probe shell behavior with representative JSON-like ADDITIONAL_PLUGS values.
python3 - <<'PY'
import shlex
samples = [
'[]',
'["a"]',
'["a b","c"]',
'["x","y z"]',
'["quote\\"here"]',
]
for s in samples:
print("SAMPLE:", s)
cmd = f'echo --build-arg ADDITIONAL_PLUGS={s}'
print("RAW:", cmd)
print("SPLIT:", shlex.split(cmd))
print()
PYRepository: ohcnetwork/care
Length of output: 782
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the plugin resolver and the workflow context around the build args.
ast-grep outline .github/scripts/resolve_plugins.py --view expanded
printf '\n--- resolve_plugins.py ---\n'
cat -n .github/scripts/resolve_plugins.py
printf '\n--- dev Dockerfile build-arg handling ---\n'
cat -n docker/dev.Dockerfile | sed -n '1,120p'Repository: ohcnetwork/care
Length of output: 3575
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the resolver to see whether the output is constrained enough to make line 51 harmless.
cat -n .github/scripts/resolve_plugins.py | sed -n '1,220p'Repository: ohcnetwork/care
Length of output: 1669
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the resolver to see whether the output is constrained enough to make line 51 harmless.
cat -n .github/scripts/resolve_plugins.py | sed -n '1,220p'Repository: ohcnetwork/care
Length of output: 1669
Quote ADDITIONAL_PLUGS before passing it to docker buildx
${{ env.ADDITIONAL_PLUGS }} is JSON, and dropping it straight into the run script lets the shell strip quotes and split words, so non-empty plugin lists turn into broken --build-arg values. Move it to env: and pass "$ADDITIONAL_PLUGS" instead. PLUGIN_RESOLVED_HASH is already shell-safe, so it can stay as-is.
🧰 Tools
🪛 zizmor (1.26.1)
[info] 51-51: code injection via template expansion (template-injection): may expand into attacker-controllable code
(template-injection)
[warning] 52-52: code injection via template expansion (template-injection): may expand into attacker-controllable code
(template-injection)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/reusable-test.yml around lines 42 - 54, The build step in
the reusable workflow is passing ADDITIONAL_PLUGS directly from the GitHub
expression into the shell, which can strip JSON quotes and split the value
before docker buildx receives it. Move ADDITIONAL_PLUGS into the step env and
reference it as a quoted shell variable in the Build images run block, using the
existing docker buildx build invocation and keeping PLUGIN_RESOLVED_HASH
unchanged.
Source: Linters/SAST tools
| ARG PLUGIN_RESOLVED_HASH | ||
| ARG ADDITIONAL_PLUGS="" | ||
| ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS | ||
| RUN python3 $APP_HOME/install_plugins.py |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Same unused-ARG cache-busting gap as dev.Dockerfile.
PLUGIN_RESOLVED_HASH is declared at Line 37 but never referenced by the install_plugins.py step (Line 40), so a changed hash won't invalidate the builder-stage plugin install layer. Thread the ARG into the RUN so it actually does its job.
🔧 Suggested fix
ARG PLUGIN_RESOLVED_HASH
ARG ADDITIONAL_PLUGS=""
ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS
-RUN python3 $APP_HOME/install_plugins.py
+RUN PLUGIN_RESOLVED_HASH=$PLUGIN_RESOLVED_HASH python3 $APP_HOME/install_plugins.py📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ARG PLUGIN_RESOLVED_HASH | |
| ARG ADDITIONAL_PLUGS="" | |
| ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS | |
| RUN python3 $APP_HOME/install_plugins.py | |
| ARG PLUGIN_RESOLVED_HASH | |
| ARG ADDITIONAL_PLUGS="" | |
| ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS | |
| RUN PLUGIN_RESOLVED_HASH=$PLUGIN_RESOLVED_HASH python3 $APP_HOME/install_plugins.py |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker/prod.Dockerfile` around lines 37 - 40, The builder-stage plugin
install step in docker/prod.Dockerfile is not using PLUGIN_RESOLVED_HASH, so
changes to the resolved plugin set do not invalidate the install_plugins.py
layer. Update the RUN step that invokes install_plugins.py to reference
PLUGIN_RESOLVED_HASH alongside the existing ARGs, using the same pattern as the
corresponding dev.Dockerfile cache-busting fix, so the layer is rebuilt whenever
the hash changes.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #3696 +/- ##
========================================
Coverage 79.55% 79.56%
========================================
Files 479 479
Lines 22996 22996
Branches 2378 2378
========================================
+ Hits 18295 18297 +2
+ Misses 4096 4095 -1
+ Partials 605 604 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Proposed Changes
actions/cache) with native BuildKit registry cache backed by GHCR (type=registry) in bothdeploy.ymlandreusable-test.ymlreproducible-containers/buildkit-cache-dancethird-party action dependencyresolve_plugins.pyscript to resolve plugin branch refs to commit SHAs viagit ls-remoteand produce a deterministic cache-busting hash as a Docker build arg (PLUGIN_RESOLVED_HASH)prune-cachejob to deletebuildcache-*GHCR package versions older than 4 weeks after each successfuldeveloppush--cache-toon fork PRs (skipped onpull_requestevents to avoid GHCR auth failures)Associated Issue
ENG-603
Summary by CodeRabbit
New Features
Bug Fixes