Skip to content

[ENG-603] Fix caching in care backend build actions#3696

Open
tellmeY18 wants to merge 1 commit into
ohcnetwork:developfrom
tellmeY18:fix/workflow-cache
Open

[ENG-603] Fix caching in care backend build actions#3696
tellmeY18 wants to merge 1 commit into
ohcnetwork:developfrom
tellmeY18:fix/workflow-cache

Conversation

@tellmeY18

@tellmeY18 tellmeY18 commented Jun 26, 2026

Copy link
Copy Markdown
Member

Proposed Changes

  • Replace GHA filesystem cache (actions/cache) with native BuildKit registry cache backed by GHCR (type=registry) in both deploy.yml and reusable-test.yml
  • Remove reproducible-containers/buildkit-cache-dance third-party action dependency
  • Add resolve_plugins.py script to resolve plugin branch refs to commit SHAs via git ls-remote and produce a deterministic cache-busting hash as a Docker build arg (PLUGIN_RESOLVED_HASH)
  • Add prune-cache job to delete buildcache-* GHCR package versions older than 4 weeks after each successful develop push
  • Fix --cache-to on fork PRs (skipped on pull_request events to avoid GHCR auth failures)

Associated Issue

ENG-603

Summary by CodeRabbit

  • New Features

    • Improved build and test workflows to use more efficient remote caching, helping speed up repeated builds.
    • Added support for including extra plugin configuration during image builds.
  • Bug Fixes

    • Added safer handling for missing or invalid plugin settings so builds fail gracefully instead of producing unexpected results.
    • Added cache cleanup for older build artifacts to keep deployments healthier over time.

@tellmeY18 tellmeY18 requested a review from a team as a code owner June 26, 2026 09:48
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a plugin-hash resolver, switches reusable-test and deploy builds to GHCR-backed Buildx caches, passes the resolved hash into Docker builds, and adds a deploy job that deletes older GHCR build cache versions.

Changes

Build cache and plugin hash plumbing

Layer / File(s) Summary
Plugin hash resolver
.github/scripts/resolve_plugins.py
ADDITIONAL_PLUGS is parsed as JSON, invalid or empty values emit no-plugins, missing package_name fails, @ versions are resolved through git ls-remote, and a deterministic short hash is printed.
Registry-backed build cache
.github/workflows/reusable-test.yml, .github/workflows/deploy.yml, docker/dev.Dockerfile, docker/prod.Dockerfile
The build workflows compute weekly cache tags and a plugin hash, switch Buildx cache settings to GHCR registry references, pass PLUGIN_RESOLVED_HASH and ADDITIONAL_PLUGS into Docker builds, and remove the local cache move/save steps.
Cache pruning job
.github/workflows/deploy.yml
prune-cache lists GHCR package versions, filters buildcache-* tags older than the weekly cutoff, and deletes matching versions with gh api.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing build-action caching in the care backend.
Description check ✅ Passed The description covers proposed changes and the associated issue, with only optional template sections omitted.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

…n staleness mitigation

- Replace actions/cache + local buildx cache with type=registry GHCR cache
- Use weekly-rotated cache tags (buildcache-{platform}-{year-Wweek})
- Add prune-cache job to delete tags older than 4 weeks on develop
- Add resolve_plugins.py to compute PLUGIN_RESOLVED_HASH from @Branch refs
  for cache busting when plugin upstreams change
- Add PLUGIN_RESOLVED_HASH ARG/ENV to both Dockerfiles
- Remove buildkit-cache-dance and all GHA cache boilerplate from test workflow
- Conditional cache-to in reusable-test.yml (write only on push)
@tellmeY18 tellmeY18 force-pushed the fix/workflow-cache branch from 9c55d2e to 9d9a3ea Compare June 26, 2026 09:50
@greptile-apps

greptile-apps Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the GHA filesystem cache (actions/cache + buildkit-cache-dance) with native BuildKit registry cache backed by GHCR for both the deploy and test workflows, and adds a Python helper that resolves plugin git refs to commit SHAs for deterministic cache-busting.

  • Registry cache migration: Both deploy.yml and reusable-test.yml now use type=registry for --cache-from/--cache-to, keyed by a weekly ISO tag (buildcache-<platform>-YYYY-WNN). --cache-to is correctly gated on push events in the test workflow to avoid GHCR auth failures on fork PRs.
  • resolve_plugins.py: New script uses git ls-remote to resolve plugin branch refs to commit SHAs before hashing; the resulting PLUGIN_RESOLVED_HASH build-arg is injected into both Dockerfiles at the right layer position to invalidate only the plugin-install layer when refs move.
  • prune-cache job: Added to deploy.yml to delete GHCR buildcache-* versions older than 4 weeks on each successful develop push.

Confidence Score: 4/5

Safe to merge after fixing the unquoted ADDITIONAL_PLUGS arg; the change only affects CI infrastructure and does not touch application code.

The --build-arg ADDITIONAL_PLUGS= line in reusable-test.yml is unquoted in a shell command. When ADDITIONAL_PLUGS is a non-empty JSON array, spaces inside the JSON cause shell word-splitting and break the docker buildx build invocation for any deployment that uses plugins. The rest of the changes look correct.

reusable-test.yml — the unquoted ADDITIONAL_PLUGS build-arg in the docker buildx build shell command needs to be quoted before merging.

Important Files Changed

Filename Overview
.github/workflows/reusable-test.yml Replaces filesystem cache with GHCR registry cache; unquoted ADDITIONAL_PLUGS build-arg in shell command will break when the env var contains JSON with spaces
.github/workflows/deploy.yml Migrates prod build to registry cache, adds weekly cache-tag rotation and prune-cache job; build-args use YAML syntax (safe from shell splitting)
.github/scripts/resolve_plugins.py New script resolves plugin git refs to commit SHAs for deterministic cache-busting; silently falls back to branch name on network failure with no warning emitted
docker/dev.Dockerfile Adds ARG PLUGIN_RESOLVED_HASH before ADDITIONAL_PLUGS to correctly bust the layer cache when plugin refs change
docker/prod.Dockerfile Same ARG PLUGIN_RESOLVED_HASH addition as dev.Dockerfile, positioned correctly before the install_plugins.py layer

Reviews (1): Last reviewed commit: "ci: switch Docker cache to GHCR registry..." | Re-trigger Greptile

Comment on lines +38 to +39
except Exception:
sha = ref

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 When git ls-remote fails (network timeout, auth error, etc.), the exception is silently swallowed and the branch name (e.g. main) is used as the "SHA". Any subsequent build that also fails the lookup will compute the same hash and get a stale cache hit — the plugin layer won't be invalidated even if the branch moved. Emitting a warning to stderr makes the fallback visible in CI logs without changing the exit behaviour.

Suggested change
except Exception:
sha = ref
except Exception as exc:
sys.stderr.write(f"warning: git ls-remote failed for {git_url!r} ({exc}); using ref name as cache key\n")
sha = ref

Comment on lines +51 to +52
--build-arg PLUGIN_RESOLVED_HASH=${{ steps.plugin-hash.outputs.resolved_hash }} \
--build-arg ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }} \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unquoted ADDITIONAL_PLUGS causes shell word-splitting

${{ env.ADDITIONAL_PLUGS }} is expanded by the Actions expression engine before the shell runs, so a JSON value that contains spaces (e.g. [{"package_name": "...", "version": "@main"}]) gets split into multiple tokens. Docker receives several malformed arguments instead of a single key=value pair and the build fails. The fix is to wrap the argument in double quotes: "--build-arg=ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }}".

Note: PLUGIN_RESOLVED_HASH is always a 16-char hex string or no-plugins, so it is safe unquoted — only the JSON-valued arg needs this treatment.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/reusable-test.yml (1)

24-29: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Optional: unpinned docker/login-action@v3.

zizmor flags Line 25 as unpinned per a blanket-hash policy. It's consistent with the rest of this workflow's tag-pinned actions, so this is only worth addressing if you intend to enforce SHA pinning repo-wide.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/reusable-test.yml around lines 24 - 29, The GitHub
Container Registry login step uses an action pinned only by tag, which may
violate the repo’s SHA-pin policy. If this workflow should follow the same
enforcement as the other tag-pinned actions, update the Docker login step in the
reusable test workflow to use a fixed commit SHA for docker/login-action instead
of `@v3`, keeping the existing Login to GitHub Container Registry block and its
inputs unchanged.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/scripts/resolve_plugins.py:
- Around line 26-42: The plugin ref resolution in resolve_plugins.py is silently
falling back to the raw branch/ref when git ls-remote fails or returns no
output, which makes values like pkg@main stop changing and hides resolution
problems; update the logic around the ref handling to surface the failure
instead of assigning sha from ref. Also fix the version defaulting in the
package version lookup so a null version is treated like missing input by using
the same default path as the versionless case, and make sure the
ver.startswith("@") branch in the resolver no longer crashes on null values.

In @.github/workflows/reusable-test.yml:
- Around line 42-54: The build step in the reusable workflow is passing
ADDITIONAL_PLUGS directly from the GitHub expression into the shell, which can
strip JSON quotes and split the value before docker buildx receives it. Move
ADDITIONAL_PLUGS into the step env and reference it as a quoted shell variable
in the Build images run block, using the existing docker buildx build invocation
and keeping PLUGIN_RESOLVED_HASH unchanged.

In `@docker/prod.Dockerfile`:
- Around line 37-40: The builder-stage plugin install step in
docker/prod.Dockerfile is not using PLUGIN_RESOLVED_HASH, so changes to the
resolved plugin set do not invalidate the install_plugins.py layer. Update the
RUN step that invokes install_plugins.py to reference PLUGIN_RESOLVED_HASH
alongside the existing ARGs, using the same pattern as the corresponding
dev.Dockerfile cache-busting fix, so the layer is rebuilt whenever the hash
changes.

---

Nitpick comments:
In @.github/workflows/reusable-test.yml:
- Around line 24-29: The GitHub Container Registry login step uses an action
pinned only by tag, which may violate the repo’s SHA-pin policy. If this
workflow should follow the same enforcement as the other tag-pinned actions,
update the Docker login step in the reusable test workflow to use a fixed commit
SHA for docker/login-action instead of `@v3`, keeping the existing Login to GitHub
Container Registry block and its inputs unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8740dfbc-e1d4-4a9b-896b-a767c37f5131

📥 Commits

Reviewing files that changed from the base of the PR and between c8baced and 9d9a3ea.

📒 Files selected for processing (5)
  • .github/scripts/resolve_plugins.py
  • .github/workflows/deploy.yml
  • .github/workflows/reusable-test.yml
  • docker/dev.Dockerfile
  • docker/prod.Dockerfile

Comment on lines +26 to +42
if ver.startswith("@"):
ref = ver[1:]
git_url = pkg.removeprefix("git+")
try:
out = subprocess.run( # noqa: S603
["git", "ls-remote", git_url, ref], # noqa: S607
check=False,
capture_output=True,
text=True,
timeout=15,
)
sha = out.stdout.split()[0] if out.stdout else ref
except Exception:
sha = ref
resolved.append(f"{pkg}@{sha}")
else:
resolved.append(f"{pkg}{ver}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY'
from pathlib import Path
p = Path('.github/scripts/resolve_plugins.py')
print(p.exists(), p)
print(p.read_text())
PY

Repository: ohcnetwork/care

Length of output: 1395


🏁 Script executed:

python3 - <<'PY'
from pathlib import Path
p = Path('.github/scripts/resolve_plugins.py')
print(p.exists(), p)
print(p.read_text())
PY

Repository: ohcnetwork/care

Length of output: 1395


Avoid falling back to the raw ref here. When git ls-remote fails or returns nothing, sha becomes ref, so @main hashes like pkg@main and stops busting the cache when the branch moves. Surface the failure instead of quietly pretending it resolved.

Also, p.get("version", "@main") does not cover "version": null; that still reaches ver.startswith("@") and crashes. Use p.get("version") or "@main" if null should mean default.

🧰 Tools
🪛 ast-grep (0.44.0)

[error] 29-35: Command coming from incoming request
Context: subprocess.run( # noqa: S603
["git", "ls-remote", git_url, ref], # noqa: S607
check=False,
capture_output=True,
text=True,
timeout=15,
)
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(subprocess-from-request)

🪛 Ruff (0.15.18)

[warning] 38-38: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/scripts/resolve_plugins.py around lines 26 - 42, The plugin ref
resolution in resolve_plugins.py is silently falling back to the raw branch/ref
when git ls-remote fails or returns no output, which makes values like pkg@main
stop changing and hides resolution problems; update the logic around the ref
handling to surface the failure instead of assigning sha from ref. Also fix the
version defaulting in the package version lookup so a null version is treated
like missing input by using the same default path as the versionless case, and
make sure the ver.startswith("@") branch in the resolver no longer crashes on
null values.

Source: Linters/SAST tools

Comment on lines 42 to +54
- name: Build images
run: |
CACHE_TO=""
if [[ '${{ github.event_name }}' == 'push' ]]; then
CACHE_TO="--cache-to=type=registry,ref=ghcr.io/${{ github.repository }}:$CACHE_TAG,mode=max"
fi
docker buildx build \
--file docker/dev.Dockerfile \
--tag care_local \
--cache-from=type=local,src=${{ runner.temp }}/.buildx-cache \
--cache-to=type=local,dest=${{ runner.temp }}/.buildx-cache-new,mode=max \
--build-arg PLUGIN_RESOLVED_HASH=${{ steps.plugin-hash.outputs.resolved_hash }} \
--build-arg ADDITIONAL_PLUGS=${{ env.ADDITIONAL_PLUGS }} \
--cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:$CACHE_TAG \
$CACHE_TO \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the workflow and related references.
git ls-files .github/workflows/reusable-test.yml .github/workflows/deploy.yml
printf '\n--- reusable-test.yml (relevant section) ---\n'
cat -n .github/workflows/reusable-test.yml | sed -n '1,180p'
printf '\n--- search ADDITIONAL_PLUGS / plugin-hash / CACHE_TAG ---\n'
rg -n "ADDITIONAL_PLUGS|plugin-hash|CACHE_TAG|build-args|build-arg" .github/workflows -S
printf '\n--- deploy.yml relevant section ---\n'
cat -n .github/workflows/deploy.yml | sed -n '1,260p'

Repository: ohcnetwork/care

Length of output: 15953


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where ADDITIONAL_PLUGS is defined and what shape it has.
rg -n "ADDITIONAL_PLUGS" -S .
printf '\n--- files containing plugin hash step ---\n'
rg -n "plugin-hash|resolved_hash" -S .github/workflows .

Repository: ohcnetwork/care

Length of output: 1501


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Probe shell behavior with representative JSON-like ADDITIONAL_PLUGS values.
python3 - <<'PY'
import shlex

samples = [
    '[]',
    '["a"]',
    '["a b","c"]',
    '["x","y z"]',
    '["quote\\"here"]',
]
for s in samples:
    print("SAMPLE:", s)
    cmd = f'echo --build-arg ADDITIONAL_PLUGS={s}'
    print("RAW:", cmd)
    print("SPLIT:", shlex.split(cmd))
    print()
PY

Repository: ohcnetwork/care

Length of output: 782


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the plugin resolver and the workflow context around the build args.
ast-grep outline .github/scripts/resolve_plugins.py --view expanded
printf '\n--- resolve_plugins.py ---\n'
cat -n .github/scripts/resolve_plugins.py

printf '\n--- dev Dockerfile build-arg handling ---\n'
cat -n docker/dev.Dockerfile | sed -n '1,120p'

Repository: ohcnetwork/care

Length of output: 3575


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the resolver to see whether the output is constrained enough to make line 51 harmless.
cat -n .github/scripts/resolve_plugins.py | sed -n '1,220p'

Repository: ohcnetwork/care

Length of output: 1669


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the resolver to see whether the output is constrained enough to make line 51 harmless.
cat -n .github/scripts/resolve_plugins.py | sed -n '1,220p'

Repository: ohcnetwork/care

Length of output: 1669


Quote ADDITIONAL_PLUGS before passing it to docker buildx

${{ env.ADDITIONAL_PLUGS }} is JSON, and dropping it straight into the run script lets the shell strip quotes and split words, so non-empty plugin lists turn into broken --build-arg values. Move it to env: and pass "$ADDITIONAL_PLUGS" instead. PLUGIN_RESOLVED_HASH is already shell-safe, so it can stay as-is.

🧰 Tools
🪛 zizmor (1.26.1)

[info] 51-51: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)


[warning] 52-52: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/reusable-test.yml around lines 42 - 54, The build step in
the reusable workflow is passing ADDITIONAL_PLUGS directly from the GitHub
expression into the shell, which can strip JSON quotes and split the value
before docker buildx receives it. Move ADDITIONAL_PLUGS into the step env and
reference it as a quoted shell variable in the Build images run block, using the
existing docker buildx build invocation and keeping PLUGIN_RESOLVED_HASH
unchanged.

Source: Linters/SAST tools

Comment thread docker/prod.Dockerfile
Comment on lines +37 to 40
ARG PLUGIN_RESOLVED_HASH
ARG ADDITIONAL_PLUGS=""
ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS
RUN python3 $APP_HOME/install_plugins.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Same unused-ARG cache-busting gap as dev.Dockerfile.

PLUGIN_RESOLVED_HASH is declared at Line 37 but never referenced by the install_plugins.py step (Line 40), so a changed hash won't invalidate the builder-stage plugin install layer. Thread the ARG into the RUN so it actually does its job.

🔧 Suggested fix
 ARG PLUGIN_RESOLVED_HASH
 ARG ADDITIONAL_PLUGS=""
 ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS
-RUN python3 $APP_HOME/install_plugins.py
+RUN PLUGIN_RESOLVED_HASH=$PLUGIN_RESOLVED_HASH python3 $APP_HOME/install_plugins.py
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ARG PLUGIN_RESOLVED_HASH
ARG ADDITIONAL_PLUGS=""
ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS
RUN python3 $APP_HOME/install_plugins.py
ARG PLUGIN_RESOLVED_HASH
ARG ADDITIONAL_PLUGS=""
ENV ADDITIONAL_PLUGS=$ADDITIONAL_PLUGS
RUN PLUGIN_RESOLVED_HASH=$PLUGIN_RESOLVED_HASH python3 $APP_HOME/install_plugins.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/prod.Dockerfile` around lines 37 - 40, The builder-stage plugin
install step in docker/prod.Dockerfile is not using PLUGIN_RESOLVED_HASH, so
changes to the resolved plugin set do not invalidate the install_plugins.py
layer. Update the RUN step that invokes install_plugins.py to reference
PLUGIN_RESOLVED_HASH alongside the existing ARGs, using the same pattern as the
corresponding dev.Dockerfile cache-busting fix, so the layer is rebuilt whenever
the hash changes.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.56%. Comparing base (c8baced) to head (9d9a3ea).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3696   +/-   ##
========================================
  Coverage    79.55%   79.56%           
========================================
  Files          479      479           
  Lines        22996    22996           
  Branches      2378     2378           
========================================
+ Hits         18295    18297    +2     
+ Misses        4096     4095    -1     
+ Partials       605      604    -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant