Skip to content

build: cosign retry resilience — add jitter and raise max-attempts default #450

Description

@bedatty

Split from #421 (Fix 1 resilience items). The 2.2.0-beta.6 incident's attempt #1 failed because cosign signing hit a transient Rekor 404 getLogEntryByUuidNotFound across all 3 retry attempts. Rekor intermittency can last several minutes, so the current retry policy is insufficient.

Current state (develop)

  • build.yml inputs: cosign_max_attempts default 3, cosign_initial_delay default 5.
  • src/security/cosign-sign/action.yml: exponential backoff (delay ×3 per failed attempt), no jitter.

Proposed

  • Add jitter to the cosign retry delay (randomized component) to avoid thundering-herd when multiple jobs hit Rekor simultaneously.
  • Review/raise the cosign_max_attempts default (e.g. 3 → 5) and consider a higher backoff ceiling, to ride out multi-minute Rekor outages.

Scope notes

Related: #421

Metadata

Metadata

Assignees

Labels

bugSomething is not working as expectedtriageNeeds initial assessment by the DevOps team

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions