gitops-update: argocd app wait 180s timeout causes false-failure on apps with long PostSync hooks

## Summary

`gitops-update.yml` reports `ArgoCD Sync` failure for downstream apps even when ArgoCD itself completes the sync successfully a short time later. The culprit is the hard-coded 180s timeout on `argocd app wait`, which is shorter than the time some apps take to finish their PostSync hooks on slower / busier clusters.

## Symptom

Workflow runs that legitimately update GitOps end with `❌ ArgoCD Sync (<server>/<env>)` jobs while the corresponding ArgoCD Application transitions to `Synced` + `Healthy` within ~1–2 minutes after the GHA timeout. This produces a steady stream of false-failure notifications for owners of apps with multi-stage sync hooks (PreSync migrations + PostSync init jobs).

## Evidence

Run: https://github.com/LerianStudio/plugin-access-manager/actions/runs/26538953954 (tag `v3.0.0-beta.7`, triggered from `build-auth-init.yml`)

| server/env | GHA conclusion | ArgoCD timeline (UTC, from `argocd app history`) | Current state |
|---|---|---|---|
| firmino/dev | ✅ success | (within timeout) | Synced/Healthy |
| benedita/dev | ✅ success | (within timeout) | Synced/Healthy |
| clotilde/dev | ❌ failure (timeout at `21:22:31Z`) | sync to `5038e5c` started `21:21:09Z`, completed `21:22:38Z` | Synced/Healthy |
| anacleto/dev | ❌ failure (timeout at `21:19:51Z`) | sync to `5038e5c` started `21:17:52Z`, completed `21:20:08Z` | Synced/Healthy |

Failure messages in both failed jobs are identical and benign:

```
Sync Status:  Synced to main (5038e5c)
Health Status: Degraded
Message:      waiting for completion of hook batch/Job/plugin-access-manager-auth-init-user
##[error]Timeout waiting for sync completion of <app>
```

The PreSync `auth-backend-migrations` Job completed in both clusters; the PostSync `auth-init-user` Job (which seeds Casdoor via API) needed slightly more than 180s to finish on clotilde/anacleto. ArgoCD itself converged ~30–90s after the GHA gave up.

## Root cause

`.github/workflows/gitops-update.yml` (v1.30.0):

```yaml
# line 808
argocd app sync "$APP_NAME" ... --async --timeout 180 $PRUNE_FLAG
# line 822
argocd app wait "$APP_NAME" ... --timeout 180
```

The 180s `app wait` timeout is shorter than the worst-case PostSync hook duration for apps like `plugin-access-manager` on slower clusters. There is no input to override it, so callers cannot tune per app/server.

## Proposed solution

1. Expose two new inputs with sensible defaults:
   - `argocd_sync_timeout` (default `180`, applied to `argocd app sync --timeout`)
   - `argocd_wait_timeout` (default `600`, applied to `argocd app wait --timeout`)
2. Raise the default `wait` timeout to **600s** so apps with PostSync seed jobs stop producing false negatives out of the box. The existing 5-attempt retry loop with 30s sleep is already in place for transient sync failures; the new value only affects how long each attempt blocks on health.
3. (Optional) Add a `argocd_skip_wait_on_hooks` input to make `app wait` return as soon as resources are Synced (ignore hook health). Useful for callers that explicitly do not want to gate the workflow on long-running seed jobs.

## Acceptance criteria

- Re-running the linked plugin-access-manager release with the new defaults produces ✅ on all four servers without any change to the downstream apps.
- The new inputs are documented in `docs/gitops-update-workflow.md`.
- Existing callers (e.g. `plugin-br-bank-transfer`, `midaz`, `ungoliant-controller`) continue to work without changes.

## References

- Failed run: https://github.com/LerianStudio/plugin-access-manager/actions/runs/26538953954
- Workflow file: https://github.com/LerianStudio/github-actions-shared-workflows/blob/v1.30.0/.github/workflows/gitops-update.yml (lines 806–823)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gitops-update: argocd app wait 180s timeout causes false-failure on apps with long PostSync hooks #381

Summary

Symptom

Evidence

Root cause

Proposed solution

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

server/env	GHA conclusion	ArgoCD timeline (UTC, from `argocd app history`)	Current state
firmino/dev	✅ success	(within timeout)	Synced/Healthy
benedita/dev	✅ success	(within timeout)	Synced/Healthy
clotilde/dev	❌ failure (timeout at `21:22:31Z`)	sync to `5038e5c` started `21:21:09Z`, completed `21:22:38Z`	Synced/Healthy
anacleto/dev	❌ failure (timeout at `21:19:51Z`)	sync to `5038e5c` started `21:17:52Z`, completed `21:20:08Z`	Synced/Healthy

Uh oh!

gitops-update: argocd app wait 180s timeout causes false-failure on apps with long PostSync hooks #381

Description

Summary

Symptom

Evidence

Root cause

Proposed solution

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions