fix(api): bound Celery worker concurrency to a configurable default#11075
fix(api): bound Celery worker concurrency to a configurable default#11075b-abderrahmane wants to merge 3 commits into
Conversation
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
cesararroba
left a comment
There was a problem hiding this comment.
Thanks for the contribution!
| ### 🐞 Fixed | ||
|
|
||
| - Celery worker concurrency now defaults to 4 (configurable via `DJANGO_CELERY_WORKER_CONCURRENCY`) so memory is predictable regardless of host CPU count [(#10968)](https://github.com/prowler-cloud/prowler/issues/10968) | ||
|
|
There was a problem hiding this comment.
Please:
- Move this to the next release as API 1.27.0 was released as part of Prowler v5.26.0.
- Include it in
### 🚀 Added
Without an explicit `worker_concurrency` setting, Celery falls back to
`os.cpu_count()` of the host. On Kubernetes nodes with many CPUs (16+)
the worker container spawns one prefork child per CPU, each loading the
full SDK working set, and routinely OOMKills under the typical 4 GiB
limit even when idle.
Add `CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)`
in `config/settings/celery.py`. The Celery app already loads Django
settings via `config_from_object("django.conf:settings", namespace="CELERY")`,
so the setting flows through to `app.conf.worker_concurrency` and the
prefork pool size with no entrypoint changes. Default of 4 matches the
per-process SDK memory footprint in typical deployments; operators size
up via `DJANGO_CELERY_WORKER_CONCURRENCY`.
Closes prowler-cloud#10968
56f188c to
07f472f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11075 +/- ##
=======================================
Coverage 93.97% 93.97%
=======================================
Files 237 237
Lines 34829 34873 +44
=======================================
+ Hits 32729 32772 +43
- Misses 2100 2101 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
AdriiiPRodri
left a comment
There was a problem hiding this comment.
Thanks for the PR. The configurability is useful, but this currently changes default runtime behavior. Please address the inline comments below before merge.
|
|
||
| CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5) | ||
|
|
||
| CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) |
There was a problem hiding this comment.
This introduces a behavior change by forcing concurrency to 4 when the env var is not set.
Could we keep the current default behavior (Celery auto-detected concurrency) and only set CELERY_WORKER_CONCURRENCY when DJANGO_CELERY_WORKER_CONCURRENCY is explicitly provided?
| # Decide whether to allow Django manage database table partitions | ||
| DJANGO_MANAGE_DB_PARTITIONS=[True|False] | ||
| DJANGO_CELERY_DEADLOCK_ATTEMPTS=5 | ||
| DJANGO_CELERY_WORKER_CONCURRENCY=4 |
There was a problem hiding this comment.
Suggestion: make this variable clearly optional (e.g., commented out with an explanation).
Keeping it active in .env.example makes 4 look like the default expected runtime behavior, which changes current behavior.
| ### 🚀 Added | ||
|
|
||
| - `okta` provider support [(#11184)](https://github.com/prowler-cloud/prowler/pull/11184) | ||
| - `DJANGO_CELERY_WORKER_CONCURRENCY` setting (default `4`) bounds the Celery worker prefork pool so memory is predictable regardless of host CPU count [(#11075)](https://github.com/prowler-cloud/prowler/pull/11075) |
There was a problem hiding this comment.
The changelog text says "default 4", but that implies a global behavior change.
Could we reword this to: optional setting that bounds concurrency when explicitly configured?
|
|
||
| CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5) | ||
|
|
||
| CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) |
There was a problem hiding this comment.
Could we add tests for the new setting behavior?
- env var missing/empty -> concurrency is not forced
- valid numeric value -> concurrency is applied
- invalid value -> explicit error path
Dismissing duplicate review entry to keep a single clean review state.
AdriiiPRodri
left a comment
There was a problem hiding this comment.
Please address the inline comments before merge.
Restore Celery's default (os.cpu_count()) when DJANGO_CELERY_WORKER_CONCURRENCY is unset; only apply the override when the env var is explicitly provided.
Good catch @AdriiiPRodri |
Context
Fix #10968.
Without an explicit
worker_concurrencysetting, Celery falls back toos.cpu_count()of the host. On Kubernetes nodes with many CPUs (16+) the worker container spawns one prefork child per CPU, each loading the full SDK working set, and routinely OOMKills under typical 4 GiB pod limits even when idle. The existing--max-tasks-per-child 1recycles each child after one task — it bounds per-task leaks but not the number of concurrent slots.Description
Adds
CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)inconfig/settings/celery.py. The Celery app already loads Django settings viaconfig_from_object("django.conf:settings", namespace="CELERY"), so the setting flows through toapp.conf.worker_concurrencyand the prefork pool size with no entrypoint changes.A default of
4matches the per-process SDK memory footprint observed in typical deployments; operators size up viaDJANGO_CELERY_WORKER_CONCURRENCY(also added toapi/.env.examplenext to the siblingDJANGO_CELERY_DEADLOCK_ATTEMPTS).Changes:
api/src/backend/config/settings/celery.py— one Django setting, matching the existingenv.int("DJANGO_*", default=...)convention used byDJANGO_CELERY_DEADLOCK_ATTEMPTS.api/.env.example— document the new env var alongside its siblings.api/CHANGELOG.md— entry under[1.29.0] (Prowler UNRELEASED)→🚀 Added.Steps to review
CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5)(existing)CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)(this PR)Evidence
Validated end-to-end on a 20-core AKS node with the worker image rebuilt from this branch (
prowler-api, dockerized fromapi/). Worker Deployment patched to use the test image; same pod spec (requests.memory=1Gi,limits.memory=4Gi).Before vs after
4)settings.CELERY_WORKER_CONCURRENCY4app.conf.worker_concurrencyos.cpu_count()=204204~3.4 GiB~880 MiB4and8(OOMKilled)0Process tree (after) — 1 parent + 4 ForkPool children:
Wiring proof — Django setting → Celery config:
Override demonstration —
DJANGO_CELERY_WORKER_CONCURRENCY=2propagates through:Process count goes to 1 parent + 2 children (with brief overlaps from
--max-tasks-per-child 1recycling, expected).Checklist
env.int()(the same pattern already exercised forDJANGO_CELERY_DEADLOCK_ATTEMPTSwith no dedicated test inapi/src/backend/api/tests/test_celery_settings.py); validated in a real cluster as documented above.[1.29.0]→🚀 Added.API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.