Skip to content

fix(api): bound Celery worker concurrency to a configurable default#11075

Open
b-abderrahmane wants to merge 3 commits into
prowler-cloud:masterfrom
b-abderrahmane:fix/celery-worker-concurrency-default
Open

fix(api): bound Celery worker concurrency to a configurable default#11075
b-abderrahmane wants to merge 3 commits into
prowler-cloud:masterfrom
b-abderrahmane:fix/celery-worker-concurrency-default

Conversation

@b-abderrahmane
Copy link
Copy Markdown

@b-abderrahmane b-abderrahmane commented May 7, 2026

Context

Fix #10968.

Without an explicit worker_concurrency setting, Celery falls back to os.cpu_count() of the host. On Kubernetes nodes with many CPUs (16+) the worker container spawns one prefork child per CPU, each loading the full SDK working set, and routinely OOMKills under typical 4 GiB pod limits even when idle. The existing --max-tasks-per-child 1 recycles each child after one task — it bounds per-task leaks but not the number of concurrent slots.

Description

Adds CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) in config/settings/celery.py. The Celery app already loads Django settings via config_from_object("django.conf:settings", namespace="CELERY"), so the setting flows through to app.conf.worker_concurrency and the prefork pool size with no entrypoint changes.

A default of 4 matches the per-process SDK memory footprint observed in typical deployments; operators size up via DJANGO_CELERY_WORKER_CONCURRENCY (also added to api/.env.example next to the sibling DJANGO_CELERY_DEADLOCK_ATTEMPTS).

Changes:

  • api/src/backend/config/settings/celery.py — one Django setting, matching the existing env.int("DJANGO_*", default=...) convention used by DJANGO_CELERY_DEADLOCK_ATTEMPTS.
  • api/.env.example — document the new env var alongside its siblings.
  • api/CHANGELOG.md — entry under [1.29.0] (Prowler UNRELEASED)🚀 Added.

Steps to review

  1. Read the 1-line settings change and the changelog/env-example additions — three files, +7 lines.
  2. Confirm the setting name/namespace is consistent with existing siblings:
    • CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5) (existing)
    • CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4) (this PR)
  3. Optionally reproduce the validation below in any K8s deployment.

Evidence

Validated end-to-end on a 20-core AKS node with the worker image rebuilt from this branch (prowler-api, dockerized from api/). Worker Deployment patched to use the test image; same pod spec (requests.memory=1Gi, limits.memory=4Gi).

Before vs after

Metric Before (master, no setting) After (this PR, default 4)
Django settings.CELERY_WORKER_CONCURRENCY (unset → falls through) 4
app.conf.worker_concurrency os.cpu_count() = 20 4
Prefork children per pod 20 4
Idle resident memory ~3.4 GiB ~880 MiB
Restart count (over 26h soak) 4 and 8 (OOMKilled) 0

Process tree (after) — 1 parent + 4 ForkPool children:

$ kubectl exec prowler-worker-7c56d9dd6d-4lcdt -- ls /proc/[0-9]*/comm | grep -E celery
pid=10  python -m celery -A config.celery worker -n c5...
pid=38  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-1
pid=39  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-2
pid=40  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-3
pid=41  python -m celery -A config.celery worker -n c5...   # ForkPoolWorker-4

Wiring proof — Django setting → Celery config:

$ kubectl exec prowler-worker-... -- manage.py shell -c "
    from django.conf import settings; from config.celery import celery_app
    print(settings.CELERY_WORKER_CONCURRENCY)
    print(celery_app.conf.worker_concurrency)
  "
4
4

Override demonstrationDJANGO_CELERY_WORKER_CONCURRENCY=2 propagates through:

$ kubectl set env deploy/prowler-worker DJANGO_CELERY_WORKER_CONCURRENCY=2
$ kubectl exec <new-pod> -- printenv DJANGO_CELERY_WORKER_CONCURRENCY
2
$ kubectl exec <new-pod> -- manage.py shell -c "..."
settings.CELERY_WORKER_CONCURRENCY=2
celery_app.conf.worker_concurrency=2
$ kubectl top pod <new-pod>
NAME                              CPU(cores)   MEMORY(bytes)
prowler-worker-5cb6bb6bc4-2b9q7   521m         564Mi          # was 3.4 GiB

Process count goes to 1 parent + 2 children (with brief overlaps from --max-tasks-per-child 1 recycling, expected).

Checklist

  • Review if the code is being covered by tests.
    • The change is a single Django settings line wrapping env.int() (the same pattern already exercised for DJANGO_CELERY_DEADLOCK_ATTEMPTS with no dedicated test in api/src/backend/api/tests/test_celery_settings.py); validated in a real cluster as documented above.
  • Review if code is being documented following pyguide §3.8.
  • Review if backport is needed.
  • Review if is needed to change the Readme.md — N/A.
  • Ensure new entries are added to CHANGELOG.md — added under [1.29.0]🚀 Added.

API

  • All issue/task requirements work as expected on the API — see Evidence above.
  • Endpoint response output (if applicable) — N/A (worker-side change, no endpoints touched).
  • EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable) — N/A.
  • Performance test results (if applicable) — included above (memory + process-count).
  • Any other relevant evidence of the implementation (if applicable) — included above.
  • Verify if API specs need to be regenerated — N/A.
  • Check if version updates are required — N/A (no dependency or spec changes).
  • Ensure new entries are added to CHANGELOG.md.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@b-abderrahmane b-abderrahmane requested a review from a team as a code owner May 7, 2026 19:41
@github-actions github-actions Bot added component/api community Opened by the Community labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

cesararroba
cesararroba previously approved these changes May 13, 2026
Copy link
Copy Markdown
Contributor

@cesararroba cesararroba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Comment thread api/CHANGELOG.md
Comment on lines +13 to +16
### 🐞 Fixed

- Celery worker concurrency now defaults to 4 (configurable via `DJANGO_CELERY_WORKER_CONCURRENCY`) so memory is predictable regardless of host CPU count [(#10968)](https://github.com/prowler-cloud/prowler/issues/10968)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please:

  • Move this to the next release as API 1.27.0 was released as part of Prowler v5.26.0.
  • Include it in ### 🚀 Added

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfagoagas It should be good now.

Without an explicit `worker_concurrency` setting, Celery falls back to
`os.cpu_count()` of the host. On Kubernetes nodes with many CPUs (16+)
the worker container spawns one prefork child per CPU, each loading the
full SDK working set, and routinely OOMKills under the typical 4 GiB
limit even when idle.

Add `CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)`
in `config/settings/celery.py`. The Celery app already loads Django
settings via `config_from_object("django.conf:settings", namespace="CELERY")`,
so the setting flows through to `app.conf.worker_concurrency` and the
prefork pool size with no entrypoint changes. Default of 4 matches the
per-process SDK memory footprint in typical deployments; operators size
up via `DJANGO_CELERY_WORKER_CONCURRENCY`.

Closes prowler-cloud#10968
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.97%. Comparing base (7d03bc5) to head (8a78b42).
⚠️ Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #11075   +/-   ##
=======================================
  Coverage   93.97%   93.97%           
=======================================
  Files         237      237           
  Lines       34829    34873   +44     
=======================================
+ Hits        32729    32772   +43     
- Misses       2100     2101    +1     
Flag Coverage Δ
api 93.97% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
prowler ∅ <ø> (∅)
api 93.97% <100.00%> (+<0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@AdriiiPRodri AdriiiPRodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The configurability is useful, but this currently changes default runtime behavior. Please address the inline comments below before merge.


CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5)

CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a behavior change by forcing concurrency to 4 when the env var is not set.
Could we keep the current default behavior (Celery auto-detected concurrency) and only set CELERY_WORKER_CONCURRENCY when DJANGO_CELERY_WORKER_CONCURRENCY is explicitly provided?

Comment thread api/.env.example Outdated
# Decide whether to allow Django manage database table partitions
DJANGO_MANAGE_DB_PARTITIONS=[True|False]
DJANGO_CELERY_DEADLOCK_ATTEMPTS=5
DJANGO_CELERY_WORKER_CONCURRENCY=4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: make this variable clearly optional (e.g., commented out with an explanation).
Keeping it active in .env.example makes 4 look like the default expected runtime behavior, which changes current behavior.

Comment thread api/CHANGELOG.md Outdated
### 🚀 Added

- `okta` provider support [(#11184)](https://github.com/prowler-cloud/prowler/pull/11184)
- `DJANGO_CELERY_WORKER_CONCURRENCY` setting (default `4`) bounds the Celery worker prefork pool so memory is predictable regardless of host CPU count [(#11075)](https://github.com/prowler-cloud/prowler/pull/11075)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog text says "default 4", but that implies a global behavior change.
Could we reword this to: optional setting that bounds concurrency when explicitly configured?


CELERY_DEADLOCK_ATTEMPTS = env.int("DJANGO_CELERY_DEADLOCK_ATTEMPTS", default=5)

CELERY_WORKER_CONCURRENCY = env.int("DJANGO_CELERY_WORKER_CONCURRENCY", default=4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add tests for the new setting behavior?

  1. env var missing/empty -> concurrency is not forced
  2. valid numeric value -> concurrency is applied
  3. invalid value -> explicit error path

@AdriiiPRodri AdriiiPRodri dismissed their stale review May 22, 2026 10:25

Dismissing duplicate review entry to keep a single clean review state.

Copy link
Copy Markdown
Contributor

@AdriiiPRodri AdriiiPRodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the inline comments before merge.

Restore Celery's default (os.cpu_count()) when DJANGO_CELERY_WORKER_CONCURRENCY
is unset; only apply the override when the env var is explicitly provided.
@b-abderrahmane
Copy link
Copy Markdown
Author

Please address the inline comments before merge.

Good catch @AdriiiPRodri
Restored default behavior and added tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Opened by the Community component/api

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Celery workers OOM on hosts with many CPUs — --concurrency is unbounded by default

4 participants