Skip to content

feat: support multi-pool worker deployments#6

Draft
nishantmunjal7 wants to merge 1 commit into
atlanhq:mainfrom
nishantmunjal7:feat/multi-pool-deployment
Draft

feat: support multi-pool worker deployments#6
nishantmunjal7 wants to merge 1 commit into
atlanhq:mainfrom
nishantmunjal7:feat/multi-pool-deployment

Conversation

@nishantmunjal7

Copy link
Copy Markdown

Problem

Today, 1 TemporalWorkerDeployment (TWD) = 1 Temporal worker deployment = 1 task queue = 1 pod pool. To run different activities on different task queues / node types (e.g. a metastore-write activity on its own queue + on-demand nodes, with everything else on spot), you must create separate TWDs → separate Temporal worker deployments.

But PINNED workflows can only dispatch activities to the deployment they're pinned to. If an activity is routed to a different deployment's task queue, it gets stuck SCHEDULED forever — there is no worker of the pinned build polling that queue.

We need one Temporal worker deployment that spans multiple pools (queue + node-placement combos), versioned together, so every version's pods exist in all pools and a pinned workflow always finds a worker of its build on every queue.

Design

Add an optional spec.pools []PoolSpec. A "pool" is a (task queue + pod placement/sizing) combination.

spec:
  replicas: 3
  template: { ... }            # base pod template (shared code/image)
  workerOptions: { ... }
  pools:                       # OPTIONAL — omit for today's behavior
    - name: spot               # required, unique (DNS-label)
      taskQueue: default-tq    # queue this pool polls (optional)
      nodeSelector: { pool: spot }
    - name: metastore
      taskQueue: metastore-tq
      replicas: 1              # per-pool override of spec.replicas
      nodeSelector: { pool: on-demand }
      tolerations: [ ... ]
      affinity: { ... }
      resources:               # applied to every container in this pool
        requests: { cpu: "2" }

Each PoolSpec carries: name (required, unique — listType=map keyed on name), taskQueue, and per-pool pod placement/sizing: replicas, nodeSelector, affinity, tolerations, resources.

Key invariants

  • One Temporal worker deployment name per TWD (ComputeWorkerDeploymentName, unchanged) — all pools share it.
  • Build ID is per-version, computed from the base template (image/code) and is the same across pools — pools differ only in queue + node placement, not code.
  • For each version we now create one k8s Deployment per pool (instead of one). Each pool Deployment gets TEMPORAL_DEPLOYMENT_NAME = the shared name, TEMPORAL_WORKER_BUILD_ID = the version build, the pool's task queue (TEMPORAL_TASK_QUEUE), and the pool's placement/resources. A temporal.io/pool label makes Deployments findable per (version, pool).
  • Temporal version-management calls (SetCurrentVersion/SetRampingVersion/drain) are unchanged — they operate on the single deployment name. We only fan pods across pools and aggregate readiness/drain across pools.

Reconcile / readiness / status

  • Create plan iterates versions × pools: one create per missing pool of the target version.
  • A version is healthy only when all of its pools' Deployments are healthy (HealthySince = the latest pool's transition time).
  • Delete/scale/connection-drift updates fan across all pool Deployments of a version.
  • mapToStatus groups by version, aggregating pool health into the single per-version status entry.

Backward compatibility (guaranteed)

When spec.pools is empty/absent, the code routes through a single implicit pool (Name == "") that:

  • carries no temporal.io/pool label (selector identical to pre-pools),
  • injects no TEMPORAL_TASK_QUEUE env var,
  • uses the base template's placement/resources and spec.replicas,
  • produces the same Deployment name as ComputeVersionedDeploymentName (pre-pools).

DeploymentState.PoolDeployments(buildID) also falls back to the flat Deployments[buildID] map when the per-pool map is absent, so existing call paths and tests behave identically. There is a unit test asserting the no-pools path is byte-compatible with the legacy constructor.

What's tested

  • go build ./..., go vet ./... — clean.
  • go test ./... — all root-module unit tests pass (planner, k8s, controller/state_mapper, temporal, api).
  • New unit tests:
    • (a) multi-pool produces N Deployments per version, all sharing one deployment name + build ID, each with its own task queue / nodeSelector / replicas / resources (TestNewPoolDeployments_MultiPool).
    • (b) version readiness requires all pools ready; HealthySince is the latest pool's time (TestMapToStatus_VersionReadinessRequiresAllPools).
    • (c) no-pools path is unchanged — single Deployment, no pool label, no task-queue env, name/replicas identical to legacy (TestNewPoolDeployments_NoPoolsBackwardCompatible, TestEffectivePools).
  • Regenerated CRD manifest (helm/.../temporal.io_temporalworkerdeployments.yaml) and zz_generated.deepcopy.go via controller-gen v0.19.0.

Note: the internal/tests integration suite (separate module, envtest + a live Temporal dev server) was not run locally — that environment isn't available here. It compiles/vets clean against the new API. Please run make test-integration in CI/locally.

Needs review / open questions

  1. Drain & delete aggregation across pools. Temporal drains the version (one deployment name), but k8s now has N Deployments per version. This PR deletes/scales each pool Deployment independently once the version's drain criteria are met, and treats a version as gone only when all pool Deployments are gone. The per-pool delete is gated on replicas == 0 per pool, but I did not add a cross-pool "all pools drained AND all scaled to zero before deleting any" barrier. If a pool can keep versioned pollers alive after others are gone, we may want to hold deletion of the whole version until every pool is drained. Please sanity-check whether per-pool independent deletion is safe, or whether we need an explicit all-pools barrier. (Marked inline as the main area to review.)

  2. Worker side must honor TEMPORAL_TASK_QUEUE. The controller now injects TEMPORAL_TASK_QUEUE per pool, but the worker process must actually read it and poll that queue (and register the versioned deployment). The demo worker / SDK wiring is out of scope here. Confirm the worker bootstrap reads this env var.

  3. Pool task-queue → test/gate workflows. Gate (test) workflows are started per task queue discovered from Temporal. With multiple pools each polling a distinct queue, the existing gate-workflow logic should naturally fan out, but I did not add explicit multi-pool gate tests. Worth verifying gate rollouts behave with >1 queue.

  4. Removing/renaming a pool. A pool removed from spec.pools leaves orphan Deployments labeled with the old pool name. They will scale via the spec.replicas fallback but are not explicitly garbage-collected by name. Decide whether removed-pool Deployments should be actively deleted.

  5. Controller identity / rollout interactions. The version-management calls are unchanged, so the existing last-modifier/identity guarding still applies at the deployment level. No new identity surface was added, but confirm multi-pool doesn't change assumptions in the ownership runbook.

  6. unsafeCustomBuildID + pools. Pod-template-drift detection (stable build ID path) now reapplies pool placement on update. Verify the drift hash (ComputePodTemplateSpecHash, computed from the base template) is still the right signal when pools differ only in placement — placement overrides are applied after hashing, so a pure pool-placement change does not bump the build ID (intended), but also won't trigger a rolling update on its own. Confirm that's the desired behavior.

🤖 Generated with Claude Code

Add optional spec.pools so one Temporal worker deployment can span
multiple (task queue + pod placement) pools, versioned together. Every
version's pods exist in all pools sharing one deployment name + build ID,
so a PINNED workflow always finds a worker of its build on every queue.

When spec.pools is empty the controller behaves exactly as before (a
single implicit pool using the base template).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@snykgituser

snykgituser commented Jun 19, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants