feat: support multi-pool worker deployments#6
Draft
nishantmunjal7 wants to merge 1 commit into
Draft
Conversation
Add optional spec.pools so one Temporal worker deployment can span multiple (task queue + pod placement) pools, versioned together. Every version's pods exist in all pools sharing one deployment name + build ID, so a PINNED workflow always finds a worker of its build on every queue. When spec.pools is empty the controller behaves exactly as before (a single implicit pool using the base template). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Today, 1
TemporalWorkerDeployment(TWD) = 1 Temporal worker deployment = 1 task queue = 1 pod pool. To run different activities on different task queues / node types (e.g. a metastore-write activity on its own queue + on-demand nodes, with everything else on spot), you must create separate TWDs → separate Temporal worker deployments.But PINNED workflows can only dispatch activities to the deployment they're pinned to. If an activity is routed to a different deployment's task queue, it gets stuck SCHEDULED forever — there is no worker of the pinned build polling that queue.
We need one Temporal worker deployment that spans multiple pools (queue + node-placement combos), versioned together, so every version's pods exist in all pools and a pinned workflow always finds a worker of its build on every queue.
Design
Add an optional
spec.pools []PoolSpec. A "pool" is a (task queue + pod placement/sizing) combination.Each
PoolSpeccarries:name(required, unique —listType=mapkeyed on name),taskQueue, and per-pool pod placement/sizing:replicas,nodeSelector,affinity,tolerations,resources.Key invariants
ComputeWorkerDeploymentName, unchanged) — all pools share it.TEMPORAL_DEPLOYMENT_NAME= the shared name,TEMPORAL_WORKER_BUILD_ID= the version build, the pool's task queue (TEMPORAL_TASK_QUEUE), and the pool's placement/resources. Atemporal.io/poollabel makes Deployments findable per (version, pool).SetCurrentVersion/SetRampingVersion/drain) are unchanged — they operate on the single deployment name. We only fan pods across pools and aggregate readiness/drain across pools.Reconcile / readiness / status
HealthySince= the latest pool's transition time).mapToStatusgroups by version, aggregating pool health into the single per-version status entry.Backward compatibility (guaranteed)
When
spec.poolsis empty/absent, the code routes through a single implicit pool (Name == "") that:temporal.io/poollabel (selector identical to pre-pools),TEMPORAL_TASK_QUEUEenv var,spec.replicas,ComputeVersionedDeploymentName(pre-pools).DeploymentState.PoolDeployments(buildID)also falls back to the flatDeployments[buildID]map when the per-pool map is absent, so existing call paths and tests behave identically. There is a unit test asserting the no-pools path is byte-compatible with the legacy constructor.What's tested
go build ./...,go vet ./...— clean.go test ./...— all root-module unit tests pass (planner, k8s, controller/state_mapper, temporal, api).TestNewPoolDeployments_MultiPool).HealthySinceis the latest pool's time (TestMapToStatus_VersionReadinessRequiresAllPools).TestNewPoolDeployments_NoPoolsBackwardCompatible,TestEffectivePools).helm/.../temporal.io_temporalworkerdeployments.yaml) andzz_generated.deepcopy.govia controller-gen v0.19.0.Needs review / open questions
Drain & delete aggregation across pools. Temporal drains the version (one deployment name), but k8s now has N Deployments per version. This PR deletes/scales each pool Deployment independently once the version's drain criteria are met, and treats a version as gone only when all pool Deployments are gone. The per-pool delete is gated on
replicas == 0per pool, but I did not add a cross-pool "all pools drained AND all scaled to zero before deleting any" barrier. If a pool can keep versioned pollers alive after others are gone, we may want to hold deletion of the whole version until every pool is drained. Please sanity-check whether per-pool independent deletion is safe, or whether we need an explicit all-pools barrier. (Marked inline as the main area to review.)Worker side must honor
TEMPORAL_TASK_QUEUE. The controller now injectsTEMPORAL_TASK_QUEUEper pool, but the worker process must actually read it and poll that queue (and register the versioned deployment). The demo worker / SDK wiring is out of scope here. Confirm the worker bootstrap reads this env var.Pool task-queue → test/gate workflows. Gate (test) workflows are started per task queue discovered from Temporal. With multiple pools each polling a distinct queue, the existing gate-workflow logic should naturally fan out, but I did not add explicit multi-pool gate tests. Worth verifying gate rollouts behave with >1 queue.
Removing/renaming a pool. A pool removed from
spec.poolsleaves orphan Deployments labeled with the old pool name. They will scale via thespec.replicasfallback but are not explicitly garbage-collected by name. Decide whether removed-pool Deployments should be actively deleted.Controller identity / rollout interactions. The version-management calls are unchanged, so the existing last-modifier/identity guarding still applies at the deployment level. No new identity surface was added, but confirm multi-pool doesn't change assumptions in the ownership runbook.
unsafeCustomBuildID+ pools. Pod-template-drift detection (stable build ID path) now reapplies pool placement on update. Verify the drift hash (ComputePodTemplateSpecHash, computed from the base template) is still the right signal when pools differ only in placement — placement overrides are applied after hashing, so a pure pool-placement change does not bump the build ID (intended), but also won't trigger a rolling update on its own. Confirm that's the desired behavior.🤖 Generated with Claude Code