Bug: HorizontalPodAutoscaler continuously rescales due to controller interference
Summary
When using WorkerResourceTemplate to configure HPA for a TemporalWorkerDeployment, the controller continuously resets the underlying Deployment's replica count to 1, directly conflicting with HPA scaling decisions. This results in a cascading rescale loop where the HPA scales up to minReplicas every ~10-15 seconds, only to be reset by the controller milliseconds later.
Severity
HIGH - HPA is completely non-functional for TemporalWorkerDeployment resources.
Environment
- temporal-worker-controller chart version:
0.24.1
- temporal-worker-controller container image version:
v1.5.2
- Kubernetes version: 1.33
- Container platform: EKS with Karpenter
- Reproducer: Minimal example provided below
Problem Description
Observed Behavior
- Deployment is locked at 1 replica despite HPA configuration specifying
minReplicas: 3
- HPA generates continuous rescale events — 11,087+ "SuccessfulRescale" events in 24 hours with message: "Current number of replicas below Spec.MinReplicas"
- Controller logs show active interference:
- Repeated
"msg":"scaling deployment" entries every 10-15 seconds
- Each entry explicitly sets
"replicas":1
- Followed by deployment error on ObjectReference handling
- HPA status confirms the conflict:
status:
currentReplicas: 1 # What's running
desiredReplicas: 3 # What HPA wants
Expected Behavior
When a WorkerResourceTemplate with HPA is created for a TemporalWorkerDeployment, the controller should:
- Recognize that HPA is managing the Deployment
- NOT modify the
spec.replicas field (allow HPA full control)
- Allow HPA to scale between
minReplicas and maxReplicas based on metrics
- Maintain steady state at the desired replica count
Configuration & Reproduction
Minimal TemporalWorkerDeployment
apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
name: temporal-worker-example
namespace: default
spec:
# NOTE: No replicas field — HPA should manage scaling
rollout:
strategy: AllAtOnce
sunset:
deleteDelay: 2h
scaledownDelay: 15m
template:
metadata:
labels:
app: temporal-worker-example
spec:
containers:
- name: temporal-worker-example
image: myrepo/myapp:v1.0.0
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
serviceAccountName: default
workerOptions:
connectionRef:
name: temporal-connection
temporalNamespace: default.local
WorkerResourceTemplate with HPA (Desired Scaling Config)
apiVersion: temporal.io/v1alpha1
kind: WorkerResourceTemplate
metadata:
name: temporal-worker-example-hpa
namespace: default
spec:
temporalWorkerDeploymentRef:
name: temporal-worker-example
template:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef: {} # Controller should populate this
minReplicas: 3
maxReplicas: 10
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 2
periodSeconds: 60
- type: Percent
value: 50
periodSeconds: 60
selectPolicy: Max
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
Steps to Reproduce
- Apply the above TemporalWorkerDeployment (without
spec.replicas)
- Apply the WorkerResourceTemplate with HPA configuration
- Observe the underlying Deployment created by the controller
- Expected: Deployment scales to 3+ replicas based on CPU/memory
- Actual: Deployment stays at 1 replica, HPA generates continuous SuccessfulRescale events
Validation
Check the generated HPA status:
kubectl get hpa -n default -l temporal.io/deployment-name=temporal-worker-example -o yaml
You'll see:
status:
currentReplicas: 1
desiredReplicas: 3 # or higher based on metrics
conditions:
- reason: SucceededRescale
message: the HPA controller was able to update the target scale to 3
But the actual Deployment will show only 1 replica running.
Controller Log Evidence
From temporal-worker-controller logs, the interference is clear:
Pattern: Controller resetting replicas
{
"level": "info",
"msg": "scaling deployment",
"controller": "temporalworkerdeployment",
"TemporalWorkerDeployment": {
"name": "temporal-worker-zili",
"namespace": "byx-originacao"
},
"deployment": {
"apiVersion": "apps/v1",
"kind": "Deployment"
},
"deploymentError": "got runtime.Object without object metadata: &ObjectReference{...}",
"replicas": 1
}
This repeats every 10-15 seconds, indicating the controller is:
- Regularly reconciling the TemporalWorkerDeployment
- Explicitly setting
replicas: 1 on the underlying Deployment
- Failing due to an ObjectReference deserialization error (secondary issue)
- But the
replicas: 1 field is still propagated despite the error
Secondary errors in logs
{
"level": "info",
"msg": "unknown field \"status.conditions\"",
"controller": "temporalworkerdeployment"
}
This suggests the controller has issues with status field management, which may be related to the deployment error above.
Root Cause Analysis
The controller's reconciliation logic appears to:
- Always set
spec.replicas on the underlying Deployment (defaulting to 1 or from TemporalWorkerDeployment spec)
- Not check if a WorkerResourceTemplate with HPA is managing this Deployment
- Override HPA decisions on every reconciliation cycle
This creates an irreconcilable conflict:
- HPA: "Target 3+ replicas based on metrics" → scales Deployment
- Controller: "TemporalWorkerDeployment says 1 replica" → resets Deployment
- HPA: Detects replicas < minReplicas → scales again
- Loop: Repeats indefinitely
Questions for the Maintainers
- Should the controller implement HPA-awareness (check for WorkerResourceTemplate presence and skip replica management)?
- What is the intended workflow for autoscaling TemporalWorkerDeployments?
- Can the ObjectReference deserialization error be addressed (secondary issue)?
Possible Solutions
Option 1: Controller Detects HPA and Defers Replica Management (Recommended)
if workerResourceTemplateExists(ctx, client, deployment) {
// WorkerResourceTemplate is managing this deployment
// Don't touch spec.replicas - let HPA manage it
skip(deployment.Spec.Replicas)
} else {
// Set replicas normally from TemporalWorkerDeployment spec
deployment.Spec.Replicas = ptr.To(int32(twd.Spec.Replicas))
}
Option 2: Explicit Validation
Reject TemporalWorkerDeployments that have both:
- No explicit
spec.replicas AND
- An associated WorkerResourceTemplate with HPA
With a clear error: "Cannot use HPA-managed WorkerResourceTemplate without explicit replicas configuration"
Option 3: User-facing Workaround Documentation
Document that HPA + TemporalWorkerDeployment currently requires:
- Explicit
spec.replicas set to minReplicas value
- HPA configured to respect this as the baseline
- Understanding that controller will manage replicas, not HPA (defeats the purpose)
Workarounds Attempted (All Failed)
- ❌ Omitting
spec.replicas: Controller defaults to 1, HPA can't override
- ❌ Setting
spec.replicas: 1: Controller locks it, HPA can't scale
- ❌ Setting
spec.replicas: 3: Controller overrides HPA, no dynamic scaling
- ❌ Setting
spec.replicas: null: Controller treats as 1
Conclusion: There is currently no working workaround. HPA + TemporalWorkerDeployment are fundamentally broken together.
Impact
- Affected Users: Anyone attempting to use WorkerResourceTemplate with HPA for autoscaling
- Scope: All TemporalWorkerDeployment resources configured with HPA
- Duration: Deployment becomes unscalable; stuck at default replica count
- Resource Waste: Continuous HPA reconciliation cycles create unnecessary API load and logging overhead
Additional Context
The HPA itself is functioning correctly:
- It successfully creates the HorizontalPodAutoscaler resource
- It correctly identifies when replicas are below minReplicas
- It successfully patches the Deployment to scale up
- But: Within milliseconds, the controller re-patches it back to 1
Bug: HorizontalPodAutoscaler continuously rescales due to controller interference
Summary
When using
WorkerResourceTemplateto configure HPA for aTemporalWorkerDeployment, the controller continuously resets the underlying Deployment's replica count to 1, directly conflicting with HPA scaling decisions. This results in a cascading rescale loop where the HPA scales up tominReplicasevery ~10-15 seconds, only to be reset by the controller milliseconds later.Severity
HIGH - HPA is completely non-functional for TemporalWorkerDeployment resources.
Environment
0.24.1v1.5.2Problem Description
Observed Behavior
minReplicas: 3"msg":"scaling deployment"entries every 10-15 seconds"replicas":1Expected Behavior
When a
WorkerResourceTemplatewith HPA is created for aTemporalWorkerDeployment, the controller should:spec.replicasfield (allow HPA full control)minReplicasandmaxReplicasbased on metricsConfiguration & Reproduction
Minimal TemporalWorkerDeployment
WorkerResourceTemplate with HPA (Desired Scaling Config)
Steps to Reproduce
spec.replicas)Validation
Check the generated HPA status:
You'll see:
But the actual Deployment will show only 1 replica running.
Controller Log Evidence
From
temporal-worker-controllerlogs, the interference is clear:Pattern: Controller resetting replicas
{ "level": "info", "msg": "scaling deployment", "controller": "temporalworkerdeployment", "TemporalWorkerDeployment": { "name": "temporal-worker-zili", "namespace": "byx-originacao" }, "deployment": { "apiVersion": "apps/v1", "kind": "Deployment" }, "deploymentError": "got runtime.Object without object metadata: &ObjectReference{...}", "replicas": 1 }This repeats every 10-15 seconds, indicating the controller is:
replicas: 1on the underlying Deploymentreplicas: 1field is still propagated despite the errorSecondary errors in logs
{ "level": "info", "msg": "unknown field \"status.conditions\"", "controller": "temporalworkerdeployment" }This suggests the controller has issues with status field management, which may be related to the deployment error above.
Root Cause Analysis
The controller's reconciliation logic appears to:
spec.replicason the underlying Deployment (defaulting to 1 or from TemporalWorkerDeployment spec)This creates an irreconcilable conflict:
Questions for the Maintainers
Possible Solutions
Option 1: Controller Detects HPA and Defers Replica Management (Recommended)
Option 2: Explicit Validation
Reject TemporalWorkerDeployments that have both:
spec.replicasANDWith a clear error: "Cannot use HPA-managed WorkerResourceTemplate without explicit replicas configuration"
Option 3: User-facing Workaround Documentation
Document that HPA + TemporalWorkerDeployment currently requires:
spec.replicasset tominReplicasvalueWorkarounds Attempted (All Failed)
spec.replicas: Controller defaults to 1, HPA can't overridespec.replicas: 1: Controller locks it, HPA can't scalespec.replicas: 3: Controller overrides HPA, no dynamic scalingspec.replicas: null: Controller treats as 1Conclusion: There is currently no working workaround. HPA + TemporalWorkerDeployment are fundamentally broken together.
Impact
Additional Context
The HPA itself is functioning correctly: