[Bug] HorizontalPodAutoscaler continuously rescales due to controller interference

# Bug: HorizontalPodAutoscaler continuously rescales due to controller interference

## Summary
When using `WorkerResourceTemplate` to configure HPA for a `TemporalWorkerDeployment`, the controller continuously resets the underlying Deployment's replica count to 1, directly conflicting with HPA scaling decisions. This results in a cascading rescale loop where the HPA scales up to `minReplicas` every ~10-15 seconds, only to be reset by the controller milliseconds later.

## Severity
**HIGH** - HPA is completely non-functional for TemporalWorkerDeployment resources.

---

## Environment
- **temporal-worker-controller chart version**: `0.24.1`
- **temporal-worker-controller container image version**: `v1.5.2`
- **Kubernetes version**: 1.33
- **Container platform**: EKS with Karpenter
- **Reproducer**: Minimal example provided below

---

## Problem Description

### Observed Behavior
1. **Deployment is locked at 1 replica** despite HPA configuration specifying `minReplicas: 3`
2. **HPA generates continuous rescale events** — 11,087+ "SuccessfulRescale" events in 24 hours with message: "Current number of replicas below Spec.MinReplicas"
3. **Controller logs show active interference**:
   - Repeated `"msg":"scaling deployment"` entries every 10-15 seconds
   - Each entry explicitly sets `"replicas":1`
   - Followed by deployment error on ObjectReference handling
4. **HPA status confirms the conflict**:
   ```yaml
   status:
     currentReplicas: 1       # What's running
     desiredReplicas: 3       # What HPA wants
   ```

### Expected Behavior
When a `WorkerResourceTemplate` with HPA is created for a `TemporalWorkerDeployment`, the controller should:
1. Recognize that HPA is managing the Deployment
2. **NOT modify** the `spec.replicas` field (allow HPA full control)
3. Allow HPA to scale between `minReplicas` and `maxReplicas` based on metrics
4. Maintain steady state at the desired replica count

---

## Configuration & Reproduction

### Minimal TemporalWorkerDeployment
```yaml
apiVersion: temporal.io/v1alpha1
kind: TemporalWorkerDeployment
metadata:
  name: temporal-worker-example
  namespace: default
spec:
  # NOTE: No replicas field — HPA should manage scaling
  rollout:
    strategy: AllAtOnce
  sunset:
    deleteDelay: 2h
    scaledownDelay: 15m
  template:
    metadata:
      labels:
        app: temporal-worker-example
    spec:
      containers:
        - name: temporal-worker-example
          image: myrepo/myapp:v1.0.0
          resources:
            limits:
              cpu: 1000m
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 512Mi
      serviceAccountName: default
  workerOptions:
    connectionRef:
      name: temporal-connection
    temporalNamespace: default.local
```

### WorkerResourceTemplate with HPA (Desired Scaling Config)
```yaml
apiVersion: temporal.io/v1alpha1
kind: WorkerResourceTemplate
metadata:
  name: temporal-worker-example-hpa
  namespace: default
spec:
  temporalWorkerDeploymentRef:
    name: temporal-worker-example
  template:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    spec:
      scaleTargetRef: {}  # Controller should populate this
      minReplicas: 3
      maxReplicas: 10
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 30
          policies:
            - type: Pods
              value: 2
              periodSeconds: 60
            - type: Percent
              value: 50
              periodSeconds: 60
          selectPolicy: Max
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 70
        - type: Resource
          resource:
            name: memory
            target:
              type: Utilization
              averageUtilization: 75
```

### Steps to Reproduce
1. Apply the above TemporalWorkerDeployment (without `spec.replicas`)
2. Apply the WorkerResourceTemplate with HPA configuration
3. Observe the underlying Deployment created by the controller
4. **Expected**: Deployment scales to 3+ replicas based on CPU/memory
5. **Actual**: Deployment stays at 1 replica, HPA generates continuous SuccessfulRescale events

### Validation
Check the generated HPA status:
```bash
kubectl get hpa -n default -l temporal.io/deployment-name=temporal-worker-example -o yaml
```

You'll see:
```yaml
status:
  currentReplicas: 1
  desiredReplicas: 3  # or higher based on metrics
  conditions:
    - reason: SucceededRescale
      message: the HPA controller was able to update the target scale to 3
```

But the actual Deployment will show only 1 replica running.

---

## Controller Log Evidence

From `temporal-worker-controller` logs, the interference is clear:

### Pattern: Controller resetting replicas
```json
{
  "level": "info",
  "msg": "scaling deployment",
  "controller": "temporalworkerdeployment",
  "TemporalWorkerDeployment": {
    "name": "temporal-worker-zili",
    "namespace": "byx-originacao"
  },
  "deployment": {
    "apiVersion": "apps/v1",
    "kind": "Deployment"
  },
  "deploymentError": "got runtime.Object without object metadata: &ObjectReference{...}",
  "replicas": 1
}
```

This repeats every 10-15 seconds, indicating the controller is:
1. Regularly reconciling the TemporalWorkerDeployment
2. Explicitly setting `replicas: 1` on the underlying Deployment
3. Failing due to an ObjectReference deserialization error (secondary issue)
4. But the `replicas: 1` field is still propagated despite the error

### Secondary errors in logs
```json
{
  "level": "info",
  "msg": "unknown field \"status.conditions\"",
  "controller": "temporalworkerdeployment"
}
```

This suggests the controller has issues with status field management, which may be related to the deployment error above.

---

## Root Cause Analysis

The controller's reconciliation logic appears to:
1. **Always set `spec.replicas` on the underlying Deployment** (defaulting to 1 or from TemporalWorkerDeployment spec)
2. **Not check if a WorkerResourceTemplate with HPA is managing this Deployment**
3. **Override HPA decisions on every reconciliation cycle**

This creates an irreconcilable conflict:
- **HPA**: "Target 3+ replicas based on metrics" → scales Deployment
- **Controller**: "TemporalWorkerDeployment says 1 replica" → resets Deployment
- **HPA**: Detects replicas < minReplicas → scales again
- **Loop**: Repeats indefinitely

---

## Questions for the Maintainers

1. Should the controller implement HPA-awareness (check for WorkerResourceTemplate presence and skip replica management)?
2. What is the intended workflow for autoscaling TemporalWorkerDeployments?
3. Can the ObjectReference deserialization error be addressed (secondary issue)?

---

## Possible Solutions

### Option 1: Controller Detects HPA and Defers Replica Management (Recommended)
```go
if workerResourceTemplateExists(ctx, client, deployment) {
    // WorkerResourceTemplate is managing this deployment
    // Don't touch spec.replicas - let HPA manage it
    skip(deployment.Spec.Replicas)
} else {
    // Set replicas normally from TemporalWorkerDeployment spec
    deployment.Spec.Replicas = ptr.To(int32(twd.Spec.Replicas))
}
```

### Option 2: Explicit Validation
Reject TemporalWorkerDeployments that have both:
- No explicit `spec.replicas` AND
- An associated WorkerResourceTemplate with HPA

With a clear error: "Cannot use HPA-managed WorkerResourceTemplate without explicit replicas configuration"

### Option 3: User-facing Workaround Documentation
Document that HPA + TemporalWorkerDeployment currently requires:
- Explicit `spec.replicas` set to `minReplicas` value
- HPA configured to respect this as the baseline
- Understanding that controller will manage replicas, not HPA (defeats the purpose)

---

## Workarounds Attempted (All Failed)

1. ❌ **Omitting `spec.replicas`**: Controller defaults to 1, HPA can't override
2. ❌ **Setting `spec.replicas: 1`**: Controller locks it, HPA can't scale
3. ❌ **Setting `spec.replicas: 3`**: Controller overrides HPA, no dynamic scaling
4. ❌ **Setting `spec.replicas: null`**: Controller treats as 1

**Conclusion**: There is currently no working workaround. HPA + TemporalWorkerDeployment are fundamentally broken together.

---

## Impact

- **Affected Users**: Anyone attempting to use WorkerResourceTemplate with HPA for autoscaling
- **Scope**: All TemporalWorkerDeployment resources configured with HPA
- **Duration**: Deployment becomes unscalable; stuck at default replica count
- **Resource Waste**: Continuous HPA reconciliation cycles create unnecessary API load and logging overhead

---

## Additional Context

The HPA itself is functioning correctly:
- It successfully creates the HorizontalPodAutoscaler resource
- It correctly identifies when replicas are below minReplicas
- It successfully patches the Deployment to scale up
- **But**: Within milliseconds, the controller re-patches it back to 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] HorizontalPodAutoscaler continuously rescales due to controller interference #350

Bug: HorizontalPodAutoscaler continuously rescales due to controller interference

Summary

Severity

Environment

Problem Description

Observed Behavior

Expected Behavior

Configuration & Reproduction

Minimal TemporalWorkerDeployment

WorkerResourceTemplate with HPA (Desired Scaling Config)

Steps to Reproduce

Validation

Controller Log Evidence

Pattern: Controller resetting replicas

Secondary errors in logs

Root Cause Analysis

Questions for the Maintainers

Possible Solutions

Option 1: Controller Detects HPA and Defers Replica Management (Recommended)

Option 2: Explicit Validation

Option 3: User-facing Workaround Documentation

Workarounds Attempted (All Failed)

Impact

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] HorizontalPodAutoscaler continuously rescales due to controller interference #350

Description

Bug: HorizontalPodAutoscaler continuously rescales due to controller interference

Summary

Severity

Environment

Problem Description

Observed Behavior

Expected Behavior

Configuration & Reproduction

Minimal TemporalWorkerDeployment

WorkerResourceTemplate with HPA (Desired Scaling Config)

Steps to Reproduce

Validation

Controller Log Evidence

Pattern: Controller resetting replicas

Secondary errors in logs

Root Cause Analysis

Questions for the Maintainers

Possible Solutions

Option 1: Controller Detects HPA and Defers Replica Management (Recommended)

Option 2: Explicit Validation

Option 3: User-facing Workaround Documentation

Workarounds Attempted (All Failed)

Impact

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions