Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions src/apps/competitions/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -795,11 +795,26 @@ def update_phase_statuses():

@app.task(queue='site-worker')
def submission_status_cleanup():
submissions = Submission.objects.filter(status=Submission.RUNNING, has_children=False).select_related('phase', 'parent')
# Recover submissions stuck in any non-terminal state
non_terminal_statuses = [
Submission.SUBMITTED,
Submission.PREPARING,
Submission.RUNNING,
Submission.SCORING,
]
submissions = Submission.objects.filter(
status__in=non_terminal_statuses,
has_children=False,
).select_related('phase', 'parent')

for sub in submissions:
# Check if the submission has been running for 24 hours longer than execution_time_limit
if sub.started_when < now() - timedelta(milliseconds=(3600000 * 24) + sub.phase.execution_time_limit):
# Use started_when for Running submissions, created_when as fallback for others
reference_time = sub.started_when if sub.started_when else sub.created_when
deadline = reference_time + timedelta(
milliseconds=(3600000 * 24) + sub.phase.execution_time_limit
)

if now() > deadline:
if sub.parent is not None:
sub.parent.cancel(status=Submission.FAILED)
else:
Expand Down
45 changes: 45 additions & 0 deletions src/apps/competitions/tests/test_submissions.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,51 @@ def test_submissions_are_cancelled_if_running_24_hours_past_execution_time_limit
assert self.submission_pass.status == Submission.RUNNING
assert self.submission_fail.status == Submission.FAILED

def test_cleanup_recovers_stuck_submitted_submissions(self):
"""Submissions stuck in Submitted should be recovered by cleanup."""
sub = self.make_submission()
sub.status = Submission.SUBMITTED
sub.created_when = timezone.now() - timedelta(hours=48)
sub.save(ignore_submission_limit=True)

submission_status_cleanup()
sub.refresh_from_db()
assert sub.status == Submission.FAILED

def test_cleanup_recovers_stuck_preparing_submissions(self):
"""Submissions stuck in Preparing should be recovered by cleanup."""
sub = self.make_submission()
sub.status = Submission.PREPARING
sub.created_when = timezone.now() - timedelta(hours=48)
sub.save(ignore_submission_limit=True)

submission_status_cleanup()
sub.refresh_from_db()
assert sub.status == Submission.FAILED

def test_cleanup_recovers_stuck_scoring_submissions(self):
"""Submissions stuck in Scoring should be recovered by cleanup."""
sub = self.make_submission()
sub.status = Submission.SCORING
sub.created_when = timezone.now() - timedelta(hours=48)
sub.save(ignore_submission_limit=True)

submission_status_cleanup()
sub.refresh_from_db()
assert sub.status == Submission.FAILED

def test_cleanup_does_not_touch_recent_non_terminal_submissions(self):
"""Recent submissions in non-terminal states should NOT be cleaned up."""
for status in [Submission.SUBMITTED, Submission.PREPARING, Submission.SCORING]:
sub = self.make_submission()
sub.status = status
sub.created_when = timezone.now()
sub.save(ignore_submission_limit=True)

submission_status_cleanup()
sub.refresh_from_db()
assert sub.status == status, f"Recent {status} submission should not be cleaned up"

def test_cancelling_parent_submission_cancels_all_children(self):
self.parent_submission = self.make_submission()
self.parent_submission.has_children = True
Expand Down
140 changes: 140 additions & 0 deletions tests/k6/README_cleanup_tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# Stuck Submission Cleanup Integration Tests

## Overview

These tests validate that the `submission_status_cleanup()` task correctly recovers submissions stuck in **any** non-terminal state, not just `Running`.

### Problem

The original cleanup task only recovered submissions stuck in `Running` state. Submissions that never reached `Running` (stuck in `Submitted`, `Preparing`, or `Scoring`) would hang forever.

### Solution

The fix extends cleanup to cover all non-terminal states:
- `Submitted`
- `Preparing`
- `Running`
- `Scoring`

Uses `created_when` as fallback when `started_when` is null (for submissions that never reached Running).

## Test Suite

### Files

- **`run_cleanup_test.sh`** — End-to-end orchestrator
- Submits N submissions
- Stops compute_worker to simulate stuck state
- Ages submissions (sets created_when to 48h ago)
- Runs cleanup task
- Verifies all submissions recovered

- **`test_stuck_submissions.js`** — K6 script for recovery verification
- Creates submissions and polls until cleanup runs
- Verifies all reach terminal state

- **`test_cleanup_conservation.js`** — K6 conservation harness
- Submits N, asserts N terminal
- Conservation rate must be 100%

## Running the Tests

### Prerequisites

- Docker Compose environment running
- K6 installed (`brew install k6`)
- `codabench` user created with password `codabench`
- Competition with at least one phase exists

### Run Full Test Suite

```bash
cd tests/k6
./run_cleanup_test.sh [PHASE_ID]
```

If `PHASE_ID` is not provided, the script auto-detects the first available phase.

### Environment Variables

- `BASE_URL` (default: `http://localhost`)
- `USERNAME` (default: `codabench`)
- `PASSWORD` (default: `codabench`)
- `SUBMISSION_COUNT` (default: `3`)

### Example

```bash
BASE_URL=http://localhost:8000 SUBMISSION_COUNT=5 ./run_cleanup_test.sh
```

## Expected Output

### Success

```
═══════════════════════════════════════════════════════
Stuck Submission Cleanup — Integration Test
═══════════════════════════════════════════════════════

▸ Step 1: Checking prerequisites...
✓ Docker Compose is running
✓ K6 is installed

▸ Step 2: Auto-detecting phase...
✓ Found phase ID: 123

▸ Step 6: Running cleanup and verifying recovery
══ Results ══
Created: 3
Recovered: 3
Still stuck: 0
Recovery rate: 100.0%

▸ Step 8: Running K6 conservation harness
Conservation rate: 100%
Lost submissions: 0

✓ PASS — All stuck submissions recovered
```

### Failure

```
❌ FAIL — 2 submissions still stuck

This means submission_status_cleanup() is not covering
all non-terminal states (bug not fixed).
```

## What This Tests

### Directly Tested

- ✅ Cleanup recovers submissions stuck in `Submitted`
- ✅ Cleanup recovers submissions stuck in `Preparing`
- ✅ Cleanup recovers submissions stuck in `Running`
- ✅ Cleanup recovers submissions stuck in `Scoring`
- ✅ Recent submissions (< 24h) are NOT cleaned up
- ✅ Conservation: all submissions reach terminal state

### Integration with CI/CD

```yaml
# .github/workflows/integration-tests.yml
- name: Run cleanup integration test
run: |
docker compose up -d
docker compose exec django python manage.py generate_data
tests/k6/run_cleanup_test.sh
env:
SUBMISSION_COUNT: 5
```

## Related Tests

Unit tests in `src/apps/competitions/tests/test_submissions.py`:
- `test_cleanup_recovers_stuck_submitted_submissions`
- `test_cleanup_recovers_stuck_preparing_submissions`
- `test_cleanup_recovers_stuck_scoring_submissions`
- `test_cleanup_does_not_touch_recent_non_terminal_submissions`
Loading