Fix submission cleanup: recover all non-terminal states, not just Running#2414
Open
hanane-ca wants to merge 1 commit into
Open
Fix submission cleanup: recover all non-terminal states, not just Running#2414hanane-ca wants to merge 1 commit into
hanane-ca wants to merge 1 commit into
Conversation
…ning Problem: - submission_status_cleanup() only recovered Running submissions - Submissions stuck in Submitted, Preparing, or Scoring would hang forever - No fallback for submissions that never reached Running (started_when null) Solution: - Extend cleanup to cover all non-terminal states: Submitted, Preparing, Running, Scoring - Use created_when as fallback when started_when is null - All non-terminal submissions now recovered after 24h + execution_time_limit Changes: - src/apps/competitions/tasks.py: * Extended non_terminal_statuses list to include all states * Added created_when fallback logic for reference_time * Cleaned up comments per Codabench guidelines - src/apps/competitions/tests/test_submissions.py: * Added 4 unit tests covering Submitted, Preparing, Scoring states * Added negative test for recent non-terminal submissions * Cleaned up docstrings (removed M3 references) - tests/k6/: * run_cleanup_test.sh: End-to-end orchestrator * test_stuck_submissions.js: K6 recovery verification * test_cleanup_conservation.js: K6 conservation harness * README_cleanup_tests.md: Test documentation * All files cleaned up (removed M3 references per guidelines) Tests validate: - All non-terminal states recovered after deadline - Recent submissions NOT cleaned up - 100% conservation rate Fixes codalab#2413
fdc11a7 to
1e6bba4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reviewers
@codalab/maintainers
Description
Fixes a bug where submissions stuck in non-terminal states (Submitted, Preparing, Scoring) would hang forever instead of being recovered by the cleanup task.
Problem: The
submission_status_cleanup()task only recovered submissions stuck inRunningstate. Submissions that never reachedRunning(stuck inSubmitted,Preparing, orScoring) would never be cleaned up.Root cause:
Runningstatusstarted_when(those that never reached Running)Solution:
Submitted,Preparing,Running,Scoringcreated_whenas fallback whenstarted_whenis nullCode changes:
src/apps/competitions/tasks.py:reference_time = started_when if started_when else created_whensrc/apps/competitions/tests/test_submissions.py:tests/k6/:Issues this PR resolves
Fixes #2413
Background
This bug was discovered during the EEG Foundation Challenge incident analysis where submissions were observed stuck in non-Running states for extended periods with no recovery mechanism.
Checklist for hand testing
docker compose exec django python manage.py shell -c "from competitions.tasks import submission_status_cleanup; submission_status_cleanup()"Relevant files for testing
Integration test suite in
tests/k6/:run_cleanup_test.sh— End-to-end orchestratortest_stuck_submissions.js— K6 recovery verificationtest_cleanup_conservation.js— K6 conservation harnessREADME_cleanup_tests.md— Test documentationRun tests:
cd tests/k6 ./run_cleanup_test.shChecklist