Skip to content
This repository was archived by the owner on Jun 3, 2026. It is now read-only.

Add v2 Temporal durable workflows#208

Merged
ishaanxgupta merged 18 commits into
mainfrom
codex/v2-temporal-durable-workflows
Jun 1, 2026
Merged

Add v2 Temporal durable workflows#208
ishaanxgupta merged 18 commits into
mainfrom
codex/v2-temporal-durable-workflows

Conversation

@ishaanxgupta

Copy link
Copy Markdown
Contributor

Summary

  • Move durable v2 memory routes into src/api/routes/v2/ while leaving existing v1 memory/scanner route contracts intact.
  • Add Temporal-backed v2 workflows, activities, worker entrypoint, enqueue/status/retry/cancel/dead-letter APIs, and scanner durable scan routes.
  • Extend Mongo durable job records with workflow ids, progress, cancellation, retry/dead-letter metadata, plus Temporal local compose/config.

Tests

  • python -B -c "import ast, pathlib; paths=[...]; [ast.parse(pathlib.Path(p).read_text(encoding='utf-8'), filename=p) for p in paths]; print('ast ok')"
  • ENVIRONMENT=test .venv\\Scripts\\python.exe -m pytest tests\\api\\test_memory_versioning.py tests\\test_durable_jobs.py -> 10 passed
  • ENVIRONMENT=test .venv\\Scripts\\python.exe -c "from src.api.app import create_app; app=create_app(); ..." confirmed /v2/memory, /v2/scanner, and /v2/jobs routes register

Notes

  • temporalio>=1.10.0 is declared, but the local venv does not currently have the SDK installed, so full worker smoke testing still needs a dependency sync plus Temporal server.

@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor
Fails
🚫

🔐 This PR modifies sensitive files: src/config/settings.py. These require review by a core maintainer (@ishaanxgupta or @ved015) before merging.

Warnings
⚠️

📦 This PR changes 7473 lines (additions + deletions). Large PRs are harder to review thoroughly — consider splitting it.

⚠️

📦 pyproject.toml or requirements.txt was modified. Make sure uv.lock is updated (uv lock) and the security audit passes.

Messages
📖

✅ Targeting main. Please squash commits before merging to keep the git history clean.

Generated by 🚫 dangerJS against 2ea7650

@github-actions

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 546e86e
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

Comment thread src/api/routes/v2/jobs.py Fixed
Comment thread src/api/routes/v2/memory.py Fixed
Comment thread src/api/routes/v2/scanner.py Fixed
Comment thread src/api/routes/v2/activities.py Fixed
@github-actions

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]

Auto-generated by API Schema Diff workflow

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Version 2 of the API routes, integrating them with Temporal workflows for durable execution. It adds a local Temporal service setup, registers new v2 routers for memory, jobs, and scanner operations, and implements the corresponding Temporal workflows, activities, and worker. The review feedback highlights several critical robustness and performance issues, including a timezone mismatch TypeError and a potential None float conversion in scanner.py, a performance bottleneck from connecting to the Temporal client on every request, missing error handling for branch tip retrieval, unhandled RPCErrors during workflow cancellation, and thread-safety concerns when running Playwright's sync API via asyncio.to_thread.

Comment thread src/api/routes/v2/scanner.py
Comment thread src/api/routes/v2/temporal_client.py Outdated
Comment on lines +27 to +38
async def get_temporal_client():
try:
from temporalio.client import Client
except Exception as exc: # pragma: no cover - depends on optional SDK import
raise TemporalUnavailable(
"temporalio is not installed. Install project dependencies first."
) from exc

return await Client.connect(
settings.temporal_address,
namespace=settings.temporal_namespace,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performance Bottleneck: Creating a new Temporal Client connection via Client.connect on every single API request (e.g., starting or cancelling a workflow) is highly inefficient. It introduces significant network latency, overhead, and risks port exhaustion under load.

The Temporal Client is thread-safe and designed to be shared as a singleton across the entire application. You should cache and reuse the client instance.

_temporal_client = None


async def get_temporal_client():
    global _temporal_client
    try:
        from temporalio.client import Client
    except Exception as exc:  # pragma: no cover - depends on optional SDK import
        raise TemporalUnavailable(
            "temporalio is not installed. Install project dependencies first."
        ) from exc

    if _temporal_client is None:
        _temporal_client = await Client.connect(
            settings.temporal_address,
            namespace=settings.temporal_namespace,
        )
    return _temporal_client

Comment thread src/api/routes/v2/scanner.py
Comment thread src/api/routes/v2/temporal_client.py Outdated
Comment thread src/api/routes/v2/activities.py Outdated
Comment thread src/api/routes/v2/scanner.py Outdated
@ishaanxgupta ishaanxgupta marked this pull request as ready for review May 31, 2026 16:11
@ishaanxgupta ishaanxgupta requested a review from ved015 as a code owner May 31, 2026 16:11
@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR moves v2 durable memory and scanner routes from src/api/routes/memory.py into a dedicated src/api/routes/v2/ package and replaces the previous background-thread job scheduler with Temporal-backed workflows, activities, a worker entrypoint, and enqueue/retry/cancel/dead-letter APIs. It also extends the durable job Mongo schema with workflow_id, run_id, attempt_count, progress, cancelled_at, and adds encrypted secret storage for GitHub PATs.

  • Temporal integration: New workflows.py (five workflow classes), activities.py, worker.py, and temporal_client.py implement the full Temporal job lifecycle; Temporal is added to docker-compose.local.yml.
  • State machine hardening: CANCELLED is added to TERMINAL_STATUSES; mark_running, mark_cancelled, mark_dead_letter, and mark_succeeded all carry $nin TERMINAL_STATUSES DB-level guards; reserve_workflow_start prevents duplicate workflow starts.
  • PAT security: Scanner credentials are encrypted with a dedicated Fernet key (XMEM_SECRET_ENCRYPTION_KEY), stored in a separate durable_job_secrets collection, and referenced only by a github_credential_ref token in the Temporal payload.

Confidence Score: 4/5

The main state-machine hardening and PAT-security concerns from the previous round are addressed; remaining findings are style and quality nits.

The previous iteration left several open correctness issues (stranded jobs on workflow start failure, missing cancel guards, JWT key reuse for PAT encryption, scanner records stuck in running). This revision addresses all of them with DB-level guards and try/except wrapping at every workflow start site. The Temporal SDK and worker are not yet exercised end-to-end (deferred per PR notes), so the workflow execution paths remain untested in CI.

src/api/routes/v2/temporal_client.py (fragile exception string-matching in cancel_job_workflow) and src/api/routes/v2/workflows.py (sequential domain activities in MemoryIngestWorkflow)

Important Files Changed

Filename Overview
src/api/routes/v2/temporal_client.py Adds singleton Temporal client with double-checked locking; cancel uses fragile string matching on exception messages instead of Temporal exception types.
src/api/routes/v2/workflows.py Five workflow classes (memory ingest, batch ingest, scrape, scanner scan, scanner phase2); domain activities in MemoryIngestWorkflow run sequentially rather than in parallel.
src/api/routes/v2/scanner.py start_scan_v2 fully wraps start_job_workflow in try/except and resets the scanner code-store on failure; scan_status_v2 uses org_id query param while start_scan_v2 exposes the field as org in responses.
src/jobs/durable.py Adds CANCELLED to TERMINAL_STATUSES, $nin guards on all terminal-state writes, two-step mark_running using attempt_count/retry_count split, reserve_workflow_start, and reset_for_retry with clear_workflow flag.
src/api/routes/v2/secrets.py Fernet encryption for GitHub PATs; _fernet() raises RuntimeError instead of falling back to JWT secret, removing the key-coupling concern from the previous review.
src/api/routes/v2/jobs.py cancel_job now has status guard (queued/running only), try/except around cancel_job_workflow, and calls _mark_scanner_job_cancelled; retry_job wraps start_job_workflow in try/except with mark_failed on failure.
src/api/routes/v2/activities.py All Temporal activities defined; scanner_scan_activity now calls _mark_scanner_scan_failed on PAT resolution failure before re-raising, fixing the stuck-running scanner record issue.
src/api/routes/v2/memory.py _enqueue_and_start helper uses reserve_workflow_start to prevent duplicate workflow starts; WorkflowStartFailed returns the durable job handle in the 503 response so the caller can poll status.
tests/test_v2_review_fixes.py New test file covering PAT round-trip, cancel state transitions, phase status derivation, and transient cancel error handling.

Fix All in Cursor Fix All in Codex Fix All in Claude Code

Reviews (16): Last reviewed commit: "Return job handles on workflow start fai..." | Re-trigger Greptile

Comment thread src/api/routes/v2/jobs.py
Comment thread src/api/routes/v2/jobs.py
Comment thread src/jobs/durable.py
Comment thread src/api/routes/v2/jobs.py Outdated
@github-actions

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 5feac00
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]

Auto-generated by API Schema Diff workflow

@github-actions

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 99a6956
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]

Auto-generated by API Schema Diff workflow

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 552c640
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]

Auto-generated by API Schema Diff workflow

Copy link
Copy Markdown
Contributor Author

@greptileai

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit f3aad60
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]

Auto-generated by API Schema Diff workflow

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 83a07e3
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]

Auto-generated by API Schema Diff workflow

Copy link
Copy Markdown
Contributor Author

@greptileai

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 370d2aa
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]

Auto-generated by API Schema Diff workflow

Copy link
Copy Markdown
Contributor Author

@greptileai

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

✅ Staging Deployment Report

Item Value
Branch codex/v2-temporal-durable-workflows
Commit 9eff4fe
Environment Staging
Health http://3.6.255.148:8001/health
API Docs http://3.6.255.148:8001/docs
Smoke Tests success

🟢 Staging is live and healthy! Test your changes at the staging URL above.

Ready to ship? Comment /promote on this PR to merge to main and deploy to production.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

🔍 API Schema Diff

---REPORT---

🔄 Modified

  • 🟡 CHANGED: root['paths']['/v2/memory/scrape']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/batch-ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest']['post']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/scrape/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/jobs/{job_id}/status']['get']['tags'][0]
  • 🟡 CHANGED: root['paths']['/v2/memory/ingest/{job_id}/status']['get']['tags'][0]

Auto-generated by API Schema Diff workflow

@ishaanxgupta ishaanxgupta merged commit eb10dd2 into main Jun 1, 2026
16 of 17 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant