feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE) by jedzill4 · Pull Request #85 · AymurAI/backend

jedzill4 · 2026-06-17T05:41:48Z

Summary

Replaces the WhisperLiveKit WebSocket ASR integration with coro, an OpenAI-compatible ASR + speaker-diarization server. aymurai now transcribes via the official openai SDK over SSE (stream=True), and coro runs as an isolated host process via uv tool/uvx (no dependency conflicts). The public API, DB schema, ASRParagraph/ASRDocument, and the anonymizer consumer are unchanged.

Changes

Client: aymurai/audio/asr_client.py rewritten (186→62 lines) — transcribe_audio_bytes(payload, filename, content_type) streams to coro's /v1/audio/transcriptions and parses the transcript.text.done frame into list[CoroSegment]. Maps openai APIError/APIConnectionError → RuntimeError.
Schemas: deleted aymurai/api/meta/asr/websocket.py (all WLKMessage*); added aymurai/api/meta/asr/coro.py with the CoroSegment boundary model + _parse_hhmmss. Folded TranscriptionItem into ASRParagraph.
Router/settings: get_transcribe_base_url + CoroSegment → ASRParagraph mapping; TRANSCRIBE_WS_URI → TRANSCRIBE_BASE_URL (+ TRANSCRIBE_API_KEY).
Deps: added openai>=1.65.2 (capped <2 by marker-pdf); removed librosa (only the old client used it).
Infra/docs: removed the whisperlivekit docker-compose service; added scripts/run-coro.sh (cpu/gpu, pins uvx --python 3.12) and scripts/smoke_coro.py; README section.

Test Plan

ruff format --check + ruff check clean on changed files
pyrefly check — 0 errors
pytest test/audio/test_asr_client.py test/api/endpoints/routers/asr — 11 passed (client error-mapping + endpoint + validation)
anonymizer audio consumer test passes (ASRDocument/ASRParagraph unchanged)
End-to-end smoke test (scripts/smoke_coro.py cpu): started coro (parakeet + NeMo diarization), transcribed a Spanish sample over SSE via both the raw OpenAI SDK and aymurai's client — identical 5-segment, diarized output, HTTP 200
Integration test test/integration/test_asr_pipeline.py requires a live coro server (set TRANSCRIBE_BASE_URL); excluded from normal runs

Notes

coro runs separately: ./scripts/run-coro.sh [cpu|gpu], then set TRANSCRIBE_BASE_URL=http://localhost:8000/v1. Requires host ffmpeg.
docker-compose.yml also includes incidental YAML formatting normalization already present on the branch.

Summary by Sourcery

Replace the WebSocket-based WhisperLiveKit ASR integration with a coro-based SSE/OpenAI SDK transcription flow while preserving the public ASR API and document schema.

New Features:

Add coro-based ASR client that streams audio to an OpenAI-compatible /v1/audio/transcriptions endpoint over SSE and returns speaker-attributed segments.
Introduce CoroSegment metadata model and helper time parsing to represent coro transcription segments.
Add scripts to run the coro ASR server in CPU or GPU mode and a smoke test script to validate end-to-end SSE transcription via both the OpenAI SDK and the internal client.

Enhancements:

Update ASR transcription endpoint to consume coro segments, map them into ASRParagraphs, and use a base URL configuration instead of WebSocket URIs.
Inline the ASRParagraph schema instead of inheriting from TranscriptionItem and add robust HH:MM:SS/ISO8601 time parsing for start/end fields.
Simplify error handling by mapping transcription RuntimeErrors directly to UpstreamServiceError at the API layer.

Build:

Add openai as a runtime dependency and remove unused librosa from the project dependencies.

Deployment:

Document and configure the external coro ASR server, including environment variables for TRANSCRIBE_BASE_URL and spill directory, and remove the legacy whisperlivekit service from docker-compose.

Documentation:

Extend the README with instructions for running the coro ASR server, configuring the API connection, and running the end-to-end smoke test.

Tests:

Add unit tests for the coro-based ASR client behavior and error handling, and update ASR endpoint and integration tests to use TRANSCRIBE_BASE_URL instead of WebSocket URIs.
Add a smoke test script to verify the full coro ASR pipeline via SSE and the internal client.

Chores:

Normalize docker-compose YAML formatting for commands and healthchecks.

…st-asyncio)

ASR now runs as a host process via scripts/run-coro.sh; .env points at TRANSCRIBE_BASE_URL. Includes pre-existing compose YAML formatting normalization already present in the working tree on this branch.

sourcery-ai · 2026-06-17T05:41:55Z

Reviewer's Guide

Replace the WhisperLiveKit WebSocket-based ASR integration with a coro-based OpenAI SSE transcription flow, introduce a CoroSegment schema and ASRParagraph refactor, wire the HTTP/SSE client into the ASR router and settings, adjust tests and dependencies, and add scripts/docs for running and smoke-testing coro.

Sequence diagram for coro-based SSE transcription flow

sequenceDiagram
    participant Client
    participant ASRRouter as ASR_transcribe_endpoint
    participant ASRService as _transcribe_audio_bytes_with_error_handling
    participant ASRClient as transcribe_audio_bytes
    participant OpenAIClient as AsyncOpenAI
    participant Coro as coro_server

    Client->>ASRRouter: POST /asr/transcribe (UploadFile)
    ASRRouter->>ASRService: _transcribe_audio_bytes_with_error_handling(data, filename, content_type)
    ASRService->>ASRClient: transcribe_audio_bytes(payload, filename, content_type)
    ASRClient->>OpenAIClient: AsyncOpenAI(base_url=TRANSCRIBE_BASE_URL)
    OpenAIClient->>Coro: audio.transcriptions.create(stream=True)
    loop SSE stream
        Coro-->>OpenAIClient: transcript.text.delta
    end
    Coro-->>OpenAIClient: transcript.text.done
    OpenAIClient-->>ASRClient: done event
    ASRClient->>ASRClient: json.loads(event.text) -> list[CoroSegment]
    ASRClient-->>ASRService: list[CoroSegment]
    ASRService->>ASRService: map CoroSegment -> ASRParagraph
    ASRService-->>ASRRouter: list[ASRParagraph]
    ASRRouter-->>Client: ASRDocument(document=list[ASRParagraph])

File-Level Changes

Change	Details	Files
Replace WebSocket WhisperLiveKit client with coro/OpenAI SSE-based async transcription client.	Remove librosa/websockets-based streaming logic and WebSocket message parsing/status handling. Introduce AsyncOpenAI-based client configured via TRANSCRIBE_BASE_URL/TRANSCRIBE_API_KEY and DONE_EVENT_TYPE constant. Implement streaming transcription to /v1/audio/transcriptions with stream=True and parse transcript.text.done into CoroSegment instances. Map OpenAI APIError/APIConnectionError to RuntimeError and raise when no done frame is received.	`aymurai/audio/asr_client.py`
Introduce coro-specific ASR schema and shared time parsing, and simplify ASRParagraph.	Add CoroSegment Pydantic model representing speaker-attributed segments from coro done frames, ignoring extra fields. Implement _parse_hhmmss to support HH:MM:SS, ISO 8601 PT#H#M#S, and numeric seconds as timedeltas. Refactor ASRParagraph into a standalone Pydantic model with speaker_no, optional speaker_name, start/end timedeltas parsed via _parse_hhmmss, and computed paragraph_id. Remove obsolete WhisperLiveKit WebSocket message schema module.	`aymurai/api/meta/asr/coro.py` `aymurai/meta/api_interfaces.py` `aymurai/api/meta/asr/websocket.py`
Wire coro-based transcription into the ASR router and settings, replacing WebSocket configuration.	Replace get_transcribe_ws_uri with get_transcribe_base_url and depend on TRANSCRIBE_BASE_URL in the transcribe endpoint. Change _transcribe_audio_bytes_with_error_handling to call the new transcribe_audio_bytes(payload, filename, content_type) and map RuntimeError directly to UpstreamServiceError. Map CoroSegment list into ASRParagraph list, converting speaker string to int and passing through start/end/text. Update tests and fixtures to use TRANSCRIBE_BASE_URL and CoroSegment-based mocking instead of WLKMessageStatus/lines. Update integration test skip condition to check TRANSCRIBE_BASE_URL instead of TRANSCRIBE_WS_URI. Add TRANSCRIBE_BASE_URL and TRANSCRIBE_API_KEY settings and remove TRANSCRIBE_WS_URI.	`aymurai/api/endpoints/routers/asr/transcribe.py` `test/api/endpoints/routers/asr/test_transcribe.py` `test/api/endpoints/routers/asr/conftest.py` `test/integration/test_asr_pipeline.py` `aymurai/settings.py`
Add smoke-testing and runtime tooling for coro and update documentation.	Add run-coro.sh script to start coro via uvx with configurable cpu/gpu mode, transcript spill dir, and port. Add smoke_coro.py script that starts coro, waits for /health, and transcribes a sample via both the OpenAI SDK and the aymurai client. Extend README with instructions for running coro, configuring TRANSCRIBE_BASE_URL, and running the smoke test.	`scripts/run-coro.sh` `scripts/smoke_coro.py` `README.md`
Add unit tests for the new ASR client behavior and adjust dependencies/infra.	Introduce test/audio/test_asr_client.py to cover happy-path done-frame parsing, missing done frame, missing base URL, and APIConnectionError mapping to RuntimeError. Adjust pyproject.toml dependencies by removing librosa and adding openai>=1.65.2 (capped by marker-pdf). Apply minor docker-compose.yml YAML normalization changes (array formatting and multiline command wrapping).	`test/audio/test_asr_client.py` `pyproject.toml` `docker-compose.yml` `uv.lock`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

In aymurai/audio/asr_client.py, AsyncOpenAI is instantiated on every transcribe_audio_bytes call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
The transcribe router depends on get_transcribe_base_url but then ignores the base_url argument and has transcribe_audio_bytes re-read and re-validate settings.TRANSCRIBE_BASE_URL; it would be cleaner either to pass the resolved base_url (and potentially api_key) into the client, or to drop the unused dependency to avoid duplicated configuration checks.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `aymurai/audio/asr_client.py`, `AsyncOpenAI` is instantiated on every `transcribe_audio_bytes` call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
- The `transcribe` router depends on `get_transcribe_base_url` but then ignores the `base_url` argument and has `transcribe_audio_bytes` re-read and re-validate `settings.TRANSCRIBE_BASE_URL`; it would be cleaner either to pass the resolved `base_url` (and potentially `api_key`) into the client, or to drop the unused dependency to avoid duplicated configuration checks.

## Individual Comments

### Comment 1
<location path="aymurai/api/endpoints/routers/asr/transcribe.py" line_range="91" />
<code_context>
     file: UploadFile,
     use_cache: bool = True,
-    ws_uri: str = Depends(get_transcribe_ws_uri),
+    base_url: str = Depends(get_transcribe_base_url),
     session: Session = Depends(get_session),
 ) -> ASRDocument:
</code_context>
<issue_to_address>
**suggestion:** The injected `base_url` dependency isn’t used, so configuration can’t be overridden per-request.

The endpoint now depends on `get_transcribe_base_url`, but `base_url` isn’t used and `transcribe_audio_bytes` still reads `settings.TRANSCRIBE_BASE_URL` directly. This prevents per-request overrides and leaves `base_url` as a dead parameter. Either pass `base_url` into `transcribe_audio_bytes` (and stop reading from settings there) or remove this dependency to avoid confusion.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-17T05:45:37Z

    file: UploadFile,
    use_cache: bool = True,
-    ws_uri: str = Depends(get_transcribe_ws_uri),
+    base_url: str = Depends(get_transcribe_base_url),


suggestion: The injected base_url dependency isn’t used, so configuration can’t be overridden per-request.

The endpoint now depends on get_transcribe_base_url, but base_url isn’t used and transcribe_audio_bytes still reads settings.TRANSCRIBE_BASE_URL directly. This prevents per-request overrides and leaves base_url as a dead parameter. Either pass base_url into transcribe_audio_bytes (and stop reading from settings there) or remove this dependency to avoid confusion.

…iption and metrics

… and SSE support

… into feat/coroasr-integration

sourcery-ai

New security issues found

sourcery-ai · 2026-06-22T00:10:57Z

+            result = subprocess.run(
+                [*_FFPROBE_ARGS, handle.name],
+                capture_output=True,
+                timeout=_FFPROBE_TIMEOUT_SECONDS,
+                check=True,
+            )


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

- request response_format=diarized_json in both coro transcription calls so the server returns speaker-attributed segments - bump openai>=2.43.0 (typed diarized_json literal) and override marker-pdf's openai<2.0.0 cap via tool.uv.override-dependencies (marker's openai LLM services are unused by aymurai) - install ffmpeg in the API image for ffprobe-based audio duration probing - add scripts/smoke_coro_diarization.py to verify diarization wiring against a running coro server with a 30s+ multi-speaker clip

jedzill4 added 14 commits June 17, 2026 00:56

feat(asr): replace TRANSCRIBE_WS_URI with TRANSCRIBE_BASE_URL/API_KEY

d344799

feat(asr): add coro boundary schema, remove WLK websocket schemas

979b4bb

refactor(asr): fold TranscriptionItem into ASRParagraph

9b0dace

docs(asr): add ASRParagraph docstring

93e1bdf

feat(asr): rewrite client to consume coro over openai SDK SSE

ce2f21c

test(asr): drive async client test with asyncio.run (repo has no pyte…

46dda72

…st-asyncio)

feat(asr): map coro segments to ASRParagraph in transcribe router

cb5f78b

test(asr): update transcribe tests for coro client

3e70e72

test(asr): gate integration test on TRANSCRIBE_BASE_URL

aeeca23

build(asr): add openai dep (1.x, capped by marker-pdf), drop librosa

6613bfa

chore(asr): drop whisperlivekit compose service (coro runs via uv tool)

8570bb7

ASR now runs as a host process via scripts/run-coro.sh; .env points at TRANSCRIBE_BASE_URL. Includes pre-existing compose YAML formatting normalization already present in the working tree on this branch.

docs(asr): add run-coro.sh (cpu/gpu) and README instructions

1e44ae8

test(asr): cover openai connection error -> RuntimeError mapping

1858149

test(asr): add coro SSE smoke test; pin uvx --python 3.12 in run-coro.sh

b147c49

sourcery-ai Bot reviewed Jun 17, 2026

View reviewed changes

jansaldo self-assigned this Jun 18, 2026

jansaldo approved these changes Jun 18, 2026

View reviewed changes

jansaldo and others added 3 commits June 19, 2026 17:20

feat(asr): add Argentine Spanish ASR evaluation notebook with transcr…

3302a59

…iption and metrics

feat(asr): implement streaming transcription with progress estimation…

1c21990

… and SSE support

Merge branch 'feat/coroasr-integration' of github.com:AymurAI/backend…

cd001a2

… into feat/coroasr-integration

sourcery-ai Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85

feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85
jedzill4 wants to merge 18 commits into
release/v2.0.0from
feat/coroasr-integration

jedzill4 commented Jun 17, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 17, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 17, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jedzill4 commented Jun 17, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Notes

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for coro-based SSE transcription flow

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jedzill4 commented Jun 17, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 17, 2026 •

edited

Loading