feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85
feat(asr): replace WhisperLiveKit with coro (OpenAI SDK + SSE)#85jedzill4 wants to merge 18 commits into
Conversation
ASR now runs as a host process via scripts/run-coro.sh; .env points at TRANSCRIBE_BASE_URL. Includes pre-existing compose YAML formatting normalization already present in the working tree on this branch.
Reviewer's GuideReplace the WhisperLiveKit WebSocket-based ASR integration with a coro-based OpenAI SSE transcription flow, introduce a CoroSegment schema and ASRParagraph refactor, wire the HTTP/SSE client into the ASR router and settings, adjust tests and dependencies, and add scripts/docs for running and smoke-testing coro. Sequence diagram for coro-based SSE transcription flowsequenceDiagram
participant Client
participant ASRRouter as ASR_transcribe_endpoint
participant ASRService as _transcribe_audio_bytes_with_error_handling
participant ASRClient as transcribe_audio_bytes
participant OpenAIClient as AsyncOpenAI
participant Coro as coro_server
Client->>ASRRouter: POST /asr/transcribe (UploadFile)
ASRRouter->>ASRService: _transcribe_audio_bytes_with_error_handling(data, filename, content_type)
ASRService->>ASRClient: transcribe_audio_bytes(payload, filename, content_type)
ASRClient->>OpenAIClient: AsyncOpenAI(base_url=TRANSCRIBE_BASE_URL)
OpenAIClient->>Coro: audio.transcriptions.create(stream=True)
loop SSE stream
Coro-->>OpenAIClient: transcript.text.delta
end
Coro-->>OpenAIClient: transcript.text.done
OpenAIClient-->>ASRClient: done event
ASRClient->>ASRClient: json.loads(event.text) -> list[CoroSegment]
ASRClient-->>ASRService: list[CoroSegment]
ASRService->>ASRService: map CoroSegment -> ASRParagraph
ASRService-->>ASRRouter: list[ASRParagraph]
ASRRouter-->>Client: ASRDocument(document=list[ASRParagraph])
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
aymurai/audio/asr_client.py,AsyncOpenAIis instantiated on everytranscribe_audio_bytescall; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios. - The
transcriberouter depends onget_transcribe_base_urlbut then ignores thebase_urlargument and hastranscribe_audio_bytesre-read and re-validatesettings.TRANSCRIBE_BASE_URL; it would be cleaner either to pass the resolvedbase_url(and potentiallyapi_key) into the client, or to drop the unused dependency to avoid duplicated configuration checks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `aymurai/audio/asr_client.py`, `AsyncOpenAI` is instantiated on every `transcribe_audio_bytes` call; consider reusing a single client instance (or injecting it) to avoid repeated connection setup overhead in high-traffic scenarios.
- The `transcribe` router depends on `get_transcribe_base_url` but then ignores the `base_url` argument and has `transcribe_audio_bytes` re-read and re-validate `settings.TRANSCRIBE_BASE_URL`; it would be cleaner either to pass the resolved `base_url` (and potentially `api_key`) into the client, or to drop the unused dependency to avoid duplicated configuration checks.
## Individual Comments
### Comment 1
<location path="aymurai/api/endpoints/routers/asr/transcribe.py" line_range="91" />
<code_context>
file: UploadFile,
use_cache: bool = True,
- ws_uri: str = Depends(get_transcribe_ws_uri),
+ base_url: str = Depends(get_transcribe_base_url),
session: Session = Depends(get_session),
) -> ASRDocument:
</code_context>
<issue_to_address>
**suggestion:** The injected `base_url` dependency isn’t used, so configuration can’t be overridden per-request.
The endpoint now depends on `get_transcribe_base_url`, but `base_url` isn’t used and `transcribe_audio_bytes` still reads `settings.TRANSCRIBE_BASE_URL` directly. This prevents per-request overrides and leaves `base_url` as a dead parameter. Either pass `base_url` into `transcribe_audio_bytes` (and stop reading from settings there) or remove this dependency to avoid confusion.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| file: UploadFile, | ||
| use_cache: bool = True, | ||
| ws_uri: str = Depends(get_transcribe_ws_uri), | ||
| base_url: str = Depends(get_transcribe_base_url), |
There was a problem hiding this comment.
suggestion: The injected base_url dependency isn’t used, so configuration can’t be overridden per-request.
The endpoint now depends on get_transcribe_base_url, but base_url isn’t used and transcribe_audio_bytes still reads settings.TRANSCRIBE_BASE_URL directly. This prevents per-request overrides and leaves base_url as a dead parameter. Either pass base_url into transcribe_audio_bytes (and stop reading from settings there) or remove this dependency to avoid confusion.
…iption and metrics
… into feat/coroasr-integration
| result = subprocess.run( | ||
| [*_FFPROBE_ARGS, handle.name], | ||
| capture_output=True, | ||
| timeout=_FFPROBE_TIMEOUT_SECONDS, | ||
| check=True, | ||
| ) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
- request response_format=diarized_json in both coro transcription calls so the server returns speaker-attributed segments - bump openai>=2.43.0 (typed diarized_json literal) and override marker-pdf's openai<2.0.0 cap via tool.uv.override-dependencies (marker's openai LLM services are unused by aymurai) - install ffmpeg in the API image for ffprobe-based audio duration probing - add scripts/smoke_coro_diarization.py to verify diarization wiring against a running coro server with a 30s+ multi-speaker clip
Summary
Replaces the WhisperLiveKit WebSocket ASR integration with coro, an OpenAI-compatible ASR + speaker-diarization server. aymurai now transcribes via the official
openaiSDK over SSE (stream=True), and coro runs as an isolated host process viauv tool/uvx(no dependency conflicts). The public API, DB schema,ASRParagraph/ASRDocument, and the anonymizer consumer are unchanged.Changes
aymurai/audio/asr_client.pyrewritten (186→62 lines) —transcribe_audio_bytes(payload, filename, content_type)streams to coro's/v1/audio/transcriptionsand parses thetranscript.text.doneframe intolist[CoroSegment]. MapsopenaiAPIError/APIConnectionError→RuntimeError.aymurai/api/meta/asr/websocket.py(allWLKMessage*); addedaymurai/api/meta/asr/coro.pywith theCoroSegmentboundary model +_parse_hhmmss. FoldedTranscriptionItemintoASRParagraph.get_transcribe_base_url+CoroSegment → ASRParagraphmapping;TRANSCRIBE_WS_URI→TRANSCRIBE_BASE_URL(+TRANSCRIBE_API_KEY).openai>=1.65.2(capped<2bymarker-pdf); removedlibrosa(only the old client used it).whisperlivekitdocker-compose service; addedscripts/run-coro.sh(cpu/gpu, pinsuvx --python 3.12) andscripts/smoke_coro.py; README section.Test Plan
ruff format --check+ruff checkclean on changed filespyrefly check— 0 errorspytest test/audio/test_asr_client.py test/api/endpoints/routers/asr— 11 passed (client error-mapping + endpoint + validation)ASRDocument/ASRParagraphunchanged)scripts/smoke_coro.py cpu): started coro (parakeet + NeMo diarization), transcribed a Spanish sample over SSE via both the raw OpenAI SDK and aymurai's client — identical 5-segment, diarized output, HTTP 200test/integration/test_asr_pipeline.pyrequires a live coro server (setTRANSCRIBE_BASE_URL); excluded from normal runsNotes
./scripts/run-coro.sh [cpu|gpu], then setTRANSCRIBE_BASE_URL=http://localhost:8000/v1. Requires hostffmpeg.docker-compose.ymlalso includes incidental YAML formatting normalization already present on the branch.Summary by Sourcery
Replace the WebSocket-based WhisperLiveKit ASR integration with a coro-based SSE/OpenAI SDK transcription flow while preserving the public ASR API and document schema.
New Features:
Enhancements:
Build:
Deployment:
Documentation:
Tests:
Chores: