feat(eot): add audio eot model support by chenghao-mou · Pull Request #1613 · livekit/agents-js

chenghao-mou · 2026-05-27T00:16:23Z

Description

add audio eot model and local inference support, deprecating silero and turn detector plugins

Changes Made

add audio eot model support for both cloud and local versions
deprecating silero and turn detector plugins

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Removed unnecessary comments and ensured code quality
Changes explained: All changes are properly documented and justified above
Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

Automated tests added/updated (if applicable)
All tests pass
Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes

Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

add audio eot model and local inference support, deprecating silero and turn detector plugins

changeset-bot · 2026-05-27T00:16:29Z

🦋 Changeset detected

Latest commit: 86d8aa7

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

toubatbrian · 2026-05-27T22:23:22Z

+ *
+ * Port of Python `livekit.agents.inference.eot.detector`.
+ */
+import type { InferenceExecutor } from '../../ipc/inference_executor.js';


I wonder if there will be any overlap coe between the eot under inference v.s. turn detector plugins? Or this is specifically to audio-based eot?

Both VAD and the turn detector plugins are getting deprecated, so the overlap will be short-term.

…frame The AudioFrame emitted on START_OF_SPEECH / END_OF_SPEECH sliced off the prefix-padding samples but still reported `samplesPerChannel = speechBufferIndex`, so the frame's metadata claimed more samples than its data contained and downstream consumers (STT, transcription) lost the pre-roll context the buffer machinery is designed to preserve. Slice from 0 instead so data length matches samplesPerChannel and the prefix-padding pre-roll is delivered, matching the Python original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 21 additional findings in Devin Review.

devin-ai-integration · 2026-05-29T12:40:57Z

+  protected override async _run(): Promise<void> {
+    while (true) {
+      try {
+        await this._raceWithSwap(this._transport.run());
+        return;
+      } catch (err) {
+        if (err instanceof SwapAbortError) {
+          if (this._closing) return;
+          // A swap already happened (e.g. predict timeout → fallback).
+          // The new transport is mounted; loop and run it. Routing the
+          // swap through `SwapAbortError` (rather than through the
+          // cloud/local branch below) is what prevents the "timeout
+          // flips backend mid-await" misclassification — the catch
+          // exits early before ever consulting `_backend`.
+          continue;
+        }
+        const e = err instanceof Error ? err : new Error(String(err));
+        if (this._backend === 'cloud') {
+          this._fallBackToLocal(e);
+          continue;
+        }
+        this._onLocalFailure(e);
+        return;
+      }
+    }


🟡 ReadableStream reader lock conflict causes local transport to crash after cloud→local fallback

After a cloud→local fallback (triggered by either a transport error or predictEndOfTurn timeout), the LocalTransport.run() throws a TypeError because the audio channel's ReadableStream is already locked by the orphaned cloud transport's drain task.

The flow: CloudTransport._runOnce() creates a drainAudioTask that calls stream._drainAudioChannel() (agents/src/inference/eot/base.ts:610-626), which holds a reader lock on this._audioChannel.stream(). When _fallBackToLocal fires (agents/src/inference/eot/detector.ts:231), it calls this._transport.detach() — but CloudTransport.detach() (agents/src/inference/eot/transports.ts:342-345) only closes the send channel and clears _ws; it does NOT cancel or release the drain task's reader. The old drain task's reader.read() remains pending indefinitely.

When the _run loop continues and runs LocalTransport.run() → stream._drainAudioChannel(), it calls this._audioChannel.stream().getReader() on the same ReadableStream instance (since createStreamChannel().stream() always returns the same underlying transform.readable — see agents/src/stream/stream_channel.ts:23). This throws TypeError ("ReadableStream is locked").

Impact and partial self-recovery

The TypeError is caught by the _run loop as a local failure, which logs a misleading "local audio eot mini failed" warning and resolves the in-flight prediction with 1.0. However, the system partially recovers because the orphaned cloud drain task continues calling this._transport.pushFrame(value) (where this._transport is now the local transport) — so the local ring buffer IS still fed with audio. Subsequent warmup() → startInference() calls snapshot the buffer and run inference successfully. Net effect: one spurious warning + one prediction defaulting to 1.0 on the first turn after fallback.

Prompt for agents

The root cause is that _drainAudioChannel() (base.ts:610) acquires a reader lock on the stream's _audioChannel that is never released when the cloud transport is detached during fallback. The local transport then cannot get its own reader. Possible approaches: 1. Make _drainAudioChannel signal-aware: accept an AbortSignal, and when aborted, break out of the while loop so the finally block releases the reader. The cloud transport's _runOnce drainAudioTask should pass its controller's signal, and detach() should abort it. 2. Alternatively, have _fallBackToLocal close and recreate the _audioChannel, so the local transport gets a fresh stream. This would require draining any buffered-but-unread frames first. 3. Skip calling stream._drainAudioChannel() in LocalTransport.run() when the stream detects that another drain is already active (e.g. a flag). Since the orphaned drain already feeds the local buffer, the local transport doesn't strictly need its own drain loop. Approach 1 is cleanest and aligns with the existing Task cancellation pattern — the drain task already receives a controller via Task.from but _drainAudioChannel never consults it.

Was this helpful? React with 👍 or 👎 to provide feedback.

…dal-eou # Conflicts: # agents/src/voice/agent_activity.ts # agents/src/voice/agent_session.ts # agents/src/voice/audio_recognition.ts # examples/src/gemini_realtime_agent.ts # examples/src/runway_avatar.ts

devin-ai-integration

Devin Review found 1 new potential issue.

View 22 additional findings in Devin Review.

devin-ai-integration · 2026-05-29T16:50:23Z

+    let speakingWon = false;
+    try {
+      await Promise.race([
+        inner(innerController),
+        this.userSpeakingEvent.waitOnce(controller.signal).then(() => {
+          speakingWon = true;
+        }),
+      ]);
+      if (speakingWon) {
+        this.logger.debug(context, 'user spoke during endpointing, cancelling end of turn task');
+      }
+    } finally {
+      controller.signal.removeEventListener('abort', onOuterAbort);
+      // If the speaking-event branch won (or the outer was aborted), tear
+      // down the inner bounce so it doesn't keep awaiting the delay.
+      innerController.abort();
+    }


🟡 Unhandled promise rejection when speaking-guard aborts the inner bounce task

In bounceEOUTaskWithSpeakingGuard, when the userSpeakingEvent wins the Promise.race at line 630, the inner(innerController) promise is orphaned (no one awaits it). The finally block at line 643 then calls innerController.abort(), which causes the inner function's await delay(...) (at the original bounceEOUTask line ~1492) to reject with an AbortError. Because the race already resolved, this rejection is unhandled and will produce UnhandledPromiseRejection warnings in Node.js. The outer Task's .catch() at agents/src/voice/audio_recognition.ts:1576 only wraps the guard function itself, not the orphaned inner promise.

Prompt for agents

In bounceEOUTaskWithSpeakingGuard (audio_recognition.ts around line 610-644), the Promise.race between inner(innerController) and the speaking-event waiter leaves the inner promise unhandled when speaking wins. When innerController.abort() fires in the finally block, the inner function's delay() rejects with AbortError that nobody catches. Fix: capture the inner promise in a variable before the race, and add a .catch(() => {}) to suppress the expected AbortError after the race completes (similar to how fire-and-forget promises are handled elsewhere in the codebase). For example: const innerPromise = inner(innerController); // Prevent unhandled rejection when we abort the inner task after speaking wins. innerPromise.catch(() => {}); await Promise.race([innerPromise, ...]); Alternatively, await the innerPromise in a try/catch after the race to ensure the rejection is consumed: try { await innerPromise; } catch { /* expected abort */ } The second approach is cleaner because it ensures the inner task has fully torn down before the guard function returns.

Was this helpful? React with 👍 or 👎 to provide feedback.

feat(eot): add audio eot model support

b4ad9eb

add audio eot model and local inference support, deprecating silero and turn detector plugins

Create busy-aliens-wink.md

96c4563

chenghao-mou requested a review from a team May 27, 2026 09:27

more clean up and refactoring

80e6ab3

chenghao-mou marked this pull request as ready for review May 27, 2026 18:24

This comment was marked as resolved.

Sign in to view

toubatbrian reviewed May 27, 2026

View reviewed changes

chenghao-mou added 2 commits May 28, 2026 16:05

more refactoring and clean up

4dc88ce

more refactoring and clean up

e7fdb49

This comment was marked as resolved.

Sign in to view

address comment

efaed38

This comment was marked as resolved.

Sign in to view

address comment

242a5ff

devin-ai-integration Bot reviewed May 29, 2026

View reviewed changes

rename backend to model

eec5078

This comment was marked as resolved.

Sign in to view

Merge remote-tracking branch 'origin/main' into feat/AGT-2520-multimo…

2832616

…dal-eou # Conflicts: # agents/src/voice/agent_activity.ts # agents/src/voice/agent_session.ts # agents/src/voice/audio_recognition.ts # examples/src/gemini_realtime_agent.ts # examples/src/runway_avatar.ts

devin-ai-integration Bot reviewed May 29, 2026

View reviewed changes

address comments

86d8aa7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eot): add audio eot model support#1613

feat(eot): add audio eot model support#1613
chenghao-mou wants to merge 11 commits into
mainfrom
feat/AGT-2520-multimodal-eou

chenghao-mou commented May 27, 2026

Uh oh!

changeset-bot Bot commented May 27, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

toubatbrian May 27, 2026

Uh oh!

chenghao-mou May 29, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenghao-mou commented May 27, 2026

Description

Changes Made

Pre-Review Checklist

Testing

Additional Notes

Uh oh!

changeset-bot Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

This comment was marked as resolved.

Uh oh!

toubatbrian May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou May 29, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented May 27, 2026 •

edited

Loading