AUDIO-1: audio output service specification by JarbasAl · Pull Request #38 · OpenVoiceOS/architecture

JarbasAl · 2026-05-27T22:25:58Z

Companion issue: #49

Summary

Defines the audio output service — the pipeline's output-side counterpart that consumes ovos.utterance.speak and renders natural-language responses as audio.

What the spec covers

§3 — Rendering pipeline: dialog-transformer chain → TTS synthesis → TTS-transformer chain → playback queue
§4 — Sequential playback queue shared between TTS speech (ovos.utterance.speak) and sound effects
§4.1 — Queued sounds: ovos.audio.queue for scheduled playback in queue order
§4.2 — Immediate sounds: ovos.audio.play_sound (plays without queuing)
§5 — Output lifecycle signals: ovos.audio.output.started / ovos.audio.output.ended (session identity from context.session.session_id)
§5.3 — Speaking-status query: ovos.audio.is_speaking (session-scoped via context, not data)
§6 — Stop integration: ovos.audio.stop and universal ovos.stop; MAY scope response to session
§7 — Listen trigger: ovos.mic.listen emitted after playback ends when listen: true on the speak message

Bus surface

Topic	Purpose
`ovos.utterance.speak`	TTS request (PIPELINE-1 §9.6)
`ovos.audio.queue`	Queue sound for sequential playback
`ovos.audio.play_sound`	Play sound immediately
`ovos.audio.stop`	Stop playback and clear queue
`ovos.audio.is_speaking`	Speaking-status query
`ovos.audio.output.started`	Playback session began
`ovos.audio.output.ended`	Playback session ended
`ovos.mic.listen`	Signal listener to start capture

Summary by CodeRabbit

Documentation
- Added comprehensive Audio Output Service specification defining audio delivery modes (local queued playback and remote client delivery).
- Documented lifecycle signals for audio output (started/ended events).
- Specified queue behavior, stop integration, and listen-triggering mechanics.

coderabbitai · 2026-05-27T22:26:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d491844-e859-49f4-bbe8-d3e148e424b7

📥 Commits

Reviewing files that changed from the base of the PR and between 62cbdf6 and 52ef5aa.

📒 Files selected for processing (4)

CHANGELOG.md
appendix/divergences.md
appendix/rationale.md
audio-out.md

✅ Files skipped from review due to trivial changes (3)

appendix/rationale.md
CHANGELOG.md
appendix/divergences.md

📝 Walkthrough

Walkthrough

Adds the OVOS-AUDIO-1 Audio Output Service specification (audio-out.md) defining rendering modes, FIFO playback queue, instant sound playback, lifecycle signals, listen/stop semantics, and conformance requirements. Supporting entries are added to CHANGELOG.md, appendix/divergences.md, and appendix/rationale.md.

Changes

OVOS-AUDIO-1 Audio Output Service Specification

Layer / File(s)	Summary
Audio Output Service Specification and Changelog `audio-out.md`, `CHANGELOG.md`	`audio-out.md` introduces OVOS-AUDIO-1 (Spec Version 2) covering two rendering modes (`ovos.utterance.speak` and `ovos.utterance.speak.b64`), dialog/TTS transformer stages, sequential FIFO queue (`ovos.audio.queue`) and fire-and-forget instant sounds (`ovos.audio.play_sound`), lifecycle signals (`ovos.audio.output.started`/`ended`), `listen` flag propagation, stop integration semantics, and MUST/SHOULD/MAY conformance requirements. `CHANGELOG.md` records this new specification.
Appendix: Rationale and Bus Topic Divergences `appendix/rationale.md`, `appendix/divergences.md`	`rationale.md` adds §4.9 AUDIO-1 explaining sentence-boundary TTS segmentation as an internal latency optimization while preserving external event ordering and `listen` timing. `divergences.md` registers new bus topics (`ovos.utterance.speak.b64`, `ovos.audio.speech`, `ovos.audio.queue`, `ovos.audio.play_sound`) under §5.5.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Spec proposal: OVOS-AUDIO-1 — Audio Output Service #49: This PR implements the OVOS-AUDIO-1 Audio Output Service specification proposed in issue #49, delivering the specification document, changelog, rationale, and bus-topic divergences entries described in that proposal.

Possibly related PRs

OpenVoiceOS/architecture#35: Both PRs extend appendix/divergences.md with new bus-topic entries in the utterance/speech naming space.

Poem

🐇 Hop, hop, the audio queue is clear,
Each sentence segmented, latency no fear.
Base64 bytes bounce to clients remote,
ovos.audio.output.started — now take note!
The listen flag fires when the last word ends,
A spec so crisp, the rabbit recommends! 🎵

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately summarizes the main change—introduction of the AUDIO-1 audio output service specification document.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/audio

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@audio.md`:
- Line 143: The spec mixes "synthesise/synthesizes/synthesised" and
"synthesize/synthesizes/synthesized"; pick one spelling (e.g., American
"synthesize/synthesizes/synthesized" or British
"synthesise/synthesises/synthesised") and normalize all occurrences accordingly
(including the instance at line with "The audio output service synthesises the
utterance text into audio." and the occurrences around 385-386), updating
headings, body text, and examples so the chosen variant is used consistently
throughout the document.
- Around line 115-129: Add a language identifier to the fenced flow-diagram
block that begins with "ovos.utterance.speak" so markdown tooling treats it as
plain text: change the opening triple-backtick fence to use "text" (i.e.,
```text) for the block that contains the lines "[dialog transformers] ←
OVOS-TRANSFORM-1 §3.5", "[tts transformers] ← OVOS-TRANSFORM-1 §3.6", and
"scheduled playback queue → audio output" so the diagram is correctly typed by
renderers.

In `@ovos-pipeline-1.md`:
- Around line 1150-1158: The spec has a normative conflict: the `listen: true`
MUST on messages emitted as `ovos.utterance.speak` in a `get_response` flow
(OVOS-CONVERSE-1 §5) conflicts with the later statement that handlers have “no
normative obligation” (currently §11); resolve by choosing one approach and
updating the doc accordingly — either move the `listen` requirement out of
handler text into the orchestrator/framework contract (mentioning
`get_response`, `ovos.utterance.speak`, and `listen` so handlers remain
implementation-neutral), or keep the handler-level MUST and update §11 to add
explicit handler conformance obligations requiring handlers to set `listen:
true` when emitting `ovos.utterance.speak` in `get_response` flows; apply the
same change consistently where the spec references handler obligations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32f3ebb8-0636-4bc4-be10-56cf26a76b09

📥 Commits

Reviewing files that changed from the base of the PR and between cec6c7d and 08c848e.

📒 Files selected for processing (2)

audio.md
ovos-pipeline-1.md

coderabbitai · 2026-05-28T16:08:04Z

+```
+ovos.utterance.speak
+    │
+    ▼
+[dialog transformers]                     ← OVOS-TRANSFORM-1 §3.5
+    │
+    ▼
+TTS synthesis (text → audio data)
+    │
+    ▼
+[tts transformers]                        ← OVOS-TRANSFORM-1 §3.6
+    │
+    ▼
+scheduled playback queue → audio output
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced block.

The flow diagram fence is untyped; please mark it as text for markdown tooling compatibility.

Proposed edit

-``` +```text ovos.utterance.speak │ ▼ [dialog transformers] ← OVOS-TRANSFORM-1 §3.5 @@ scheduled playback queue → audio output

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 115-115: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@audio.md` around lines 115 - 129, Add a language identifier to the fenced flow-diagram block that begins with "ovos.utterance.speak" so markdown tooling treats it as plain text: change the opening triple-backtick fence to use "text" (i.e., ```text) for the block that contains the lines "[dialog transformers] ← OVOS-TRANSFORM-1 §3.5", "[tts transformers] ← OVOS-TRANSFORM-1 §3.6", and "scheduled playback queue → audio output" so the diagram is correctly typed by renderers.

coderabbitai · 2026-05-28T19:04:04Z

Actionable comments posted: 0

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

audio-out.md (1)

137-398: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use one spelling family for “synthesise/synthesize” across the file.

Line 137 uses synthesises while Line 398 uses synthesized. Please normalize to one variant throughout the spec.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audio-out.md` around lines 137 - 398, The document inconsistently uses the
British "synthesise/synthesises" and American "synthesize/synthesized"
spellings; pick one spelling family and normalize every occurrence (e.g.,
replace "synthesises", "synthesise" and "synthesised" or alternatively
"synthesizes", "synthesize" and "synthesized") across the file so all mentions
(including headings like "TTS transformer stage", sentence bodies such as the
first paragraph and the section titles/notes) use the chosen variant
consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@audio-out.md`:
- Around line 238-255: The subsection numbering jumps from "### 4.3 Synthesised
audio delivery — `ovos.audio.speech`" to "### 4.5 Listen flag", leaving out 4.4;
update the heading "### 4.5 Listen flag" to "### 4.4 Listen flag" (or renumber
subsequent headings accordingly) so section references are consistent, and
verify any cross-references in the document that mention 4.5/4.4 are adjusted to
the new number.

---

Outside diff comments:
In `@audio-out.md`:
- Around line 137-398: The document inconsistently uses the British
"synthesise/synthesises" and American "synthesize/synthesized" spellings; pick
one spelling family and normalize every occurrence (e.g., replace "synthesises",
"synthesise" and "synthesised" or alternatively "synthesizes", "synthesize" and
"synthesized") across the file so all mentions (including headings like "TTS
transformer stage", sentence bodies such as the first paragraph and the section
titles/notes) use the chosen variant consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3c644e8-83c2-48a0-8ba5-435824deff4a

📥 Commits

Reviewing files that changed from the base of the PR and between 738e32d and 62cbdf6.

📒 Files selected for processing (4)

appendix/divergences.md
appendix/rationale.md
audio-out.md
ovos-pipeline-1.md

✅ Files skipped from review due to trivial changes (1)

appendix/divergences.md

🚧 Files skipped from review as they are similar to previous changes (1)

ovos-pipeline-1.md

JarbasAl · 2026-06-10T18:58:54Z

Merge-ready (MERGEABLE, dev merged in). Template conformance: header present (OVOS-AUDIO-1 v1 Draft), RFC-2119 boilerplate present. Fixed: OVOS-AUDIO-1 was absent from README spec table and CHANGELOG — both added. Spec filename audio-out.md is already using clean naming (consistent with #55). Note: after #55 merges, GLOSSARY.md in this branch will need ovos-intent-*.md links updated.

…isten field Consolidates the PIPELINE-1 companion edits previously bundled into the union-slots (#56), FALLBACK-1 (#39), COMMON-QUERY-1 (#40) and AUDIO-1 (#38) feature PRs into a single one-file change to ovos-pipeline-1.md. - §6.1/§6.2 — orchestrator backstop for required_slots (INTENT-3 §5.3): the orchestrator treats a Match as declined if any required slot is absent. Second line of defense behind engine-side enforcement. - §7.3 — reserve intent_names "fallback" (FALLBACK-1 §6.3) and "common_query" (COMMON-QUERY-1 §3). COMMON-QUERY-1 asserted the reservation but never registered the row; this closes that gap. - §9.6 — add the OPTIONAL listen field to ovos.utterance.speak; the output-side behaviour is owned by AUDIO-1. All additions are backwards-compatible. PIPELINE-1 is already V2 (its namespaced topics replace the pre-spec names); these refinements do not change the class, so the Version stays 2. Adds the missing PIPELINE-1 CHANGELOG section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

JarbasAl force-pushed the spec/audio branch 4 times, most recently from 607cf56 to 4a35082 Compare May 27, 2026 22:40

JarbasAl mentioned this pull request May 28, 2026

Spec proposal: OVOS-AUDIO-1 — Audio Output Service #49

Open

JarbasAl changed the title ~~OVOS-AUDIO-1: Audio Output Service Specification (v1 draft)~~ OVOS-AUDIO-1: Audio Output Service Specification May 28, 2026

JarbasAl mentioned this pull request May 28, 2026

Adoption: move the OVOS specifications out of Draft status #5

Open

3 tasks

JarbasAl marked this pull request as ready for review May 28, 2026 16:03

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread audio-out.md

JarbasAl mentioned this pull request Jun 10, 2026

docs: drop the ovos- filename prefix; refresh the spec index #55

Merged

JarbasAl mentioned this pull request Jun 10, 2026

Classify every spec as V1 or V2 before launch day #60

Open

JarbasAl mentioned this pull request Jun 22, 2026

PIPELINE-1: required_slots backstop, reserve fallback/common_query, listen field #66

Draft

JarbasAl changed the title ~~OVOS-AUDIO-1: Audio Output Service Specification~~ AUDIO-1: audio output service specification Jun 22, 2026

JarbasAl force-pushed the spec/audio branch 2 times, most recently from 30743e1 to bda203b Compare June 22, 2026 18:04

AUDIO-1: audio output service specification

52ef5aa

JarbasAl force-pushed the spec/audio branch from bda203b to 52ef5aa Compare June 23, 2026 05:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AUDIO-1: audio output service specification#38

AUDIO-1: audio output service specification#38
JarbasAl wants to merge 1 commit into
devfrom
spec/audio

JarbasAl commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

JarbasAl commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JarbasAl commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the spec covers

Bus surface

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JarbasAl commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading