Skip to content

AUDIO-1: audio output service specification#38

Open
JarbasAl wants to merge 1 commit into
devfrom
spec/audio
Open

AUDIO-1: audio output service specification#38
JarbasAl wants to merge 1 commit into
devfrom
spec/audio

Conversation

@JarbasAl

@JarbasAl JarbasAl commented May 27, 2026

Copy link
Copy Markdown
Member

Companion issue: #49

Summary

Defines the audio output service — the pipeline's output-side counterpart that consumes ovos.utterance.speak and renders natural-language responses as audio.

What the spec covers

  • §3 — Rendering pipeline: dialog-transformer chain → TTS synthesis → TTS-transformer chain → playback queue
  • §4 — Sequential playback queue shared between TTS speech (ovos.utterance.speak) and sound effects
  • §4.1 — Queued sounds: ovos.audio.queue for scheduled playback in queue order
  • §4.2 — Immediate sounds: ovos.audio.play_sound (plays without queuing)
  • §5 — Output lifecycle signals: ovos.audio.output.started / ovos.audio.output.ended (session identity from context.session.session_id)
  • §5.3 — Speaking-status query: ovos.audio.is_speaking (session-scoped via context, not data)
  • §6 — Stop integration: ovos.audio.stop and universal ovos.stop; MAY scope response to session
  • §7 — Listen trigger: ovos.mic.listen emitted after playback ends when listen: true on the speak message

Bus surface

Topic Purpose
ovos.utterance.speak TTS request (PIPELINE-1 §9.6)
ovos.audio.queue Queue sound for sequential playback
ovos.audio.play_sound Play sound immediately
ovos.audio.stop Stop playback and clear queue
ovos.audio.is_speaking Speaking-status query
ovos.audio.output.started Playback session began
ovos.audio.output.ended Playback session ended
ovos.mic.listen Signal listener to start capture

Summary by CodeRabbit

  • Documentation
    • Added comprehensive Audio Output Service specification defining audio delivery modes (local queued playback and remote client delivery).
    • Documented lifecycle signals for audio output (started/ended events).
    • Specified queue behavior, stop integration, and listen-triggering mechanics.

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d491844-e859-49f4-bbe8-d3e148e424b7

📥 Commits

Reviewing files that changed from the base of the PR and between 62cbdf6 and 52ef5aa.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • appendix/divergences.md
  • appendix/rationale.md
  • audio-out.md
✅ Files skipped from review due to trivial changes (3)
  • appendix/rationale.md
  • CHANGELOG.md
  • appendix/divergences.md

📝 Walkthrough

Walkthrough

Adds the OVOS-AUDIO-1 Audio Output Service specification (audio-out.md) defining rendering modes, FIFO playback queue, instant sound playback, lifecycle signals, listen/stop semantics, and conformance requirements. Supporting entries are added to CHANGELOG.md, appendix/divergences.md, and appendix/rationale.md.

Changes

OVOS-AUDIO-1 Audio Output Service Specification

Layer / File(s) Summary
Audio Output Service Specification and Changelog
audio-out.md, CHANGELOG.md
audio-out.md introduces OVOS-AUDIO-1 (Spec Version 2) covering two rendering modes (ovos.utterance.speak and ovos.utterance.speak.b64), dialog/TTS transformer stages, sequential FIFO queue (ovos.audio.queue) and fire-and-forget instant sounds (ovos.audio.play_sound), lifecycle signals (ovos.audio.output.started/ended), listen flag propagation, stop integration semantics, and MUST/SHOULD/MAY conformance requirements. CHANGELOG.md records this new specification.
Appendix: Rationale and Bus Topic Divergences
appendix/rationale.md, appendix/divergences.md
rationale.md adds §4.9 AUDIO-1 explaining sentence-boundary TTS segmentation as an internal latency optimization while preserving external event ordering and listen timing. divergences.md registers new bus topics (ovos.utterance.speak.b64, ovos.audio.speech, ovos.audio.queue, ovos.audio.play_sound) under §5.5.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • OpenVoiceOS/architecture#35: Both PRs extend appendix/divergences.md with new bus-topic entries in the utterance/speech naming space.

Poem

🐇 Hop, hop, the audio queue is clear,
Each sentence segmented, latency no fear.
Base64 bytes bounce to clients remote,
ovos.audio.output.started — now take note!
The listen flag fires when the last word ends,
A spec so crisp, the rabbit recommends! 🎵

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change—introduction of the AUDIO-1 audio output service specification document.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/audio

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@JarbasAl JarbasAl force-pushed the spec/audio branch 4 times, most recently from 607cf56 to 4a35082 Compare May 27, 2026 22:40
@JarbasAl JarbasAl changed the title OVOS-AUDIO-1: Audio Output Service Specification (v1 draft) OVOS-AUDIO-1: Audio Output Service Specification May 28, 2026
@JarbasAl JarbasAl marked this pull request as ready for review May 28, 2026 16:03

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@audio.md`:
- Line 143: The spec mixes "synthesise/synthesizes/synthesised" and
"synthesize/synthesizes/synthesized"; pick one spelling (e.g., American
"synthesize/synthesizes/synthesized" or British
"synthesise/synthesises/synthesised") and normalize all occurrences accordingly
(including the instance at line with "The audio output service synthesises the
utterance text into audio." and the occurrences around 385-386), updating
headings, body text, and examples so the chosen variant is used consistently
throughout the document.
- Around line 115-129: Add a language identifier to the fenced flow-diagram
block that begins with "ovos.utterance.speak" so markdown tooling treats it as
plain text: change the opening triple-backtick fence to use "text" (i.e.,
```text) for the block that contains the lines "[dialog transformers] ←
OVOS-TRANSFORM-1 §3.5", "[tts transformers] ← OVOS-TRANSFORM-1 §3.6", and
"scheduled playback queue → audio output" so the diagram is correctly typed by
renderers.

In `@ovos-pipeline-1.md`:
- Around line 1150-1158: The spec has a normative conflict: the `listen: true`
MUST on messages emitted as `ovos.utterance.speak` in a `get_response` flow
(OVOS-CONVERSE-1 §5) conflicts with the later statement that handlers have “no
normative obligation” (currently §11); resolve by choosing one approach and
updating the doc accordingly — either move the `listen` requirement out of
handler text into the orchestrator/framework contract (mentioning
`get_response`, `ovos.utterance.speak`, and `listen` so handlers remain
implementation-neutral), or keep the handler-level MUST and update §11 to add
explicit handler conformance obligations requiring handlers to set `listen:
true` when emitting `ovos.utterance.speak` in `get_response` flows; apply the
same change consistently where the spec references handler obligations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32f3ebb8-0636-4bc4-be10-56cf26a76b09

📥 Commits

Reviewing files that changed from the base of the PR and between cec6c7d and 08c848e.

📒 Files selected for processing (2)
  • audio.md
  • ovos-pipeline-1.md

Comment thread audio-out.md
Comment on lines +115 to +129
```
ovos.utterance.speak
[dialog transformers] ← OVOS-TRANSFORM-1 §3.5
TTS synthesis (text → audio data)
[tts transformers] ← OVOS-TRANSFORM-1 §3.6
scheduled playback queue → audio output
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced block.

The flow diagram fence is untyped; please mark it as text for markdown tooling compatibility.

Proposed edit
-```
+```text
 ovos.utterance.speak
     │
     ▼
 [dialog transformers]                     ← OVOS-TRANSFORM-1 §3.5
@@
 scheduled playback queue → audio output
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 115-115: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audio.md` around lines 115 - 129, Add a language identifier to the fenced
flow-diagram block that begins with "ovos.utterance.speak" so markdown tooling
treats it as plain text: change the opening triple-backtick fence to use "text"
(i.e., ```text) for the block that contains the lines "[dialog transformers] ←
OVOS-TRANSFORM-1 §3.5", "[tts transformers] ← OVOS-TRANSFORM-1 §3.6", and
"scheduled playback queue → audio output" so the diagram is correctly typed by
renderers.

Comment thread audio-out.md
Comment thread ovos-pipeline-1.md Outdated
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Actionable comments posted: 0

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
audio-out.md (1)

137-398: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use one spelling family for “synthesise/synthesize” across the file.

Line 137 uses synthesises while Line 398 uses synthesized. Please normalize to one variant throughout the spec.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audio-out.md` around lines 137 - 398, The document inconsistently uses the
British "synthesise/synthesises" and American "synthesize/synthesized"
spellings; pick one spelling family and normalize every occurrence (e.g.,
replace "synthesises", "synthesise" and "synthesised" or alternatively
"synthesizes", "synthesize" and "synthesized") across the file so all mentions
(including headings like "TTS transformer stage", sentence bodies such as the
first paragraph and the section titles/notes) use the chosen variant
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@audio-out.md`:
- Around line 238-255: The subsection numbering jumps from "### 4.3 Synthesised
audio delivery — `ovos.audio.speech`" to "### 4.5 Listen flag", leaving out 4.4;
update the heading "### 4.5 Listen flag" to "### 4.4 Listen flag" (or renumber
subsequent headings accordingly) so section references are consistent, and
verify any cross-references in the document that mention 4.5/4.4 are adjusted to
the new number.

---

Outside diff comments:
In `@audio-out.md`:
- Around line 137-398: The document inconsistently uses the British
"synthesise/synthesises" and American "synthesize/synthesized" spellings; pick
one spelling family and normalize every occurrence (e.g., replace "synthesises",
"synthesise" and "synthesised" or alternatively "synthesizes", "synthesize" and
"synthesized") across the file so all mentions (including headings like "TTS
transformer stage", sentence bodies such as the first paragraph and the section
titles/notes) use the chosen variant consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3c644e8-83c2-48a0-8ba5-435824deff4a

📥 Commits

Reviewing files that changed from the base of the PR and between 738e32d and 62cbdf6.

📒 Files selected for processing (4)
  • appendix/divergences.md
  • appendix/rationale.md
  • audio-out.md
  • ovos-pipeline-1.md
✅ Files skipped from review due to trivial changes (1)
  • appendix/divergences.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • ovos-pipeline-1.md

Comment thread audio-out.md
@JarbasAl

Copy link
Copy Markdown
Member Author

Merge-ready (MERGEABLE, dev merged in). Template conformance: header present (OVOS-AUDIO-1 v1 Draft), RFC-2119 boilerplate present. Fixed: OVOS-AUDIO-1 was absent from README spec table and CHANGELOG — both added. Spec filename audio-out.md is already using clean naming (consistent with #55). Note: after #55 merges, GLOSSARY.md in this branch will need ovos-intent-*.md links updated.

JarbasAl added a commit that referenced this pull request Jun 22, 2026
…isten field

Consolidates the PIPELINE-1 companion edits previously bundled into the
union-slots (#56), FALLBACK-1 (#39), COMMON-QUERY-1 (#40) and AUDIO-1 (#38)
feature PRs into a single one-file change to ovos-pipeline-1.md.

- §6.1/§6.2 — orchestrator backstop for required_slots (INTENT-3 §5.3):
  the orchestrator treats a Match as declined if any required slot is
  absent. Second line of defense behind engine-side enforcement.
- §7.3 — reserve intent_names "fallback" (FALLBACK-1 §6.3) and
  "common_query" (COMMON-QUERY-1 §3). COMMON-QUERY-1 asserted the
  reservation but never registered the row; this closes that gap.
- §9.6 — add the OPTIONAL listen field to ovos.utterance.speak; the
  output-side behaviour is owned by AUDIO-1.

All additions are backwards-compatible. PIPELINE-1 is already V2 (its
namespaced topics replace the pre-spec names); these refinements do not
change the class, so the Version stays 2. Adds the missing PIPELINE-1
CHANGELOG section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JarbasAl JarbasAl changed the title OVOS-AUDIO-1: Audio Output Service Specification AUDIO-1: audio output service specification Jun 22, 2026
@JarbasAl JarbasAl force-pushed the spec/audio branch 2 times, most recently from 30743e1 to bda203b Compare June 22, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant