Skip to content

Wanna Discuss Add Voice Text#578

Merged
arul28 merged 5 commits into
mainfrom
ade/wanna-discuss-add-voice-text-46c24910
Jun 15, 2026
Merged

Wanna Discuss Add Voice Text#578
arul28 merged 5 commits into
mainfrom
ade/wanna-discuss-add-voice-text-46c24910

Conversation

@arul28

@arul28 arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

Describe the change.

What Changed

Key files and behaviors.

Validation

How you tested.

Risks

Anything to watch.

ADE   Open in ADE  ·  ade/wanna-discuss-add-voice-text-46c24910 branch  ·  PR #578

Summary by CodeRabbit

  • New Features
    • Added on-device voice dictation to chat composers with microphone input controls
    • Added voice input toggle to app settings (Desktop and iOS)
    • Added live recording indicator displaying elapsed time and audio levels
    • Integrated voice glossary for improved speech-to-text accuracy with contextual term recognition and automatic corrections
    • Added microphone permission handling and status indicators

Greptile Summary

This PR adds on-device voice dictation to both the desktop (Electron/whisper.cpp) and iOS (iOS 26 SpeechAnalyzer) chat composers, with a shared deterministic cleanup pipeline, a Dynamic Island Live Activity, and a voice glossary for domain-term recognition.

  • Desktop: A module-level GlobalVoiceRecorder singleton captures 16 kHz mono PCM via Web Audio, ships it to a main-process TranscriptionService that shells out to a bundled whisper.cpp binary, then runs a deterministic cleanup pass (filler removal → corrections → capitalization → spacing) before inserting into the registered composer and copying to clipboard as a recovery net.
  • iOS (iOS 26+): A SpeechDictationService streams buffers into a SpeechAnalyzer/SpeechTranscriber pipeline; a DictationController singleton lifts recording above individual composers so it survives navigation, drives a Live Activity Dynamic Island, and routes the cleaned transcript to whichever composer is visible.
  • Shared glossary: A single voice-glossary.json (corrections + contextual terms + fillers) is bundled on both platforms; the deterministic cleanup algorithms on both sides now agree on capitalizeSentences behavior including non-letter leading characters.

Confidence Score: 5/5

Safe to merge — all previously flagged correctness issues are resolved; remaining findings are documentation and tooling quality concerns only.

Large feature addition with well-structured platform isolation; core recording lifecycles (generation guards, teardown paths, single-flight transcription queue) are correctly implemented on both platforms. The generation-based cancel guard in GlobalVoiceRecorder correctly handles all cancel-during-getUserMedia race windows. The iOS SpeechDictationService isStarting guard, defer teardown, and double-safe cancelIOS26 calls are all sound. The cleanup pipelines on both platforms now agree. The three flagged items are documentation and tooling quality concerns, not correctness bugs in the shipped feature.

apps/desktop/src/renderer/services/globalVoiceRecorder.ts (inaccurate IPC comment), apps/ios/ADE/Services/Dictation/SpeechDictationService.swift (locale fallback comment vs. implementation), apps/desktop/scripts/materialize-whisper-resources.mjs (build-step timeout)

Important Files Changed

Filename Overview
apps/desktop/src/renderer/services/globalVoiceRecorder.ts New module-level singleton for Web Audio PCM capture; generation-based cancel guard correctly aborts getUserMedia and prevents orphaned audio graphs; misleading "zero-copy" comment for the IPC ArrayBuffer transfer.
apps/desktop/src/main/services/transcription/transcriptionService.ts Well-implemented whisper.cpp wrapper: serialized single-flight queue, proper temp-file cleanup, WAV header written without native deps, typed error codes, configurable process timeout with SIGKILL fallback.
apps/ios/ADE/Services/Dictation/SpeechDictationService.swift iOS 26 SpeechAnalyzer integration with generation-based cancellation, proper defer/catch teardown, and isStarting guard; preferredLocale comment claims a device-locale fallback that isn't implemented.
apps/ios/ADE/Services/Dictation/DictationController.swift App-level singleton wiring dictation lifecycle to Live Activity and insertion targets; isFinishing guard, Combine republishing pattern, and double-safe cancelIOS26 calls are all correct.
apps/desktop/src/main/services/transcription/dictationCleanup.ts Deterministic transcript cleanup (trim → fillers → corrections → capitalize → spacing); longest-first sort, module-level cache by path, and test-only reset hook are all correct.
apps/ios/ADE/Services/Dictation/DictationCleanup.swift Swift counterpart to the TS cleanup pipeline; capitalizeSentences now correctly preserves pending capitalization through non-letter leading chars (matches TS behavior).
apps/desktop/scripts/materialize-whisper-resources.mjs Download + build script for whisper.cpp; downloadFile now has a configurable timeout, but spawnStep used for git clone and cmake has no timeout protection against hung builds.
apps/desktop/src/renderer/components/chat/AgentChatComposer.tsx Voice dictation wired into composer via ref-based insertion target registration; stale-closure issue on draft is correctly avoided using insertDictatedTextRef; shimmer animation injects CSS once via a style tag id guard.

Sequence Diagram

sequenceDiagram
    participant UI as Composer UI
    participant GVR as GlobalVoiceRecorder
    participant Store as Root App Store
    participant IPC as Preload IPC
    participant TS as TranscriptionService (main)
    participant Whisper as whisper.cpp

    UI->>GVR: start()
    GVR->>Store: setDictationPhase("recording") [optimistic]
    GVR->>IPC: requestMicAccess()
    IPC-->>GVR: "{status: "granted"}"
    GVR->>GVR: getUserMedia() + build AudioGraph
    note over GVR: ScriptProcessorNode collects Float32 chunks

    UI->>GVR: finish()
    GVR->>Store: setDictationPhase("transcribing")
    GVR->>GVR: downsampleToInt16(merged, srcRate)
    GVR->>IPC: "transcribe(pcm.buffer, {format:"int16"})"
    IPC->>TS: ipcMain.handle(transcriptionTranscribe)
    TS->>TS: pcmToWavBuffer() → write tmp WAV
    TS->>Whisper: spawn whisper-cli -m model -f wav -oj -np
    Whisper-->>TS: WAV.json sidecar
    TS->>TS: parseWhisperJson() + cleanTranscript()
    TS-->>IPC: "{raw, cleaned}"
    IPC-->>GVR: TranscriptionResult
    GVR->>Store: activeDictationTarget.insertText(cleaned)
    GVR->>Store: resetDictationSession() → "idle"
Loading

Reviews (5): Last reviewed commit: "ship: iteration 4 - address dictation re..." | Re-trigger Greptile

@vercel

vercel Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
ade Ignored Ignored Preview Jun 15, 2026 8:38am

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@arul28, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 8 minutes and 26 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ff0d988-c84b-4a79-993b-6d0c67b18f74

📥 Commits

Reviewing files that changed from the base of the PR and between b9006de and f996a3e.

⛔ Files ignored due to path filters (4)
  • apps/ios/ADE.xcodeproj/project.pbxproj is excluded by !**/*.xcodeproj/project.pbxproj
  • docs/features/chat/composer-and-ui.md is excluded by !docs/**
  • docs/features/onboarding-and-settings/README.md is excluded by !docs/**
  • docs/features/sync-and-multi-device/ios-companion.md is excluded by !docs/**
📒 Files selected for processing (55)
  • .gitignore
  • apps/desktop/build/entitlements.mac.inherit.plist
  • apps/desktop/build/entitlements.mac.plist
  • apps/desktop/package.json
  • apps/desktop/resources/voice/voice-glossary.json
  • apps/desktop/resources/whisper/.gitkeep
  • apps/desktop/resources/whisper/README.md
  • apps/desktop/scripts/materialize-whisper-resources.mjs
  • apps/desktop/scripts/validate-whisper-resources.mjs
  • apps/desktop/src/main/main.ts
  • apps/desktop/src/main/services/ipc/ipcTimeouts.test.ts
  • apps/desktop/src/main/services/ipc/ipcTimeouts.ts
  • apps/desktop/src/main/services/ipc/registerIpc.ts
  • apps/desktop/src/main/services/transcription/bundledResources.ts
  • apps/desktop/src/main/services/transcription/dictationCleanup.ts
  • apps/desktop/src/main/services/transcription/transcriptionService.test.ts
  • apps/desktop/src/main/services/transcription/transcriptionService.ts
  • apps/desktop/src/preload/global.d.ts
  • apps/desktop/src/preload/preload.ts
  • apps/desktop/src/renderer/components/app/App.tsx
  • apps/desktop/src/renderer/components/app/App.workKeepAlive.test.tsx
  • apps/desktop/src/renderer/components/app/TopBar.tsx
  • apps/desktop/src/renderer/components/chat/AgentChatComposer.tsx
  • apps/desktop/src/renderer/components/chat/VoiceDictationButton.tsx
  • apps/desktop/src/renderer/components/settings/AppearanceSection.tsx
  • apps/desktop/src/renderer/components/settings/DictationSection.tsx
  • apps/desktop/src/renderer/components/voice/GlobalVoiceCaptureIndicator.tsx
  • apps/desktop/src/renderer/components/voice/RecordingPill.tsx
  • apps/desktop/src/renderer/hooks/useVoiceModelInstalled.ts
  • apps/desktop/src/renderer/services/globalVoiceRecorder.ts
  • apps/desktop/src/renderer/state/appStore.ts
  • apps/desktop/src/shared/ipc.ts
  • apps/ios/ADE/App/ADEApp.swift
  • apps/ios/ADE/App/ContentView.swift
  • apps/ios/ADE/Info.plist
  • apps/ios/ADE/Resources/VoiceGlossary.json
  • apps/ios/ADE/Services/Dictation/DictationCleanup.swift
  • apps/ios/ADE/Services/Dictation/DictationController.swift
  • apps/ios/ADE/Services/Dictation/SpeechDictationService.swift
  • apps/ios/ADE/Services/Dictation/VoiceGlossary.swift
  • apps/ios/ADE/Shared/DictationActivityShared.swift
  • apps/ios/ADE/Views/Components/DictationMicButton.swift
  • apps/ios/ADE/Views/Components/GlobalDictationPill.swift
  • apps/ios/ADE/Views/Components/RecordingPill.swift
  • apps/ios/ADE/Views/Settings/ConnectionSettingsView.swift
  • apps/ios/ADE/Views/Settings/SettingsVoiceInputSection.swift
  • apps/ios/ADE/Views/Work/WorkChatComposerAndInputViews.swift
  • apps/ios/ADE/Views/Work/WorkChatSessionView.swift
  • apps/ios/ADE/Views/Work/WorkNewChatScreen.swift
  • apps/ios/ADE/Views/Work/WorkNewChatSheet.swift
  • apps/ios/ADE/Views/Work/WorkPreviews.swift
  • apps/ios/ADE/Views/Work/WorkRootScreen.swift
  • apps/ios/ADETests/ADETests.swift
  • apps/ios/ADEWidgets/ADEWidgetBundle.swift
  • apps/ios/ADEWidgets/DictationLiveActivity.swift
📝 Walkthrough

Walkthrough

Adds end-to-end voice dictation to both the Electron desktop app and the iOS app. On desktop, a Node.js script materializes whisper.cpp binaries and a base English model, a main-process transcription service encodes PCM to WAV and spawns whisper-cli, and three new IPC channels expose transcription, status, and mic-access to the renderer. On iOS, a new SpeechDictationService drives the iOS 26 SpeechAnalyzer/SpeechTranscriber pipeline and a DictationController routes transcripts to composer insertion targets or clipboard. Both platforms share a deterministic cleanup pipeline (filler removal, longest-first corrections, sentence capitalization) driven by a shared VoiceGlossary.json resource.

Changes

Desktop Voice Dictation

Layer / File(s) Summary
Build infrastructure: entitlements, packaging, gitignore
.gitignore, apps/desktop/build/entitlements.mac*.plist, apps/desktop/package.json, apps/desktop/resources/whisper/README.md
Adds com.apple.security.device.audio-input to both macOS entitlement plists, updates electron-builder extraResources to package voice/ and whisper/ directories, expands the x64ArchFiles glob, wires materialize/validate steps into all dist scripts, and gitignores built whisper binaries.
Whisper resource materialization and validation scripts
apps/desktop/scripts/materialize-whisper-resources.mjs, apps/desktop/scripts/validate-whisper-resources.mjs, apps/desktop/resources/voice/voice-glossary.json
materialize-whisper-resources.mjs downloads ggml-base.en.bin and a platform-specific whisper-cli (or builds from source via cmake/git). validate-whisper-resources.mjs checks file presence, minimum size, and executable bits. Adds the voice glossary JSON with contextual terms, corrections, and fillers.
Transcription service: bundled resource resolution, cleanup, and whisper execution
apps/desktop/src/main/services/transcription/bundledResources.ts, apps/desktop/src/main/services/transcription/dictationCleanup.ts, apps/desktop/src/main/services/transcription/transcriptionService.ts, apps/desktop/src/main/services/transcription/transcriptionService.test.ts
resolveBundledResource handles packaged vs dev path lookup. dictationCleanup defines VoiceGlossary/PreparedGlossary types and a cleanTranscript pipeline (filler removal, longest-first corrections, sentence capitalization, whitespace normalization). transcriptionService encodes PCM to 16kHz WAV, spawns whisper-cli with -oj, parses JSON sidecar output, serializes requests via a promise queue, and exposes typed TranscriptionError codes. Tests validate cleanup behavior and buildWhisperArgs.
IPC channels, main-process wiring, and preload bridge
apps/desktop/src/shared/ipc.ts, apps/desktop/src/main/main.ts, apps/desktop/src/main/services/ipc/registerIpc.ts, apps/desktop/src/preload/preload.ts, apps/desktop/src/preload/global.d.ts
Adds three IPC channel constants. main.ts lazily constructs a shared TranscriptionService singleton and disposes it on will-quit. registerIpc adds transcriptionTranscribe (PCM normalization, typed error re-throw), transcriptionStatus, and transcriptionRequestMicAccess (macOS systemPreferences mic-access flow). The preload contextBridge and global.d.ts expose window.ade.transcription.
Renderer Zustand store: voiceInputEnabled and dictation session state
apps/desktop/src/renderer/state/appStore.ts
Adds persisted voiceInputEnabled to PersistedUserPreferences and propagates it to project stores. Adds ephemeral dictation session state (DictationPhase, dictationElapsed, dictationLevels, activeDictationTarget) with full setter suite, resetDictationSession, and register/unregisterDictationTarget. Exports rootAppStoreApi and useRootAppStore.
GlobalVoiceRecorder singleton: Web Audio capture, waveform, and transcription
apps/desktop/src/renderer/services/globalVoiceRecorder.ts
Module-level singleton managing mic capture via Web Audio (MediaStreamSource → Analyser + ScriptProcessorNode). start() optionally checks macOS mic permission, publishes waveform levels and elapsed time to the root store. finish() downsamples PCM to 16kHz Int16, calls window.ade.transcription.transcribe, inserts cleaned text into the active dictation target, and copies to clipboard as fallback.
RecordingPill, GlobalVoiceCaptureIndicator, and VoiceDictationButton
apps/desktop/src/renderer/components/voice/RecordingPill.tsx, apps/desktop/src/renderer/components/voice/GlobalVoiceCaptureIndicator.tsx, apps/desktop/src/renderer/components/chat/VoiceDictationButton.tsx, apps/desktop/src/renderer/components/app/TopBar.tsx
RecordingPill renders idle/transcribing/recording states with a waveform bar visualization, prefersReducedMotion support, and cancel/done controls. GlobalVoiceCaptureIndicator subscribes to root store dictation state and renders the pill in TopBar when not idle. VoiceDictationButton maps typed recorder errors to user messages and renders either the pill or a mic button.
AgentChatComposer integration and DictationSection settings
apps/desktop/src/renderer/components/chat/AgentChatComposer.tsx, apps/desktop/src/renderer/hooks/useVoiceModelInstalled.ts, apps/desktop/src/renderer/components/settings/DictationSection.tsx, apps/desktop/src/renderer/components/settings/AppearanceSection.tsx, apps/desktop/src/renderer/components/app/App.tsx, apps/desktop/src/renderer/components/app/App.workKeepAlive.test.tsx
AgentChatComposer injects shimmer keyframes, registers itself as the active dictation target via rootAppStoreApi, inserts dictated text at the caret/saved selection, and renders a dismissible error banner plus a VoiceDictationButton or disabled mic icon. useVoiceModelInstalled probes window.ade.transcription.status. DictationSection renders a voice-input checkbox with a "model not installed" notice. voiceInputEnabled is added to rootPrefs hydration.

iOS Voice Dictation

Layer / File(s) Summary
iOS permissions and app-level bootstrap
apps/ios/ADE/Info.plist, apps/ios/ADE/App/ADEApp.swift, apps/ios/ADE/App/ContentView.swift
Declares microphone and speech recognition usage strings and the audio background mode in Info.plist. ADEApp owns a @StateObject DictationController injected as an environment object. ContentView adds GlobalDictationPill() as a top safe-area inset.
VoiceGlossary model, loading, and DictationCleanup pipeline
apps/ios/ADE/Resources/VoiceGlossary.json, apps/ios/ADE/Services/Dictation/VoiceGlossary.swift, apps/ios/ADE/Services/Dictation/DictationCleanup.swift, apps/ios/ADETests/ADETests.swift
Adds VoiceGlossary.json with contextual terms, corrections, and fillers. VoiceGlossary.swift defines the model with a cached .shared loader that sorts corrections longest-first. DictationCleanup.clean applies the same deterministic pipeline as desktop: filler removal, corrections, sentence capitalization, and spacing normalization.
SpeechDictationService: AVAudioEngine + iOS 26 SpeechAnalyzer
apps/ios/ADE/Services/Dictation/SpeechDictationService.swift
SpeechDictationService (@MainActor ObservableObject) drives AVAudioEngine and the iOS 26 SpeechAnalyzer/SpeechTranscriber pipeline. startIOS26 installs locale model assets, injects contextual vocabulary into AnalysisContext, starts the analyzer with an AsyncStream, and installs an audio input tap. DictationBufferConverter handles mic-to-analyzer format conversion. Exponential-moving-average RMS levels power the waveform.
DictationController orchestration and Live Activity shared contract
apps/ios/ADE/Services/Dictation/DictationController.swift, apps/ios/ADE/Shared/DictationActivityShared.swift
DictationActivityAttributes/ContentState define the Live Activity contract (waveform levels, elapsed time, isFinishing) shared between the app and widget targets. DictationDoneIntent/DictationCancelIntent dispatch through DictationActivityActionRegistry. DictationController owns SpeechDictationService, maintains an insertion-target registry with visibility tracking, routes finished transcripts to the visible target or clipboard, and manages throttled ActivityKit Live Activity updates.
RecordingPill, DictationMicButton, GlobalDictationPill
apps/ios/ADE/Views/Components/RecordingPill.swift, apps/ios/ADE/Views/Components/DictationMicButton.swift, apps/ios/ADE/Views/Components/GlobalDictationPill.swift
RecordingPill shows elapsed time, a DictationWaveform with Canvas/TimelineView easing (Reduce Motion aware), and cancel/done controls. DictationMicButton registers with DictationController, inserts cleaned transcripts with smart leading-space and a UTF-16 undo span, shows a timed DictationRawUndoChip. GlobalDictationPill shows the pill globally when the active target is not visible.
Composer surface integrations
apps/ios/ADE/Views/Work/WorkChatSessionView.swift, apps/ios/ADE/Views/Work/WorkChatComposerAndInputViews.swift, apps/ios/ADE/Views/Work/WorkNewChatScreen.swift, apps/ios/ADE/Views/Work/WorkNewChatSheet.swift, apps/ios/ADE/Views/Work/WorkRootScreen.swift, apps/ios/ADE/Views/Work/WorkPreviews.swift
Threads dictationTargetId through WorkChatComposerCardWorkChatComposerDraftInput. Adds DictationMicButton + DictationInsertionCoordinator to WorkChatComposerDraftInput, WorkQueuedSteerRow, WorkNewChatComposerBar, and WorkNewChatSheet. WorkRootScreen re-injects dictationController into navigationDestination pushed views.
Settings section and Live Activity widget
apps/ios/ADE/Views/Settings/SettingsVoiceInputSection.swift, apps/ios/ADE/Views/Settings/ConnectionSettingsView.swift, apps/ios/ADEWidgets/DictationLiveActivity.swift, apps/ios/ADEWidgets/ADEWidgetBundle.swift
Adds SettingsVoiceInputSection (AppStorage-backed voice input toggle) into ConnectionSettingsView. DictationLiveActivity implements an iOS 17+ Live Activity with a Dynamic Island layout (mic glyph, waveform, timer, interactive DictationDoneIntent Done button) and a lock-screen view. Registered in ADEWidgetBundle behind #available(iOS 17.0, *).

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

desktop, ios

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ade/wanna-discuss-add-voice-text-46c24910

@mintlify

mintlify Bot commented Jun 15, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
ade-ac1c6011 🟢 Ready View Preview Jun 15, 2026, 7:09 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@arul28

arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@copilot review but do not make fixes

Comment thread apps/desktop/src/renderer/services/globalVoiceRecorder.ts
Comment thread apps/desktop/src/renderer/services/globalVoiceRecorder.ts
Comment thread apps/ios/ADE/Services/Dictation/DictationCleanup.swift
Comment thread apps/desktop/src/renderer/services/globalVoiceRecorder.ts
Comment thread apps/desktop/scripts/materialize-whisper-resources.mjs Outdated
@arul28

arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@codex review

Comment thread apps/ios/ADE/Services/Dictation/SpeechDictationService.swift

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 696f9fc0a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

didYouKnowEnabled: rootState.didYouKnowEnabled,
launchPromptClipboardEnabled: rootState.launchPromptClipboardEnabled,
launchPromptClipboardNoticeEnabled: rootState.launchPromptClipboardNoticeEnabled,
voiceInputEnabled: rootState.voiceInputEnabled,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep voice preference synced into project stores

When settings or another project surface toggles voice input while a project store already exists, this initial copy stays stale because ProjectTabHost only re-hydrates open project stores from its rootPrefs selector, and that selector does not include voiceInputEnabled. As a result, the new Settings checkbox and chat composers inside the active project can continue reading the old value until the project surface is recreated/reloaded, so disabling voice input may leave the mic visible/enabled (or enabling it may not show the mic). Add this preference to the root-pref hydration path or read it from the root store like the dictation session state.

Useful? React with 👍 / 👎.

@arul28

arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9006de324

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +491 to +493
if converter == nil || converter?.outputFormat != format {
converter = AVAudioConverter(from: inputFormat, to: format)
converter?.primeMethod = .none

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recreate the converter when the mic format changes

If a user records once, then changes audio route or sample rate before the next recording (for example switching to AirPods/Bluetooth), this cached converter is reused as long as the analyzer output format is unchanged. The converter was created for the old inputFormat, so subsequent buffers with a different format can fail conversion and get dropped by try?, leaving the analyzer with little or no audio. Include the input format in the cache check or reset the converter between sessions.

Useful? React with 👍 / 👎.

@arul28

arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/desktop/scripts/materialize-whisper-resources.mjs`:
- Around line 48-55: The whisperBinarySpecForHost function currently selects the
whisper-cli binary based on the host architecture, which can produce an
arm64-only binary even when building universal macOS artifacts, causing exec
format errors on Intel Macs. Modify the function to detect when building for
universal macOS (check for an environment variable or build context that
indicates universal build mode) and in that case, use a universal binary target
(such as darwin-universal) instead of the host architecture-based target. This
ensures that the universal build gets a compatible binary regardless of which
architecture the build runs on.
- Around line 141-142: The console.log statements are logging full URLs that may
contain sensitive query tokens from signed URLs, risking secret exposure in CI
logs. Remove or redact the MODEL_URL parameter from the log messages to prevent
secrets from being exposed. Keep the MODEL_BASENAME and other descriptive
information visible, but sanitize the URL by either omitting the query
parameters, replacing the URL with a redacted placeholder, or extracting only
the base path without query strings. Apply this same fix to both logging
locations mentioned in the comment (the downloadFile call and the additional
location at lines 270-271).
- Around line 204-210: The code is currently falling back to cloning the default
branch when the pinned WHISPER_SRC_REF cannot be cloned, which breaks
reproducibility. Instead of the fallback behavior, remove the spawnStep call
that clones WHISPER_SRC_REPO without the pinned ref and replace it with a throw
statement or process.exit call after the warning log. This ensures the script
fails if the pinned ref cannot be cloned, maintaining reproducibility of release
artifacts.

In `@apps/desktop/scripts/validate-whisper-resources.mjs`:
- Around line 50-52: The statFile call for the voice glossary only validates
file existence but not JSON structure. After the statFile check for the glossary
file path (voice-glossary.json in voiceRoot), add additional validation to read
and parse the JSON content to ensure it is well-formed. If the JSON parsing
fails or the structure is invalid, throw an error so the build fails early
rather than allowing malformed JSON to pass through and degrade behavior at
runtime.

In `@apps/desktop/src/main/main.ts`:
- Around line 5933-5936: The sharedTranscriptionService disposal code currently
lives only in the will-quit handler, but when the app is not ready during
shutdown, finalizeAppExit calls process.exit() directly, which bypasses the
will-quit event and leaves the service undisposed. Move the
sharedTranscriptionService?.dispose() and sharedTranscriptionService = null
statements from the will-quit handler to the runImmediateProcessCleanup function
so the disposal runs before both app.exit() (in the ready path) and
process.exit() (in the non-ready path) calls, ensuring the transcription service
and its child processes are properly cleaned up regardless of app readiness
state.

In `@apps/desktop/src/main/services/ipc/registerIpc.ts`:
- Around line 6198-6222: The IPC.transcriptionTranscribe handler returns
transcription results containing raw and cleaned text that can be exposed in
main-process logs through the global IPC wrapper's result logging. You need to
add redaction for this channel to prevent sensitive dictated text from appearing
in logs. First, add result redaction for the TranscriptionResult return value
(which contains the raw and cleaned transcript text) to mask the transcribed
content. Second, explicitly redact the PCM audio argument in the handler to
prevent raw audio data from being logged during IPC tracing. This redaction
should be applied at the IPC wrapper level where results and arguments are
logged for slow or verbose calls, not just at the handler implementation itself.
- Around line 6210-6222: The code in the PCM buffer handling logic accepts
renderer-controlled input without sufficient validation before constructing
typed arrays and forwarding to the transcribe service. Add validation checks
before creating the typed arrays: validate that the format parameter is one of
the expected values (such as "float32" or "int16"), verify that the sampleRate
is a finite positive number, check that the buffer byte alignment is correct for
the specified format, and enforce reasonable limits on the total audio duration
or buffer size to prevent resource exhaustion. Perform these validations after
checking for an empty buffer but before constructing the Float32Array or
Int16Array from the buffer, throwing descriptive errors for any validation
failures.

In `@apps/desktop/src/main/services/transcription/dictationCleanup.ts`:
- Around line 184-187: In the catch block that handles read/parse failures
(lines 184-187), remove the lines that cache the fallback EMPTY_GLOSSARY and its
path (the assignments to cachedGlossary and cachedGlossaryPath). Instead, only
return EMPTY_GLOSSARY without caching it, so that transient read/parse errors
allow retry attempts on subsequent calls rather than permanently disabling
glossary cleanup until process restart.
- Line 67: The sort comparator in the dictationCleanup.ts file at the sorting of
replacement phrases currently only sorts by length, which makes the order of
equal-length phrases non-deterministic and inconsistent with iOS behavior.
Modify the sort function to add a deterministic tie-break: when two phrases have
the same length (when b.from.length - a.from.length equals zero), add a
secondary comparison to sort them alphabetically by their from string value.
This ensures consistent replacement order for overlapping entries across
platforms.

In `@apps/desktop/src/main/services/transcription/transcriptionService.ts`:
- Around line 314-320: The sampleRate variable extracted from IPC options is not
validated before being used in the pcmToWavBuffer function call. Add validation
after the sampleRate assignment (where it defaults to TARGET_SAMPLE_RATE if not
provided) to ensure the value is a finite positive number. If validation fails,
throw a typed TranscriptionError instead of allowing invalid values to propagate
to pcmToWavBuffer, which could cause unhandled exceptions during WAV header
writes and bypass proper error handling.
- Around line 251-291: Add a hard timeout mechanism to prevent the whisper child
process from hanging indefinitely. In the Promise returned by this block, set a
timeout using setTimeout after spawning the child process. If the timeout
expires before the child process closes, kill the child process using
child.kill(), remove it from activeChildren, and reject the promise with an
appropriate TranscriptionError. Make sure to clear the timeout in the
child.on("close") handler to prevent it from firing after the process has
already completed successfully.

In `@apps/desktop/src/renderer/services/globalVoiceRecorder.ts`:
- Around line 274-287: The insertText call on target is not guarded by
try-catch, so if it throws an exception, control will jump out and skip the
clipboard fallback write below, causing the transcript to be lost. Wrap the
target.insertText(cleaned) call in its own try-catch block to ensure any
exceptions it throws are caught and do not prevent the clipboard write fallback
from executing. This preserves the intended behavior where the clipboard write
acts as a recovery mechanism.

In `@apps/ios/ADE/Resources/VoiceGlossary.json`:
- Around line 4-110: The contextualTerms array in VoiceGlossary.json currently
contains 105 items, exceeding the file's documented constraint of approximately
100 phrases. Reduce the array to align with this stated limit by removing or
deprioritizing less critical terms. Focus on keeping the most essential and
frequently used contextual terms for voice recognition, particularly those
related to the core ADE functionality and widely-used development tools and
frameworks.

In `@apps/ios/ADE/Services/Dictation/DictationCleanup.swift`:
- Around line 94-96: The line constructing
Character(String(character).uppercased()) in the capitalizeNext block can trap
at runtime when uppercase mapping expands to multiple characters (for example,
"ß" becomes "SS"). Instead of appending a Character initialized with the
uppercased string, directly append the uppercased String value to the result
variable to safely handle multi-character uppercase expansions.

In `@apps/ios/ADE/Services/Dictation/DictationController.swift`:
- Around line 64-71: The DictationController is mirroring audioLevel,
elapsedTime, and isRecording from SpeechDictationService as `@Published`
properties, but isPreparing and isStarting are not mirrored, leaving the UI
unaware of startup and download phases. Add `@Published` private(set) properties
for isPreparing and isStarting in DictationController to mirror these state
values from SpeechDictationService, then update DictationMicButton to include
these mirrored properties in its UI state gating and disable logic so users
receive deterministic preparation feedback during startup.

In `@apps/ios/ADE/Services/Dictation/SpeechDictationService.swift`:
- Around line 487-493: The converter recreation condition in the convert method
only checks if the outputFormat has changed, but during audio route changes the
inputFormat can also change independently. Update the condition that checks
whether to recreate the converter to also validate that the current converter's
inputFormat matches the buffer's inputFormat, in addition to the existing
outputFormat check. This ensures the cached converter is recreated when either
the input or output format changes, preventing stale converters from causing
conversion failures and dropped transcription buffers.

In `@apps/ios/ADE/Views/Work/WorkNewChatSheet.swift`:
- Around line 22-24: The sheet actions can still be triggered while dictation is
active because the `canStartChat` condition does not account for the
`isDictating` state. Modify the `canStartChat` logic to include a check that
prevents the Start action from firing when `isDictating` is true. This will
block sheet dismissal while an active recording is in progress and prevent loss
or redirection of the transcript insertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0453fa03-a6b0-42e2-b1f4-e7307ccbf85a

📥 Commits

Reviewing files that changed from the base of the PR and between 1603520 and b9006de.

⛔ Files ignored due to path filters (4)
  • apps/ios/ADE.xcodeproj/project.pbxproj is excluded by !**/*.xcodeproj/project.pbxproj
  • docs/features/chat/composer-and-ui.md is excluded by !docs/**
  • docs/features/onboarding-and-settings/README.md is excluded by !docs/**
  • docs/features/sync-and-multi-device/ios-companion.md is excluded by !docs/**
📒 Files selected for processing (53)
  • .gitignore
  • apps/desktop/build/entitlements.mac.inherit.plist
  • apps/desktop/build/entitlements.mac.plist
  • apps/desktop/package.json
  • apps/desktop/resources/voice/voice-glossary.json
  • apps/desktop/resources/whisper/.gitkeep
  • apps/desktop/resources/whisper/README.md
  • apps/desktop/scripts/materialize-whisper-resources.mjs
  • apps/desktop/scripts/validate-whisper-resources.mjs
  • apps/desktop/src/main/main.ts
  • apps/desktop/src/main/services/ipc/registerIpc.ts
  • apps/desktop/src/main/services/transcription/bundledResources.ts
  • apps/desktop/src/main/services/transcription/dictationCleanup.ts
  • apps/desktop/src/main/services/transcription/transcriptionService.test.ts
  • apps/desktop/src/main/services/transcription/transcriptionService.ts
  • apps/desktop/src/preload/global.d.ts
  • apps/desktop/src/preload/preload.ts
  • apps/desktop/src/renderer/components/app/App.tsx
  • apps/desktop/src/renderer/components/app/App.workKeepAlive.test.tsx
  • apps/desktop/src/renderer/components/app/TopBar.tsx
  • apps/desktop/src/renderer/components/chat/AgentChatComposer.tsx
  • apps/desktop/src/renderer/components/chat/VoiceDictationButton.tsx
  • apps/desktop/src/renderer/components/settings/AppearanceSection.tsx
  • apps/desktop/src/renderer/components/settings/DictationSection.tsx
  • apps/desktop/src/renderer/components/voice/GlobalVoiceCaptureIndicator.tsx
  • apps/desktop/src/renderer/components/voice/RecordingPill.tsx
  • apps/desktop/src/renderer/hooks/useVoiceModelInstalled.ts
  • apps/desktop/src/renderer/services/globalVoiceRecorder.ts
  • apps/desktop/src/renderer/state/appStore.ts
  • apps/desktop/src/shared/ipc.ts
  • apps/ios/ADE/App/ADEApp.swift
  • apps/ios/ADE/App/ContentView.swift
  • apps/ios/ADE/Info.plist
  • apps/ios/ADE/Resources/VoiceGlossary.json
  • apps/ios/ADE/Services/Dictation/DictationCleanup.swift
  • apps/ios/ADE/Services/Dictation/DictationController.swift
  • apps/ios/ADE/Services/Dictation/SpeechDictationService.swift
  • apps/ios/ADE/Services/Dictation/VoiceGlossary.swift
  • apps/ios/ADE/Shared/DictationActivityShared.swift
  • apps/ios/ADE/Views/Components/DictationMicButton.swift
  • apps/ios/ADE/Views/Components/GlobalDictationPill.swift
  • apps/ios/ADE/Views/Components/RecordingPill.swift
  • apps/ios/ADE/Views/Settings/ConnectionSettingsView.swift
  • apps/ios/ADE/Views/Settings/SettingsVoiceInputSection.swift
  • apps/ios/ADE/Views/Work/WorkChatComposerAndInputViews.swift
  • apps/ios/ADE/Views/Work/WorkChatSessionView.swift
  • apps/ios/ADE/Views/Work/WorkNewChatScreen.swift
  • apps/ios/ADE/Views/Work/WorkNewChatSheet.swift
  • apps/ios/ADE/Views/Work/WorkPreviews.swift
  • apps/ios/ADE/Views/Work/WorkRootScreen.swift
  • apps/ios/ADETests/ADETests.swift
  • apps/ios/ADEWidgets/ADEWidgetBundle.swift
  • apps/ios/ADEWidgets/DictationLiveActivity.swift

Comment thread apps/desktop/scripts/materialize-whisper-resources.mjs
Comment thread apps/desktop/scripts/materialize-whisper-resources.mjs Outdated
Comment thread apps/desktop/scripts/materialize-whisper-resources.mjs Outdated
Comment thread apps/desktop/scripts/validate-whisper-resources.mjs
Comment thread apps/desktop/src/main/main.ts Outdated
Comment thread apps/ios/ADE/Resources/VoiceGlossary.json
Comment thread apps/ios/ADE/Services/Dictation/DictationCleanup.swift
Comment thread apps/ios/ADE/Services/Dictation/DictationController.swift
Comment thread apps/ios/ADE/Services/Dictation/SpeechDictationService.swift
Comment thread apps/ios/ADE/Views/Work/WorkNewChatSheet.swift

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 016ec16313

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +212 to +214
func cancelRecording() {
guard service.isRecording else { return }
ADEHaptics.warning()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow canceling a pending dictation start

When first use has to request/download iOS 26 Speech assets, startRecording has already set activeTargetId, but SpeechDictationService.start() does not set isRecording until after ensureModel and audio setup complete. This guard makes cancelRecording() a no-op throughout that isStarting/isPreparing window, so a user who leaves the composer or tries to stop the pending capture can still have recording begin later against their intent. Include the starting/preparing state in the cancellation path or keep the start task cancellable.

Useful? React with 👍 / 👎.

Comment on lines +47 to +50
HStack(spacing: 10) {
DictationIslandTimer(state: ctx.state)
DictationIslandDoneButton(isFinishing: ctx.state.isFinishing)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire Cancel into the Live Activity controls

The background/Dynamic Island dictation surface renders only the Done intent even though DictationCancelIntent and cancelFromLiveActivity() are defined. When ADE is backgrounded or locked, users can finish but cannot discard a recording without reopening the app, leaving an unwanted mic session running; add a cancel button/intent to the Live Activity surfaces where cancellation is promised.

Useful? React with 👍 / 👎.

Comment on lines +292 to +298
const raw = error instanceof Error ? error.message : String(error);
this.emitError(
raw.startsWith("model_not_installed:")
? "model_not_installed"
: raw.startsWith("empty_audio:")
? "no_audio"
: "transcribe_failed",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Match typed transcription errors after IPC wrapping

When transcription.transcribe throws from ipcMain.handle, Electron rejects ipcRenderer.invoke with a message prefixed by Error invoking remote method ..., so these startsWith checks miss the typed prefixes emitted by main. In no-audio or model-missing cases the composer reports the generic transcription failure instead of the intended actionable message; unwrap/search the message or return a structured error code.

Useful? React with 👍 / 👎.

@arul28 arul28 force-pushed the ade/wanna-discuss-add-voice-text-46c24910 branch from 016ec16 to f996a3e Compare June 15, 2026 08:38
@arul28

arul28 commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f996a3ef71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +25

const onCancel = useCallback(() => globalVoiceRecorder.cancel(), []);
const onDone = useCallback(() => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface dictation errors when the composer unmounts

When a user starts dictation in a composer, navigates away so VoiceDictationButton unmounts, and then taps Done from this global pill, globalVoiceRecorder.finish() can emit no_audio or transcribe_failed but there is no mounted onError listener because the only listener is registered by the composer button. The pill just disappears on failure, so the recording is lost without any explanation; keep a root/global error listener or surface failures from this control.

Useful? React with 👍 / 👎.

@arul28 arul28 merged commit c3a663a into main Jun 15, 2026
27 checks passed
@arul28 arul28 deleted the ade/wanna-discuss-add-voice-text-46c24910 branch June 15, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant