Wanna Discuss Add Voice Text#578
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
|
Warning Review limit reached
More reviews will be available in 8 minutes and 26 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (4)
📒 Files selected for processing (55)
📝 WalkthroughWalkthroughAdds end-to-end voice dictation to both the Electron desktop app and the iOS app. On desktop, a Node.js script materializes whisper.cpp binaries and a base English model, a main-process transcription service encodes PCM to WAV and spawns whisper-cli, and three new IPC channels expose transcription, status, and mic-access to the renderer. On iOS, a new ChangesDesktop Voice Dictation
iOS Voice Dictation
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested labels
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
@copilot review but do not make fixes |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 696f9fc0a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| didYouKnowEnabled: rootState.didYouKnowEnabled, | ||
| launchPromptClipboardEnabled: rootState.launchPromptClipboardEnabled, | ||
| launchPromptClipboardNoticeEnabled: rootState.launchPromptClipboardNoticeEnabled, | ||
| voiceInputEnabled: rootState.voiceInputEnabled, |
There was a problem hiding this comment.
Keep voice preference synced into project stores
When settings or another project surface toggles voice input while a project store already exists, this initial copy stays stale because ProjectTabHost only re-hydrates open project stores from its rootPrefs selector, and that selector does not include voiceInputEnabled. As a result, the new Settings checkbox and chat composers inside the active project can continue reading the old value until the project surface is recreated/reloaded, so disabling voice input may leave the mic visible/enabled (or enabling it may not show the mic). Add this preference to the root-pref hydration path or read it from the root store like the dictation session state.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b9006de324
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if converter == nil || converter?.outputFormat != format { | ||
| converter = AVAudioConverter(from: inputFormat, to: format) | ||
| converter?.primeMethod = .none |
There was a problem hiding this comment.
Recreate the converter when the mic format changes
If a user records once, then changes audio route or sample rate before the next recording (for example switching to AirPods/Bluetooth), this cached converter is reused as long as the analyzer output format is unchanged. The converter was created for the old inputFormat, so subsequent buffers with a different format can fail conversion and get dropped by try?, leaving the analyzer with little or no audio. Include the input format in the cache check or reset the converter between sessions.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 17
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/desktop/scripts/materialize-whisper-resources.mjs`:
- Around line 48-55: The whisperBinarySpecForHost function currently selects the
whisper-cli binary based on the host architecture, which can produce an
arm64-only binary even when building universal macOS artifacts, causing exec
format errors on Intel Macs. Modify the function to detect when building for
universal macOS (check for an environment variable or build context that
indicates universal build mode) and in that case, use a universal binary target
(such as darwin-universal) instead of the host architecture-based target. This
ensures that the universal build gets a compatible binary regardless of which
architecture the build runs on.
- Around line 141-142: The console.log statements are logging full URLs that may
contain sensitive query tokens from signed URLs, risking secret exposure in CI
logs. Remove or redact the MODEL_URL parameter from the log messages to prevent
secrets from being exposed. Keep the MODEL_BASENAME and other descriptive
information visible, but sanitize the URL by either omitting the query
parameters, replacing the URL with a redacted placeholder, or extracting only
the base path without query strings. Apply this same fix to both logging
locations mentioned in the comment (the downloadFile call and the additional
location at lines 270-271).
- Around line 204-210: The code is currently falling back to cloning the default
branch when the pinned WHISPER_SRC_REF cannot be cloned, which breaks
reproducibility. Instead of the fallback behavior, remove the spawnStep call
that clones WHISPER_SRC_REPO without the pinned ref and replace it with a throw
statement or process.exit call after the warning log. This ensures the script
fails if the pinned ref cannot be cloned, maintaining reproducibility of release
artifacts.
In `@apps/desktop/scripts/validate-whisper-resources.mjs`:
- Around line 50-52: The statFile call for the voice glossary only validates
file existence but not JSON structure. After the statFile check for the glossary
file path (voice-glossary.json in voiceRoot), add additional validation to read
and parse the JSON content to ensure it is well-formed. If the JSON parsing
fails or the structure is invalid, throw an error so the build fails early
rather than allowing malformed JSON to pass through and degrade behavior at
runtime.
In `@apps/desktop/src/main/main.ts`:
- Around line 5933-5936: The sharedTranscriptionService disposal code currently
lives only in the will-quit handler, but when the app is not ready during
shutdown, finalizeAppExit calls process.exit() directly, which bypasses the
will-quit event and leaves the service undisposed. Move the
sharedTranscriptionService?.dispose() and sharedTranscriptionService = null
statements from the will-quit handler to the runImmediateProcessCleanup function
so the disposal runs before both app.exit() (in the ready path) and
process.exit() (in the non-ready path) calls, ensuring the transcription service
and its child processes are properly cleaned up regardless of app readiness
state.
In `@apps/desktop/src/main/services/ipc/registerIpc.ts`:
- Around line 6198-6222: The IPC.transcriptionTranscribe handler returns
transcription results containing raw and cleaned text that can be exposed in
main-process logs through the global IPC wrapper's result logging. You need to
add redaction for this channel to prevent sensitive dictated text from appearing
in logs. First, add result redaction for the TranscriptionResult return value
(which contains the raw and cleaned transcript text) to mask the transcribed
content. Second, explicitly redact the PCM audio argument in the handler to
prevent raw audio data from being logged during IPC tracing. This redaction
should be applied at the IPC wrapper level where results and arguments are
logged for slow or verbose calls, not just at the handler implementation itself.
- Around line 6210-6222: The code in the PCM buffer handling logic accepts
renderer-controlled input without sufficient validation before constructing
typed arrays and forwarding to the transcribe service. Add validation checks
before creating the typed arrays: validate that the format parameter is one of
the expected values (such as "float32" or "int16"), verify that the sampleRate
is a finite positive number, check that the buffer byte alignment is correct for
the specified format, and enforce reasonable limits on the total audio duration
or buffer size to prevent resource exhaustion. Perform these validations after
checking for an empty buffer but before constructing the Float32Array or
Int16Array from the buffer, throwing descriptive errors for any validation
failures.
In `@apps/desktop/src/main/services/transcription/dictationCleanup.ts`:
- Around line 184-187: In the catch block that handles read/parse failures
(lines 184-187), remove the lines that cache the fallback EMPTY_GLOSSARY and its
path (the assignments to cachedGlossary and cachedGlossaryPath). Instead, only
return EMPTY_GLOSSARY without caching it, so that transient read/parse errors
allow retry attempts on subsequent calls rather than permanently disabling
glossary cleanup until process restart.
- Line 67: The sort comparator in the dictationCleanup.ts file at the sorting of
replacement phrases currently only sorts by length, which makes the order of
equal-length phrases non-deterministic and inconsistent with iOS behavior.
Modify the sort function to add a deterministic tie-break: when two phrases have
the same length (when b.from.length - a.from.length equals zero), add a
secondary comparison to sort them alphabetically by their from string value.
This ensures consistent replacement order for overlapping entries across
platforms.
In `@apps/desktop/src/main/services/transcription/transcriptionService.ts`:
- Around line 314-320: The sampleRate variable extracted from IPC options is not
validated before being used in the pcmToWavBuffer function call. Add validation
after the sampleRate assignment (where it defaults to TARGET_SAMPLE_RATE if not
provided) to ensure the value is a finite positive number. If validation fails,
throw a typed TranscriptionError instead of allowing invalid values to propagate
to pcmToWavBuffer, which could cause unhandled exceptions during WAV header
writes and bypass proper error handling.
- Around line 251-291: Add a hard timeout mechanism to prevent the whisper child
process from hanging indefinitely. In the Promise returned by this block, set a
timeout using setTimeout after spawning the child process. If the timeout
expires before the child process closes, kill the child process using
child.kill(), remove it from activeChildren, and reject the promise with an
appropriate TranscriptionError. Make sure to clear the timeout in the
child.on("close") handler to prevent it from firing after the process has
already completed successfully.
In `@apps/desktop/src/renderer/services/globalVoiceRecorder.ts`:
- Around line 274-287: The insertText call on target is not guarded by
try-catch, so if it throws an exception, control will jump out and skip the
clipboard fallback write below, causing the transcript to be lost. Wrap the
target.insertText(cleaned) call in its own try-catch block to ensure any
exceptions it throws are caught and do not prevent the clipboard write fallback
from executing. This preserves the intended behavior where the clipboard write
acts as a recovery mechanism.
In `@apps/ios/ADE/Resources/VoiceGlossary.json`:
- Around line 4-110: The contextualTerms array in VoiceGlossary.json currently
contains 105 items, exceeding the file's documented constraint of approximately
100 phrases. Reduce the array to align with this stated limit by removing or
deprioritizing less critical terms. Focus on keeping the most essential and
frequently used contextual terms for voice recognition, particularly those
related to the core ADE functionality and widely-used development tools and
frameworks.
In `@apps/ios/ADE/Services/Dictation/DictationCleanup.swift`:
- Around line 94-96: The line constructing
Character(String(character).uppercased()) in the capitalizeNext block can trap
at runtime when uppercase mapping expands to multiple characters (for example,
"ß" becomes "SS"). Instead of appending a Character initialized with the
uppercased string, directly append the uppercased String value to the result
variable to safely handle multi-character uppercase expansions.
In `@apps/ios/ADE/Services/Dictation/DictationController.swift`:
- Around line 64-71: The DictationController is mirroring audioLevel,
elapsedTime, and isRecording from SpeechDictationService as `@Published`
properties, but isPreparing and isStarting are not mirrored, leaving the UI
unaware of startup and download phases. Add `@Published` private(set) properties
for isPreparing and isStarting in DictationController to mirror these state
values from SpeechDictationService, then update DictationMicButton to include
these mirrored properties in its UI state gating and disable logic so users
receive deterministic preparation feedback during startup.
In `@apps/ios/ADE/Services/Dictation/SpeechDictationService.swift`:
- Around line 487-493: The converter recreation condition in the convert method
only checks if the outputFormat has changed, but during audio route changes the
inputFormat can also change independently. Update the condition that checks
whether to recreate the converter to also validate that the current converter's
inputFormat matches the buffer's inputFormat, in addition to the existing
outputFormat check. This ensures the cached converter is recreated when either
the input or output format changes, preventing stale converters from causing
conversion failures and dropped transcription buffers.
In `@apps/ios/ADE/Views/Work/WorkNewChatSheet.swift`:
- Around line 22-24: The sheet actions can still be triggered while dictation is
active because the `canStartChat` condition does not account for the
`isDictating` state. Modify the `canStartChat` logic to include a check that
prevents the Start action from firing when `isDictating` is true. This will
block sheet dismissal while an active recording is in progress and prevent loss
or redirection of the transcript insertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0453fa03-a6b0-42e2-b1f4-e7307ccbf85a
⛔ Files ignored due to path filters (4)
apps/ios/ADE.xcodeproj/project.pbxprojis excluded by!**/*.xcodeproj/project.pbxprojdocs/features/chat/composer-and-ui.mdis excluded by!docs/**docs/features/onboarding-and-settings/README.mdis excluded by!docs/**docs/features/sync-and-multi-device/ios-companion.mdis excluded by!docs/**
📒 Files selected for processing (53)
.gitignoreapps/desktop/build/entitlements.mac.inherit.plistapps/desktop/build/entitlements.mac.plistapps/desktop/package.jsonapps/desktop/resources/voice/voice-glossary.jsonapps/desktop/resources/whisper/.gitkeepapps/desktop/resources/whisper/README.mdapps/desktop/scripts/materialize-whisper-resources.mjsapps/desktop/scripts/validate-whisper-resources.mjsapps/desktop/src/main/main.tsapps/desktop/src/main/services/ipc/registerIpc.tsapps/desktop/src/main/services/transcription/bundledResources.tsapps/desktop/src/main/services/transcription/dictationCleanup.tsapps/desktop/src/main/services/transcription/transcriptionService.test.tsapps/desktop/src/main/services/transcription/transcriptionService.tsapps/desktop/src/preload/global.d.tsapps/desktop/src/preload/preload.tsapps/desktop/src/renderer/components/app/App.tsxapps/desktop/src/renderer/components/app/App.workKeepAlive.test.tsxapps/desktop/src/renderer/components/app/TopBar.tsxapps/desktop/src/renderer/components/chat/AgentChatComposer.tsxapps/desktop/src/renderer/components/chat/VoiceDictationButton.tsxapps/desktop/src/renderer/components/settings/AppearanceSection.tsxapps/desktop/src/renderer/components/settings/DictationSection.tsxapps/desktop/src/renderer/components/voice/GlobalVoiceCaptureIndicator.tsxapps/desktop/src/renderer/components/voice/RecordingPill.tsxapps/desktop/src/renderer/hooks/useVoiceModelInstalled.tsapps/desktop/src/renderer/services/globalVoiceRecorder.tsapps/desktop/src/renderer/state/appStore.tsapps/desktop/src/shared/ipc.tsapps/ios/ADE/App/ADEApp.swiftapps/ios/ADE/App/ContentView.swiftapps/ios/ADE/Info.plistapps/ios/ADE/Resources/VoiceGlossary.jsonapps/ios/ADE/Services/Dictation/DictationCleanup.swiftapps/ios/ADE/Services/Dictation/DictationController.swiftapps/ios/ADE/Services/Dictation/SpeechDictationService.swiftapps/ios/ADE/Services/Dictation/VoiceGlossary.swiftapps/ios/ADE/Shared/DictationActivityShared.swiftapps/ios/ADE/Views/Components/DictationMicButton.swiftapps/ios/ADE/Views/Components/GlobalDictationPill.swiftapps/ios/ADE/Views/Components/RecordingPill.swiftapps/ios/ADE/Views/Settings/ConnectionSettingsView.swiftapps/ios/ADE/Views/Settings/SettingsVoiceInputSection.swiftapps/ios/ADE/Views/Work/WorkChatComposerAndInputViews.swiftapps/ios/ADE/Views/Work/WorkChatSessionView.swiftapps/ios/ADE/Views/Work/WorkNewChatScreen.swiftapps/ios/ADE/Views/Work/WorkNewChatSheet.swiftapps/ios/ADE/Views/Work/WorkPreviews.swiftapps/ios/ADE/Views/Work/WorkRootScreen.swiftapps/ios/ADETests/ADETests.swiftapps/ios/ADEWidgets/ADEWidgetBundle.swiftapps/ios/ADEWidgets/DictationLiveActivity.swift
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 016ec16313
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func cancelRecording() { | ||
| guard service.isRecording else { return } | ||
| ADEHaptics.warning() |
There was a problem hiding this comment.
Allow canceling a pending dictation start
When first use has to request/download iOS 26 Speech assets, startRecording has already set activeTargetId, but SpeechDictationService.start() does not set isRecording until after ensureModel and audio setup complete. This guard makes cancelRecording() a no-op throughout that isStarting/isPreparing window, so a user who leaves the composer or tries to stop the pending capture can still have recording begin later against their intent. Include the starting/preparing state in the cancellation path or keep the start task cancellable.
Useful? React with 👍 / 👎.
| HStack(spacing: 10) { | ||
| DictationIslandTimer(state: ctx.state) | ||
| DictationIslandDoneButton(isFinishing: ctx.state.isFinishing) | ||
| } |
There was a problem hiding this comment.
Wire Cancel into the Live Activity controls
The background/Dynamic Island dictation surface renders only the Done intent even though DictationCancelIntent and cancelFromLiveActivity() are defined. When ADE is backgrounded or locked, users can finish but cannot discard a recording without reopening the app, leaving an unwanted mic session running; add a cancel button/intent to the Live Activity surfaces where cancellation is promised.
Useful? React with 👍 / 👎.
| const raw = error instanceof Error ? error.message : String(error); | ||
| this.emitError( | ||
| raw.startsWith("model_not_installed:") | ||
| ? "model_not_installed" | ||
| : raw.startsWith("empty_audio:") | ||
| ? "no_audio" | ||
| : "transcribe_failed", |
There was a problem hiding this comment.
Match typed transcription errors after IPC wrapping
When transcription.transcribe throws from ipcMain.handle, Electron rejects ipcRenderer.invoke with a message prefixed by Error invoking remote method ..., so these startsWith checks miss the typed prefixes emitted by main. In no-audio or model-missing cases the composer reports the generic transcription failure instead of the intended actionable message; unwrap/search the message or return a structured error code.
Useful? React with 👍 / 👎.
016ec16 to
f996a3e
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f996a3ef71
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| const onCancel = useCallback(() => globalVoiceRecorder.cancel(), []); | ||
| const onDone = useCallback(() => { |
There was a problem hiding this comment.
Surface dictation errors when the composer unmounts
When a user starts dictation in a composer, navigates away so VoiceDictationButton unmounts, and then taps Done from this global pill, globalVoiceRecorder.finish() can emit no_audio or transcribe_failed but there is no mounted onError listener because the only listener is registered by the composer button. The pill just disappears on failure, so the recording is lost without any explanation; keep a root/global error listener or surface failures from this control.
Useful? React with 👍 / 👎.
Summary
Describe the change.
What Changed
Key files and behaviors.
Validation
How you tested.
Risks
Anything to watch.
Summary by CodeRabbit
Greptile Summary
This PR adds on-device voice dictation to both the desktop (Electron/whisper.cpp) and iOS (iOS 26 SpeechAnalyzer) chat composers, with a shared deterministic cleanup pipeline, a Dynamic Island Live Activity, and a voice glossary for domain-term recognition.
GlobalVoiceRecordersingleton captures 16 kHz mono PCM via Web Audio, ships it to a main-processTranscriptionServicethat shells out to a bundled whisper.cpp binary, then runs a deterministic cleanup pass (filler removal → corrections → capitalization → spacing) before inserting into the registered composer and copying to clipboard as a recovery net.SpeechDictationServicestreams buffers into aSpeechAnalyzer/SpeechTranscriberpipeline; aDictationControllersingleton lifts recording above individual composers so it survives navigation, drives a Live Activity Dynamic Island, and routes the cleaned transcript to whichever composer is visible.voice-glossary.json(corrections + contextual terms + fillers) is bundled on both platforms; the deterministic cleanup algorithms on both sides now agree oncapitalizeSentencesbehavior including non-letter leading characters.Confidence Score: 5/5
Safe to merge — all previously flagged correctness issues are resolved; remaining findings are documentation and tooling quality concerns only.
Large feature addition with well-structured platform isolation; core recording lifecycles (generation guards, teardown paths, single-flight transcription queue) are correctly implemented on both platforms. The generation-based cancel guard in GlobalVoiceRecorder correctly handles all cancel-during-getUserMedia race windows. The iOS SpeechDictationService isStarting guard, defer teardown, and double-safe cancelIOS26 calls are all sound. The cleanup pipelines on both platforms now agree. The three flagged items are documentation and tooling quality concerns, not correctness bugs in the shipped feature.
apps/desktop/src/renderer/services/globalVoiceRecorder.ts (inaccurate IPC comment), apps/ios/ADE/Services/Dictation/SpeechDictationService.swift (locale fallback comment vs. implementation), apps/desktop/scripts/materialize-whisper-resources.mjs (build-step timeout)
Important Files Changed
Sequence Diagram
sequenceDiagram participant UI as Composer UI participant GVR as GlobalVoiceRecorder participant Store as Root App Store participant IPC as Preload IPC participant TS as TranscriptionService (main) participant Whisper as whisper.cpp UI->>GVR: start() GVR->>Store: setDictationPhase("recording") [optimistic] GVR->>IPC: requestMicAccess() IPC-->>GVR: "{status: "granted"}" GVR->>GVR: getUserMedia() + build AudioGraph note over GVR: ScriptProcessorNode collects Float32 chunks UI->>GVR: finish() GVR->>Store: setDictationPhase("transcribing") GVR->>GVR: downsampleToInt16(merged, srcRate) GVR->>IPC: "transcribe(pcm.buffer, {format:"int16"})" IPC->>TS: ipcMain.handle(transcriptionTranscribe) TS->>TS: pcmToWavBuffer() → write tmp WAV TS->>Whisper: spawn whisper-cli -m model -f wav -oj -np Whisper-->>TS: WAV.json sidecar TS->>TS: parseWhisperJson() + cleanTranscript() TS-->>IPC: "{raw, cleaned}" IPC-->>GVR: TranscriptionResult GVR->>Store: activeDictationTarget.insertText(cleaned) GVR->>Store: resetDictationSession() → "idle"Reviews (5): Last reviewed commit: "ship: iteration 4 - address dictation re..." | Re-trigger Greptile