feat(ssh): classify SSH connection-failure stderr into actionable banner labels by david-engelmann · Pull Request #12917 · warpdotdev/warp

david-engelmann · 2026-06-22T17:27:13Z

Closes #3317
Refs #1922, #2931

Summary

When the SSH extension fails to start, the terminal shows a banner that's currently always titled "Failed to start SSH extension" with the raw stderr in the detail field. For the user this is "something broke — read the wall of text and figure it out." This PR splits that single label into structured failure modes so the banner can lead with the actual cause and a suggested fix.

After this PR a permission-denied looks like:

Public-key auth rejected
Make sure your key is loaded (ssh-add -l), the host has it in authorized_keys, and your SSH agent is reachable.
Error: <original error>
SSH output: <raw stderr, truncated>

A host-key-changed looks like:

Host key CHANGED — investigate
The server's host key doesn't match what's in known_hosts. Either the host was reinstalled or your connection is being intercepted. Don't bypass this until you've confirmed which.

…and 16 more (one per ConnectionFailureMode variant). When stderr doesn't match any known mode, the banner falls back to the existing "Failed to start SSH extension" copy — so unrecognized failures keep the current behavior verbatim.

Issues addressed (partial fix angle)

SSH does not work with remote server settings #3317 — SSH does not work with remote server settings
Shell on remote server will not load #1922 — Shell on remote server will not load
Error when running commands on remote servers #2931 — Error when running commands on remote servers

The classification doesn't fix any of these underlying bugs, but it dramatically narrows down what each one is — which is what reporters keep asking for in the threads. All three issues carry `ready-to-implement`.

Structure

Two commits, separable:

1. `feat(ssh-diagnostics): new crate for SSH failure-mode taxonomy + agent/identity probing` (8a6f0e4f)

Adds a new `crates/warp_ssh_diagnostics` crate:

`failure_modes` — `ConnectionFailureMode` enum (18 variants), `classify_ssh_stderr(stderr) -> Option`, plus `label()` / `suggested_fix()` / `is_transient()` accessors per mode
`agent` — `ssh_agent_status()` with an `SshAddRunner` trait so unit tests don't shell out
`identities` — enumerate `~/.ssh` for known identity files

Pure library code. 43 unit tests. No consumer dependency outside this PR's banner integration. Wasm-gated where it shells out.

2. `feat(ssh): classify SSH connection-failure errors into actionable banner labels` (d35c51c5)

Wires `classify_ssh_stderr` into the path in `app/src/terminal/view.rs` that builds the `RemoteServerManagerEvent::SessionConnectionFailed` banner. The classification helper lives in `app/src/terminal/ssh/failure_classification.rs` as a pure function so it's unit-testable without a transport setup. 10 tests on the consumer side.

Adjacency notes

Resume/retry cloud runs on transport failures instead of failing them #12431 / Suppress transient "lost connection" banner during auto-recovery #12757 suppress the cloud-agent "lost connection" banner during auto-recovery. That's a different code path (MAA stream, not SSH transport). The `SessionConnectionFailed` event this PR classifies only fires when SSH actually fails to establish — not during a recovery window — so the two changes don't overlap. Happy to gate explicitly if the team prefers.
Add setting to enable reusing user's existing control master #12465 added the `ControlPath` enum (WarpManaged / UserOwned / None). This PR doesn't touch that surface; a follow-up could surface `ControlPath` state alongside the classified mode in the same banner.
Remove TMUX based SSH warpification flow #12478 removed the legacy `SSHTmuxWrapper` flow, leaving the SSH extension as the single warpify path — exactly the path this PR's banner sits on.

Test plan

`cargo test -p warp_ssh_diagnostics --lib` (43 pass)
`cargo test -p warp --lib terminal::ssh::failure_classification` (10 pass)
`cargo check -p warp --bin warp-oss` (clean)
`cargo fmt -p warp_ssh_diagnostics -p warp`
Manual repro on each `ConnectionFailureMode` variant — happy to record a Loom for each major class (auth-rejected, host-key-changed, port-closed) if useful; let me know.

Things explicitly NOT in this PR

No new feature flag — classification is behavior-preserving for unrecognized stderr, additive for recognized stderr. Happy to gate behind a flag if reviewers prefer.
No change to the other two `show_ssh_remote_server_failed_banner` call sites (`BinaryInstallComplete` / `BinaryCheckComplete`); those already go through `error.user_facing_error(SetupStage::...)` which is a separate well-typed surface.
No SSH-hosts UI integration of the diagnostics crate — that's a follow-up PR (R3.5).
No reconnect / session-journal work — that's a separate scope, would link to Can not restore normal session when accident disconnect remote server #3230.

…t/identity probing (R3.3a foundation) Self-contained data layer that turns opaque ssh client output into structured, user-actionable diagnostics. Built so it can plug into either the diagnostics chip UI (display side) or the auto-reconnect policy (decision side) without re-implementing the classification in either consumer. Three pieces: 1. `ConnectionFailureMode` + `classify_ssh_stderr(stderr) -> Option<Mode>` — pattern matcher for ~18 common ssh failure modes. Each variant carries a `label()`, `suggested_fix()`, and `is_transient()` flag. Pure logic, no IO. Returns `None` for unknown patterns so the UI can fall back to raw text instead of misclassifying. Modes covered: HostnameUnresolved, NetworkUnreachable, PortClosed, ConnectTimeout, SessionTimeout, BrokenPipe, AuthPublicKeyRejected, AuthPasswordRejected, AuthRejected, HostKeyMismatch, HostKeyChanged (separate so the UI can surface the stronger MitM warning), KexClosedByRemote, ProxyJumpFailed, IdentityFileBadPermissions, IdentityFileMissing, SshConfigSyntaxError, TooManyAuthFailures, ConnectionResetByPeer. The classifier deliberately picks specific-over-general: a stderr that contains both `UNPROTECTED PRIVATE KEY FILE` *and* a downstream `Permission denied` classifies as `IdentityFileBadPermissions` (the actionable cause) rather than `AuthPublicKeyRejected` (the symptom). Same for `HostKeyChanged` beating `HostKeyMismatch`. 2. `SshIdentity` + `list_ssh_identities()` — enumerate visible public keys in `~/.ssh` with a private-key-presence flag for each. Deliberately does NOT read the private key (would require passphrase handling); only stat-checks the matching path. UI uses the flag to highlight `id_rsa.pub` orphans where the private key isn't on this machine. 3. `SshAgentStatus` + `ssh_agent_status()` — probe ssh-agent health via `ssh-add -l` and classify as Available / NotConfigured / Stale. Driven through the `SshAddRunner` trait so tests substitute canned outputs; production uses `ProcessSshAddRunner` with a 750ms timeout and a non-blocking poll loop so a wedged socket doesn't stall the diagnostics chip refresh. Cross-platform: - All three pieces compile to wasm. The subprocess-spawning bits (`ProcessSshAddRunner`, `ssh_agent_status`, `run_with_timeout`) are non-wasm only — wasm callers stay with `ssh_agent_status_with` + a stubbed runner if they need the type-level surface. - Uses `command::blocking::Command` (Warp's wrapper that avoids the Windows terminal-flash on subprocess spawn) instead of `std::process::Command`. - Uses `instant::Instant` (wasm-compatible) instead of `std::time::Instant`. Tests: 43 unit tests covering every classifier variant, the specific-over-general precedence rules, identity enumeration edge cases (missing dir, empty dir, orphan pub key, non-pub files ignored, sorted output), and the agent state machine (NotConfigured on missing/empty socket, Available on exit 0 and exit 1 "no identities" path, Stale on other exits and spawn failures). (Fork-only commit; no PR until R3 scope has a maintainer green-light.)

…ner labels Wires `warp_ssh_diagnostics::classify_ssh_stderr` into the path that builds the `RemoteServerManagerEvent::SessionConnectionFailed` banner in the terminal view. The banner previously always said "Failed to start SSH extension" with the raw stderr in the detail field; classified failure modes now surface as the banner body itself with an actionable suggested-fix line in the detail. Banner labels include: - "Public-key auth rejected" — for `Permission denied (publickey)` - "Host key CHANGED — investigate" — for the openssh MitM warning - "Identity file permissions too open" — for UNPROTECTED PRIVATE KEY - "Hostname could not be resolved" — for DNS failures - "SSH port closed" — for `Connection refused` - ... and 13 more (see `ConnectionFailureMode` taxonomy) The helper is a pure function in `app/src/terminal/ssh/failure_classification.rs` that takes the raw error + proxy_stderr from the manager event and returns a structured `UserFacingError`. Pure code; unit-testable without a transport setup. 10 tests cover: - classified-path bodies for auth / network / host-key / identity-permissions modes - precedence (UNPROTECTED PRIVATE KEY beats downstream auth failure; HOST IDENTIFICATION CHANGED beats generic mismatch) - fallback path (unrecognized stderr keeps the pre-existing banner shape) - empty-error / empty-stderr / `Some("")` edge cases - UTF-8-safe truncation at MAX_RAW_TAIL_CHARS When classification doesn't match, the banner falls back to the original "Failed to start SSH extension" body with the raw error text as detail -- behavior-preserving for unrecognized failures, additive for recognized ones. Addresses (partial fix angle): - warpdotdev#3317 — SSH does not work with remote server settings - warpdotdev#1922 — Shell on remote server will not load - warpdotdev#2931 — Error when running commands on remote servers

oz-for-oss · 2026-06-22T17:27:22Z

@david-engelmann

Every PR must be linked to a same-repo issue before Oz can review it.

Next step: open or find a same-repo issue describing this change, then link it to this PR by adding Closes #123 to the PR description (or using the "Development" sidebar on GitHub). A maintainer will mark the issue ready-to-implement when it is ready. Once it is marked, comment /oz-review to re-trigger review.

See the contribution guidelines for the full readiness model.

Powered by Oz

oz-for-oss

@david-engelmann

Every PR must be linked to a same-repo issue before Oz can review it.

Next step: open or find a same-repo issue describing this change, then link it to this PR by adding Closes #123 to the PR description (or using the "Development" sidebar on GitHub). A maintainer will mark the issue ready-to-implement when it is ready. Once it is marked, comment /oz-review to re-trigger review.

See the contribution guidelines for the full readiness model.

Powered by Oz

david-engelmann · 2026-06-22T20:54:33Z

Updated PR body to add Closes #3317 at the top (and Refs #1922, #2931 for the other related issues — each carries ready-to-implement, and this PR is partial-fix shape for the broader "SSH does not work" report in #3317). Happy to switch the linkage to a different issue or split into per-issue PRs if the team prefers; #3317 felt closest to the user-facing problem the classification surface solves.

/oz-review

david-engelmann added 2 commits June 22, 2026 12:46

cla-bot Bot added the cla-signed label Jun 22, 2026

github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label Jun 22, 2026

oz-for-oss Bot requested changes Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ssh): classify SSH connection-failure stderr into actionable banner labels#12917

feat(ssh): classify SSH connection-failure stderr into actionable banner labels#12917
david-engelmann wants to merge 2 commits into
warpdotdev:masterfrom
david-engelmann:david/ssh-failure-classification

david-engelmann commented Jun 22, 2026 •

edited

Loading

Uh oh!

oz-for-oss Bot commented Jun 22, 2026

Uh oh!

oz-for-oss Bot left a comment

Uh oh!

david-engelmann commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

david-engelmann commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues addressed (partial fix angle)

Structure

Adjacency notes

Test plan

Things explicitly NOT in this PR

Uh oh!

oz-for-oss Bot commented Jun 22, 2026

Uh oh!

oz-for-oss Bot left a comment

Choose a reason for hiding this comment

Uh oh!

david-engelmann commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

david-engelmann commented Jun 22, 2026 •

edited

Loading