feat(ssh): classify SSH connection-failure stderr into actionable banner labels#12917
feat(ssh): classify SSH connection-failure stderr into actionable banner labels#12917david-engelmann wants to merge 2 commits into
Conversation
…t/identity probing (R3.3a foundation) Self-contained data layer that turns opaque ssh client output into structured, user-actionable diagnostics. Built so it can plug into either the diagnostics chip UI (display side) or the auto-reconnect policy (decision side) without re-implementing the classification in either consumer. Three pieces: 1. `ConnectionFailureMode` + `classify_ssh_stderr(stderr) -> Option<Mode>` — pattern matcher for ~18 common ssh failure modes. Each variant carries a `label()`, `suggested_fix()`, and `is_transient()` flag. Pure logic, no IO. Returns `None` for unknown patterns so the UI can fall back to raw text instead of misclassifying. Modes covered: HostnameUnresolved, NetworkUnreachable, PortClosed, ConnectTimeout, SessionTimeout, BrokenPipe, AuthPublicKeyRejected, AuthPasswordRejected, AuthRejected, HostKeyMismatch, HostKeyChanged (separate so the UI can surface the stronger MitM warning), KexClosedByRemote, ProxyJumpFailed, IdentityFileBadPermissions, IdentityFileMissing, SshConfigSyntaxError, TooManyAuthFailures, ConnectionResetByPeer. The classifier deliberately picks specific-over-general: a stderr that contains both `UNPROTECTED PRIVATE KEY FILE` *and* a downstream `Permission denied` classifies as `IdentityFileBadPermissions` (the actionable cause) rather than `AuthPublicKeyRejected` (the symptom). Same for `HostKeyChanged` beating `HostKeyMismatch`. 2. `SshIdentity` + `list_ssh_identities()` — enumerate visible public keys in `~/.ssh` with a private-key-presence flag for each. Deliberately does NOT read the private key (would require passphrase handling); only stat-checks the matching path. UI uses the flag to highlight `id_rsa.pub` orphans where the private key isn't on this machine. 3. `SshAgentStatus` + `ssh_agent_status()` — probe ssh-agent health via `ssh-add -l` and classify as Available / NotConfigured / Stale. Driven through the `SshAddRunner` trait so tests substitute canned outputs; production uses `ProcessSshAddRunner` with a 750ms timeout and a non-blocking poll loop so a wedged socket doesn't stall the diagnostics chip refresh. Cross-platform: - All three pieces compile to wasm. The subprocess-spawning bits (`ProcessSshAddRunner`, `ssh_agent_status`, `run_with_timeout`) are non-wasm only — wasm callers stay with `ssh_agent_status_with` + a stubbed runner if they need the type-level surface. - Uses `command::blocking::Command` (Warp's wrapper that avoids the Windows terminal-flash on subprocess spawn) instead of `std::process::Command`. - Uses `instant::Instant` (wasm-compatible) instead of `std::time::Instant`. Tests: 43 unit tests covering every classifier variant, the specific-over-general precedence rules, identity enumeration edge cases (missing dir, empty dir, orphan pub key, non-pub files ignored, sorted output), and the agent state machine (NotConfigured on missing/empty socket, Available on exit 0 and exit 1 "no identities" path, Stale on other exits and spawn failures). (Fork-only commit; no PR until R3 scope has a maintainer green-light.)
…ner labels
Wires `warp_ssh_diagnostics::classify_ssh_stderr` into the path that
builds the `RemoteServerManagerEvent::SessionConnectionFailed` banner
in the terminal view. The banner previously always said "Failed to
start SSH extension" with the raw stderr in the detail field;
classified failure modes now surface as the banner body itself with
an actionable suggested-fix line in the detail.
Banner labels include:
- "Public-key auth rejected" — for `Permission denied (publickey)`
- "Host key CHANGED — investigate" — for the openssh MitM warning
- "Identity file permissions too open" — for UNPROTECTED PRIVATE KEY
- "Hostname could not be resolved" — for DNS failures
- "SSH port closed" — for `Connection refused`
- ... and 13 more (see `ConnectionFailureMode` taxonomy)
The helper is a pure function in `app/src/terminal/ssh/failure_classification.rs`
that takes the raw error + proxy_stderr from the manager event and
returns a structured `UserFacingError`. Pure code; unit-testable
without a transport setup.
10 tests cover:
- classified-path bodies for auth / network / host-key / identity-permissions modes
- precedence (UNPROTECTED PRIVATE KEY beats downstream auth failure;
HOST IDENTIFICATION CHANGED beats generic mismatch)
- fallback path (unrecognized stderr keeps the pre-existing banner shape)
- empty-error / empty-stderr / `Some("")` edge cases
- UTF-8-safe truncation at MAX_RAW_TAIL_CHARS
When classification doesn't match, the banner falls back to the
original "Failed to start SSH extension" body with the raw error text
as detail -- behavior-preserving for unrecognized failures, additive
for recognized ones.
Addresses (partial fix angle):
- warpdotdev#3317 — SSH does not work with remote server settings
- warpdotdev#1922 — Shell on remote server will not load
- warpdotdev#2931 — Error when running commands on remote servers
|
Every PR must be linked to a same-repo issue before Oz can review it. Next step: open or find a same-repo issue describing this change, then link it to this PR by adding See the contribution guidelines for the full readiness model. Powered by Oz |
There was a problem hiding this comment.
Every PR must be linked to a same-repo issue before Oz can review it.
Next step: open or find a same-repo issue describing this change, then link it to this PR by adding Closes #123 to the PR description (or using the "Development" sidebar on GitHub). A maintainer will mark the issue ready-to-implement when it is ready. Once it is marked, comment /oz-review to re-trigger review.
See the contribution guidelines for the full readiness model.
Powered by Oz
|
Updated PR body to add
|
Closes #3317
Refs #1922, #2931
Summary
When the SSH extension fails to start, the terminal shows a banner that's currently always titled "Failed to start SSH extension" with the raw stderr in the detail field. For the user this is "something broke — read the wall of text and figure it out." This PR splits that single label into structured failure modes so the banner can lead with the actual cause and a suggested fix.
After this PR a permission-denied looks like:
A host-key-changed looks like:
…and 16 more (one per
ConnectionFailureModevariant). When stderr doesn't match any known mode, the banner falls back to the existing "Failed to start SSH extension" copy — so unrecognized failures keep the current behavior verbatim.Issues addressed (partial fix angle)
The classification doesn't fix any of these underlying bugs, but it dramatically narrows down what each one is — which is what reporters keep asking for in the threads. All three issues carry `ready-to-implement`.
Structure
Two commits, separable:
1. `feat(ssh-diagnostics): new crate for SSH failure-mode taxonomy + agent/identity probing` (
8a6f0e4f)Adds a new `crates/warp_ssh_diagnostics` crate:
Pure library code. 43 unit tests. No consumer dependency outside this PR's banner integration. Wasm-gated where it shells out.
2. `feat(ssh): classify SSH connection-failure errors into actionable banner labels` (
d35c51c5)Wires `classify_ssh_stderr` into the path in `app/src/terminal/view.rs` that builds the `RemoteServerManagerEvent::SessionConnectionFailed` banner. The classification helper lives in `app/src/terminal/ssh/failure_classification.rs` as a pure function so it's unit-testable without a transport setup. 10 tests on the consumer side.
Adjacency notes
Test plan
Things explicitly NOT in this PR