feat(telemetry): add OTLP export writer by EhabY · Pull Request #961 · coder/vscode-coder

EhabY · 2026-05-17T17:10:39Z

Summary

add OTLP/JSON mapping for telemetry logs, spans, and metric-shaped events
add .otlp.zip export writing with POST-ready logs.json, traces.json, and metrics.json
extend writer tests to cover signal routing and OTLP zip contents

Refs #903.

Stack: 3 / 4. Base: #960. Next: #953.

Review follow-up

exports spans as internal spans instead of unspecified spans
includes window start timestamps for HTTP percentile gauges
closes partially opened OTLP writers when setup fails
drains queued zip writes before closing file handles on error paths

- Move src/telemetry/export/writers.ts to writers/json.ts and the matching test to writers/json.test.ts. The writer keeps the same shape; the doc comment is tightened and the test drops a redundant atomic-failure assertion already covered by writeAtomically's own tests. - Extract parseTelemetryTimestampMs from range.ts's isTimestampInRange so other export writers can reuse it. - Add a trivial asyncIterable test helper to test/mocks/. - Inline the duplicated `vi.mock("node:fs", ...)` / `vi.mock("node:fs/promises", ...)` factory pair (9 lines per file) as a 2-line dynamic import in the affected test files. Upstream-compatible groundwork for the OTLP writer branch (#961); isolates move + helper-extraction churn so that PR's diff stays focused on OTLP-specific work.

EhabY · 2026-05-26T14:04:07Z

/coder-agents-review

coder-agents-review · 2026-05-26T14:04:11Z

Review posted | Chat
Requested: 2026-05-26 15:47 UTC by @EhabY
Spend: $49.01 / $100.00

Review history

R1 (2026-05-26): 14 reviewers, 3 Nit, 1 P1, 2 P2, 11 P3, REQUEST_CHANGES. Review
R2 (2026-05-26): 5 reviewers, 5 Nit, 1 P1, 1 P2, 13 P3, APPROVE. Review

deep-review v0.5.0 | Round 2 | 87a6831..b90a626

Last posted: Round 2, 20 findings (1 P1, 1 P2, 13 P3, 5 Nit), APPROVE. Review

Finding inventory

Findings

#	Sev	Status	Location	Summary	Round	Reviewer	Posted
CRF-1	P3	Author fixed (`b90a626`)	envelope.ts:46	openEnvelopeFile leaks fd when prefix write fails	R1	Netero	Yes
CRF-2	P3	Author fixed (`b90a626`)	writer.ts:113	openChannels does not close already-opened channels on partial open failure	R1	Netero	Yes
CRF-3	P1	Author fixed (`b90a626`)	metrics.ts:64	Counter prefix "count_" does not match production "count."	R1	Hisoka P1, Mafuuu P1, Melody P1, Pariston P1	Yes
CRF-4	P3	Author fixed (`b90a626`)	records.ts:225	Cumulative key serialization collision on pipe/equals in property values	R1	Hisoka P3, Mafuuu P3, Kite P3, Zoro Note	Yes
CRF-5	P3	Author fixed (`b90a626`)	envelope.ts:62	close() leaks fd when suffix write fails (sibling of CRF-1)	R1	Meruem P3, Knov P3	Yes
CRF-6	P3	Author fixed (`b90a626`)	writer.ts:150	appendRecords counts events, not records; misleading when metrics produce 0 OTLP records	R1	Chopper P3, Knov P3, Meruem Note	Yes
CRF-7	P2	Author fixed (`b90a626`)	records.test.ts:254	Cumulative counter property-based key separation is untested	R1	Bisky	Yes
CRF-8	P3	Author fixed (`b90a626`)	records.ts:113	spanRecord accepts events without traceId, produces non-conformant OTLP	R1	Knov	Yes
CRF-9	P3	Author fixed (`b90a626`)	records.ts:240	isMonotonic: true declared but negative values not clamped	R1	Knov	Yes
CRF-10	P3	Author fixed (`b90a626`)	metrics.ts:24	ssh.network.info in METRIC_EVENT_NAMES but never emitted	R1	Melody P3, Hisoka Note, Pariston Note	Yes
CRF-11	P3	Author fixed (`b90a626`)	writer.ts:35	zipAsync manually promisifies fflate.zip; supportBundleLogs.ts already uses promisify(zip)	R1	Robin	Yes
CRF-12	P3	Author fixed (`b90a626`)	writer.test.ts:47	Empty event stream not tested	R1	Bisky	Yes
CRF-13	P3	Author fixed (`b90a626`)	writer.test.ts:63	Routing priority for metric events with traceId not tested	R1	Bisky	Yes
CRF-14	P2	Author fixed (`b90a626`)	PR description	Validation command runs 0 of 46 new tests (wrong file paths)	R1	Leorio	No
CRF-15	Nit	Author fixed (`b90a626`)	metrics.test.ts:16	Format string %p/%s placeholders swapped in test title	R1	Bisky	Yes
CRF-16	Nit	Author fixed (`b90a626`)	records.ts:120	SpanKind.INTERNAL + 1 magic offset; named constant would be clearer	R1	Gon	Yes
CRF-17	Nit	Author fixed (`b90a626`)	metrics.ts:3,38; envelope.ts:5; writer.ts:42	Doc comments restate type/function names	R1	Gon	Yes
CRF-18	P3	Open	records.test.ts:283	Negative counter clamping (CRF-9 fix) untested	R2	Bisky	Yes
CRF-19	P3	Open	records.test.ts:140	Span status table missing combined result:error + error object case	R2	Bisky	Yes
CRF-20	Nit	Open	metrics.test.ts:15	%p is not a Vitest format specifier (incomplete CRF-15 fix)	R2	Bisky	Yes
CRF-21	Nit	Open	envelope.test.ts:104	stream.end failure test missing stream.destroyed assertion	R2	Bisky	Yes

Contested and acknowledged

(none)

Round log

Round 1

Panel. 1 P1, 2 P2, 10 P3, 3 Nit. 0 dropped. Reviewed against 87a6831..4574a38. Panel: Bisky, Hisoka, Mafuuu, Pariston, Gon, Leorio, Ging-TS, Chopper, Meruem, Melody, Zoro, Robin, Knov, Kite. Wildcards: Knov, Kite. Ging-TS: no findings.

Round 2

Churn guard: PROCEED. 17/17 addressed (b90a626). Panel: Bisky, Mafuuu, Meruem, Melody, Kite. Wildcard: Kite. 2 P3, 2 Nit new. Reviewed against 87a6831..b90a626.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer	Focus
Bisky	tests
Chopper	ops/errors
Churn-guard	change verification
Ging	language modernization
Gon	naming
Hisoka	edge cases
Killua	perf
Kite	change integrity
Knov	contracts
Knuckle	SQL
Kurapika	security
Leorio	docs
Luffy	product
Mafu-san	process
Mafuuu	contracts
Melody	dispatch/pairing
Meruem	structural
Nami	frontend
Netero	mechanical checks
Pariston	premise testing
Pen-botter	product gaps
Razor	verification
Robin	duplication
Ryosuke	Go arch
Takumi	concurrency
Zoro	shape

🤖 Managed by Coder Agents.

coder-agents-review

Well-structured OTLP/JSON export with clean layering: metric classification, record mapping, streaming envelope writer, zip orchestrator. Cumulative counter accumulation, out-of-order timestamp clamping, zero-counter suppression, and the wrapError diagnostic chain are all solid. The error-context pattern (operation + target + cause) throughout is exemplary.

One P1, two P2, ten P3, three Nits.

The P1 is a producer/consumer name mismatch: the counter classifier checks "count_" but production emits "count.". Four reviewers independently verified this against httpRequestsTelemetry.ts. The entire cumulative sum machinery is dead code for real data. The test fixtures mask it by using synthetic names that happen to match the wrong prefix.

The P2s: (1) the cumulative counter property-based key separation has zero test coverage; you could delete the property serialization and all tests pass; (2) P2 [CRF-14] the PR validation command lists files that don't exist (writers.test.ts) and runs none of the 46 new tests.

As Melody put it after walking the signal routing chain: "Every real http.requests counter will be exported as an OTLP gauge instead of a cumulative sum."

🤖 This review was automatically generated with Coder Agents.

EhabY · 2026-05-26T15:47:36Z

/coder-agents-review

coder-agents-review

All 17 R1 findings addressed in b90a626. Five reviewers verified the fixes; Mafuuu, Meruem, Melody, and Kite found no new issues. Meruem traced error paths, concurrency interleaving, and test layer boundaries with no findings. Melody walked every pairing chain (dispatch, wire format, transport lifecycle, enumeration) and confirmed all sides echo.

Two P3 test gaps and two Nits remain, all from Bisky. The production code is correct; these are test-quality improvements.

As Meruem put it: "The structural design is clean: error paths release resources, cumulative state is bounded and clamped, the streaming protocol is correct, and test coverage matches the abstraction layers."

🤖 This review was automatically generated with Coder Agents.

Addresses R2 review feedback on PR #961: - records: assert the Math.max(0, ...) negative-counter clamp directly (existing Infinity case only exercised the isFinite guard). - records: cover the result="error" + error object combination so a regression on the spread of error.message is caught. - envelope: also assert stream.destroyed on the stream.end failure path (matches the sibling suffix-write test). - metrics: swap %p (not a Vitest format specifier) for %s in it.each titles so boolean rows render.

- Restructure: writers.ts → writers/json.ts; combine OTLP mapping and writer into writers/otlp.ts; lift signal classification into export/signal.ts. - Group all records of each signal under a single Resource block per file (was one Resource per event), matching the OTLP spec's recommended shape and shrinking exports at scale. - Adopt OTLP enums from @opentelemetry/api + api-logs; document the +1 wire offset on SpanKind. - Tighten the public surface: only writeOtlpZipExport and OtlpExportCounts are exported; record mappers are module-private. - Hoist per-event metric attributes once; single-pass partition of http.requests measurements into counts and gauges. - Share parseTelemetryTimestampMs between range.ts and the OTLP writer's nanos conversion (was duplicated with identical error). - Extract asyncIterable test helper to test/mocks/; inline the memfs vi.mock pair across affected test files. - Tests now exercise the public API only, dropping side-effect assertions about staging cleanup and atomic semantics already covered by writeAtomically's own tests.

Wire-format fixes: - Switch http.requests count_* sums from DELTA to CUMULATIVE temporality (Prom/Mimir/Grafana Cloud reject the delta+sum combination). - Maintain per-series running totals across the export, anchored at the first observed window's startTimeUnixNano. - Suppress zero-cumulative counters so routes that never errored don't ship empty data points. Semantic-convention compliance: - Use OTel-standard `service.instance.id` (was `coder.session.id`) and `host.id` (was `coder.machine.id`) for portability with OTel-aware backends. The previous vendor keys were pure aliases. - Add `schemaUrl` to each ResourceLogs/ResourceSpans/ResourceMetrics container and each scope container. - Stamp scope.version with the extension version.

Group OTLP code into src/telemetry/export/writers/otlp/ with one file per concern, and hoist metric classification to src/telemetry/export/metrics.ts so it can be shared by future writers. Layout: - metrics.ts: domain (isMetricEvent, describeMetricEvent, units) - otlp/writer.ts: public API + orchestration (signal routing, packZip) - otlp/envelope.ts: streaming JSON envelope writer + error wrapping - otlp/records.ts: log/span/metric record builders + helpers - otlp/types.ts: OTLP wire-format interfaces (pinned to proto v1.10.0) Safety fixes surfaced by the layer split: - envelope: listen for stream 'error' events so failed opens reject pending writes instead of hanging; close() is idempotent; suffix-write failures during close are labeled as close failures. - writer: on loop failure, use Promise.allSettled to close streams without masking the original error. - records: coerce NaN/Infinity numeric inputs to 0n instead of throwing from BigInt(); clamp startTimeUnixNano <= timeUnixNano for out-of-order events; include eventName in the cumulative-series key. Each layer has its own test file; total grows from 99 to 131 cases.

Apply review findings to the OTLP writer modules and their tests. writer.ts - Collapse the triplicate `{logs, traces, metrics}` shape into one `Record<Signal, Channel>` driven from `ENVELOPES`. - Extract `appendRecords(channel, records)`; three route branches collapse from four lines each to one and count mutation lives in one place. - Replace try/catch + duplicated close calls with a `succeeded` flag + try/finally; the close list builds once. - Collapse `packZip`'s two try blocks into one. - Type `zipAsync` properly (was `any` from `promisify(zip)`). - Drop `Object.fromEntries(map(async))` + `as Record<Signal, Channel>` cast in `openChannels` in favour of a direct destructured `Promise.all`. records.ts - Move OTLP envelope metadata (`ENVELOPES`, `envelopePrefix`, schema URL, suffix) here so wire-format knowledge stops leaking into writer.ts. - Carry nanosecond timestamps as `bigint` internally; `String()` only at the JSON boundary. Eliminates repeated `BigInt(string)` reparses. - Rename `ExportState`/`newExportState`/`cumulativeStart` to `CumulativeState`/`newCumulativeState`/`anchor`. - Extract a `MetricContext` so `gaugeMetric(ctx, m)` and `sumMetric(ctx, m, state)` replace the five-arg `gaugeRecord` and `cumulativeSum`. - Convert `spanStatus` if-chain to a switch on `event.properties.result`. - Convert `metricRecords` for-loop to `flatMap` with a normalized 0-or-1 record return. - Have `toNonNegativeBigInt` delegate to `toIntegerBigInt` so the rounding rule lives in one place. envelope.ts - Drop the `.cause` archaeology in `close()`. Collapse `stream.write` and `stream.end` plumbing into one `awaitOp` helper that handles the errRef pre-check and post-callback merge once. - Move error labelling to the three API boundaries (prefix, append, close) so the verb stays at the call site rather than threading through parameters. Drop the trailing-underscore `reject_`. metrics.ts - Replace the `measurementUnit` if-chain with a lookup table. errorUtils.ts - Add a shared `wrapError(verb, target, cause)` helper to replace six near-identical `Failed to ... ${path}` throws across envelope.ts and writer.ts. tests - Extract a shared `helpers.ts` with `TRACE_ID`, `attrs()`, and a signal-driven `parseEnvelope(files, signal)` that reads keys from the production `ENVELOPES` table. - Drop the duplicate `TRACE_ID`/`attrs` definitions from writer.test.ts and records.test.ts. - Move `gauge`/`counter` measurement builders to module scope and type them as `MetricMeasurement`. - writer.test.ts derives the file-list assertion from `ENVELOPES`; resource-consistency test now uses cross-envelope equality + a `toMatchObject` spot-check, deferring full shape to records.test.ts. - Align `makeEvent`/`context` in writer.test.ts with records.test.ts (module-level constants instead of let + beforeEach). Delete comments that restate code; keep only ones recording a non-obvious WHY.

- match production "count." prefix in the metric classifier so HTTP request counts export as cumulative sums instead of dimensionless gauges - close opened envelope fds on prefix/suffix write failure and on partial openChannels rejection - collision-proof the cumulative counter series key with JSON.stringify - clamp negative counter values to preserve the monotonic-sum invariant - count records actually written, not events routed - narrow spanRecord to require a traceId at the type level - drop the stale ssh.network.info entry and tighten doc comments - use promisify(zip) to match supportBundleLogs.ts

Addresses R2 review feedback on PR #961: - records: assert the Math.max(0, ...) negative-counter clamp directly (existing Infinity case only exercised the isFinite guard). - records: cover the result="error" + error object combination so a regression on the spread of error.message is caught. - envelope: also assert stream.destroyed on the stream.end failure path (matches the sibling suffix-write test). - metrics: swap %p (not a Vitest format specifier) for %s in it.each titles so boolean rows render.

- Stamp coder.event.{extension_version,session_id,deployment_url} on every log/span/metric record so multi-session and multi-deployment exports preserve the original producer's identity. The resource block remains an export-tool snapshot. - Stage envelopes in os.tmpdir() instead of next to the user's save path so cloud-sync agents do not ingest the intermediate uncompressed logs.json/traces.json/metrics.json. - Replace the in-memory fflate.zip with streaming Zip + ZipDeflate so multi-GB exports do not hold every envelope plus the zipped output in the V8 heap at once. - Forward staging-cleanup failures via OtlpWriteOptions.onStagingCleanupError instead of letting a Windows EBUSY in finally mask an otherwise-successful export (which writeAtomically would then erase via temp-file cleanup). - Increment metric channel counts per event rather than per non-empty record batch so metric events with all-zero counters stay in the user-visible total and JSON/OTLP exports of the same range agree. - Accept an optional AbortSignal so callers can cancel a long-running export between events and between files.

mtojek

This is looking good! Left minor comments, but would like to have another pair of eyes to review the contribution.

mtojek · 2026-05-27T12:36:02Z

+}
+
+/** Streams `<prefix>v1,v2,...<suffix>` JSON into `filePath`. */
+export async function openEnvelopeFile(


nice design :)

mtojek · 2026-05-27T12:36:46Z

+				await writeChunk(suffix);
+				await awaitOp((cb) => stream.end(cb));
+			} catch (err) {
+				stream.destroy();


just double-check: no way destroy will throw an error?

mtojek · 2026-05-27T12:53:53Z

+/**
+ * Local TypeScript shapes for the OTLP/JSON wire format. Upstream TS types
+ * live in `@opentelemetry/otlp-transformer`, which pulls in the OTel SDK; we
+ * mirror the proto schema instead so the writer stays SDK-free.


https://github.com/open-telemetry/opentelemetry-proto/tree/ca839c51f706f5d53bfb46f06c3e90c3af3a52c6/opentelemetry/proto

is it a huge problem to depend on SDK?

mtojek · 2026-05-27T12:56:46Z

@@ -0,0 +1,128 @@
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";


Could we simplify unit tests by relying on golden files? Also, such files are easier to review in terms of schema changes

mtojek · 2026-05-27T12:57:36Z

+	count: number;
+}
+
+const READ_HWM_BYTES = 256 * 1024;


nit: add comment HWM - high water mark

EhabY mentioned this pull request May 17, 2026

feat(telemetry): add export command #953

Open

EhabY self-assigned this May 17, 2026

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 199d2e7 to 3a29ac9 Compare May 17, 2026 19:42

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from ab7dcf4 to acadd7e Compare May 17, 2026 19:42

EhabY mentioned this pull request May 17, 2026

feat(telemetry): add streaming JSON export writer #960

Merged

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3a29ac9 to eb776a7 Compare May 18, 2026 17:03

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from acadd7e to 29c269d Compare May 18, 2026 17:03

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch 2 times, most recently from ba095a9 to 8066586 Compare May 19, 2026 10:00

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 2 times, most recently from 5a186aa to 5ee4862 Compare May 19, 2026 10:44

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 3d77d32 to c9c3b74 Compare May 19, 2026 11:02

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 5ee4862 to 539131f Compare May 19, 2026 11:02

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from c9c3b74 to 9c5a19d Compare May 20, 2026 10:28

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 539131f to fe6675e Compare May 20, 2026 10:29

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 9c5a19d to 914025b Compare May 20, 2026 13:49

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from fe6675e to 94ac895 Compare May 20, 2026 13:50

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 55cccf2 to 71bcce7 Compare May 21, 2026 09:55

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 034f5bc to 5a9edee Compare May 21, 2026 15:13

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 7ce5f22 to d9f955a Compare May 21, 2026 15:14

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch 3 times, most recently from 1671ea2 to 5cbd8ba Compare May 21, 2026 15:51

EhabY force-pushed the feat/issue-903-export-telemetry-json-writer branch from 5cbd8ba to d8a9b52 Compare May 21, 2026 15:54

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 3 times, most recently from 883e121 to d7db126 Compare May 26, 2026 13:57

Base automatically changed from feat/issue-903-export-telemetry-json-writer to main May 26, 2026 14:01

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 2 times, most recently from 3187fae to 4574a38 Compare May 26, 2026 14:03

coder-agents-review Bot suggested changes May 26, 2026

View reviewed changes

coder-agents-review Bot approved these changes May 26, 2026

View reviewed changes

Comment thread test/unit/telemetry/export/writers/otlp/records.test.ts

Comment thread test/unit/telemetry/export/writers/otlp/records.test.ts

Comment thread test/unit/telemetry/export/metrics.test.ts Outdated

Comment thread test/unit/telemetry/export/writers/otlp/envelope.test.ts Outdated

EhabY added 7 commits May 26, 2026 19:28

feat(telemetry): add OTLP export writer

7e4348a

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch 5 times, most recently from d8aa467 to 4929d92 Compare May 26, 2026 22:19

EhabY force-pushed the feat/issue-903-export-telemetry-otlp-writer branch from 4929d92 to 22d73a9 Compare May 26, 2026 22:27

mtojek requested review from mafredri and mtojek May 27, 2026 11:32

mtojek reviewed May 27, 2026

View reviewed changes

		@@ -0,0 +1,128 @@
		import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";

Conversation

EhabY commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review follow-up

Uh oh!

EhabY commented May 26, 2026

Uh oh!

coder-agents-review Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Findings

Contested and acknowledged

Round log

Round 1

Round 2

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EhabY commented May 26, 2026

Uh oh!

coder-agents-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

Uh oh!

mtojek May 27, 2026

Choose a reason for hiding this comment

Uh oh!

mtojek May 27, 2026

Choose a reason for hiding this comment

Uh oh!

mtojek May 27, 2026

Choose a reason for hiding this comment

Uh oh!

mtojek May 27, 2026

Choose a reason for hiding this comment

Uh oh!

mtojek May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EhabY commented May 17, 2026 •

edited

Loading

coder-agents-review Bot commented May 26, 2026 •

edited

Loading