feat(telemetry): add OTLP export writer#961
Conversation
199d2e7 to
3a29ac9
Compare
ab7dcf4 to
acadd7e
Compare
3a29ac9 to
eb776a7
Compare
acadd7e to
29c269d
Compare
ba095a9 to
8066586
Compare
5a186aa to
5ee4862
Compare
3d77d32 to
c9c3b74
Compare
5ee4862 to
539131f
Compare
c9c3b74 to
9c5a19d
Compare
539131f to
fe6675e
Compare
9c5a19d to
914025b
Compare
fe6675e to
94ac895
Compare
- Move src/telemetry/export/writers.ts to writers/json.ts and the
matching test to writers/json.test.ts. The writer keeps the same
shape; the doc comment is tightened and the test drops a
redundant atomic-failure assertion already covered by
writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
`vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
as a 2-line dynamic import in the affected test files.
Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
55cccf2 to
71bcce7
Compare
- Move src/telemetry/export/writers.ts to writers/json.ts and the
matching test to writers/json.test.ts. The writer keeps the same
shape; the doc comment is tightened and the test drops a
redundant atomic-failure assertion already covered by
writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
`vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
as a 2-line dynamic import in the affected test files.
Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
034f5bc to
5a9edee
Compare
7ce5f22 to
d9f955a
Compare
1671ea2 to
5cbd8ba
Compare
- Move src/telemetry/export/writers.ts to writers/json.ts and the
matching test to writers/json.test.ts. The writer keeps the same
shape; the doc comment is tightened and the test drops a
redundant atomic-failure assertion already covered by
writeAtomically's own tests.
- Extract parseTelemetryTimestampMs from range.ts's
isTimestampInRange so other export writers can reuse it.
- Add a trivial asyncIterable test helper to test/mocks/.
- Inline the duplicated `vi.mock("node:fs", ...)` /
`vi.mock("node:fs/promises", ...)` factory pair (9 lines per file)
as a 2-line dynamic import in the affected test files.
Upstream-compatible groundwork for the OTLP writer branch (#961);
isolates move + helper-extraction churn so that PR's diff stays
focused on OTLP-specific work.
5cbd8ba to
d8a9b52
Compare
883e121 to
d7db126
Compare
3187fae to
4574a38
Compare
|
/coder-agents-review |
|
Review posted | Chat Review historydeep-review v0.5.0 | Round 2 | Last posted: Round 2, 20 findings (1 P1, 1 P2, 13 P3, 5 Nit), APPROVE. Review Finding inventoryFindings
Contested and acknowledged(none) Round logRound 1Panel. 1 P1, 2 P2, 10 P3, 3 Nit. 0 dropped. Reviewed against 87a6831..4574a38. Panel: Bisky, Hisoka, Mafuuu, Pariston, Gon, Leorio, Ging-TS, Chopper, Meruem, Melody, Zoro, Robin, Knov, Kite. Wildcards: Knov, Kite. Ging-TS: no findings. Round 2Churn guard: PROCEED. 17/17 addressed (b90a626). Panel: Bisky, Mafuuu, Meruem, Melody, Kite. Wildcard: Kite. 2 P3, 2 Nit new. Reviewed against 87a6831..b90a626. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Well-structured OTLP/JSON export with clean layering: metric classification, record mapping, streaming envelope writer, zip orchestrator. Cumulative counter accumulation, out-of-order timestamp clamping, zero-counter suppression, and the wrapError diagnostic chain are all solid. The error-context pattern (operation + target + cause) throughout is exemplary.
One P1, two P2, ten P3, three Nits.
The P1 is a producer/consumer name mismatch: the counter classifier checks "count_" but production emits "count.". Four reviewers independently verified this against httpRequestsTelemetry.ts. The entire cumulative sum machinery is dead code for real data. The test fixtures mask it by using synthetic names that happen to match the wrong prefix.
The P2s: (1) the cumulative counter property-based key separation has zero test coverage; you could delete the property serialization and all tests pass; (2) P2 [CRF-14] the PR validation command lists files that don't exist (writers.test.ts) and runs none of the 46 new tests.
As Melody put it after walking the signal routing chain: "Every real http.requests counter will be exported as an OTLP gauge instead of a cumulative sum."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 17 R1 findings addressed in b90a626. Five reviewers verified the fixes; Mafuuu, Meruem, Melody, and Kite found no new issues. Meruem traced error paths, concurrency interleaving, and test layer boundaries with no findings. Melody walked every pairing chain (dispatch, wire format, transport lifecycle, enumeration) and confirmed all sides echo.
Two P3 test gaps and two Nits remain, all from Bisky. The production code is correct; these are test-quality improvements.
As Meruem put it: "The structural design is clean: error paths release resources, cumulative state is bounded and clamped, the streaming protocol is correct, and test coverage matches the abstraction layers."
🤖 This review was automatically generated with Coder Agents.
Addresses R2 review feedback on PR #961: - records: assert the Math.max(0, ...) negative-counter clamp directly (existing Infinity case only exercised the isFinite guard). - records: cover the result="error" + error object combination so a regression on the spread of error.message is caught. - envelope: also assert stream.destroyed on the stream.end failure path (matches the sibling suffix-write test). - metrics: swap %p (not a Vitest format specifier) for %s in it.each titles so boolean rows render.
- Restructure: writers.ts → writers/json.ts; combine OTLP mapping and writer into writers/otlp.ts; lift signal classification into export/signal.ts. - Group all records of each signal under a single Resource block per file (was one Resource per event), matching the OTLP spec's recommended shape and shrinking exports at scale. - Adopt OTLP enums from @opentelemetry/api + api-logs; document the +1 wire offset on SpanKind. - Tighten the public surface: only writeOtlpZipExport and OtlpExportCounts are exported; record mappers are module-private. - Hoist per-event metric attributes once; single-pass partition of http.requests measurements into counts and gauges. - Share parseTelemetryTimestampMs between range.ts and the OTLP writer's nanos conversion (was duplicated with identical error). - Extract asyncIterable test helper to test/mocks/; inline the memfs vi.mock pair across affected test files. - Tests now exercise the public API only, dropping side-effect assertions about staging cleanup and atomic semantics already covered by writeAtomically's own tests.
Wire-format fixes: - Switch http.requests count_* sums from DELTA to CUMULATIVE temporality (Prom/Mimir/Grafana Cloud reject the delta+sum combination). - Maintain per-series running totals across the export, anchored at the first observed window's startTimeUnixNano. - Suppress zero-cumulative counters so routes that never errored don't ship empty data points. Semantic-convention compliance: - Use OTel-standard `service.instance.id` (was `coder.session.id`) and `host.id` (was `coder.machine.id`) for portability with OTel-aware backends. The previous vendor keys were pure aliases. - Add `schemaUrl` to each ResourceLogs/ResourceSpans/ResourceMetrics container and each scope container. - Stamp scope.version with the extension version.
Group OTLP code into src/telemetry/export/writers/otlp/ with one file per concern, and hoist metric classification to src/telemetry/export/metrics.ts so it can be shared by future writers. Layout: - metrics.ts: domain (isMetricEvent, describeMetricEvent, units) - otlp/writer.ts: public API + orchestration (signal routing, packZip) - otlp/envelope.ts: streaming JSON envelope writer + error wrapping - otlp/records.ts: log/span/metric record builders + helpers - otlp/types.ts: OTLP wire-format interfaces (pinned to proto v1.10.0) Safety fixes surfaced by the layer split: - envelope: listen for stream 'error' events so failed opens reject pending writes instead of hanging; close() is idempotent; suffix-write failures during close are labeled as close failures. - writer: on loop failure, use Promise.allSettled to close streams without masking the original error. - records: coerce NaN/Infinity numeric inputs to 0n instead of throwing from BigInt(); clamp startTimeUnixNano <= timeUnixNano for out-of-order events; include eventName in the cumulative-series key. Each layer has its own test file; total grows from 99 to 131 cases.
Apply review findings to the OTLP writer modules and their tests.
writer.ts
- Collapse the triplicate `{logs, traces, metrics}` shape into one
`Record<Signal, Channel>` driven from `ENVELOPES`.
- Extract `appendRecords(channel, records)`; three route branches
collapse from four lines each to one and count mutation lives in
one place.
- Replace try/catch + duplicated close calls with a `succeeded` flag
+ try/finally; the close list builds once.
- Collapse `packZip`'s two try blocks into one.
- Type `zipAsync` properly (was `any` from `promisify(zip)`).
- Drop `Object.fromEntries(map(async))` + `as Record<Signal, Channel>`
cast in `openChannels` in favour of a direct destructured
`Promise.all`.
records.ts
- Move OTLP envelope metadata (`ENVELOPES`, `envelopePrefix`, schema
URL, suffix) here so wire-format knowledge stops leaking into
writer.ts.
- Carry nanosecond timestamps as `bigint` internally; `String()` only
at the JSON boundary. Eliminates repeated `BigInt(string)` reparses.
- Rename `ExportState`/`newExportState`/`cumulativeStart` to
`CumulativeState`/`newCumulativeState`/`anchor`.
- Extract a `MetricContext` so `gaugeMetric(ctx, m)` and
`sumMetric(ctx, m, state)` replace the five-arg `gaugeRecord` and
`cumulativeSum`.
- Convert `spanStatus` if-chain to a switch on
`event.properties.result`.
- Convert `metricRecords` for-loop to `flatMap` with a normalized
0-or-1 record return.
- Have `toNonNegativeBigInt` delegate to `toIntegerBigInt` so the
rounding rule lives in one place.
envelope.ts
- Drop the `.cause` archaeology in `close()`. Collapse `stream.write`
and `stream.end` plumbing into one `awaitOp` helper that handles
the errRef pre-check and post-callback merge once.
- Move error labelling to the three API boundaries (prefix, append,
close) so the verb stays at the call site rather than threading
through parameters. Drop the trailing-underscore `reject_`.
metrics.ts
- Replace the `measurementUnit` if-chain with a lookup table.
errorUtils.ts
- Add a shared `wrapError(verb, target, cause)` helper to replace
six near-identical `Failed to ... ${path}` throws across
envelope.ts and writer.ts.
tests
- Extract a shared `helpers.ts` with `TRACE_ID`, `attrs()`, and a
signal-driven `parseEnvelope(files, signal)` that reads keys from
the production `ENVELOPES` table.
- Drop the duplicate `TRACE_ID`/`attrs` definitions from
writer.test.ts and records.test.ts.
- Move `gauge`/`counter` measurement builders to module scope and
type them as `MetricMeasurement`.
- writer.test.ts derives the file-list assertion from `ENVELOPES`;
resource-consistency test now uses cross-envelope equality + a
`toMatchObject` spot-check, deferring full shape to records.test.ts.
- Align `makeEvent`/`context` in writer.test.ts with records.test.ts
(module-level constants instead of let + beforeEach).
Delete comments that restate code; keep only ones recording a
non-obvious WHY.
- match production "count." prefix in the metric classifier so HTTP request counts export as cumulative sums instead of dimensionless gauges - close opened envelope fds on prefix/suffix write failure and on partial openChannels rejection - collision-proof the cumulative counter series key with JSON.stringify - clamp negative counter values to preserve the monotonic-sum invariant - count records actually written, not events routed - narrow spanRecord to require a traceId at the type level - drop the stale ssh.network.info entry and tighten doc comments - use promisify(zip) to match supportBundleLogs.ts
Addresses R2 review feedback on PR #961: - records: assert the Math.max(0, ...) negative-counter clamp directly (existing Infinity case only exercised the isFinite guard). - records: cover the result="error" + error object combination so a regression on the spread of error.message is caught. - envelope: also assert stream.destroyed on the stream.end failure path (matches the sibling suffix-write test). - metrics: swap %p (not a Vitest format specifier) for %s in it.each titles so boolean rows render.
d8aa467 to
4929d92
Compare
- Stamp coder.event.{extension_version,session_id,deployment_url} on
every log/span/metric record so multi-session and multi-deployment
exports preserve the original producer's identity. The resource block
remains an export-tool snapshot.
- Stage envelopes in os.tmpdir() instead of next to the user's save
path so cloud-sync agents do not ingest the intermediate uncompressed
logs.json/traces.json/metrics.json.
- Replace the in-memory fflate.zip with streaming Zip + ZipDeflate so
multi-GB exports do not hold every envelope plus the zipped output in
the V8 heap at once.
- Forward staging-cleanup failures via
OtlpWriteOptions.onStagingCleanupError instead of letting a Windows
EBUSY in finally mask an otherwise-successful export (which
writeAtomically would then erase via temp-file cleanup).
- Increment metric channel counts per event rather than per non-empty
record batch so metric events with all-zero counters stay in the
user-visible total and JSON/OTLP exports of the same range agree.
- Accept an optional AbortSignal so callers can cancel a long-running
export between events and between files.
4929d92 to
22d73a9
Compare
mtojek
left a comment
There was a problem hiding this comment.
This is looking good! Left minor comments, but would like to have another pair of eyes to review the contribution.
| } | ||
|
|
||
| /** Streams `<prefix>v1,v2,...<suffix>` JSON into `filePath`. */ | ||
| export async function openEnvelopeFile( |
| await writeChunk(suffix); | ||
| await awaitOp((cb) => stream.end(cb)); | ||
| } catch (err) { | ||
| stream.destroy(); |
There was a problem hiding this comment.
just double-check: no way destroy will throw an error?
| /** | ||
| * Local TypeScript shapes for the OTLP/JSON wire format. Upstream TS types | ||
| * live in `@opentelemetry/otlp-transformer`, which pulls in the OTel SDK; we | ||
| * mirror the proto schema instead so the writer stays SDK-free. |
There was a problem hiding this comment.
is it a huge problem to depend on SDK?
| @@ -0,0 +1,128 @@ | |||
| import { afterEach, beforeEach, describe, expect, it, vi } from "vitest"; | |||
There was a problem hiding this comment.
Could we simplify unit tests by relying on golden files? Also, such files are easier to review in terms of schema changes
| count: number; | ||
| } | ||
|
|
||
| const READ_HWM_BYTES = 256 * 1024; |
There was a problem hiding this comment.
nit: add comment HWM - high water mark
Summary
.otlp.zipexport writing with POST-readylogs.json,traces.json, andmetrics.jsonRefs #903.
Stack: 3 / 4. Base: #960. Next: #953.
Review follow-up