probes: BPF uprobe service for entry/exit duration tracking via OTLP logs#3181
Open
gnurizen wants to merge 1 commit into
Open
probes: BPF uprobe service for entry/exit duration tracking via OTLP logs#3181gnurizen wants to merge 1 commit into
gnurizen wants to merge 1 commit into
Conversation
0779bed to
791b9b4
Compare
8fa8124 to
1c1c011
Compare
a178b36 to
2955851
Compare
2955851 to
c6c61a7
Compare
c6c61a7 to
b15a152
Compare
4 tasks
Add a probes package that:
- parses a YAML config of (symbol, file_match regex) pairs and assigns
a 1-based spec_id per entry;
- loads an embedded BPF uprobe program (probe.bpf.amd64, built by
`make probes-bpf`) that emits one ringbuf record per fire carrying
ktime/pid/tid/comm/spec_id;
- on each newly-observed executable, regex-matches its path and
attaches an exec.Uprobe per matching spec, encoding the spec_id in
the uprobe cookie;
- drains the ringbuf in a goroutine and forwards each event as a
reporter.LogEvent (Body=symbol, attrs=pid/tid/comm/spec_id) via
reporter.ParcaReporter.ReportLogEvents. The BPF service no longer
owns the Arrow log stream — that lives in the reporter package now.
Reporter integration is via two small additions: a ProbesHook interface
(OnExecutable) plus a SetProbes setter on arrowReporter so ReportExecutable
can notify the BPF service to attach to fresh binaries.
Wires the existing --probe-config flag through main.go: when set, the
service is started with the parca reporter; offline mode is rejected
since log streaming needs a gRPC conn.
b15a152 to
c86c9d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a generic eBPF uprobe service that attaches paired entry/exit probes to declared symbols and emits per-call duration records through the OTLP logs pipeline introduced in #3190.
The motivating use case is finding long JS execution blocks in Node.js applications. A 200 ms callback hogging the libuv event loop is exactly the "phantom" tail latency that doesn't show up in CPU profiles. The reference config attaches to
node::InternalCallbackScope's ctor + dtor, brackets each outer libuv callback, and emits one record per scope close with a precise duration measured in BPF.What's added
probes/package:{id, file_match (regex), entry_symbol, exit_symbol, main_thread_only, min_duration_ms}.tid == tgiddirectly in BPF; libuv worker-pool / V8 background threads are silently dropped.log.Recordper completed outer scope viaLogger("parca-agent.probes"), so consumers can filter onattributes_scope.nameto slice probe records vs other agent logs.--probe-config <path>CLI flag (default: disabled). Requires a remote-store.probes/testdata/probes.yaml.samplewith the Node.jsInternalCallbackScopector + dtor pair pre-verified on Node v18.20 / v22.17 / v24.4 (same mangled symbol on all three, exported via.dynsymso stripped builds work too).probe.bpf.o(noPT_REGS_*macros used).Makefile+.goreleaser.ymlbefore:hook + CI workflow +probes/bpf/README.mdset up to produce it on every release.Data on the wire
Each probe-fire record:
body"node.callback_scope"(stable; queryable)timestamptimes.KTimeoffset (aligned with CPU sample timestamps; range queries over[start_ns, end_ns]against the profile-samples table don't drift)attributesstart_ns,end_ns,duration_ns,pid,tid,comm,is_main,spec_id,probe_id,levelresourceLoggerProvider:service.name,service.version,host.namescope"parca-agent.probes"Test plan
make probes-bpfproducesprobes/bpf/probe.bpf.ogo build ./...cleango test ./probes/... ./reporter/... -count=1— all passnode blocker.jsdoing synthetic 200 ms blocks. Agent attached, fired 142 times over 8 seconds. Duration histogram: 112 sub-ms callbacks (libuv noise), 27 records at 199 ms and 3 at 200 ms (the blocker). Exactly the expected signal.Stacked on
Was stacked on #3190 (OTel logs support), which has since merged. This branch is now rebased directly on top of main.