Add native GET /internal/metrics#188
Conversation
Expose the same operational data as /internal/status in Prometheus text format so scrapers can ingest Olla metrics without an external wrapper.
…lity. Export /internal/stats/models data alongside status metrics, keep the status label only on olla_endpoint_up, and document the built-in scrape endpoint.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughAdds ChangesPrometheus metrics endpoint
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
internal/app/handlers/handler_metrics.go (1)
214-225: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftSame interleaving issue in the per-endpoint model loop.
This nested loop emits
olla_model_endpoint_requests_total,_success_rate_percent,_avg_latency_ms, and_consecutive_errorsinterleaved per (model, endpoint) pair, again breaking the single-uninterrupted-group requirement of the text format. Group each family's samples together as with the endpoint metrics above.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/app/handlers/handler_metrics.go` around lines 214 - 225, The nested model endpoint metrics loop in handler_metrics.go is interleaving different metric families per (model, endpoint) pair, which breaks the required grouping in the text format. Update the modelEndpointStats section so writePrometheusLabeledGauge emits all olla_model_endpoint_requests_total samples first, then all olla_model_endpoint_success_rate_percent, then all olla_model_endpoint_avg_latency_ms, and finally all olla_model_endpoint_consecutive_errors, using the existing modelName and epName values in the loop.
🧹 Nitpick comments (1)
internal/app/handlers/handler_metrics_test.go (1)
78-88: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winTests do not validate metric-family grouping.
The
assert.Containschecks pass regardless of line ordering, so they would not catch the interleaved-family issue flagged inhandler_metrics.go. Consider adding an assertion that, for a multi-endpoint exposition, all samples of a family are contiguous (or parse the body with a Prometheus text parser) to lock in spec compliance.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/app/handlers/handler_metrics_test.go` around lines 78 - 88, The current assertions in handler_metrics_test only check for substring presence and won’t catch interleaved metric families. Update the test around the handler response body check to verify family contiguity for a multi-endpoint exposition, either by asserting the ordering/grouping of samples or by parsing the output with a Prometheus text parser. Use the existing test case in handler_metrics_test and the handler_metrics.go output as the reference point so the spec-compliance regression is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/content/configuration/practices/monitoring.md`:
- Line 16: Broaden the summary for the /internal/metrics endpoint so it reflects
all exported data, not just that it mirrors /internal/status. Update the
monitoring docs entry to mention the additional model series coming from
/internal/stats/models, using the existing /internal/metrics description as the
reference point.
In `@internal/app/handlers/handler_metrics.go`:
- Around line 120-156: The metrics output in the handler loop is interleaving
different `olla_endpoint_*` families by iterating per endpoint in
`collectMetrics`/`writePrometheusLabeledGauge`, which breaks Prometheus
text-format grouping. Refactor the export logic so each metric family is emitted
in one contiguous block for all endpoints before moving to the next family,
keeping HELP/TYPE and all samples for a family together. Use the existing `all`,
`statsMap`, `modelMap`, and `connectionStats` data, but change the loop
structure to be family-oriented rather than endpoint-oriented.
In `@internal/app/handlers/handler_status.go`:
- Around line 102-123: The nil-handling for statsCollector is inconsistent
between gatherStatusSnapshot and metricsHandler, and gatherStatusSnapshot can
panic before the later guard runs. Update gatherStatusSnapshot in Application to
defensively check a.statsCollector before calling GetEndpointStats,
GetProxyStats, GetSecurityStats, and GetConnectionStats, matching the pattern
used in metricsHandler. If the collector is nil, return a statusSnapshot with
safe empty/default stats instead of dereferencing it.
---
Duplicate comments:
In `@internal/app/handlers/handler_metrics.go`:
- Around line 214-225: The nested model endpoint metrics loop in
handler_metrics.go is interleaving different metric families per (model,
endpoint) pair, which breaks the required grouping in the text format. Update
the modelEndpointStats section so writePrometheusLabeledGauge emits all
olla_model_endpoint_requests_total samples first, then all
olla_model_endpoint_success_rate_percent, then all
olla_model_endpoint_avg_latency_ms, and finally all
olla_model_endpoint_consecutive_errors, using the existing modelName and epName
values in the loop.
---
Nitpick comments:
In `@internal/app/handlers/handler_metrics_test.go`:
- Around line 78-88: The current assertions in handler_metrics_test only check
for substring presence and won’t catch interleaved metric families. Update the
test around the handler response body check to verify family contiguity for a
multi-endpoint exposition, either by asserting the ordering/grouping of samples
or by parsing the output with a Prometheus text parser. Use the existing test
case in handler_metrics_test and the handler_metrics.go output as the reference
point so the spec-compliance regression is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 22de9af2-3f03-4e81-b6b8-c406afcd8e61
📒 Files selected for processing (11)
docs/content/api-reference/overview.mddocs/content/api-reference/system.mddocs/content/configuration/practices/monitoring.mdinternal/app/handlers/handler_metrics.gointernal/app/handlers/handler_metrics_test.gointernal/app/handlers/handler_status.gointernal/app/handlers/handler_status_endpoints_test.gointernal/app/handlers/handler_version.gointernal/app/handlers/server_routes.gointernal/core/constants/content.gointernal/core/constants/endpoint.go
Group metric families contiguously per exposition format, guard gatherStatusSnapshot against a nil stats collector, broaden monitoring docs, and assert family grouping in tests.
Summary
Add native
GET /internal/metricsin Prometheus text format, exposing operationaldata from the same builders as
/internal/status(system, endpoint, security) and/internal/stats/models(model usage and routing). No external Python exporterrequired for core proxy monitoring.
Status metrics use
gatherStatusSnapshot/buildStatusResponse; model metricsuse
buildModelStats/buildSummary.Type of change
How was this tested?
Built via
make docker-build-localand tested locallyAdded new tests for
/internal/metricsendpoint and sucessfully passed all tests and lintsChecklist
make readypasses (test-short + test-race + fmt + vet + lint + align)./docs) if behaviour or config changedSummary by CodeRabbit
New Features
/internal/metrics) for monitoring app health, requests, models, and routing activity.Documentation
/internal/metrics, including curl/scrape examples and metric series details (and how it relates to other system endpoints).Bug Fixes