Add native `GET /internal/metrics` by Puupuls · Pull Request #188 · thushan/olla

Puupuls · 2026-06-25T11:48:48Z

Summary

Add native GET /internal/metrics in Prometheus text format, exposing operational
data from the same builders as /internal/status (system, endpoint, security) and
/internal/stats/models (model usage and routing). No external Python exporter
required for core proxy monitoring.

Status metrics use gatherStatusSnapshot / buildStatusResponse; model metrics
use buildModelStats / buildSummary.

Type of change

How was this tested?

Built via make docker-build-local and tested locally
Added new tests for /internal/metrics endpoint and sucessfully passed all tests and lints

Checklist

make ready passes (test-short + test-race + fmt + vet + lint + align)
Tests added or updated for the change
Documentation updated (./docs) if behaviour or config changed
No new dependencies added (or called out and justified below)

Summary by CodeRabbit

New Features
- Added a new Prometheus metrics endpoint (/internal/metrics) for monitoring app health, requests, models, and routing activity.
- Exposed the metrics endpoint in the API version response and registered it in system routes.
Documentation
- Updated the API reference and monitoring guides with /internal/metrics, including curl/scrape examples and metric series details (and how it relates to other system endpoints).
Bug Fixes
- Improved status snapshot assembly and reliability when model/proxy stats are unavailable, and added comprehensive tests for metrics formatting and escaping.

Expose the same operational data as /internal/status in Prometheus text format so scrapers can ingest Olla metrics without an external wrapper.

…lity. Export /internal/stats/models data alongside status metrics, keep the status label only on olla_endpoint_up, and document the built-in scrape endpoint.

coderabbitai · 2026-06-25T11:49:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a6c350b-a445-4b33-bbd4-32bbf435e2e7

📥 Commits

Reviewing files that changed from the base of the PR and between 7f060d3 and 8c450db.

📒 Files selected for processing (4)

docs/content/configuration/practices/monitoring.md
internal/app/handlers/handler_metrics.go
internal/app/handlers/handler_metrics_test.go
internal/app/handlers/handler_status.go

🚧 Files skipped from review as they are similar to previous changes (4)

internal/app/handlers/handler_status.go
internal/app/handlers/handler_metrics_test.go
docs/content/configuration/practices/monitoring.md
internal/app/handlers/handler_metrics.go

Walkthrough

Adds /internal/metrics, wires it into routing and version metadata, refactors status snapshot collection, implements Prometheus text rendering for system, endpoint, and model metrics, and updates API reference and monitoring documentation.

Changes

Prometheus metrics endpoint

Layer / File(s)	Summary
Endpoint surface wiring `internal/core/constants/endpoint.go`, `internal/core/constants/content.go`, `internal/app/handlers/server_routes.go`, `internal/app/handlers/handler_version.go`, `docs/content/api-reference/overview.md`	Adds `/internal/metrics` constants, registers the GET route, exposes the endpoint in version metadata, and lists it in the API overview.
Status snapshot plumbing `internal/app/handlers/handler_status.go`, `internal/app/handlers/handler_status_endpoints_test.go`	Centralises status snapshot gathering and response construction, and expands the status test collector to supply proxy and model stats.
Prometheus handler `internal/app/handlers/handler_metrics.go`, `internal/app/handlers/handler_metrics_test.go`	Implements Prometheus text rendering for system, endpoint, and model metrics, with tests covering handler output and helper formatting.
Documentation updates `docs/content/api-reference/system.md`, `docs/content/configuration/practices/monitoring.md`	Updates the API reference and monitoring guide to describe `/internal/metrics`, show the response mapping, and include scrape configuration examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: adding a native `GET /internal/metrics` endpoint.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

internal/app/handlers/handler_metrics.go (1)
214-225: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Same interleaving issue in the per-endpoint model loop.

This nested loop emits olla_model_endpoint_requests_total, _success_rate_percent, _avg_latency_ms, and _consecutive_errors interleaved per (model, endpoint) pair, again breaking the single-uninterrupted-group requirement of the text format. Group each family's samples together as with the endpoint metrics above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/handlers/handler_metrics.go` around lines 214 - 225, The nested
model endpoint metrics loop in handler_metrics.go is interleaving different
metric families per (model, endpoint) pair, which breaks the required grouping
in the text format. Update the modelEndpointStats section so
writePrometheusLabeledGauge emits all olla_model_endpoint_requests_total samples
first, then all olla_model_endpoint_success_rate_percent, then all
olla_model_endpoint_avg_latency_ms, and finally all
olla_model_endpoint_consecutive_errors, using the existing modelName and epName
values in the loop.

🧹 Nitpick comments (1)

internal/app/handlers/handler_metrics_test.go (1)
78-88: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tests do not validate metric-family grouping.

The assert.Contains checks pass regardless of line ordering, so they would not catch the interleaved-family issue flagged in handler_metrics.go. Consider adding an assertion that, for a multi-endpoint exposition, all samples of a family are contiguous (or parse the body with a Prometheus text parser) to lock in spec compliance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/handlers/handler_metrics_test.go` around lines 78 - 88, The
current assertions in handler_metrics_test only check for substring presence and
won’t catch interleaved metric families. Update the test around the handler
response body check to verify family contiguity for a multi-endpoint exposition,
either by asserting the ordering/grouping of samples or by parsing the output
with a Prometheus text parser. Use the existing test case in
handler_metrics_test and the handler_metrics.go output as the reference point so
the spec-compliance regression is covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/content/configuration/practices/monitoring.md`:
- Line 16: Broaden the summary for the /internal/metrics endpoint so it reflects
all exported data, not just that it mirrors /internal/status. Update the
monitoring docs entry to mention the additional model series coming from
/internal/stats/models, using the existing /internal/metrics description as the
reference point.

In `@internal/app/handlers/handler_metrics.go`:
- Around line 120-156: The metrics output in the handler loop is interleaving
different `olla_endpoint_*` families by iterating per endpoint in
`collectMetrics`/`writePrometheusLabeledGauge`, which breaks Prometheus
text-format grouping. Refactor the export logic so each metric family is emitted
in one contiguous block for all endpoints before moving to the next family,
keeping HELP/TYPE and all samples for a family together. Use the existing `all`,
`statsMap`, `modelMap`, and `connectionStats` data, but change the loop
structure to be family-oriented rather than endpoint-oriented.

In `@internal/app/handlers/handler_status.go`:
- Around line 102-123: The nil-handling for statsCollector is inconsistent
between gatherStatusSnapshot and metricsHandler, and gatherStatusSnapshot can
panic before the later guard runs. Update gatherStatusSnapshot in Application to
defensively check a.statsCollector before calling GetEndpointStats,
GetProxyStats, GetSecurityStats, and GetConnectionStats, matching the pattern
used in metricsHandler. If the collector is nil, return a statusSnapshot with
safe empty/default stats instead of dereferencing it.

---

Duplicate comments:
In `@internal/app/handlers/handler_metrics.go`:
- Around line 214-225: The nested model endpoint metrics loop in
handler_metrics.go is interleaving different metric families per (model,
endpoint) pair, which breaks the required grouping in the text format. Update
the modelEndpointStats section so writePrometheusLabeledGauge emits all
olla_model_endpoint_requests_total samples first, then all
olla_model_endpoint_success_rate_percent, then all
olla_model_endpoint_avg_latency_ms, and finally all
olla_model_endpoint_consecutive_errors, using the existing modelName and epName
values in the loop.

---

Nitpick comments:
In `@internal/app/handlers/handler_metrics_test.go`:
- Around line 78-88: The current assertions in handler_metrics_test only check
for substring presence and won’t catch interleaved metric families. Update the
test around the handler response body check to verify family contiguity for a
multi-endpoint exposition, either by asserting the ordering/grouping of samples
or by parsing the output with a Prometheus text parser. Use the existing test
case in handler_metrics_test and the handler_metrics.go output as the reference
point so the spec-compliance regression is covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 22de9af2-3f03-4e81-b6b8-c406afcd8e61

📥 Commits

Reviewing files that changed from the base of the PR and between a595ca7 and 7f060d3.

📒 Files selected for processing (11)

docs/content/api-reference/overview.md
docs/content/api-reference/system.md
docs/content/configuration/practices/monitoring.md
internal/app/handlers/handler_metrics.go
internal/app/handlers/handler_metrics_test.go
internal/app/handlers/handler_status.go
internal/app/handlers/handler_status_endpoints_test.go
internal/app/handlers/handler_version.go
internal/app/handlers/server_routes.go
internal/core/constants/content.go
internal/core/constants/endpoint.go

Group metric families contiguously per exposition format, guard gatherStatusSnapshot against a nil stats collector, broaden monitoring docs, and assert family grouping in tests.

Puupuls added 2 commits June 25, 2026 12:57

Add native Prometheus /internal/metrics endpoint.

c8c8bfb

Expose the same operational data as /internal/status in Prometheus text format so scrapers can ingest Olla metrics without an external wrapper.

Extend Prometheus metrics with model stats and lower endpoint cardina…

7f060d3

…lity. Export /internal/stats/models data alongside status metrics, keep the status label only on olla_endpoint_up, and document the built-in scrape endpoint.

Puupuls requested a review from thushan as a code owner June 25, 2026 11:48

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread docs/content/configuration/practices/monitoring.md Outdated

Comment thread internal/app/handlers/handler_metrics.go

Comment thread internal/app/handlers/handler_status.go

Address CodeRabbit review on Prometheus metrics export.

8c450db

Group metric families contiguously per exposition format, guard gatherStatusSnapshot against a nil stats collector, broaden monitoring docs, and assert family grouping in tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add native `GET /internal/metrics`#188

Add native `GET /internal/metrics`#188
Puupuls wants to merge 3 commits into
thushan:mainfrom
asya-ai:main

Puupuls commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Puupuls commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

How was this tested?

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Puupuls commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading