Skip to content

Add native GET /internal/metrics#188

Open
Puupuls wants to merge 3 commits into
thushan:mainfrom
asya-ai:main
Open

Add native GET /internal/metrics#188
Puupuls wants to merge 3 commits into
thushan:mainfrom
asya-ai:main

Conversation

@Puupuls

@Puupuls Puupuls commented Jun 25, 2026

Copy link
Copy Markdown

Summary

Add native GET /internal/metrics in Prometheus text format, exposing operational
data from the same builders as /internal/status (system, endpoint, security) and
/internal/stats/models (model usage and routing). No external Python exporter
required for core proxy monitoring.

Status metrics use gatherStatusSnapshot / buildStatusResponse; model metrics
use buildModelStats / buildSummary.

Type of change

  • Bug fix
  • New feature
  • Backend / integration support
  • Documentation
  • Refactor / internal (no behaviour change)

How was this tested?

Built via make docker-build-local and tested locally
Added new tests for /internal/metrics endpoint and sucessfully passed all tests and lints

Checklist

  • make ready passes (test-short + test-race + fmt + vet + lint + align)
  • Tests added or updated for the change
  • Documentation updated (./docs) if behaviour or config changed
  • No new dependencies added (or called out and justified below)

Summary by CodeRabbit

  • New Features

    • Added a new Prometheus metrics endpoint (/internal/metrics) for monitoring app health, requests, models, and routing activity.
    • Exposed the metrics endpoint in the API version response and registered it in system routes.
  • Documentation

    • Updated the API reference and monitoring guides with /internal/metrics, including curl/scrape examples and metric series details (and how it relates to other system endpoints).
  • Bug Fixes

    • Improved status snapshot assembly and reliability when model/proxy stats are unavailable, and added comprehensive tests for metrics formatting and escaping.

Puupuls added 2 commits June 25, 2026 12:57
Expose the same operational data as /internal/status in Prometheus text format so scrapers can ingest Olla metrics without an external wrapper.
…lity.

Export /internal/stats/models data alongside status metrics, keep the status label only on olla_endpoint_up, and document the built-in scrape endpoint.
@Puupuls Puupuls requested a review from thushan as a code owner June 25, 2026 11:48
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a6c350b-a445-4b33-bbd4-32bbf435e2e7

📥 Commits

Reviewing files that changed from the base of the PR and between 7f060d3 and 8c450db.

📒 Files selected for processing (4)
  • docs/content/configuration/practices/monitoring.md
  • internal/app/handlers/handler_metrics.go
  • internal/app/handlers/handler_metrics_test.go
  • internal/app/handlers/handler_status.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/app/handlers/handler_status.go
  • internal/app/handlers/handler_metrics_test.go
  • docs/content/configuration/practices/monitoring.md
  • internal/app/handlers/handler_metrics.go

Walkthrough

Adds /internal/metrics, wires it into routing and version metadata, refactors status snapshot collection, implements Prometheus text rendering for system, endpoint, and model metrics, and updates API reference and monitoring documentation.

Changes

Prometheus metrics endpoint

Layer / File(s) Summary
Endpoint surface wiring
internal/core/constants/endpoint.go, internal/core/constants/content.go, internal/app/handlers/server_routes.go, internal/app/handlers/handler_version.go, docs/content/api-reference/overview.md
Adds /internal/metrics constants, registers the GET route, exposes the endpoint in version metadata, and lists it in the API overview.
Status snapshot plumbing
internal/app/handlers/handler_status.go, internal/app/handlers/handler_status_endpoints_test.go
Centralises status snapshot gathering and response construction, and expands the status test collector to supply proxy and model stats.
Prometheus handler
internal/app/handlers/handler_metrics.go, internal/app/handlers/handler_metrics_test.go
Implements Prometheus text rendering for system, endpoint, and model metrics, with tests covering handler output and helper formatting.
Documentation updates
docs/content/api-reference/system.md, docs/content/configuration/practices/monitoring.md
Updates the API reference and monitoring guide to describe /internal/metrics, show the response mapping, and include scrape configuration examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding a native GET /internal/metrics endpoint.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
internal/app/handlers/handler_metrics.go (1)

214-225: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Same interleaving issue in the per-endpoint model loop.

This nested loop emits olla_model_endpoint_requests_total, _success_rate_percent, _avg_latency_ms, and _consecutive_errors interleaved per (model, endpoint) pair, again breaking the single-uninterrupted-group requirement of the text format. Group each family's samples together as with the endpoint metrics above.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/handlers/handler_metrics.go` around lines 214 - 225, The nested
model endpoint metrics loop in handler_metrics.go is interleaving different
metric families per (model, endpoint) pair, which breaks the required grouping
in the text format. Update the modelEndpointStats section so
writePrometheusLabeledGauge emits all olla_model_endpoint_requests_total samples
first, then all olla_model_endpoint_success_rate_percent, then all
olla_model_endpoint_avg_latency_ms, and finally all
olla_model_endpoint_consecutive_errors, using the existing modelName and epName
values in the loop.
🧹 Nitpick comments (1)
internal/app/handlers/handler_metrics_test.go (1)

78-88: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tests do not validate metric-family grouping.

The assert.Contains checks pass regardless of line ordering, so they would not catch the interleaved-family issue flagged in handler_metrics.go. Consider adding an assertion that, for a multi-endpoint exposition, all samples of a family are contiguous (or parse the body with a Prometheus text parser) to lock in spec compliance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/handlers/handler_metrics_test.go` around lines 78 - 88, The
current assertions in handler_metrics_test only check for substring presence and
won’t catch interleaved metric families. Update the test around the handler
response body check to verify family contiguity for a multi-endpoint exposition,
either by asserting the ordering/grouping of samples or by parsing the output
with a Prometheus text parser. Use the existing test case in
handler_metrics_test and the handler_metrics.go output as the reference point so
the spec-compliance regression is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/content/configuration/practices/monitoring.md`:
- Line 16: Broaden the summary for the /internal/metrics endpoint so it reflects
all exported data, not just that it mirrors /internal/status. Update the
monitoring docs entry to mention the additional model series coming from
/internal/stats/models, using the existing /internal/metrics description as the
reference point.

In `@internal/app/handlers/handler_metrics.go`:
- Around line 120-156: The metrics output in the handler loop is interleaving
different `olla_endpoint_*` families by iterating per endpoint in
`collectMetrics`/`writePrometheusLabeledGauge`, which breaks Prometheus
text-format grouping. Refactor the export logic so each metric family is emitted
in one contiguous block for all endpoints before moving to the next family,
keeping HELP/TYPE and all samples for a family together. Use the existing `all`,
`statsMap`, `modelMap`, and `connectionStats` data, but change the loop
structure to be family-oriented rather than endpoint-oriented.

In `@internal/app/handlers/handler_status.go`:
- Around line 102-123: The nil-handling for statsCollector is inconsistent
between gatherStatusSnapshot and metricsHandler, and gatherStatusSnapshot can
panic before the later guard runs. Update gatherStatusSnapshot in Application to
defensively check a.statsCollector before calling GetEndpointStats,
GetProxyStats, GetSecurityStats, and GetConnectionStats, matching the pattern
used in metricsHandler. If the collector is nil, return a statusSnapshot with
safe empty/default stats instead of dereferencing it.

---

Duplicate comments:
In `@internal/app/handlers/handler_metrics.go`:
- Around line 214-225: The nested model endpoint metrics loop in
handler_metrics.go is interleaving different metric families per (model,
endpoint) pair, which breaks the required grouping in the text format. Update
the modelEndpointStats section so writePrometheusLabeledGauge emits all
olla_model_endpoint_requests_total samples first, then all
olla_model_endpoint_success_rate_percent, then all
olla_model_endpoint_avg_latency_ms, and finally all
olla_model_endpoint_consecutive_errors, using the existing modelName and epName
values in the loop.

---

Nitpick comments:
In `@internal/app/handlers/handler_metrics_test.go`:
- Around line 78-88: The current assertions in handler_metrics_test only check
for substring presence and won’t catch interleaved metric families. Update the
test around the handler response body check to verify family contiguity for a
multi-endpoint exposition, either by asserting the ordering/grouping of samples
or by parsing the output with a Prometheus text parser. Use the existing test
case in handler_metrics_test and the handler_metrics.go output as the reference
point so the spec-compliance regression is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 22de9af2-3f03-4e81-b6b8-c406afcd8e61

📥 Commits

Reviewing files that changed from the base of the PR and between a595ca7 and 7f060d3.

📒 Files selected for processing (11)
  • docs/content/api-reference/overview.md
  • docs/content/api-reference/system.md
  • docs/content/configuration/practices/monitoring.md
  • internal/app/handlers/handler_metrics.go
  • internal/app/handlers/handler_metrics_test.go
  • internal/app/handlers/handler_status.go
  • internal/app/handlers/handler_status_endpoints_test.go
  • internal/app/handlers/handler_version.go
  • internal/app/handlers/server_routes.go
  • internal/core/constants/content.go
  • internal/core/constants/endpoint.go

Comment thread docs/content/configuration/practices/monitoring.md Outdated
Comment thread internal/app/handlers/handler_metrics.go
Comment thread internal/app/handlers/handler_status.go
Group metric families contiguously per exposition format, guard gatherStatusSnapshot against a nil stats collector, broaden monitoring docs, and assert family grouping in tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant