feat(metrics): add model_name label and new throughput/cache metrics by sufubao · Pull Request #1344 · ModelTC/LightLLM

sufubao · 2026-06-11T08:20:38Z

Summary

Ported metrics improvements from the qwen35 branch to main.

Changes

lightllm/server/metrics/metrics.py

所有 Prometheus 指标（Histogram、Counter、Gauge）统一增加 model_name label，支持多模型部署时区分各模型的监控数据
新增 counter_inc_by(name, amount) 方法，支持按任意数量递增计数器
新增 5 个监控指标：
- lightllm_prompt_tokens_total — 累计 prefill token 数
- lightllm_generation_tokens_total — 累计 generation token 数
- lightllm_cache_hit_rate — 前缀缓存命中率
- lightllm_gen_throughput — 生成吞吐量（tokens/s）
- lightllm_num_running_reqs — 当前运行中的请求数

lightllm/server/metrics/manager.py

MetricServer 新增 exposed_counter_inc_by RPC 方法
MetricClient 新增 counter_inc_by 异步调用方法

Test plan

启动服务后访问 /metrics 端点，确认新指标出现且带有 model_name label
发送请求后确认 lightllm_prompt_tokens_total / lightllm_generation_tokens_total 正确递增
验证旧有指标（如 lightllm_request_duration）的 label 兼容性

gemini-code-assist

Code Review

This pull request introduces several new metrics (such as prompt/generation token totals, cache hit rate, throughput, and running requests) and updates the metrics manager to support incrementing counters by a specific amount. Additionally, all metrics are updated to include a model_name label. The review feedback highlights a critical issue where args.model_name could be None or missing, which would cause runtime errors in prometheus_client when updating metrics. It is recommended to use a safe fallback value like 'unknown' to prevent crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T08:22:03Z

        self.init_metrics(args)

    def init_metrics(self, args):
+        self.model_name = args.model_name


If args.model_name is None (e.g., if the --model_name argument is not provided at startup) or if the attribute does not exist on args, self.model_name will be None or raise an AttributeError. In prometheus_client, passing None as a label value (e.g., model_name=None) will raise a ValueError: Invalid label value: None at runtime when any metric is updated, crashing the metric server or the background metric thread. To prevent this, use getattr with a safe fallback string like 'unknown'.

Suggested change

self.model_name = args.model_name

self.model_name = getattr(args, "model_name", None) or "unknown"

- Add model_name label to all Prometheus metrics (histograms, counters, gauges) so metrics can be distinguished when multiple models are served - Add counter_inc_by() method to Monitor, MetricServer and MetricClient for incrementing counters by arbitrary amounts - Add new metrics: - lightllm_prompt_tokens_total: total prefill tokens processed - lightllm_generation_tokens_total: total generation tokens processed - lightllm_cache_hit_rate: prefix cache hit rate - lightllm_gen_throughput: generation throughput (tokens/s) - lightllm_num_running_reqs: number of running requests Ported from qwen35 branch.

Port the metric-reporting part of qwen35's SystemStatusReporter so the new metrics actually receive values on main: - lightllm_prompt_tokens_total: incremented with batch.input_tokens() when a prefill batch is dispatched - lightllm_generation_tokens_total: incremented per decode step with the number of running requests - lightllm_cache_hit_rate / lightllm_gen_throughput / lightllm_num_running_reqs: gauges refreshed every log_stats_interval seconds (min 5s), same cadence and semantics as the qwen35 branch Unlike qwen35, main's existing router logging is left untouched; only the /metrics reporting is ported.

Previously the gauge was set inside the per-dp debug loop, so in multi-dp deployments it only held the last dp's paused count. Align with qwen35 by reporting the total via _get_paused_req_num().

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

sufubao force-pushed the feat/metrics-model-name-and-new-metrics branch 3 times, most recently from 7157c37 to cc5a4df Compare June 11, 2026 08:24

sufubao force-pushed the feat/metrics-model-name-and-new-metrics branch from cc5a4df to 075b405 Compare June 11, 2026 08:27

sufubao and others added 4 commits June 11, 2026 08:42

fix(metrics): report total paused req num for lightllm_batch_pause_size

6c11387

Previously the gauge was set inside the per-dp debug loop, so in multi-dp deployments it only held the last dp's paused count. Align with qwen35 by reporting the total via _get_paused_req_num().

simple the metrics

f3084e2

fix tool_check

f9f22d4

shihaobai merged commit d471c21 into main Jun 11, 2026
1 check passed

shihaobai deleted the feat/metrics-model-name-and-new-metrics branch June 11, 2026 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): add model_name label and new throughput/cache metrics#1344

feat(metrics): add model_name label and new throughput/cache metrics#1344
shihaobai merged 5 commits into
mainfrom
feat/metrics-model-name-and-new-metrics

sufubao commented Jun 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self.model_name = args.model_name
	self.model_name = getattr(args, "model_name", None) or "unknown"

Conversation

sufubao commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sufubao commented Jun 11, 2026 •

edited

Loading