feat(metrics): add model_name label and new throughput/cache metrics#1344
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces several new metrics (such as prompt/generation token totals, cache hit rate, throughput, and running requests) and updates the metrics manager to support incrementing counters by a specific amount. Additionally, all metrics are updated to include a model_name label. The review feedback highlights a critical issue where args.model_name could be None or missing, which would cause runtime errors in prometheus_client when updating metrics. It is recommended to use a safe fallback value like 'unknown' to prevent crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| self.init_metrics(args) | ||
|
|
||
| def init_metrics(self, args): | ||
| self.model_name = args.model_name |
There was a problem hiding this comment.
If args.model_name is None (e.g., if the --model_name argument is not provided at startup) or if the attribute does not exist on args, self.model_name will be None or raise an AttributeError. In prometheus_client, passing None as a label value (e.g., model_name=None) will raise a ValueError: Invalid label value: None at runtime when any metric is updated, crashing the metric server or the background metric thread. To prevent this, use getattr with a safe fallback string like 'unknown'.
| self.model_name = args.model_name | |
| self.model_name = getattr(args, "model_name", None) or "unknown" |
7157c37 to
cc5a4df
Compare
- Add model_name label to all Prometheus metrics (histograms, counters, gauges) so metrics can be distinguished when multiple models are served - Add counter_inc_by() method to Monitor, MetricServer and MetricClient for incrementing counters by arbitrary amounts - Add new metrics: - lightllm_prompt_tokens_total: total prefill tokens processed - lightllm_generation_tokens_total: total generation tokens processed - lightllm_cache_hit_rate: prefix cache hit rate - lightllm_gen_throughput: generation throughput (tokens/s) - lightllm_num_running_reqs: number of running requests Ported from qwen35 branch.
cc5a4df to
075b405
Compare
Port the metric-reporting part of qwen35's SystemStatusReporter so the new metrics actually receive values on main: - lightllm_prompt_tokens_total: incremented with batch.input_tokens() when a prefill batch is dispatched - lightllm_generation_tokens_total: incremented per decode step with the number of running requests - lightllm_cache_hit_rate / lightllm_gen_throughput / lightllm_num_running_reqs: gauges refreshed every log_stats_interval seconds (min 5s), same cadence and semantics as the qwen35 branch Unlike qwen35, main's existing router logging is left untouched; only the /metrics reporting is ported.
Previously the gauge was set inside the per-dp debug loop, so in multi-dp deployments it only held the last dp's paused count. Align with qwen35 by reporting the total via _get_paused_req_num().
Summary
Ported metrics improvements from the
qwen35branch tomain.Changes
lightllm/server/metrics/metrics.pymodel_namelabel,支持多模型部署时区分各模型的监控数据counter_inc_by(name, amount)方法,支持按任意数量递增计数器lightllm_prompt_tokens_total— 累计 prefill token 数lightllm_generation_tokens_total— 累计 generation token 数lightllm_cache_hit_rate— 前缀缓存命中率lightllm_gen_throughput— 生成吞吐量(tokens/s)lightllm_num_running_reqs— 当前运行中的请求数lightllm/server/metrics/manager.pyMetricServer新增exposed_counter_inc_byRPC 方法MetricClient新增counter_inc_by异步调用方法Test plan
/metrics端点,确认新指标出现且带有model_namelabellightllm_prompt_tokens_total/lightllm_generation_tokens_total正确递增lightllm_request_duration)的 label 兼容性