feat: add RateLimitLayer middleware#566
Closed
Linuxdazhao wants to merge 2 commits into
Closed
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Uses governor for RPM limiting, placed below the retry layer so retries are also throttled. Reads x-ratelimit-remaining-requests and x-ratelimit-reset-requests headers to apply backpressure when the server quota is exhausted. Gated behind the rate-limit feature.
Contributor
Author
|
close, 有文件需要清理 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
接着 #320 的讨论,我把 RPM 限流层做了出来。
放在 retry 下面,governor 做 RPM 限流,同时从 x-ratelimit-remaining-requests 和 x-ratelimit-reset-requests 头读服务端的限流状态。remaining 到 0 的时候 poll_ready 会卡住等到 reset 时间再放行,避免发多余的请求。governor 桶和背压状态在 clone 之间共享,retry 的时候也走同一个桶。
native 上用 tokio::sleep 做等待,sleep future 存在 service 里防止 waker 丢掉;wasm 上只走 governor 本地计数,不做延迟。
TPM 限流需要从响应体提取 token 用量,后面单独做。
开了 rate-limit feature 才生效,不影响现有用户。13 个单测覆盖了 header 解析、背压状态、waker 注册、governor 桶共享、重试穿过限流层这些情况。