feat: add RateLimitLayer middleware#567
Conversation
Uses governor for RPM limiting, placed below the retry layer so retries are also throttled. Reads x-ratelimit-remaining-requests and x-ratelimit-reset-requests headers to apply backpressure when the server quota is exhausted. Gated behind the rate-limit feature.
462e867 to
bbdbfef
Compare
The ServerBackpressure struct, its methods, and helper functions are only used by the non-wasm Service impl. On wasm32 they became dead code, failing the build under -D warnings. Gate them with #[cfg(not(target_family = "wasm"))] and add a wasm-only empty ZST variant so the field type still compiles.
Translation: Following up on the discussion in #320, I have implemented the RPM rate-limiting layer. Positioned beneath the retry layer, the On native platforms, TPM rate limiting requires extracting token usage from the response body and will be implemented separately. This feature is enabled only when the |
|
谢谢。 希望能做几处修改:
Translation: Thanks for this. Would appreciate a few changes:
|
接着 #320 的讨论,我把 RPM 限流层做了出来。
放在 retry 下面,governor 做 RPM 限流,同时从 x-ratelimit-remaining-requests 和 x-ratelimit-reset-requests 头读服务端的限流状态。remaining 到 0 的时候 poll_ready 会卡住等到 reset 时间再放行,避免发多余的请求。governor 桶和背压状态在 clone 之间共享,retry 的时候也走同一个桶。
native 上用 tokio::sleep 做等待,sleep future 存在 service 里防止 waker 丢掉;wasm 上只走 governor 本地计数,不做延迟。
TPM 限流需要从响应体提取 token 用量,后面单独做。
开了 rate-limit feature 才生效,不影响现有用户。13 个单测覆盖了 header 解析、背压状态、waker 注册、governor 桶共享、重试穿过限流层这些情况。