test: add vLLM HTTP logprobs contract test for NeMo-Gym capture#2845
test: add vLLM HTTP logprobs contract test for NeMo-Gym capture#2845ananthsub wants to merge 1 commit into
Conversation
NeMo-Gym's vllm_model forwards logprobs=True and return_tokens_as_token_ids=True
and pins top_logprobs=0 to extract per-token ids and logprobs for training. vLLM
computes `logprobs = top_logprobs if logprobs else None`, so omitting top_logprobs
(default 0) or sending 0 returns logprobs, while an explicit null returns none and
silently empties the captured token ids (freezing training).
Add a test that drives the real vLLM OpenAI HTTP server (via the existing
nemo_gym_vllm_generation fixture) over top_logprobs in {omitted, 0, null}, where
the logprobs/top_logprobs -> sampling-params translation actually lives. The
offline LLM API does not exercise that path. This pins the contract against the
supported vLLM so a version bump that changes it fails here instead of silently
emptying training token ids.
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 7b508e5 |
terrykong
left a comment
There was a problem hiding this comment.
Thanks for adding this — it's a valuable contract test that pins the vLLM HTTP request→SamplingParams behavior (the offline LLM API tests don't exercise this path) and is designed to fail loudly on a vLLM bump that changes top_logprobs semantics, right where the silent training-freeze lives.
Verified the core reasoning against source:
- vLLM v0.20.0
logprobs = self.top_logprobs if self.logprobs else None(protocol.py#L549),top_logprobsdefault0; an explicitnullskips validation and returns 200 withlogprobs=None(validator L625-637). - The RL HTTP wrapper does assert
temperature/top_pequality (vllm_worker_async.py#L718-L719), so passing them is required.
Note: I couldn't get a green local run — the shared nemo_gym_vllm_generation fixture failed in Ray worker bootstrap (_get_node_ip_and_free_port) during setup in my multi-worktree box, before the test body ran. That's environmental (affects all nemo-gym vLLM tests), not specific to this test — leaving the green run to CI/L0.
One non-blocking suggestion inline.
Generated by Claude Code
| null_resp = _chat(None) | ||
| if null_resp.status_code == 200: | ||
| assert null_resp.json()["choices"][0].get("logprobs") is None |
There was a problem hiding this comment.
tests/unit/environments/test_nemo_gym.py:362
Null case can pass vacuously on a non-200. The branch only asserts the contract when status_code == 200; for any non-200 the test passes with zero assertions. The docstring allows "rejects the request outright" as acceptable, but as written a future vLLM 5xx (or an unrelated server error) would also pass silently — the mirror image of the 200-path check this test is built around. At vLLM v0.20.0 explicit null returns 200 with logprobs=None (protocol.py#L549), so the assertion does fire today; this just hardens the rejection branch for future bumps.
| null_resp = _chat(None) | |
| if null_resp.status_code == 200: | |
| assert null_resp.json()["choices"][0].get("logprobs") is None | |
| null_resp = _chat(None) | |
| if null_resp.status_code == 200: | |
| assert null_resp.json()["choices"][0].get("logprobs") is None | |
| else: | |
| # A rejection must be a client-side validation error, not an unrelated server failure. | |
| assert 400 <= null_resp.status_code < 500, ( | |
| f"expected null top_logprobs accepted-with-None or rejected as 4xx, " | |
| f"got {null_resp.status_code}: {null_resp.text}" | |
| ) |
Optional nit (wording only): the docstring (L303) and the L353 comment say NeMo-Gym "pins top_logprobs=0". At the pinned Gym submodule SHA f82b601 Gym actually omits top_logprobs on the capture path (vllm_model/app.py#L320-L330) — the =0 pin is what Gym PR #1612 adds. Functionally identical (omit → default 0), so purely a future-reader clarity nit; e.g. "Gym sets logprobs=True/return_tokens_as_token_ids=True; Gym PR #1612 additionally pins top_logprobs=0."
Summary
Adds a GPU contract test that pins the vLLM OpenAI HTTP logprobs behavior NeMo-Gym capture depends on, complementing the upstream Gym fix for
top_logprobs=null: NVIDIA-NeMo/Gym#1612With that change^, NeMo Gym's
vllm_modelforwardslogprobs=True+return_tokens_as_token_ids=Trueand pinstop_logprobs=0to extract per-token ids/logprobs for training. Currnet vLLM versions computelogprobs = top_logprobs if logprobs else None, so omittingtop_logprobs(default0) or sending0returns logprobs, while an explicitnullreturns none — silently emptying captured token ids and freezing training (zero loss mask /grad_norm≈0).The bug lives in vLLM's request→
SamplingParamstranslation, which is only exercised over the HTTP path (the offlineLLM/SamplingParamsAPI used by the other vllm tests does not have this issue). This test drives the real server via the existingnemo_gym_vllm_generationfixture so a vLLM bump that changes the contract fails here instead of silently freezing training.Changes
tests/unit/environments/test_nemo_gym.py: new@pytest.mark.nemo_gymtesttest_vllm_http_logprobs_contract, exercisingtop_logprobs ∈ {omitted, 0, null}againstdp_openai_server_base_urls. Omitted/0must return per-token logprobs whose"token_id:<int>"tokens parse to ints;nullmust return no logprobs (the divergence motivating the Gym fix — if a future vLLM makesnullbehave like0, this fails and signals the workaround can be relaxed).