Skip to content

test: add vLLM HTTP logprobs contract test for NeMo-Gym capture#2845

Open
ananthsub wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ananthsub:ansubramania/vllm-logprobs-contract-test
Open

test: add vLLM HTTP logprobs contract test for NeMo-Gym capture#2845
ananthsub wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
ananthsub:ansubramania/vllm-logprobs-contract-test

Conversation

@ananthsub

@ananthsub ananthsub commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a GPU contract test that pins the vLLM OpenAI HTTP logprobs behavior NeMo-Gym capture depends on, complementing the upstream Gym fix for top_logprobs=null: NVIDIA-NeMo/Gym#1612

With that change^, NeMo Gym's vllm_model forwards logprobs=True + return_tokens_as_token_ids=True and pins top_logprobs=0 to extract per-token ids/logprobs for training. Currnet vLLM versions compute logprobs = top_logprobs if logprobs else None, so omitting top_logprobs (default 0) or sending 0 returns logprobs, while an explicit null returns none — silently emptying captured token ids and freezing training (zero loss mask / grad_norm≈0).

The bug lives in vLLM's request→SamplingParams translation, which is only exercised over the HTTP path (the offline LLM/SamplingParams API used by the other vllm tests does not have this issue). This test drives the real server via the existing nemo_gym_vllm_generation fixture so a vLLM bump that changes the contract fails here instead of silently freezing training.

Changes

  • tests/unit/environments/test_nemo_gym.py: new @pytest.mark.nemo_gym test test_vllm_http_logprobs_contract, exercising top_logprobs ∈ {omitted, 0, null} against dp_openai_server_base_urls. Omitted/0 must return per-token logprobs whose "token_id:<int>" tokens parse to ints; null must return no logprobs (the divergence motivating the Gym fix — if a future vLLM makes null behave like 0, this fails and signals the workaround can be relaxed).

NeMo-Gym's vllm_model forwards logprobs=True and return_tokens_as_token_ids=True
and pins top_logprobs=0 to extract per-token ids and logprobs for training. vLLM
computes `logprobs = top_logprobs if logprobs else None`, so omitting top_logprobs
(default 0) or sending 0 returns logprobs, while an explicit null returns none and
silently empties the captured token ids (freezing training).

Add a test that drives the real vLLM OpenAI HTTP server (via the existing
nemo_gym_vllm_generation fixture) over top_logprobs in {omitted, 0, null}, where
the logprobs/top_logprobs -> sampling-params translation actually lives. The
offline LLM API does not exercise that path. This pins the contract against the
supported vLLM so a version bump that changes it fails here instead of silently
emptying training token ids.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
@ananthsub ananthsub requested a review from a team as a code owner June 16, 2026 14:47
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ananthsub ananthsub added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 16, 2026
@ananthsub

Copy link
Copy Markdown
Contributor Author

/ok to test 7b508e5

@terrykong terrykong left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this — it's a valuable contract test that pins the vLLM HTTP request→SamplingParams behavior (the offline LLM API tests don't exercise this path) and is designed to fail loudly on a vLLM bump that changes top_logprobs semantics, right where the silent training-freeze lives.

Verified the core reasoning against source:

  • vLLM v0.20.0 logprobs = self.top_logprobs if self.logprobs else None (protocol.py#L549), top_logprobs default 0; an explicit null skips validation and returns 200 with logprobs=None (validator L625-637).
  • The RL HTTP wrapper does assert temperature/top_p equality (vllm_worker_async.py#L718-L719), so passing them is required.

Note: I couldn't get a green local run — the shared nemo_gym_vllm_generation fixture failed in Ray worker bootstrap (_get_node_ip_and_free_port) during setup in my multi-worktree box, before the test body ran. That's environmental (affects all nemo-gym vLLM tests), not specific to this test — leaving the green run to CI/L0.

One non-blocking suggestion inline.

Generated by Claude Code

Comment on lines +362 to +364
null_resp = _chat(None)
if null_resp.status_code == 200:
assert null_resp.json()["choices"][0].get("logprobs") is None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/unit/environments/test_nemo_gym.py:362

Null case can pass vacuously on a non-200. The branch only asserts the contract when status_code == 200; for any non-200 the test passes with zero assertions. The docstring allows "rejects the request outright" as acceptable, but as written a future vLLM 5xx (or an unrelated server error) would also pass silently — the mirror image of the 200-path check this test is built around. At vLLM v0.20.0 explicit null returns 200 with logprobs=None (protocol.py#L549), so the assertion does fire today; this just hardens the rejection branch for future bumps.

Suggested change
null_resp = _chat(None)
if null_resp.status_code == 200:
assert null_resp.json()["choices"][0].get("logprobs") is None
null_resp = _chat(None)
if null_resp.status_code == 200:
assert null_resp.json()["choices"][0].get("logprobs") is None
else:
# A rejection must be a client-side validation error, not an unrelated server failure.
assert 400 <= null_resp.status_code < 500, (
f"expected null top_logprobs accepted-with-None or rejected as 4xx, "
f"got {null_resp.status_code}: {null_resp.text}"
)

Optional nit (wording only): the docstring (L303) and the L353 comment say NeMo-Gym "pins top_logprobs=0". At the pinned Gym submodule SHA f82b601 Gym actually omits top_logprobs on the capture path (vllm_model/app.py#L320-L330) — the =0 pin is what Gym PR #1612 adds. Functionally identical (omit → default 0), so purely a future-reader clarity nit; e.g. "Gym sets logprobs=True/return_tokens_as_token_ids=True; Gym PR #1612 additionally pins top_logprobs=0."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants