fix: improve Responses API streaming event lifecycle and input handling#63
fix: improve Responses API streaming event lifecycle and input handling#63ivanopcode wants to merge 9 commits into
Conversation
Review Summary by QodoImprove Responses API streaming lifecycle and input handling
WalkthroughsDescription• Improved input handling for function_call, function_call_output, and developer message types in Responses API requests • Enhanced streaming event lifecycle with response.output_item.added, response.output_item.done, response.output_text.done, and response.content_part.done events • Fixed item_id consistency across streamed tool-call events (changed from id to item_id in ToolCallArgumentsDone) • Replaced non-spec ready status with in_progress/completed for tool calls • Added reasoning_content caching and reinject mechanism across tool-call turns for CoT preservation Diagramflowchart LR
A["Input Processing"] -->|function_call| B["Convert to Assistant Message"]
A -->|function_call_output| C["Convert to Tool Message"]
A -->|developer| D["Convert to System Message"]
B --> E["Cache reasoning_content"]
C --> E
E --> F["Stream Events"]
F -->|output_item.added| G["Tool Call Created"]
F -->|function_call_arguments.delta| H["Arguments Streaming"]
H -->|function_call_arguments.done| I["Arguments Complete"]
I -->|output_item.done| J["Tool Call Done"]
F -->|output_text.delta| K["Text Streaming"]
K -->|output_text.done| L["Text Complete"]
L -->|output_item.done| M["Message Done"]
E -->|Next Turn| N["Reinject reasoning_content"]
File Changes1. src/open_responses_server/models/responses_models.py
|
Code Review by Qodo
1.
|
There was a problem hiding this comment.
Pull request overview
This PR improves the Responses adapter to better match the Responses API, especially for tool-calling clients and models that emit separate reasoning_content.
Changes:
- Expanded
inputitem conversion to supportfunction_call,function_call_output(incl.idfallback + non-string output normalization), anddevelopermessages. - Implemented a richer streaming event lifecycle for tool calls and message items (added/done events, arguments.done, text done events) and standardized status values (
in_progress/completed). - Added a bounded
reasoning_contentcache to reinject model reasoning across tool-call turns (“CoT passback”).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/test_responses_service.py |
Adds regression tests for reasoning cache eviction, tool output normalization, and stricter streaming lifecycle expectations. |
src/open_responses_server/responses_service.py |
Updates request input handling, adds CoT caching, and significantly expands streaming event lifecycle emissions. |
src/open_responses_server/models/responses_models.py |
Updates/extends streaming event models (e.g., output_item.*, output_text.done) and renames ToolCallArgumentsDone.id → item_id. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Addressed the latest review comments in two follow-up commits:
|
Problem
ORS did not fully support some Responses API request and event patterns used
by tool-calling clients and open-weight reasoning models such as gpt-oss.
In practice this caused three classes of problems:
Input history was reconstructed incompletely.
Clients send prior tool calls, tool results, and developer messages as
inputitems on each turn. ORS only handled a subset of these items, soparts of the conversation history were dropped before reaching the backend.
The streamed Responses event lifecycle was incomplete.
Several expected events and state transitions were missing or inconsistent,
especially around tool calls and text output items.
reasoning_contentwas not preserved across tool-call turns.For models that emit reasoning separately from the final answer, losing that
context degraded multi-step tool use. This is sometimes referred to in the
community as "CoT passback".
These issues were reproduced with Codex CLI, but the fixes bring ORS closer
to the Responses API model more generally.
Changes
This MR updates the Responses adapter to:
function_call,function_call_output, anddeveloperinput itemsinto the corresponding chat-completions message structure
output, including
response.output_item.added,response.function_call_arguments.done,response.output_text.done, andresponse.output_item.doneitem_idvalues across streamed tool-call eventsreadystatus within_progress/completedreasoning_contentacross tool-call turns when the modelprovides it
Testing
Tested with:
uv run pytest tests/test_responses_service.pyAlso verified manually with Codex CLI against local llama.cpp-backed models,
including multi-turn tool-calling flows where prior tool calls and reasoning
need to survive across turns.