Skip to content

fix: improve Responses API streaming event lifecycle and input handling#63

Open
ivanopcode wants to merge 9 commits into
teabranch:mainfrom
ivanopcode:fix/responses-api-streaming-lifecycle
Open

fix: improve Responses API streaming event lifecycle and input handling#63
ivanopcode wants to merge 9 commits into
teabranch:mainfrom
ivanopcode:fix/responses-api-streaming-lifecycle

Conversation

@ivanopcode

@ivanopcode ivanopcode commented Apr 9, 2026

Copy link
Copy Markdown

Problem

ORS did not fully support some Responses API request and event patterns used
by tool-calling clients and open-weight reasoning models such as gpt-oss.

In practice this caused three classes of problems:

  1. Input history was reconstructed incompletely.
    Clients send prior tool calls, tool results, and developer messages as
    input items on each turn. ORS only handled a subset of these items, so
    parts of the conversation history were dropped before reaching the backend.

  2. The streamed Responses event lifecycle was incomplete.
    Several expected events and state transitions were missing or inconsistent,
    especially around tool calls and text output items.

  3. reasoning_content was not preserved across tool-call turns.
    For models that emit reasoning separately from the final answer, losing that
    context degraded multi-step tool use. This is sometimes referred to in the
    community as "CoT passback".

These issues were reproduced with Codex CLI, but the fixes bring ORS closer
to the Responses API model more generally.

Changes

This MR updates the Responses adapter to:

  • convert function_call, function_call_output, and developer input items
    into the corresponding chat-completions message structure
  • emit a more complete Responses streaming lifecycle for tool calls and text
    output, including response.output_item.added,
    response.function_call_arguments.done,
    response.output_text.done, and response.output_item.done
  • use consistent item_id values across streamed tool-call events
  • replace the non-spec ready status with in_progress / completed
  • cache and reinject reasoning_content across tool-call turns when the model
    provides it

Testing

Tested with:

  • uv run pytest tests/test_responses_service.py

Also verified manually with Codex CLI against local llama.cpp-backed models,
including multi-turn tool-calling flows where prior tool calls and reasoning
need to survive across turns.

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Improve Responses API streaming lifecycle and input handling

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Improved input handling for function_call, function_call_output, and developer message types
  in Responses API requests
• Enhanced streaming event lifecycle with response.output_item.added, response.output_item.done,
  response.output_text.done, and response.content_part.done events
• Fixed item_id consistency across streamed tool-call events (changed from id to item_id in
  ToolCallArgumentsDone)
• Replaced non-spec ready status with in_progress/completed for tool calls
• Added reasoning_content caching and reinject mechanism across tool-call turns for CoT
  preservation
Diagram
flowchart LR
  A["Input Processing"] -->|function_call| B["Convert to Assistant Message"]
  A -->|function_call_output| C["Convert to Tool Message"]
  A -->|developer| D["Convert to System Message"]
  B --> E["Cache reasoning_content"]
  C --> E
  E --> F["Stream Events"]
  F -->|output_item.added| G["Tool Call Created"]
  F -->|function_call_arguments.delta| H["Arguments Streaming"]
  H -->|function_call_arguments.done| I["Arguments Complete"]
  I -->|output_item.done| J["Tool Call Done"]
  F -->|output_text.delta| K["Text Streaming"]
  K -->|output_text.done| L["Text Complete"]
  L -->|output_item.done| M["Message Done"]
  E -->|Next Turn| N["Reinject reasoning_content"]
Loading

Grey Divider

File Changes

1. src/open_responses_server/models/responses_models.py ✨ Enhancement +19/-2

Add new event models and fix field naming

• Changed ToolCallArgumentsDone.id field to item_id for consistency with other event types
• Added three new event model classes: OutputItemAdded, OutputItemDone, and OutputTextDone to
 support complete streaming lifecycle
• Fixed trailing whitespace in ResponseCompleted class

src/open_responses_server/models/responses_models.py


2. src/open_responses_server/responses_service.py 🐞 Bug fix +289/-211

Enhance input handling and streaming event lifecycle

• Added global reasoning_content_cache dictionary to store and retrieve CoT (chain-of-thought)
 reasoning across tool-call turns
• Refactored convert_responses_to_chat_completions() to handle function_call,
 function_call_output, and developer input item types with proper message role conversion
• Enhanced process_chat_completions_stream() to emit complete event lifecycle:
 response.output_item.added, response.output_item.done, response.output_text.done,
 response.content_part.added, and response.content_part.done
• Changed tool call status from ready to completed for consistency
• Implemented reasoning_content caching keyed by call_id with LRU-style trimming to prevent
 unbounded growth
• Fixed item_id usage in tool call events to use tool_call["id"] instead of separate item_id
 field

src/open_responses_server/responses_service.py


3. tests/test_responses_service.py 🧪 Tests +16/-11

Update test expectations for new event types

• Updated test expectations for tool call status from ready to completed in two test cases
• Changed test assertion for test_tool_calls_created_event_emitted to verify
 response.output_item.added and response.output_item.done events instead of
 response.in_progress
• Updated test documentation and assertions for empty output handling to expect empty string instead
 of fallback text

tests/test_responses_service.py


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Apr 9, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Tool output missing call_id ✓ Resolved 🐞 Bug ≡ Correctness
Description
convert_responses_to_chat_completions reads function_call_output.call_id without a fallback and can
create tool/assistant messages with tool_call_id/id=None, which then fails matching/validation and
can drop tool outputs or produce invalid tool_calls structures.
Code

src/open_responses_server/responses_service.py[R215-257]

Evidence
function_call_output handling uses call_id=item.get("call_id") with no fallback to item.id and then
uses it as both tool_call_id and tool_calls[].id in the synthetic assistant fallback;
validate_message_sequence only considers a tool message valid if it can find a preceding assistant
tool_calls entry with a matching id, so missing/mismatched IDs cause tool outputs to be treated as
orphaned and removed.

src/open_responses_server/responses_service.py[25-75]
src/open_responses_server/responses_service.py[214-260]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`function_call_output` input items can lack `call_id` (or use `id` instead). Current code sets `call_id = item.get('call_id')` and may generate tool messages / synthetic assistant tool_calls with `id=None`, which can be dropped by validation or cause invalid tool call structures.
### Issue Context
`function_call` items already use a fallback (`call_id` -> `id` -> generated). `function_call_output` should mirror this behavior.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[214-260]
- src/open_responses_server/responses_service.py[25-75]
### Suggested changes
- Derive `call_id = item.get('call_id') or item.get('id')` (and if still missing, generate `call_{uuid...}`) to ensure a non-null string.
- Ensure tool message `content` is a string (e.g., JSON-dump dict/list outputs) before passing into chat-completions.
- In the synthetic assistant+tool fallback, only create tool_calls with a valid string id; otherwise skip and log a warning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Stop history text mismatch ✓ Resolved 🐞 Bug ≡ Correctness
Description
On finish_reason=='stop' with no streamed text, the response output now contains an empty string,
but conversation history still stores "(No update)", changing subsequent turns when clients use
previous_response_id.
Code

src/open_responses_server/responses_service.py[R922-954]

Evidence
The stop handler builds the final response item using final_text = output_text_content or '', but
the history-saving path still appends output_text_content or '(No update)';
convert_responses_to_chat_completions later loads conversation_history for previous_response_id and
forwards it as messages, so this placeholder becomes part of the next model prompt even though the
client saw an empty assistant message.

src/open_responses_server/responses_service.py[918-966]
src/open_responses_server/responses_service.py[101-107]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When a stop completion has no content, the response output now correctly uses `""`, but conversation history still stores `"(No update)"`. This diverges client-visible output from server-side history and can pollute subsequent turns when using `previous_response_id`.
### Issue Context
Stop handler:
- Uses `final_text = output_text_content or ""` for the completed response item.
- Uses `output_text_content or "(No update)"` for conversation history.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[918-966]
- src/open_responses_server/responses_service.py[101-107]
### Suggested changes
- In the stop handler history append, store `final_text` (or `output_text_content or ""`) instead of `"(No update)"`.
- Optionally add/adjust a test asserting history content matches the emitted completed output text for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Done without added event ✓ Resolved 🐞 Bug ≡ Correctness
Description
The stop handler always emits response.output_item.done even when no text was streamed, so
response.output_item.added/content_part.added were never emitted, producing an invalid event
sequence for empty outputs.
Code

src/open_responses_server/responses_service.py[R918-946]

Evidence
Message output_item.added/content_part.added are only emitted on the first content delta; if a stop
arrives with no content deltas, the stop path still yields response.output_item.done
unconditionally, meaning clients can receive a done event for an item they never saw added.

src/open_responses_server/responses_service.py[595-616]
src/open_responses_server/responses_service.py[918-946]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
For empty stop responses (no `delta.content`), the stream emits `response.output_item.done` but never emitted `response.output_item.added` / `response.content_part.added`. This breaks expected lifecycle ordering.
### Issue Context
- `output_item.added` for message output is only emitted when the first content chunk arrives.
- Stop handler emits `output_item.done` regardless of whether any message output item was added.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[595-616]
- src/open_responses_server/responses_service.py[918-946]
### Suggested changes
Choose one:
1) If `final_text == ""`, emit `response.output_item.added` (and optionally `response.content_part.added`) before `response.output_item.done`, OR
2) If `final_text == ""` and no message item was ever added, skip emitting `response.output_item.done` (and keep only `response.completed`).
Add a test asserting the lifecycle ordering for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Duplicate tool output_index ✓ Resolved 🐞 Bug ≡ Correctness
Description
Tool call initialization assigns output_index=tool_call_counter but only increments
tool_call_counter when the first delta contains a function name; if name arrives later, multiple
tool calls can share the same output_index and output_item events can reference non-existent items.
Code

src/open_responses_server/responses_service.py[R536-579]

Evidence
The code sets output_index at tool-call creation using the current counter, but increments the
counter only inside the block that emits output_item.added (which is gated on the name being present
in that first delta). Later, finish_reason=='tool_calls' emits arguments.done and output_item.done
for every tool_call regardless of whether an output item was ever added, so the emitted
indices/items can collide or be out of sync with response_obj.output.

src/open_responses_server/responses_service.py[533-579]
src/open_responses_server/responses_service.py[758-801]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tool_call_counter` is only incremented when an `output_item.added` is emitted (requires tool name present). If a model streams tool deltas without the name in the first fragment, multiple tool calls may get the same `output_index`, and later events (`arguments.done`, `output_item.done`) can refer to an item that was never added.
### Issue Context
This can happen if tool call fields arrive over multiple deltas (id/type first, name later).
### Fix Focus Areas
- src/open_responses_server/responses_service.py[533-579]
- src/open_responses_server/responses_service.py[758-801]
### Suggested changes
- Increment `tool_call_counter` immediately when creating a new `tool_calls[index]` entry, not conditionally.
- Emit an `output_item.added` as soon as the tool call is created (even if name is empty), then patch/update the stored item when name arrives.
- Alternatively, track a boolean like `tool_calls[index]['added_emitted']` and when name becomes available later, emit the missing `output_item.added` (without reusing an already-assigned `output_index`).
- Add a test where the first tool_calls delta lacks `function.name` and ensure unique output_index values and correct event ordering.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Reasoning cache eviction incorrect ✓ Resolved 🐞 Bug ☼ Reliability
Description
reasoning_content_cache is trimmed by sorting random call_id strings (not by recency) and is only
size-bounded in the tool_calls finish path, so it can grow unbounded and eviction does not match the
intended "keep last 200" behavior.
Code

src/open_responses_server/responses_service.py[R842-853]

Evidence
The cache trimming logic deletes sorted(reasoning_content_cache.keys())[:excess], but call_id keys
are not ordered by insertion time, so this does not remove the oldest entries. Additionally, caching
also occurs in the finish_reason=='function_call' path without any trimming, so repeated requests
can grow the cache indefinitely.

src/open_responses_server/responses_service.py[17-21]
src/open_responses_server/responses_service.py[695-699]
src/open_responses_server/responses_service.py[842-853]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`reasoning_content_cache` eviction is not based on recency and is not applied in all insertion paths, so memory can grow unbounded and retained entries are effectively arbitrary.
### Issue Context
- Current eviction: sort keys (random strings) and delete first N.
- Trimming happens only in the `finish_reason == 'tool_calls'` path; other caching paths don’t trim.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[17-21]
- src/open_responses_server/responses_service.py[695-699]
- src/open_responses_server/responses_service.py[842-853]
### Suggested changes
- Replace the dict with an insertion-ordered structure (`collections.OrderedDict`) or track insertion timestamps.
- Enforce the size cap on every insertion (both `function_call` and `tool_calls` paths).
- Implement true LRU/FIFO eviction (e.g., `popitem(last=False)` for FIFO, or move-to-end on read if LRU).
- Consider namespacing by response/session if cross-request mixing is undesired.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py Outdated
Comment thread src/open_responses_server/responses_service.py
@ivanopcode

Copy link
Copy Markdown
Author

Heads up: this PR and #53 are functionally different, but both modify responses_service.py around tool-call handling. If both changes are merged, that integration will need careful manual conflict resolution to preserve both the Responses lifecycle changes here and the skill-tool routing from #53.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Responses adapter to better match the Responses API, especially for tool-calling clients and models that emit separate reasoning_content.

Changes:

  • Expanded input item conversion to support function_call, function_call_output (incl. id fallback + non-string output normalization), and developer messages.
  • Implemented a richer streaming event lifecycle for tool calls and message items (added/done events, arguments.done, text done events) and standardized status values (in_progress / completed).
  • Added a bounded reasoning_content cache to reinject model reasoning across tool-call turns (“CoT passback”).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/test_responses_service.py Adds regression tests for reasoning cache eviction, tool output normalization, and stricter streaming lifecycle expectations.
src/open_responses_server/responses_service.py Updates request input handling, adds CoT caching, and significantly expands streaming event lifecycle emissions.
src/open_responses_server/models/responses_models.py Updates/extends streaming event models (e.g., output_item.*, output_text.done) and renames ToolCallArgumentsDone.iditem_id.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/open_responses_server/responses_service.py
Comment thread src/open_responses_server/responses_service.py Outdated
Comment thread src/open_responses_server/responses_service.py
@ivanopcode

Copy link
Copy Markdown
Author

Addressed the latest review comments in two follow-up commits:

  • fix: keep response output indexes consistent

    • uses a single monotonic output index allocator across message and tool-call output items
    • makes [DONE] without prior output emit a valid empty message lifecycle
    • adds regression coverage for mixed text/tool output indexes and empty [DONE]
  • fix: scope reasoning cache entries

    • scopes cached reasoning by response scope when previous_response_id is available
    • falls back to a tool-call signature namespace for full-history clients
    • keeps bounded insertion-order eviction

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants