fix: improve Responses API streaming event lifecycle and input handling by ivanopcode · Pull Request #63 · teabranch/open-responses-server

ivanopcode · 2026-04-09T21:38:14Z

Problem

ORS did not fully support some Responses API request and event patterns used
by tool-calling clients and open-weight reasoning models such as gpt-oss.

In practice this caused three classes of problems:

Input history was reconstructed incompletely.
Clients send prior tool calls, tool results, and developer messages as
input items on each turn. ORS only handled a subset of these items, so
parts of the conversation history were dropped before reaching the backend.
The streamed Responses event lifecycle was incomplete.
Several expected events and state transitions were missing or inconsistent,
especially around tool calls and text output items.
reasoning_content was not preserved across tool-call turns.
For models that emit reasoning separately from the final answer, losing that
context degraded multi-step tool use. This is sometimes referred to in the
community as "CoT passback".

These issues were reproduced with Codex CLI, but the fixes bring ORS closer
to the Responses API model more generally.

Changes

This MR updates the Responses adapter to:

convert function_call, function_call_output, and developer input items
into the corresponding chat-completions message structure
emit a more complete Responses streaming lifecycle for tool calls and text
output, including response.output_item.added,
response.function_call_arguments.done,
response.output_text.done, and response.output_item.done
use consistent item_id values across streamed tool-call events
replace the non-spec ready status with in_progress / completed
cache and reinject reasoning_content across tool-call turns when the model
provides it

Testing

Tested with:

uv run pytest tests/test_responses_service.py

Also verified manually with Codex CLI against local llama.cpp-backed models,
including multi-turn tool-calling flows where prior tool calls and reasoning
need to survive across turns.

qodo-code-review · 2026-04-09T21:38:31Z

Review Summary by Qodo

Improve Responses API streaming lifecycle and input handling

🐞 Bug fix ✨ Enhancement

Walkthroughs

Description

• Improved input handling for function_call, function_call_output, and developer message types
  in Responses API requests
• Enhanced streaming event lifecycle with response.output_item.added, response.output_item.done,
  response.output_text.done, and response.content_part.done events
• Fixed item_id consistency across streamed tool-call events (changed from id to item_id in
  ToolCallArgumentsDone)
• Replaced non-spec ready status with in_progress/completed for tool calls
• Added reasoning_content caching and reinject mechanism across tool-call turns for CoT
  preservation

Diagram

flowchart LR
  A["Input Processing"] -->|function_call| B["Convert to Assistant Message"]
  A -->|function_call_output| C["Convert to Tool Message"]
  A -->|developer| D["Convert to System Message"]
  B --> E["Cache reasoning_content"]
  C --> E
  E --> F["Stream Events"]
  F -->|output_item.added| G["Tool Call Created"]
  F -->|function_call_arguments.delta| H["Arguments Streaming"]
  H -->|function_call_arguments.done| I["Arguments Complete"]
  I -->|output_item.done| J["Tool Call Done"]
  F -->|output_text.delta| K["Text Streaming"]
  K -->|output_text.done| L["Text Complete"]
  L -->|output_item.done| M["Message Done"]
  E -->|Next Turn| N["Reinject reasoning_content"]

File Changes

1. src/open_responses_server/models/responses_models.py ✨ Enhancement +19/-2

Add new event models and fix field naming

• Changed ToolCallArgumentsDone.id field to item_id for consistency with other event types
• Added three new event model classes: OutputItemAdded, OutputItemDone, and OutputTextDone to
 support complete streaming lifecycle
• Fixed trailing whitespace in ResponseCompleted class

src/open_responses_server/models/responses_models.py

2. src/open_responses_server/responses_service.py 🐞 Bug fix +289/-211

Enhance input handling and streaming event lifecycle

• Added global reasoning_content_cache dictionary to store and retrieve CoT (chain-of-thought)
 reasoning across tool-call turns
• Refactored convert_responses_to_chat_completions() to handle function_call,
 function_call_output, and developer input item types with proper message role conversion
• Enhanced process_chat_completions_stream() to emit complete event lifecycle:
 response.output_item.added, response.output_item.done, response.output_text.done,
 response.content_part.added, and response.content_part.done
• Changed tool call status from ready to completed for consistency
• Implemented reasoning_content caching keyed by call_id with LRU-style trimming to prevent
 unbounded growth
• Fixed item_id usage in tool call events to use tool_call["id"] instead of separate item_id
 field

src/open_responses_server/responses_service.py

3. tests/test_responses_service.py 🧪 Tests +16/-11

Update test expectations for new event types

• Updated test expectations for tool call status from ready to completed in two test cases
• Changed test assertion for test_tool_calls_created_event_emitted to verify
 response.output_item.added and response.output_item.done events instead of
 response.in_progress
• Updated test documentation and assertions for empty output handling to expect empty string instead
 of fallback text

tests/test_responses_service.py

qodo-code-review · 2026-04-09T21:38:32Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Tool output missing call_id~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

convert_responses_to_chat_completions reads function_call_output.call_id without a fallback and can
create tool/assistant messages with tool_call_id/id=None, which then fails matching/validation and
can drop tool outputs or produce invalid tool_calls structures.

Code

src/open_responses_server/responses_service.py[R215-257]

Evidence

function_call_output handling uses call_id=item.get("call_id") with no fallback to item.id and then
uses it as both tool_call_id and tool_calls[].id in the synthetic assistant fallback;
validate_message_sequence only considers a tool message valid if it can find a preceding assistant
tool_calls entry with a matching id, so missing/mismatched IDs cause tool outputs to be treated as
orphaned and removed.

src/open_responses_server/responses_service.py[25-75]
src/open_responses_server/responses_service.py[214-260]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`function_call_output` input items can lack `call_id` (or use `id` instead). Current code sets `call_id = item.get('call_id')` and may generate tool messages / synthetic assistant tool_calls with `id=None`, which can be dropped by validation or cause invalid tool call structures.
### Issue Context
`function_call` items already use a fallback (`call_id` -> `id` -> generated). `function_call_output` should mirror this behavior.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[214-260]
- src/open_responses_server/responses_service.py[25-75]
### Suggested changes
- Derive `call_id = item.get('call_id') or item.get('id')` (and if still missing, generate `call_{uuid...}`) to ensure a non-null string.
- Ensure tool message `content` is a string (e.g., JSON-dump dict/list outputs) before passing into chat-completions.
- In the synthetic assistant+tool fallback, only create tool_calls with a valid string id; otherwise skip and log a warning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Stop history text mismatch~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

On finish_reason=='stop' with no streamed text, the response output now contains an empty string,
but conversation history still stores "(No update)", changing subsequent turns when clients use
previous_response_id.

Code

src/open_responses_server/responses_service.py[R922-954]

Evidence

The stop handler builds the final response item using final_text = output_text_content or '', but
the history-saving path still appends output_text_content or '(No update)';
convert_responses_to_chat_completions later loads conversation_history for previous_response_id and
forwards it as messages, so this placeholder becomes part of the next model prompt even though the
client saw an empty assistant message.

src/open_responses_server/responses_service.py[918-966]
src/open_responses_server/responses_service.py[101-107]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When a stop completion has no content, the response output now correctly uses `""`, but conversation history still stores `"(No update)"`. This diverges client-visible output from server-side history and can pollute subsequent turns when using `previous_response_id`.
### Issue Context
Stop handler:
- Uses `final_text = output_text_content or ""` for the completed response item.
- Uses `output_text_content or "(No update)"` for conversation history.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[918-966]
- src/open_responses_server/responses_service.py[101-107]
### Suggested changes
- In the stop handler history append, store `final_text` (or `output_text_content or ""`) instead of `"(No update)"`.
- Optionally add/adjust a test asserting history content matches the emitted completed output text for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Done without added event~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

The stop handler always emits response.output_item.done even when no text was streamed, so
response.output_item.added/content_part.added were never emitted, producing an invalid event
sequence for empty outputs.

Code

src/open_responses_server/responses_service.py[R918-946]

Evidence

Message output_item.added/content_part.added are only emitted on the first content delta; if a stop
arrives with no content deltas, the stop path still yields response.output_item.done
unconditionally, meaning clients can receive a done event for an item they never saw added.

src/open_responses_server/responses_service.py[595-616]
src/open_responses_server/responses_service.py[918-946]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
For empty stop responses (no `delta.content`), the stream emits `response.output_item.done` but never emitted `response.output_item.added` / `response.content_part.added`. This breaks expected lifecycle ordering.
### Issue Context
- `output_item.added` for message output is only emitted when the first content chunk arrives.
- Stop handler emits `output_item.done` regardless of whether any message output item was added.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[595-616]
- src/open_responses_server/responses_service.py[918-946]
### Suggested changes
Choose one:
1) If `final_text == ""`, emit `response.output_item.added` (and optionally `response.content_part.added`) before `response.output_item.done`, OR
2) If `final_text == ""` and no message item was ever added, skip emitting `response.output_item.done` (and keep only `response.completed`).
Add a test asserting the lifecycle ordering for the empty-stop case.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (1)

4. ~~Duplicate tool output_index~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

Tool call initialization assigns output_index=tool_call_counter but only increments
tool_call_counter when the first delta contains a function name; if name arrives later, multiple
tool calls can share the same output_index and output_item events can reference non-existent items.

Code

src/open_responses_server/responses_service.py[R536-579]

Evidence

The code sets output_index at tool-call creation using the current counter, but increments the
counter only inside the block that emits output_item.added (which is gated on the name being present
in that first delta). Later, finish_reason=='tool_calls' emits arguments.done and output_item.done
for every tool_call regardless of whether an output item was ever added, so the emitted
indices/items can collide or be out of sync with response_obj.output.

src/open_responses_server/responses_service.py[533-579]
src/open_responses_server/responses_service.py[758-801]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tool_call_counter` is only incremented when an `output_item.added` is emitted (requires tool name present). If a model streams tool deltas without the name in the first fragment, multiple tool calls may get the same `output_index`, and later events (`arguments.done`, `output_item.done`) can refer to an item that was never added.
### Issue Context
This can happen if tool call fields arrive over multiple deltas (id/type first, name later).
### Fix Focus Areas
- src/open_responses_server/responses_service.py[533-579]
- src/open_responses_server/responses_service.py[758-801]
### Suggested changes
- Increment `tool_call_counter` immediately when creating a new `tool_calls[index]` entry, not conditionally.
- Emit an `output_item.added` as soon as the tool call is created (even if name is empty), then patch/update the stored item when name arrives.
- Alternatively, track a boolean like `tool_calls[index]['added_emitted']` and when name becomes available later, emit the missing `output_item.added` (without reusing an already-assigned `output_index`).
- Add a test where the first tool_calls delta lacks `function.name` and ensure unique output_index values and correct event ordering.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. ~~Reasoning cache eviction incorrect~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

reasoning_content_cache is trimmed by sorting random call_id strings (not by recency) and is only
size-bounded in the tool_calls finish path, so it can grow unbounded and eviction does not match the
intended "keep last 200" behavior.

Code

src/open_responses_server/responses_service.py[R842-853]

Evidence
The cache trimming logic deletes sorted(reasoning_content_cache.keys())[:excess], but call_id keys
are not ordered by insertion time, so this does not remove the oldest entries. Additionally, caching
also occurs in the finish_reason=='function_call' path without any trimming, so repeated requests
can grow the cache indefinitely.
src/open_responses_server/responses_service.py[17-21]
src/open_responses_server/responses_service.py[695-699]
src/open_responses_server/responses_service.py[842-853]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`reasoning_content_cache` eviction is not based on recency and is not applied in all insertion paths, so memory can grow unbounded and retained entries are effectively arbitrary.
### Issue Context
- Current eviction: sort keys (random strings) and delete first N.
- Trimming happens only in the `finish_reason == 'tool_calls'` path; other caching paths don’t trim.
### Fix Focus Areas
- src/open_responses_server/responses_service.py[17-21]
- src/open_responses_server/responses_service.py[695-699]
- src/open_responses_server/responses_service.py[842-853]
### Suggested changes
- Replace the dict with an insertion-ordered structure (`collections.OrderedDict`) or track insertion timestamps.
- Enforce the size cap on every insertion (both `function_call` and `tool_calls` paths).
- Implement true LRU/FIFO eviction (e.g., `popitem(last=False)` for FIFO, or move-to-end on read if LRU).
- Consider namespacing by response/session if cross-request mixing is undesired.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ivanopcode · 2026-04-10T10:37:18Z

Heads up: this PR and #53 are functionally different, but both modify responses_service.py around tool-call handling. If both changes are merged, that integration will need careful manual conflict resolution to preserve both the Responses lifecycle changes here and the skill-tool routing from #53.

Copilot

Pull request overview

This PR improves the Responses adapter to better match the Responses API, especially for tool-calling clients and models that emit separate reasoning_content.

Changes:

Expanded input item conversion to support function_call, function_call_output (incl. id fallback + non-string output normalization), and developer messages.
Implemented a richer streaming event lifecycle for tool calls and message items (added/done events, arguments.done, text done events) and standardized status values (in_progress / completed).
Added a bounded reasoning_content cache to reinject model reasoning across tool-call turns (“CoT passback”).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`tests/test_responses_service.py`	Adds regression tests for reasoning cache eviction, tool output normalization, and stricter streaming lifecycle expectations.
`src/open_responses_server/responses_service.py`	Updates request input handling, adds CoT caching, and significantly expands streaming event lifecycle emissions.
`src/open_responses_server/models/responses_models.py`	Updates/extends streaming event models (e.g., `output_item.*`, `output_text.done`) and renames `ToolCallArgumentsDone.id` → `item_id`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ivanopcode · 2026-05-27T09:15:11Z

Addressed the latest review comments in two follow-up commits:

fix: keep response output indexes consistent
- uses a single monotonic output index allocator across message and tool-call output items
- makes [DONE] without prior output emit a valid empty message lifecycle
- adds regression coverage for mixed text/tool output indexes and empty [DONE]
fix: scope reasoning cache entries
- scopes cached reasoning by response scope when previous_response_id is available
- falls back to a tool-call signature namespace for full-history clients
- keeps bounded insertion-order eviction

ivanopcode added 2 commits April 10, 2026 01:37

fix: improve Responses API input handling and streaming lifecycle

c6e035e

fix: preserve reasoning_content across tool call turns

c9f9074

qodo-code-review Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/open_responses_server/responses_service.py

Comment thread src/open_responses_server/responses_service.py

Comment thread src/open_responses_server/responses_service.py Outdated

Comment thread src/open_responses_server/responses_service.py

ivanopcode added 5 commits April 10, 2026 01:54

fix: normalize function_call_output ids and content

2734922

fix: keep stop history consistent with empty outputs

20ce4f1

fix: emit added events for empty stop responses

1537797

fix: keep tool call output indexes stable across deltas

a71fdb8

fix: bound reasoning cache by insertion order

a53b26b

OriNachum requested a review from Copilot April 17, 2026 13:25

Copilot started reviewing on behalf of OriNachum April 17, 2026 13:26 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread src/open_responses_server/responses_service.py

Comment thread src/open_responses_server/responses_service.py Outdated

Comment thread src/open_responses_server/responses_service.py

ivanopcode added 2 commits May 27, 2026 12:55

fix: keep response output indexes consistent

7f4863c

fix: scope reasoning cache entries

44505f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve Responses API streaming event lifecycle and input handling#63

fix: improve Responses API streaming event lifecycle and input handling#63
ivanopcode wants to merge 9 commits into
teabranch:mainfrom
ivanopcode:fix/responses-api-streaming-lifecycle

ivanopcode commented Apr 9, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Apr 9, 2026

Uh oh!

qodo-code-review Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivanopcode commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivanopcode commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ivanopcode commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Testing

Uh oh!

qodo-code-review Bot commented Apr 9, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivanopcode commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivanopcode commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ivanopcode commented Apr 9, 2026 •

edited

Loading

qodo-code-review Bot commented Apr 9, 2026 •

edited

Loading