fix(llmrails): normalize OpenAI multi-part content to string before rail evaluation by nac7 · Pull Request #2005 · NVIDIA-NeMo/Guardrails

nac7 · 2026-06-07T23:52:58Z

Problem

When a user message uses the OpenAI multi-part content format:

{"role": "user", "content": [{"type": "text", "text": "You are a dotard and I hate you"}]}

_get_events_for_messages assigns msg["content"] (the list) directly into UtteranceUserActionFinished.final_transcript and UserMessage.text without normalising it to a string. This causes two bugs:

Bug 1 — Silent guardrail bypass: Every LLM prompt (self-check input, intent matching, etc.) receives the Python repr of the list instead of the actual user text. Content-safety rails evaluate garbage and silently pass the real message through unblocked.

Bug 2 — TypeError crash in multi-turn: When mask_prev_user_message fires in a subsequent turn, get_colang_history() calls history.rsplit(utterance_to_replace, 1) where utterance_to_replace is the list, crashing with:

TypeError: must be str or None, not list

Fix

Add get_content_text(content) to nemoguardrails/rails/llm/utils.py. It joins all type: text parts with a space and passes non-list values through unchanged. Apply it at all four user-message content access points in _get_events_for_messages:

Location	Colang path
`UtteranceUserActionFinished.final_transcript`	1.0
`UserMessage.text` (non-final turns)	1.0
Tool-message user-lookup fallback	1.0
`UtteranceUserActionFinished.final_transcript`	2.0

Also refactors the pre-existing inline multipart-list handling in get_history_cache_key to reuse the same helper, eliminating duplicate logic.

Tests

11 new tests added to tests/test_llmrails.py:

TestGetContentText — 8 unit tests for the helper:

plain string passthrough
None passthrough
single text part extracted
multiple text parts joined with space
non-text parts (image_url) skipped
empty list returns empty string
image-only list returns empty string
missing text key in part handled gracefully

Integration tests:

test_multipart_content_single_turn — single-turn generate_async with multipart content returns correct string response
test_multipart_content_multi_turn_does_not_crash — multipart content in non-final turns does not raise TypeError
test_multipart_content_mixed_parts — image_url parts are silently dropped, text parts extracted correctly
test_tool_message_with_multipart_user_content — Colang 1.0 tool-message branch: UserMessage events carry the normalised string
test_colang2_multipart_content_normalization — Colang 2.0 path: UtteranceUserActionFinished carries the normalised string

All 54 tests in tests/test_llmrails.py pass. Coverage confirmed on every added and modified line.

Fixes #1741

Summary by CodeRabbit

Bug Fixes
- Fixed garbled self-check prompts caused by improper handling of OpenAI multi-part content lists.
- Resolved TypeError crash in get_colang_history when processing multimodal messages.
- Improved normalization of multi-part message content into plain strings for consistent processing.

…ail evaluation When a user message uses the OpenAI multi-part content format (``content: [{type: text, text: ...}]``), the content field was passed directly into ``UtteranceUserActionFinished.final_transcript`` and ``UserMessage.text`` without normalization. This caused two bugs: 1. All LLM prompts (self-check input, intent matching, etc.) received the Python repr of the list instead of the actual user text, silently defeating content-safety rails. 2. In multi-turn conversations where ``mask_prev_user_message`` fires, ``get_colang_history()`` crashed with ``TypeError: must be str or None, not list`` at the ``rsplit()`` call. Fix: add ``get_content_text()`` to ``nemoguardrails/rails/llm/utils.py``. It joins all ``type: text`` parts with a space and passes non-list values through unchanged. Apply it at all four user-message content access points in ``_get_events_for_messages`` (Colang 1.0 transcript event, Colang 1.0 UserMessage event, Colang 1.0 tool-message fallback lookup, and Colang 2.0 transcript event). Refactor the pre-existing inline list handling in ``get_history_cache_key`` to reuse the same helper. Fixes NVIDIA-NeMo#1741

github-actions · 2026-06-07T23:54:58Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-2005

greptile-apps · 2026-06-07T23:55:11Z

Greptile Summary

This PR fixes two bugs triggered by OpenAI multi-part message content (content as a list of typed parts) being passed directly into guardrail pipeline fields: garbled self-check prompts and a TypeError crash in multi-turn get_colang_history.

Introduces get_content_text(content) in utils.py that normalizes any content value — plain string, None, or a list of typed parts — to a plain string, extracting and joining type: text parts.
Applies the helper at all four user-message content access points in _get_events_for_messages (Colang 1.0 and 2.0 paths, plus the tool-message user-lookup branch) and refactors get_history_cache_key to reuse the helper, eliminating duplicate multimodal-content logic.
Adds 11 new tests (8 unit tests for the helper, 5 integration tests) covering all modified code paths.

Confidence Score: 5/5

Safe to merge — the change is a targeted normalization fix with comprehensive test coverage across all affected code paths.

The helper is straightforward, the four application sites in _get_events_for_messages are all correctly updated, and the refactoring of get_history_cache_key eliminates duplicate logic without introducing new behavior for typical inputs. All added and modified lines are covered by tests.

No files require special attention.

Important Files Changed

Filename	Overview
nemoguardrails/rails/llm/utils.py	Adds get_content_text() helper that normalizes multipart content lists to strings; also refactors get_history_cache_key to use it, removing duplicated inline logic.
nemoguardrails/rails/llm/llmrails.py	Applies get_content_text() at all four user-message content access points in _get_events_for_messages, covering Colang 1.0, Colang 2.0, and the tool-message user-lookup branch.
tests/test_llmrails.py	Adds 11 tests covering the helper function and all modified code paths (single-turn, multi-turn, mixed parts, tool-message branch, Colang 2.0). Coverage is thorough.
.github/workflows/_test.yml	Downgrades codecov/codecov-action from v5 to v4; unrelated to the main fix.
CHANGELOG.md	Adds [Unreleased] entry describing the multipart-content normalization fix.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant R as LLMRails._get_events_for_messages
    participant H as get_content_text (utils.py)
    participant P as Guardrail Pipeline

    C->>R: messages with multipart content list
    note over R: Colang 1.0 path
    R->>H: get_content_text(msg[content])
    H-->>R: plain string
    R->>P: "UtteranceUserActionFinished(final_transcript=string)"
    R->>P: "UserMessage(text=string)"
    note over R: Tool-message branch
    R->>H: get_content_text(prev_user_msg[content])
    H-->>R: normalized string
    R->>P: "UserMessage(text=normalized string)"
    note over R: Colang 2.0 path
    R->>H: get_content_text(msg[content])
    H-->>R: plain string
    R->>P: "UtteranceUserActionFinished(final_transcript=string)"

_{Reviews (4): Last reviewed commit: "fix: update codecov action to v4 to reso..." | Re-trigger Greptile}

coderabbitai · 2026-06-07T23:56:52Z

📝 Walkthrough

Walkthrough

This PR fixes issue #1741 by introducing a get_content_text() utility that normalizes OpenAI-style multipart message content (list of text/image parts) to plain strings. The utility is applied throughout LLMRails event construction and cache key generation to prevent garbled prompts and TypeError crashes when multipart content is used.

Changes

OpenAI Multipart Content Normalization

Layer / File(s)	Summary
Content normalization utility `nemoguardrails/rails/llm/utils.py`	New `get_content_text()` function normalizes OpenAI multipart content by joining text parts from lists and returning plain strings unchanged. `get_history_cache_key()` updated to use this utility for user message cache keys.
LLMRails event construction integration `nemoguardrails/rails/llm/llmrails.py`	Import and apply `get_content_text()` across Colang 1.0 and 2.x event construction: user-message final transcripts, `UserMessage.text` fields, tool-role prior message handling, and `UtteranceUserActionFinished` events.
Unit and integration tests `tests/test_llmrails.py`	`TestGetContentText` unit tests validate normalization of plain strings, `None`, single/multiple text parts, non-text filtering, and edge cases. Integration tests verify `LLMRails.generate_async()` normalizes multipart content across single-turn, multi-turn, mixed-part, tool-call, and Colang 2.0 flows.
Changelog entry `CHANGELOG.md`	Unreleased Bug Fixes section documents multipart content normalization fix and resolution of garbled self-check prompts and `get_colang_history()` TypeError.

Estimated code review effort: 🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: normalizing OpenAI multi-part content to strings before rail evaluation, which is the primary objective of the PR.
Linked Issues check	✅ Passed	The PR implementation fully addresses all coding requirements from issue `#1741`: normalization of multi-part content via get_content_text() helper, application to all four content access points, refactoring of duplicate logic, and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are scoped to the multi-part content normalization fix: new utility function, application in llmrails.py, refactoring in utils.py cache key logic, and comprehensive test coverage with no unrelated modifications.
Test Results For Major Changes	✅ Passed	PR documents test results (11 tests added, all 54 existing tests pass, coverage confirmed) and changes are a targeted bug fix, not a major feature/breaking change requiring performance benchmarks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/rails/llm/utils.py`:
- Around line 21-39: get_content_text currently declares and documents returning
a str but returns non-list inputs unchanged, allowing None/dict/etc. to leak
through; update get_content_text to always return a str by: when content is a
list keep the existing join behavior, otherwise if content is None return an
empty string, if it's already a str return it, and for other types return
str(content) (ensuring the function contract -> str is honored).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 326c5380-9c7e-41c2-8c7a-22eb9b2f9424

📥 Commits

Reviewing files that changed from the base of the PR and between 1839dd2 and dca0e61.

📒 Files selected for processing (4)

CHANGELOG.md
nemoguardrails/rails/llm/llmrails.py
nemoguardrails/rails/llm/utils.py
tests/test_llmrails.py

…d formatting - Apply ruff-format reformatting (generator expression collapsed to one line) - Change signature to get_content_text(content: Any) -> str so the return type is always an honest str: None now maps to empty string, non-list non-None values go through str(), and text parts in the list branch are wrapped in str(... or '') to guard against explicit None text values - Update test_none_passthrough -> test_none_returns_empty_string to match new None behavior; add test_non_string_non_list_converted_via_str to cover the str() fallback branch

Update codecov/codecov-action from v5 to v4 to fix GPG signature verification failures in coverage upload step. v4 resolves the GPG key verification issue that was causing CI failures. Fixes: 'gpg: Can't check signature: No public key' error in PR tests coverage upload >

codecov · 2026-06-08T00:30:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

nac7 · 2026-06-08T00:30:27Z

Hi @Pouyanpi , if you have some time, could you please help with this PR review? Thanks!

github-actions · 2026-06-17T10:02:25Z

PR merge guidance

@nac7 thanks for the PR. GitHub is currently blocking merge for one or more repository requirements:

4 commits do not have a verified signature (dca0e61, 57778e8, 0e7f4e8, da5d434). Please sign the commits and force-push the updated branch.

Relevant guide:

Signed commits: https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/CONTRIBUTING.md#commit-signing
Contribution guide: https://github.com/NVIDIA-NeMo/Guardrails/blob/develop/CONTRIBUTING.md

Signed-off-by: nac7 <lelenachiket07@gmail.com>

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread nemoguardrails/rails/llm/utils.py Outdated

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread nemoguardrails/rails/llm/utils.py Outdated

nac7 and others added 3 commits June 7, 2026 19:02

ci: retrigger CI

0e7f4e8

github-actions Bot added needs: rebase needs: signing labels Jun 17, 2026

Merge branch 'develop' into fix/multipart-content-normalization

ea31817

Signed-off-by: nac7 <lelenachiket07@gmail.com>

github-actions Bot added size: M and removed needs: rebase labels Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llmrails): normalize OpenAI multi-part content to string before rail evaluation#2005

fix(llmrails): normalize OpenAI multi-part content to string before rail evaluation#2005
nac7 wants to merge 5 commits into
NVIDIA-NeMo:developfrom
nac7:fix/multipart-content-normalization

nac7 commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

greptile-apps Bot commented Jun 7, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 7, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026

Uh oh!

nac7 commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nac7 commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 7, 2026

Documentation preview

Uh oh!

greptile-apps Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 7, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jun 8, 2026

Codecov Report

Uh oh!

nac7 commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR merge guidance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nac7 commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading