Skip to content

API: surface third-party errors in /llm/call and /collections#906

Open
Ayush8923 wants to merge 14 commits into
mainfrom
feat/api-status-code
Open

API: surface third-party errors in /llm/call and /collections#906
Ayush8923 wants to merge 14 commits into
mainfrom
feat/api-status-code

Conversation

@Ayush8923
Copy link
Copy Markdown
Collaborator

@Ayush8923 Ayush8923 commented Jun 2, 2026

Issue: #874

Summary

In this PR, Improves error visibility for two endpoints that proxy third-party providers so clients(who using these api endpoints) can branch on what actually went wrong instead of parsing a generic message string.

1. /api/v1/collections

When OpenAI vector-store batch upload partially fails, the error string now includes per-file reasons pulled from the batch's failed-file list.

  • Before: OpenAI document processing error: 0/3 files completed
  • After: OpenAI document processing error: 0/3 files completed. Failed files: file-abc (Unsupported file type), file-xyz (File too large)

Capped at 10 files / 600 chars to keep callback payloads sane. Falls back to the old count-only message if the follow-up list_files lookup itself errors, so no regression on transient OpenAI failures.

2. /api/v1/llm/call

The failure callback's data field is now populated with a structured LLMCallErrorDetail payload instead of null. The top-level error string is unchanged, so existing clients keep working.

  • New model: conversation_id, provider, provider_status_code, error_type, message
  • conversation_id is echoed from the request, so clients chaining turns by
    thread_id no longer have their conversation break on every error
  • OpenAI errors are classified into rate_limit | authentication | timeout | invalid_request | provider_error, with provider_status_code from OpenAIError.status_code

Other providers (Anthropic, Google, Sarvam, ElevenLabs) still flow through the new path with conversation_id + provider populated but error_type="provider_error" and provider_status_code=null — same 5-line treatment can be added in a follow-up commit per provider.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Uses a shared native-provider suffix constant; enriches OpenAI vector-store upload errors with targeted exception mapping and failed-file summaries; refactors provider-level OpenAI exception handling into explicit category branches with matching tests.

Changes

OpenAI Error Handling Enhancement

Layer / File(s) Summary
Provider constant refactoring
backend/app/crud/model_config.py
validate_blob_model_or_raise now uses NATIVE_PROVIDER_SUFFIX instead of the hard-coded "-native" literal.
RAG vector store error enrichment
backend/app/crud/rag/open_ai.py, backend/app/tests/crud/rag/test_open_ai.py
OpenAIVectorStoreCrud.update wraps upload_and_poll with specific OpenAI exception handlers, and on partial completion calls list_files(filter='failed', limit=10) to collect failed-file messages (joined and truncated to 600 chars) appended to the raised InterruptedError. Tests cover success, partial failures, exception mapping, truncation/pagination behavior, and constructor validation.
OpenAI provider exception mapping
backend/app/services/llm/providers/oai.py, backend/app/tests/services/llm/providers/test_openai.py
OpenAIProvider.execute now handles concrete openai.* error subclasses (rate limit, auth, not found, bad request, unprocessable entity, internal server error, timeout) with category-prefixed messages and logging; generic OpenAIError and Exception fallbacks include raw exception text. Tests assert exact returned messages for each case.

Sequence Diagram(s)

sequenceDiagram
  participant Test
  participant OpenAIVectorStoreCrud
  participant OpenAI as OpenAI Client
  participant Storage as File Storage
  Test->>OpenAIVectorStoreCrud: update(docs)
  OpenAIVectorStoreCrud->>OpenAI: upload_and_poll(files)
  alt Upload Success
    OpenAI-->>OpenAIVectorStoreCrud: completed == total
    OpenAIVectorStoreCrud-->>Test: yield docs
  else Upload Partial Failure
    OpenAI-->>OpenAIVectorStoreCrud: completed < total
    OpenAIVectorStoreCrud->>Storage: list_files(filter='failed', limit=10)
    Storage-->>OpenAIVectorStoreCrud: failed file details with messages
    OpenAIVectorStoreCrud->>OpenAIVectorStoreCrud: enrich error summary (truncate to 600 chars)
    OpenAIVectorStoreCrud-->>Test: raise InterruptedError(enriched message)
  else OpenAI Exception
    OpenAI-->>OpenAIVectorStoreCrud: raise openai.*Error
    OpenAIVectorStoreCrud-->>Test: raise InterruptedError(category-prefixed message)
  end
Loading
sequenceDiagram
  participant Test
  participant OpenAIProvider
  participant OpenAI as OpenAI Client
  Test->>OpenAIProvider: execute(request)
  OpenAIProvider->>OpenAI: api_call()
  alt RateLimitError
    OpenAI-->>OpenAIProvider: RateLimitError
    OpenAIProvider-->>Test: None, "OpenAI API rate limit exceeded: ..."
  else AuthenticationError
    OpenAI-->>OpenAIProvider: AuthenticationError
    OpenAIProvider-->>Test: None, "OpenAI authentication failed: ..."
  else APIStatusError
    OpenAI-->>OpenAIProvider: APIStatusError(status_code=418)
    OpenAIProvider-->>Test: None, "OpenAI API status error (418): ..."
  else TimeoutError
    OpenAI-->>OpenAIProvider: APITimeoutError
    OpenAIProvider-->>Test: None, "OpenAI request timed out: ..."
  else ConnectionError
    OpenAI-->>OpenAIProvider: APIConnectionError
    OpenAIProvider-->>Test: None, "OpenAI connection error: ..."
  else Generic OpenAIError
    OpenAI-->>OpenAIProvider: OpenAIError
    OpenAIProvider-->>Test: None, "OpenAI error: ..."
  else Unexpected Exception
    OpenAI-->>OpenAIProvider: Exception
    OpenAIProvider-->>Test: None, "Unexpected error: ..."
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • AkhileshNegi
  • Prajna1999

Poem

🐰 A rabbit's ode to clearer logs

When uploads stall and errors chime,
I list the files and save you time.
From rate limits, auth, to timeout woes,
Each problem now a message that shows. 🎋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: improving error visibility by surfacing third-party errors in two specific API endpoints (/llm/call and /collections).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/api-status-code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ayush8923 Ayush8923 self-assigned this Jun 2, 2026
@sentry
Copy link
Copy Markdown

sentry Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes would touch upon

  1. services/llm/providers/{provider}.py
  2. similar to doc_transform celery job
  3. jobs.py (execute_llm, and doc_transform_job, execute_chain_job etc)
  4. Avoid wrappers/utils/global catch-all error classes.
  5. Favor simpler if-else or switch-case if error cases are large.
  6. See how can we surface error codes with the webhook payload, since from Kaapi perspective every success callback (even if contains errors) is considered default 200.
  7. Check the top 5 payloads for sample error messages and others for success msgs
    https://webhook.site/#!/view/5a612527-4251-48dc-9f26-b45438802f41/2793b530-434a-4033-9175-0ddf95f0e9d6/1
  8. This is the gemini TTS error that comes up

{
"message": "Failed to extract audio from response: 'NoneType' object has no attribute 'parts'",
"success": false
}

logger = logging.getLogger(__name__)


def _classify_openai_error(status_code: int | None, error_code: str | None) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Avoid wrappers and use explicit in-place error messages for each error code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

Comment thread backend/app/models/llm/response.py Outdated
)


class LLMCallErrorDetail(SQLModel):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Avoid top level catch-all classes and use if--else in place

@@ -138,17 +158,31 @@ def execute(

except openai.OpenAIError as e:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is ideal. except openai.OpenAIError as e: and then use their native errors (my guess is command + clicking inside OpenAIError we can see specific error messages) and use as is if they are already human readable or make them in place

Comment thread backend/app/services/llm/errors.py Outdated
)


def set_provider_error_meta(meta: ProviderErrorMeta) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise. No complex wrappers.

Comment thread backend/app/services/llm/jobs.py Outdated
)


def _finalize_error_detail(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Comment thread backend/app/services/llm/errors.py Outdated
@@ -0,0 +1,37 @@
"""Shared error-capture plumbing for LLM provider calls.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this errors.py altogether. Dont use special utils/classes to wrape errors. Write errors in-place.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

OpenAPI changes   ⚪ No API surface changes

Note

This PR does not modify the API contract.

main2080e0a0 · generated by oasdiff

Comment thread backend/app/crud/rag/open_ai.py Outdated
Comment on lines +143 to +150
except openai.RateLimitError as e:
raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}")
except openai.AuthenticationError as e:
raise InterruptedError(f"OpenAI authentication failed: {e.message}")
except openai.PermissionDeniedError as e:
raise InterruptedError(f"OpenAI permission denied: {e.message}")
except openai.NotFoundError as e:
raise InterruptedError(f"OpenAI resource not found: {e.message}")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajna1999 Added these types of the exception handling instead of wrapper.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit too large for the problem it’s solving. I think using the wrapper function here would be a cleaner and more maintainable approach than handling multiple exception cases individually.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove PermissionDeniedError, ConflictError, APIConnectionError, APIStatusError,. It;s alright if we have so many error classes. Can add one two line to each error message on what to do like OpenAI rate limit exceeded. Try again in 1 minute. If issue persists, contact Kaapi to all messages. It enables the user to take a call on what to do next. Similarly for Authentication error Check your openAI API Key is valid or not.. etc.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/app/tests/crud/rag/test_open_ai.py (1)

15-59: 💤 Low value

Add type hints to the new fixtures and builder helpers.

The fixtures (mock_client, crud, mock_storage, docs_batch) and builders (_batch_result, _failed_file_page, _failed_file) are missing return type hints, and _failed_file_page(files, ...) leaves files untyped.

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/crud/rag/test_open_ai.py` around lines 15 - 59, Add
explicit type hints to all fixtures and helper builders: annotate mock_client()
-> MagicMock, crud(mock_client) -> OpenAIVectorStoreCrud, mock_storage() ->
MagicMock, docs_batch() -> List[List[MagicMock]] (or list[list[MagicMock]]),
_batch_result(...) -> MagicMock, _failed_file_page(files: Iterable[MagicMock],
has_more: bool = False) -> MagicMock, and _failed_file(file_id: str,
error_message: Optional[str]) -> MagicMock; import needed typing names (List,
Iterable, Optional) and ensure return annotations match the MagicMock types used
in tests so static checkers and linters accept the fixtures and helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/tests/crud/rag/test_open_ai.py`:
- Around line 15-59: Add explicit type hints to all fixtures and helper
builders: annotate mock_client() -> MagicMock, crud(mock_client) ->
OpenAIVectorStoreCrud, mock_storage() -> MagicMock, docs_batch() ->
List[List[MagicMock]] (or list[list[MagicMock]]), _batch_result(...) ->
MagicMock, _failed_file_page(files: Iterable[MagicMock], has_more: bool = False)
-> MagicMock, and _failed_file(file_id: str, error_message: Optional[str]) ->
MagicMock; import needed typing names (List, Iterable, Optional) and ensure
return annotations match the MagicMock types used in tests so static checkers
and linters accept the fixtures and helpers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 62a78d49-f0f3-42a1-9923-377d88e2d42b

📥 Commits

Reviewing files that changed from the base of the PR and between 901517d and c65d610.

📒 Files selected for processing (6)
  • backend/app/crud/model_config.py
  • backend/app/crud/rag/open_ai.py
  • backend/app/services/llm/providers/oai.py
  • backend/app/tests/crud/rag/__init__.py
  • backend/app/tests/crud/rag/test_open_ai.py
  • backend/app/tests/services/llm/providers/test_openai.py

Comment thread backend/app/crud/rag/open_ai.py Outdated
Comment on lines +143 to +150
except openai.RateLimitError as e:
raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}")
except openai.AuthenticationError as e:
raise InterruptedError(f"OpenAI authentication failed: {e.message}")
except openai.PermissionDeniedError as e:
raise InterruptedError(f"OpenAI permission denied: {e.message}")
except openai.NotFoundError as e:
raise InterruptedError(f"OpenAI resource not found: {e.message}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove PermissionDeniedError, ConflictError, APIConnectionError, APIStatusError,. It;s alright if we have so many error classes. Can add one two line to each error message on what to do like OpenAI rate limit exceeded. Try again in 1 minute. If issue persists, contact Kaapi to all messages. It enables the user to take a call on what to do next. Similarly for Authentication error Check your openAI API Key is valid or not.. etc.

except openai.RateLimitError as e:
error_message = f"OpenAI rate limit exceeded: {e.message}"
logger.warning(
f"[OpenAIProvider.execute] {error_message} | "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. One line explaining what to do next.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
return None, error_message

except openai.ConflictError as e:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.

Comment thread backend/app/crud/rag/open_ai.py Outdated
files=files,
)
except openai.RateLimitError as e:
raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add error code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/app/tests/services/llm/providers/test_openai.py (1)

280-290: ⚡ Quick win

Type-annotate the updated parametrized provider test signature.

The modified method uses new arguments but still omits parameter and return type hints.

Suggested patch
+from collections.abc import Callable
@@
     def test_execute_specific_openai_exceptions_use_category_prefix(
         self,
-        provider,
-        mock_client,
-        completion_config,
-        query_params,
-        exception_factory,
-        expected_prefix,
-        expected_status,
-        original_message,
-    ):
+        provider: OpenAIProvider,
+        mock_client: MagicMock,
+        completion_config: NativeCompletionConfig,
+        query_params: QueryParams,
+        exception_factory: Callable[[], Exception],
+        expected_prefix: str,
+        expected_status: int,
+        original_message: str,
+    ) -> None:

As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/services/llm/providers/test_openai.py` around lines 280 -
290, Add proper type hints to the test function signature
test_execute_specific_openai_exceptions_use_category_prefix by annotating each
parameter and the return type (-> None). For example, annotate fixtures like
provider and mock_client as Any, completion_config and query_params as dict,
exception_factory as Callable[..., Exception], expected_prefix as str,
expected_status as int and original_message as str, and add the necessary
imports from typing (Any, Callable) at the top of the test module; keep the
function name unchanged.
backend/app/tests/crud/rag/test_open_ai.py (1)

281-291: ⚡ Quick win

Add type hints to the updated parametrized test method.

The modified method signature is still untyped. Please annotate parameters and return type to keep test code aligned with the repository-wide typing rule.

Suggested patch
+from collections.abc import Callable
@@
     def test_specific_openai_exception_maps_to_category_prefix(
         self,
-        crud,
-        mock_client,
-        mock_storage,
-        docs_batch,
-        exception_factory,
-        expected_prefix,
-        expected_status,
-        original_message,
-    ):
+        crud: OpenAIVectorStoreCrud,
+        mock_client: MagicMock,
+        mock_storage: MagicMock,
+        docs_batch: list[list[MagicMock]],
+        exception_factory: Callable[[], Exception],
+        expected_prefix: str,
+        expected_status: int,
+        original_message: str,
+    ) -> None:

As per coding guidelines, **/*.py: Always add type hints to all function parameters and return values in Python code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/crud/rag/test_open_ai.py` around lines 281 - 291, Add type
annotations to the test function signature for
test_specific_openai_exception_maps_to_category_prefix: annotate fixture-like
parameters crud, mock_client, mock_storage, docs_batch as Any (from typing),
annotate exception_factory as Callable[..., Exception], annotate expected_prefix
as str, expected_status as int, original_message as str, and annotate the
function return type as None; update the def line for
test_specific_openai_exception_maps_to_category_prefix accordingly and import
Any and Callable from typing if not already imported.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/tests/crud/rag/test_open_ai.py`:
- Around line 281-291: Add type annotations to the test function signature for
test_specific_openai_exception_maps_to_category_prefix: annotate fixture-like
parameters crud, mock_client, mock_storage, docs_batch as Any (from typing),
annotate exception_factory as Callable[..., Exception], annotate expected_prefix
as str, expected_status as int, original_message as str, and annotate the
function return type as None; update the def line for
test_specific_openai_exception_maps_to_category_prefix accordingly and import
Any and Callable from typing if not already imported.

In `@backend/app/tests/services/llm/providers/test_openai.py`:
- Around line 280-290: Add proper type hints to the test function signature
test_execute_specific_openai_exceptions_use_category_prefix by annotating each
parameter and the return type (-> None). For example, annotate fixtures like
provider and mock_client as Any, completion_config and query_params as dict,
exception_factory as Callable[..., Exception], expected_prefix as str,
expected_status as int and original_message as str, and add the necessary
imports from typing (Any, Callable) at the top of the test module; keep the
function name unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a1a41b1e-053c-4b6d-93c5-54ae980ed6a9

📥 Commits

Reviewing files that changed from the base of the PR and between 71692bb and 6d99f27.

📒 Files selected for processing (4)
  • backend/app/crud/rag/open_ai.py
  • backend/app/services/llm/providers/oai.py
  • backend/app/tests/crud/rag/test_open_ai.py
  • backend/app/tests/services/llm/providers/test_openai.py

@Ayush8923 Ayush8923 requested a review from Prajna1999 June 5, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants