API: surface third-party errors in /llm/call and /collections#906
API: surface third-party errors in /llm/call and /collections#906Ayush8923 wants to merge 14 commits into
Conversation
📝 WalkthroughWalkthroughUses a shared native-provider suffix constant; enriches OpenAI vector-store upload errors with targeted exception mapping and failed-file summaries; refactors provider-level OpenAI exception handling into explicit category branches with matching tests. ChangesOpenAI Error Handling Enhancement
Sequence Diagram(s)sequenceDiagram
participant Test
participant OpenAIVectorStoreCrud
participant OpenAI as OpenAI Client
participant Storage as File Storage
Test->>OpenAIVectorStoreCrud: update(docs)
OpenAIVectorStoreCrud->>OpenAI: upload_and_poll(files)
alt Upload Success
OpenAI-->>OpenAIVectorStoreCrud: completed == total
OpenAIVectorStoreCrud-->>Test: yield docs
else Upload Partial Failure
OpenAI-->>OpenAIVectorStoreCrud: completed < total
OpenAIVectorStoreCrud->>Storage: list_files(filter='failed', limit=10)
Storage-->>OpenAIVectorStoreCrud: failed file details with messages
OpenAIVectorStoreCrud->>OpenAIVectorStoreCrud: enrich error summary (truncate to 600 chars)
OpenAIVectorStoreCrud-->>Test: raise InterruptedError(enriched message)
else OpenAI Exception
OpenAI-->>OpenAIVectorStoreCrud: raise openai.*Error
OpenAIVectorStoreCrud-->>Test: raise InterruptedError(category-prefixed message)
end
sequenceDiagram
participant Test
participant OpenAIProvider
participant OpenAI as OpenAI Client
Test->>OpenAIProvider: execute(request)
OpenAIProvider->>OpenAI: api_call()
alt RateLimitError
OpenAI-->>OpenAIProvider: RateLimitError
OpenAIProvider-->>Test: None, "OpenAI API rate limit exceeded: ..."
else AuthenticationError
OpenAI-->>OpenAIProvider: AuthenticationError
OpenAIProvider-->>Test: None, "OpenAI authentication failed: ..."
else APIStatusError
OpenAI-->>OpenAIProvider: APIStatusError(status_code=418)
OpenAIProvider-->>Test: None, "OpenAI API status error (418): ..."
else TimeoutError
OpenAI-->>OpenAIProvider: APITimeoutError
OpenAIProvider-->>Test: None, "OpenAI request timed out: ..."
else ConnectionError
OpenAI-->>OpenAIProvider: APIConnectionError
OpenAIProvider-->>Test: None, "OpenAI connection error: ..."
else Generic OpenAIError
OpenAI-->>OpenAIProvider: OpenAIError
OpenAIProvider-->>Test: None, "OpenAI error: ..."
else Unexpected Exception
OpenAI-->>OpenAIProvider: Exception
OpenAIProvider-->>Test: None, "Unexpected error: ..."
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…kend into feat/api-status-code
Prajna1999
left a comment
There was a problem hiding this comment.
The changes would touch upon
- services/llm/providers/{provider}.py
- similar to doc_transform celery job
- jobs.py (execute_llm, and doc_transform_job, execute_chain_job etc)
- Avoid wrappers/utils/global catch-all error classes.
- Favor simpler if-else or switch-case if error cases are large.
- See how can we surface error codes with the webhook payload, since from Kaapi perspective every success callback (even if contains errors) is considered default 200.
- Check the top 5 payloads for sample error messages and others for success msgs
https://webhook.site/#!/view/5a612527-4251-48dc-9f26-b45438802f41/2793b530-434a-4033-9175-0ddf95f0e9d6/1 - This is the gemini TTS error that comes up
{
"message": "Failed to extract audio from response: 'NoneType' object has no attribute 'parts'",
"success": false
}
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _classify_openai_error(status_code: int | None, error_code: str | None) -> str: |
There was a problem hiding this comment.
- Avoid wrappers and use explicit in-place error messages for each error code.
| ) | ||
|
|
||
|
|
||
| class LLMCallErrorDetail(SQLModel): |
There was a problem hiding this comment.
- Avoid top level catch-all classes and use if--else in place
| @@ -138,17 +158,31 @@ def execute( | |||
|
|
|||
| except openai.OpenAIError as e: | |||
There was a problem hiding this comment.
- This is ideal. except
openai.OpenAIErroras e: and then use their native errors (my guess is command + clicking insideOpenAIErrorwe can see specific error messages) and use as is if they are already human readable or make them in place
| ) | ||
|
|
||
|
|
||
| def set_provider_error_meta(meta: ProviderErrorMeta) -> None: |
There was a problem hiding this comment.
likewise. No complex wrappers.
| ) | ||
|
|
||
|
|
||
| def _finalize_error_detail( |
| @@ -0,0 +1,37 @@ | |||
| """Shared error-capture plumbing for LLM provider calls. | |||
There was a problem hiding this comment.
Remove this errors.py altogether. Dont use special utils/classes to wrape errors. Write errors in-place.
OpenAPI changes ⚪ No API surface changesNote This PR does not modify the API contract.
|
| except openai.RateLimitError as e: | ||
| raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}") | ||
| except openai.AuthenticationError as e: | ||
| raise InterruptedError(f"OpenAI authentication failed: {e.message}") | ||
| except openai.PermissionDeniedError as e: | ||
| raise InterruptedError(f"OpenAI permission denied: {e.message}") | ||
| except openai.NotFoundError as e: | ||
| raise InterruptedError(f"OpenAI resource not found: {e.message}") |
There was a problem hiding this comment.
@Prajna1999 Added these types of the exception handling instead of wrapper.
There was a problem hiding this comment.
This feels a bit too large for the problem it’s solving. I think using the wrapper function here would be a cleaner and more maintainable approach than handling multiple exception cases individually.
There was a problem hiding this comment.
can remove PermissionDeniedError, ConflictError, APIConnectionError, APIStatusError,. It;s alright if we have so many error classes. Can add one two line to each error message on what to do like OpenAI rate limit exceeded. Try again in 1 minute. If issue persists, contact Kaapi to all messages. It enables the user to take a call on what to do next. Similarly for Authentication error Check your openAI API Key is valid or not.. etc.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/app/tests/crud/rag/test_open_ai.py (1)
15-59: 💤 Low valueAdd type hints to the new fixtures and builder helpers.
The fixtures (
mock_client,crud,mock_storage,docs_batch) and builders (_batch_result,_failed_file_page,_failed_file) are missing return type hints, and_failed_file_page(files, ...)leavesfilesuntyped.As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/tests/crud/rag/test_open_ai.py` around lines 15 - 59, Add explicit type hints to all fixtures and helper builders: annotate mock_client() -> MagicMock, crud(mock_client) -> OpenAIVectorStoreCrud, mock_storage() -> MagicMock, docs_batch() -> List[List[MagicMock]] (or list[list[MagicMock]]), _batch_result(...) -> MagicMock, _failed_file_page(files: Iterable[MagicMock], has_more: bool = False) -> MagicMock, and _failed_file(file_id: str, error_message: Optional[str]) -> MagicMock; import needed typing names (List, Iterable, Optional) and ensure return annotations match the MagicMock types used in tests so static checkers and linters accept the fixtures and helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/app/tests/crud/rag/test_open_ai.py`:
- Around line 15-59: Add explicit type hints to all fixtures and helper
builders: annotate mock_client() -> MagicMock, crud(mock_client) ->
OpenAIVectorStoreCrud, mock_storage() -> MagicMock, docs_batch() ->
List[List[MagicMock]] (or list[list[MagicMock]]), _batch_result(...) ->
MagicMock, _failed_file_page(files: Iterable[MagicMock], has_more: bool = False)
-> MagicMock, and _failed_file(file_id: str, error_message: Optional[str]) ->
MagicMock; import needed typing names (List, Iterable, Optional) and ensure
return annotations match the MagicMock types used in tests so static checkers
and linters accept the fixtures and helpers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 62a78d49-f0f3-42a1-9923-377d88e2d42b
📒 Files selected for processing (6)
backend/app/crud/model_config.pybackend/app/crud/rag/open_ai.pybackend/app/services/llm/providers/oai.pybackend/app/tests/crud/rag/__init__.pybackend/app/tests/crud/rag/test_open_ai.pybackend/app/tests/services/llm/providers/test_openai.py
| except openai.RateLimitError as e: | ||
| raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}") | ||
| except openai.AuthenticationError as e: | ||
| raise InterruptedError(f"OpenAI authentication failed: {e.message}") | ||
| except openai.PermissionDeniedError as e: | ||
| raise InterruptedError(f"OpenAI permission denied: {e.message}") | ||
| except openai.NotFoundError as e: | ||
| raise InterruptedError(f"OpenAI resource not found: {e.message}") |
There was a problem hiding this comment.
can remove PermissionDeniedError, ConflictError, APIConnectionError, APIStatusError,. It;s alright if we have so many error classes. Can add one two line to each error message on what to do like OpenAI rate limit exceeded. Try again in 1 minute. If issue persists, contact Kaapi to all messages. It enables the user to take a call on what to do next. Similarly for Authentication error Check your openAI API Key is valid or not.. etc.
| except openai.RateLimitError as e: | ||
| error_message = f"OpenAI rate limit exceeded: {e.message}" | ||
| logger.warning( | ||
| f"[OpenAIProvider.execute] {error_message} | " |
There was a problem hiding this comment.
Same as above. One line explaining what to do next.
There was a problem hiding this comment.
| ) | ||
| return None, error_message | ||
|
|
||
| except openai.ConflictError as e: |
| files=files, | ||
| ) | ||
| except openai.RateLimitError as e: | ||
| raise InterruptedError(f"OpenAI rate limit exceeded: {e.message}") |
There was a problem hiding this comment.
Should also add error code
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/app/tests/services/llm/providers/test_openai.py (1)
280-290: ⚡ Quick winType-annotate the updated parametrized provider test signature.
The modified method uses new arguments but still omits parameter and return type hints.
Suggested patch
+from collections.abc import Callable @@ def test_execute_specific_openai_exceptions_use_category_prefix( self, - provider, - mock_client, - completion_config, - query_params, - exception_factory, - expected_prefix, - expected_status, - original_message, - ): + provider: OpenAIProvider, + mock_client: MagicMock, + completion_config: NativeCompletionConfig, + query_params: QueryParams, + exception_factory: Callable[[], Exception], + expected_prefix: str, + expected_status: int, + original_message: str, + ) -> None:As per coding guidelines,
**/*.py: Always add type hints to all function parameters and return values in Python code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/tests/services/llm/providers/test_openai.py` around lines 280 - 290, Add proper type hints to the test function signature test_execute_specific_openai_exceptions_use_category_prefix by annotating each parameter and the return type (-> None). For example, annotate fixtures like provider and mock_client as Any, completion_config and query_params as dict, exception_factory as Callable[..., Exception], expected_prefix as str, expected_status as int and original_message as str, and add the necessary imports from typing (Any, Callable) at the top of the test module; keep the function name unchanged.backend/app/tests/crud/rag/test_open_ai.py (1)
281-291: ⚡ Quick winAdd type hints to the updated parametrized test method.
The modified method signature is still untyped. Please annotate parameters and return type to keep test code aligned with the repository-wide typing rule.
Suggested patch
+from collections.abc import Callable @@ def test_specific_openai_exception_maps_to_category_prefix( self, - crud, - mock_client, - mock_storage, - docs_batch, - exception_factory, - expected_prefix, - expected_status, - original_message, - ): + crud: OpenAIVectorStoreCrud, + mock_client: MagicMock, + mock_storage: MagicMock, + docs_batch: list[list[MagicMock]], + exception_factory: Callable[[], Exception], + expected_prefix: str, + expected_status: int, + original_message: str, + ) -> None:As per coding guidelines,
**/*.py: Always add type hints to all function parameters and return values in Python code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/tests/crud/rag/test_open_ai.py` around lines 281 - 291, Add type annotations to the test function signature for test_specific_openai_exception_maps_to_category_prefix: annotate fixture-like parameters crud, mock_client, mock_storage, docs_batch as Any (from typing), annotate exception_factory as Callable[..., Exception], annotate expected_prefix as str, expected_status as int, original_message as str, and annotate the function return type as None; update the def line for test_specific_openai_exception_maps_to_category_prefix accordingly and import Any and Callable from typing if not already imported.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/app/tests/crud/rag/test_open_ai.py`:
- Around line 281-291: Add type annotations to the test function signature for
test_specific_openai_exception_maps_to_category_prefix: annotate fixture-like
parameters crud, mock_client, mock_storage, docs_batch as Any (from typing),
annotate exception_factory as Callable[..., Exception], annotate expected_prefix
as str, expected_status as int, original_message as str, and annotate the
function return type as None; update the def line for
test_specific_openai_exception_maps_to_category_prefix accordingly and import
Any and Callable from typing if not already imported.
In `@backend/app/tests/services/llm/providers/test_openai.py`:
- Around line 280-290: Add proper type hints to the test function signature
test_execute_specific_openai_exceptions_use_category_prefix by annotating each
parameter and the return type (-> None). For example, annotate fixtures like
provider and mock_client as Any, completion_config and query_params as dict,
exception_factory as Callable[..., Exception], expected_prefix as str,
expected_status as int and original_message as str, and add the necessary
imports from typing (Any, Callable) at the top of the test module; keep the
function name unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a1a41b1e-053c-4b6d-93c5-54ae980ed6a9
📒 Files selected for processing (4)
backend/app/crud/rag/open_ai.pybackend/app/services/llm/providers/oai.pybackend/app/tests/crud/rag/test_open_ai.pybackend/app/tests/services/llm/providers/test_openai.py
Issue: #874
Summary
In this PR, Improves error visibility for two endpoints that proxy third-party providers so clients(who using these api endpoints) can branch on what actually went wrong instead of parsing a generic message string.
1.
/api/v1/collectionsWhen OpenAI vector-store batch upload partially fails, the error string now includes per-file reasons pulled from the batch's failed-file list.
OpenAI document processing error: 0/3 files completedOpenAI document processing error: 0/3 files completed. Failed files: file-abc (Unsupported file type), file-xyz (File too large)Capped at 10 files / 600 chars to keep callback payloads sane. Falls back to the old count-only message if the follow-up
list_fileslookup itself errors, so no regression on transient OpenAI failures.2.
/api/v1/llm/callThe failure callback's
datafield is now populated with a structuredLLMCallErrorDetailpayload instead ofnull. The top-levelerrorstring is unchanged, so existing clients keep working.conversation_id,provider,provider_status_code,error_type,messageconversation_idis echoed from the request, so clients chaining turns bythread_idno longer have their conversation break on every errorrate_limit | authentication | timeout | invalid_request | provider_error, withprovider_status_codefromOpenAIError.status_codeOther providers (Anthropic, Google, Sarvam, ElevenLabs) still flow through the new path with
conversation_id+providerpopulated buterror_type="provider_error"andprovider_status_code=null— same 5-line treatment can be added in a follow-up commit per provider.