[Platform][Cache] Cache content-object inputs via CacheKeyGenerator strategy#2160
[Platform][Cache] Cache content-object inputs via CacheKeyGenerator strategy#2160tacman wants to merge 1 commit into
Conversation
…point (tacman, chr-hertel) This PR was squashed before being merged into the main branch. Discussion ---------- [Platform][Mistral] Add OCR support via the /v1/ocr endpoint | Q | A | ------------- | --- | Bug fix? | no | New feature? | yes | Docs? | yes | Issues | Fix #2072 | License | MIT Adds OCR support to the Mistral bridge via the dedicated `/v1/ocr` endpoint, as discussed in #2072. Mistral's `mistral-ocr-latest` model is not a chat completion — it uses a separate endpoint and returns structured per-page output, so it can't be routed through `MessageBag`/chat. Following the existing Whisper bridge pattern, this adds a dedicated `Ocr` model with its own `ModelClient` and `ResultConverter`: ```php use Symfony\AI\Platform\Bridge\Mistral\Factory; use Symfony\AI\Platform\Bridge\Mistral\Ocr\Result\OcrResult; use Symfony\AI\Platform\Message\Content\DocumentUrl; $platform = Factory::createPlatform($apiKey); $result = $platform->invoke('mistral-ocr-latest', new DocumentUrl('https://example.com/document.pdf')); $ocr = $result->asObject(); \assert($ocr instanceof OcrResult); echo $ocr->getMarkdown(); ``` The result is a typed `OcrResult` (pages with markdown, layout images + bounding boxes, per-page annotations, usage info) via `ObjectResult`/`asObject()`, not a text blob. Document URL, binary PDF and image inputs all work — the existing `Document`, `DocumentUrl` and `ImageUrl` contract normalizers were widened to also accept the `Ocr` model. Catalog entries (`mistral-ocr-latest`, `mistral-ocr-2505`) plus a generator rule keep OCR models out of the chat class. Verified end-to-end against the live API (a 29-page arXiv PDF by URL and a binary PDF by base64), plus unit tests for the client, converter and catalog. Also includes a `Document OCR` demo (`demo/`) showing the feature as a chat that extracts a document's text and answers questions about it. Docs: `docs/components/platform.rst`, the bridge `README.md`, examples under `examples/mistral/`, and a `CHANGELOG.md` entry. Companion PR #2160 (`CacheableInputInterface`) makes the `ai.platform.cache.*` decorator able to cache content-object inputs like `DocumentUrl`, so OCR results can be cached at the platform layer — independent of this PR. Commits ------- e9dab92 [Platform][Mistral] Add OCR support via the /v1/ocr endpoint
chr-hertel
left a comment
There was a problem hiding this comment.
From architecture point of view I'm not a real fan here since requirements of that cache bridge are leaking into the main Platform component - you avoided that it gets obvious with moving the CacheableInputInterface into the Platform namespace, but semantically it's still leaky. can we isolate this in the bridge itself?
|
Yeah, you're right — keying inputs is the bridge's job, not something core content classes should advertise. The fix I have in mind keeps an interface (so it stays extensible) but moves it into the bridge as a strategy: namespace Symfony\AI\Platform\Bridge\Cache;
interface CacheKeyGenerator
{
public function supports(object $input): bool;
public function generate(object $input): string;
}
One thing I want your call on before I push, since it shapes the implementation: should |
|
@chr-hertel thoughts on this? I'd love to get it in before 0.11 -- the mistral calls are relatively expensive, so I'd love to cache them. |
|
that pattern looks like a good fit to me 👍 |
|
CacheKeyGenerator, with MessageBag going through it as well? I'll submit a PR shortly. |
d39d468 to
252fc09
Compare
|
Pushed the |
252fc09 to
0c00151
Compare
…trategy `CachePlatform` could only build a cache key for `string`, `array` and `MessageBag` inputs and threw `Unsupported input type` on anything else, so it could not decorate platform tasks whose top-level input is a content object — audio transcription (`Audio`), OCR (`DocumentUrl`/`Document`) and single-image vision (`ImageUrl`/`Image`), which are the slow/expensive calls most worth caching. Add a `CacheKeyGenerator` strategy that lives in the Cache bridge, so the knowledge of how to key an input stays in the bridge instead of leaking into the Platform component or its content classes. `CachePlatform` holds an ordered list of generators (default set: `MessageBag`, `DocumentUrl`, `ImageUrl`, `File`/`Audio`) and the first one that `supports()` an input keys it. Custom input types opt in by registering their own generator — no change to `CachePlatform` or to core content classes. `MessageBag` now routes through the same mechanism, so all object inputs go through one consistent path. Fix symfony#2159
0c00151 to
0a91836
Compare
|
Green now — the last red was a Fabbot nit (exception value needed double-quoting in |
CachePlatform(theai.platform.cache.*decorator) could only build a cachekey for
string,arrayandMessageBaginputs — it threwUnsupported input typeon anything else. That means it could not decorate anyplatform task whose top-level input is a content object, which is the normal
shape for the slow/expensive calls most worth caching:
$platform->invoke('whisper-1', Audio::fromFile(...))$platform->invoke('mistral-ocr-latest', new DocumentUrl(...))$platform->invoke($model, new ImageUrl(...))Approach
Per @chr-hertel's review (keying inputs is the bridge's job — it shouldn't leak
into the Platform component or its content classes), this keeps the knowledge of
how to key an input inside the Cache bridge via a small strategy
interface:
CachePlatformholds an ordered list of generators and the first one thatsupports()an input keys it:The default set covers
MessageBag,DocumentUrl,ImageUrlandFile(and itsAudio/Image/Video/Documentsubclasses).DocumentUrland
ImageUrlkey on the URL;Fileon a hash of its bytes.getCacheKey()values are used verbatim in a cache key, so the URL/byte arms hash their value
(
xxh128) the same way the existingstringarm usesmd5().MessageBagnow routes through the same mechanism(
MessageBagCacheKeyGenerator), so all object inputs go through oneconsistent path instead of a special-cased inline arm.
A userland content type opts in by registering its own generator — no change to
CachePlatformor to core content classes:Scope
Fully isolated in the bridge: no changes to the Platform component or its
content classes, so the dependency points bridge → component only. This is
also additive — behavior for
string/array/MessageBaginputs is unchanged.Tests added to
CachePlatformTest(content-object cached once; different inputsnot shared; unsupported object still throws; custom generator honored). Surfaced
while adding a Mistral OCR bridge (#2072), but independent of it — the gap
already affects Whisper today.