Skip to content

[Platform][Cache] Cache content-object inputs via CacheKeyGenerator strategy#2160

Open
tacman wants to merge 1 commit into
symfony:mainfrom
tacman:platform-cacheable-input
Open

[Platform][Cache] Cache content-object inputs via CacheKeyGenerator strategy#2160
tacman wants to merge 1 commit into
symfony:mainfrom
tacman:platform-cacheable-input

Conversation

@tacman

@tacman tacman commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
Q A
Bug fix? no
New feature? yes
Docs? no
Issues Fix #2159
License MIT

CachePlatform (the ai.platform.cache.* decorator) could only build a cache
key for string, array and MessageBag inputs — it threw
Unsupported input type on anything else. That means it could not decorate any
platform task whose top-level input is a content object, which is the normal
shape for the slow/expensive calls most worth caching:

  • audio transcription — $platform->invoke('whisper-1', Audio::fromFile(...))
  • OCR — $platform->invoke('mistral-ocr-latest', new DocumentUrl(...))
  • single-image vision — $platform->invoke($model, new ImageUrl(...))

Approach

Per @chr-hertel's review (keying inputs is the bridge's job — it shouldn't leak
into the Platform component or its content classes), this keeps the knowledge of
how to key an input inside the Cache bridge via a small strategy
interface:

namespace Symfony\AI\Platform\Bridge\Cache;

interface CacheKeyGenerator
{
    public function supports(object $input): bool;
    public function generate(object $input): string;
}

CachePlatform holds an ordered list of generators and the first one that
supports() an input keys it:

$normalizedInput = match (true) {
    \is_string($input) => md5($input),
    \is_array($input) => json_encode($input),
    default => $this->generateInputCacheKey($input), // tries each CacheKeyGenerator
};

The default set covers MessageBag, DocumentUrl, ImageUrl and
File (and its Audio/Image/Video/Document subclasses). DocumentUrl
and ImageUrl key on the URL; File on a hash of its bytes. getCacheKey()
values are used verbatim in a cache key, so the URL/byte arms hash their value
(xxh128) the same way the existing string arm uses md5().

MessageBag now routes through the same mechanism
(MessageBagCacheKeyGenerator), so all object inputs go through one
consistent path instead of a special-cased inline arm.

$cached = new CachePlatform($platform, cache: $pool);
$cached->invoke('mistral-ocr-latest', new DocumentUrl('https://…/doc.pdf'), [
    'prompt_cache_key' => 'ocr',
]); // now cached instead of throwing

A userland content type opts in by registering its own generator — no change to
CachePlatform or to core content classes:

new CachePlatform($platform, cache: $pool, cacheKeyGenerators: [
    new MyContentCacheKeyGenerator(),
    new MessageBagCacheKeyGenerator(),
    // …
]);

Scope

Fully isolated in the bridge: no changes to the Platform component or its
content classes
, so the dependency points bridge → component only. This is
also additive — behavior for string/array/MessageBag inputs is unchanged.

Tests added to CachePlatformTest (content-object cached once; different inputs
not shared; unsupported object still throws; custom generator honored). Surfaced
while adding a Mistral OCR bridge (#2072), but independent of it — the gap
already affects Whisper today.

Supersedes the earlier CacheableInputInterface-in-Platform approach from
this PR's first revision, which leaked the bridge's caching concern into the
core content classes.

@carsonbot carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Jun 12, 2026
chr-hertel added a commit that referenced this pull request Jun 21, 2026
…point (tacman, chr-hertel)

This PR was squashed before being merged into the main branch.

Discussion
----------

[Platform][Mistral] Add OCR support via the /v1/ocr endpoint

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes
| Docs?         | yes
| Issues        | Fix #2072
| License       | MIT

Adds OCR support to the Mistral bridge via the dedicated `/v1/ocr` endpoint, as
discussed in #2072.

Mistral's `mistral-ocr-latest` model is not a chat completion — it uses a
separate endpoint and returns structured per-page output, so it can't be routed
through `MessageBag`/chat. Following the existing Whisper bridge pattern, this
adds a dedicated `Ocr` model with its own `ModelClient` and `ResultConverter`:

```php
use Symfony\AI\Platform\Bridge\Mistral\Factory;
use Symfony\AI\Platform\Bridge\Mistral\Ocr\Result\OcrResult;
use Symfony\AI\Platform\Message\Content\DocumentUrl;

$platform = Factory::createPlatform($apiKey);

$result = $platform->invoke('mistral-ocr-latest', new DocumentUrl('https://example.com/document.pdf'));

$ocr = $result->asObject();
\assert($ocr instanceof OcrResult);

echo $ocr->getMarkdown();
```

The result is a typed `OcrResult` (pages with markdown, layout images + bounding
boxes, per-page annotations, usage info) via `ObjectResult`/`asObject()`, not a
text blob. Document URL, binary PDF and image inputs all work — the existing
`Document`, `DocumentUrl` and `ImageUrl` contract normalizers were widened to
also accept the `Ocr` model. Catalog entries (`mistral-ocr-latest`,
`mistral-ocr-2505`) plus a generator rule keep OCR models out of the chat class.

Verified end-to-end against the live API (a 29-page arXiv PDF by URL and a
binary PDF by base64), plus unit tests for the client, converter and catalog.

Also includes a `Document OCR` demo (`demo/`) showing the feature as a chat that
extracts a document's text and answers questions about it.

Docs: `docs/components/platform.rst`, the bridge `README.md`, examples under
`examples/mistral/`, and a `CHANGELOG.md` entry.

Companion PR #2160 (`CacheableInputInterface`) makes the `ai.platform.cache.*`
decorator able to cache content-object inputs like `DocumentUrl`, so OCR results
can be cached at the platform layer — independent of this PR.

Commits
-------

e9dab92 [Platform][Mistral] Add OCR support via the /v1/ocr endpoint

@chr-hertel chr-hertel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From architecture point of view I'm not a real fan here since requirements of that cache bridge are leaking into the main Platform component - you avoided that it gets obvious with moving the CacheableInputInterface into the Platform namespace, but semantically it's still leaky. can we isolate this in the bridge itself?

@tacman

tacman commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Yeah, you're right — keying inputs is the bridge's job, not something core content classes should advertise. The fix I have in mind keeps an interface (so it stays extensible) but moves it into the bridge as a strategy:

namespace Symfony\AI\Platform\Bridge\Cache;

interface CacheKeyGenerator
{
    public function supports(object $input): bool;
    public function generate(object $input): string;
}

CachePlatform holds an ordered list of these (default set covers DocumentUrl/ImageUrl/File), so the dependency points bridge → component, content classes go back to pure, and a userland content type can register its own generator instead of implementing a core interface. CacheableInputInterface and the getCacheKey() methods get dropped.

One thing I want your call on before I push, since it shapes the implementation: should MessageBag route through the same generator mechanism, or stay inline? Today it's keyed inline ($input->getId()). I'd lean toward folding it into a MessageBagCacheKeyGenerator so all object inputs go through one consistent path — otherwise we've got two parallel mechanisms for objects, which is a bit of the same smell. But if you'd rather keep MessageBag special-cased inline and only route the content objects through generators, that's an easy variant. Which do you prefer?

@tacman

tacman commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

@chr-hertel thoughts on this? I'd love to get it in before 0.11 -- the mistral calls are relatively expensive, so I'd love to cache them.

@chr-hertel

Copy link
Copy Markdown
Member

that pattern looks like a good fit to me 👍

@tacman

tacman commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

CacheKeyGenerator, with MessageBag going through it as well? I'll submit a PR shortly.

@tacman tacman force-pushed the platform-cacheable-input branch from d39d468 to 252fc09 Compare June 24, 2026 15:33
@tacman tacman changed the title [Platform][Cache] Cache content-object inputs via CacheableInputInterface [Platform][Cache] Cache content-object inputs via CacheKeyGenerator strategy Jun 24, 2026
@tacman

tacman commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Pushed the CacheKeyGenerator strategy as discussed — it lives entirely in the Cache bridge now, so there are no changes to the Platform component or its content classes (CacheableInputInterface and the getCacheKey() methods are gone). MessageBag routes through the same mechanism (MessageBagCacheKeyGenerator), so all object inputs share one consistent path, and custom input types opt in by registering their own generator. Default set: MessageBag, DocumentUrl, ImageUrl, File/Audio. CHANGELOG is under 0.11.

@tacman tacman force-pushed the platform-cacheable-input branch from 252fc09 to 0c00151 Compare June 24, 2026 16:21
…trategy

`CachePlatform` could only build a cache key for `string`, `array` and
`MessageBag` inputs and threw `Unsupported input type` on anything else, so
it could not decorate platform tasks whose top-level input is a content
object — audio transcription (`Audio`), OCR (`DocumentUrl`/`Document`) and
single-image vision (`ImageUrl`/`Image`), which are the slow/expensive calls
most worth caching.

Add a `CacheKeyGenerator` strategy that lives in the Cache bridge, so the
knowledge of how to key an input stays in the bridge instead of leaking into
the Platform component or its content classes. `CachePlatform` holds an
ordered list of generators (default set: `MessageBag`, `DocumentUrl`,
`ImageUrl`, `File`/`Audio`) and the first one that `supports()` an input
keys it. Custom input types opt in by registering their own generator —
no change to `CachePlatform` or to core content classes.

`MessageBag` now routes through the same mechanism, so all object inputs go
through one consistent path.

Fix symfony#2159
@tacman tacman force-pushed the platform-cacheable-input branch from 0c00151 to 0a91836 Compare June 25, 2026 18:01
@tacman

tacman commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Green now — the last red was a Fabbot nit (exception value needed double-quoting in CachePlatform), fixed. Recap of where this landed vs your review: the caching concern is fully isolated in the bridge via the CacheKeyGenerator strategy you sketched — nothing leaks into the Platform component or its content classes, and MessageBag routes through the same path. I credited you as co-author on the CacheKeyGenerator interface since the shape was your call. Mind giving it another look when you have a sec? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Platform][Cache] CachePlatform cannot cache content-object inputs (Whisper audio, Mistral OCR)

3 participants