feat(js/testing): mockModel + echoModel for genkit/testing#5475
feat(js/testing): mockModel + echoModel for genkit/testing#5475cabljac wants to merge 7 commits into
Conversation
Promote Genkit's internal model-mocking helpers into the public
`genkit/testing` surface so app developers can unit-test flows,
prompts, tools, and chat deterministically — without a live model,
network, or API key.
- mockModel(ai, { respond }): a programmable mock model that drives each
call's response (text, structured, tool requests, streamed chunks) and
records calls with typed inspection (lastRequest / requests /
requestCount), replacing the untyped `(model as any).__test__*` hack.
- echoModel(ai): a zero-config model that echoes the rendered request as
text, for asserting prompt/message assembly.
- Re-exported from `@genkit-ai/ai/testing` and `genkit/testing`.
- Tests at js/genkit/tests/mock-model_test.ts.
- Sample app js/testapps/testing-sample demonstrating flow, tool,
streaming, and prompt-assembly tests.
Implements the JS proposal in docs/js-testing-module-rfc.md (RFC #20).
autoModel (zero-config schema-aware tier) is left for a follow-up per
RFC open question 5.
…view, fix sample scripts
- echoModel: render non-text parts (media, tool requests/responses, reasoning, data) as labelled placeholders instead of silently dropping them, so the echo reflects the full request the model would have seen. - mockModel: snapshot requests with structuredClone instead of JSON.parse(JSON.stringify(...)) — preserves more types and avoids silently stripping undefined fields. - mockModel: forward only the ModelInfo fields defineModel accepts (versions/label/supports) instead of spreading via `as any`. - testing-sample: expose a createMenuApp() factory and build a fresh, isolated app per test so mocks no longer re-register the default model name on a shared registry (removes registry-overwrite log spam). - bump copyright headers on new files to 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fresh-context review caught that renderPart still dropped two Part variants (ResourcePart, CustomPart) to empty string — the same silent-drop class the prior commit fixed for media/tool/reasoning/data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous sample exercised mostly framework behavior because the flow was a passthrough. Reworked it around an app with logic worth testing: - recommendPrompt: a dotprompt (Handlebars template + system + tool), asserted via echoModel (system + rendered template). - recommendDish: a flow that requests STRUCTURED output, validates it, and derives `withinBudget` itself — tests pin down that derivation. The same model output with a lower budget flips the result, proving the test exercises app logic, not the model. Adds an error-path test. - streamRecommendation: a streaming flow driven via flow.stream(), so the streaming test goes through the flow instead of bypassing it. Structured output schema is supplied at the flow's call site (not baked into the prompt) so the prompt stays text-renderable for echoModel, which a strict output schema would reject. README updated. node:test 6/6, vitest 5/5, tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stText echoModel returns text, which can't satisfy a structured output schema — Genkit derives `output` by parsing the response text and validating it. Previously that surfaced as a cryptic ajv "must have required property" error. Now: - echoModel declares native constrained support, so the framework hands the schema to the model in `request.output.schema` (instead of injecting it as prompt text). echoModel detects it and throws an explanatory error pointing to the right pattern. - Add `model.lastRequestText` to every mock: the full assembled request (system + all messages) flattened to a string. This gives echo-style prompt-assembly assertions on ANY mock — including structured-output paths where echoModel can't be used — as pure request inspection, decoupled from the response shape. - Share the request-flattening logic (renderRequestText) between echoModel and lastRequestText; reframe echoModel's jsdoc as a text-path preset. Tests: echoModel throw-under-schema, lastRequestText on a structured path. Sample gains a lastRequestText assertion on the structured flow. README updated. genkit 9/9, sample node 7/7 + vitest 5/5, tsc clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces deterministic testing utilities for Genkit, specifically mockModel and echoModel, along with a sample application demonstrating their usage with both node:test and vitest. The review feedback highlights several key robustness and usability improvements in mockModel: handling potential DataCloneErrors when cloning requests with non-serializable properties, joining multi-turn messages with newlines in renderRequestText for better readability, and adding defensive checks in toResponseData to prevent runtime errors when a mock response is falsy.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| async (request, { sendChunk }) => { | ||
| // Snapshot so later mutation of the request can't alter recorded history. | ||
| requests.push(structuredClone(request)); |
There was a problem hiding this comment.
Using structuredClone directly on the request object can throw a DataCloneError if the request contains non-serializable properties. In Genkit, the context or config objects often carry complex, non-serializable values (such as database connections, Express request/response objects, or custom class instances).
To make this robust, we should wrap the cloning in a helper that falls back to a safe shallow/semi-deep clone of the messages and parts if structuredClone fails.
async (request, { sendChunk }) => {
// Snapshot so later mutation of the request can't alter recorded history.
let cloned: GenerateRequest;
try {
cloned = structuredClone(request);
} catch {
cloned = {
...request,
messages: request.messages.map((m) => ({
...m,
content: m.content.map((c) => ({ ...c })),
})),
};
}
requests.push(cloned);| function renderRequestText(request: GenerateRequest): string { | ||
| return request.messages | ||
| .map( | ||
| (m) => | ||
| (m.role === 'user' || m.role === 'model' ? '' : `${m.role}: `) + | ||
| m.content.map(renderPart).join('') | ||
| ) | ||
| .join(''); | ||
| } |
There was a problem hiding this comment.
Currently, renderRequestText joins all messages with an empty string (.join('')). This causes multi-turn conversations to run together (e.g., user: hello and model: hi becomes hellohi), making assertions and debugging output hard to read.
Joining the messages with a newline (\n) instead makes the flattened request text much more readable and easier to assert on.
function renderRequestText(request: GenerateRequest): string {
return request.messages
.map(
(m) =>
(m.role === 'user' || m.role === 'model' ? '' : \`\${m.role}: \`) +
m.content.map(renderPart).join('')
)
.join('\n');
}| function toResponseData(response: MockResponse): GenerateResponseData { | ||
| if (typeof response === 'string') { | ||
| return { | ||
| message: { role: 'model', content: [{ text: response }] }, | ||
| finishReason: 'stop', | ||
| }; | ||
| } | ||
| if ('message' in response && response.message) { | ||
| return { finishReason: 'stop', ...(response as GenerateResponseData) }; | ||
| } |
There was a problem hiding this comment.
If the user-defined respond function returns undefined or null (for example, if they only stream chunks or have a void callback), toResponseData will throw a TypeError: Cannot use 'in' operator to search for 'message' in undefined.
We should add a defensive check to handle falsy responses gracefully by returning an empty model response.
function toResponseData(response: MockResponse): GenerateResponseData {
if (!response) {
return {
message: { role: 'model', content: [] },
finishReason: 'stop',
};
}
if (typeof response === 'string') {
return {
message: { role: 'model', content: [{ text: response }] },
finishReason: 'stop',
};
}
if (typeof response === 'object' && 'message' in response && response.message) {
return { finishReason: 'stop', ...(response as GenerateResponseData) };
}
Implementation of the JS testing-module RFC (#5438).
What
Promotes Genkit internal model-mocking helpers into the public
genkit/testingsurface so app developers can unit-test flows, prompts, tools, and chat deterministically, with no live model, network, or API key.mockModel(ai, { respond })- programmable mock.respond(req, { sendChunk })drives each call response, including text, structured output, tool requests, or a stream. Typed inspection includesmodel.lastMessage,model.lastRequest,model.requests, andmodel.requestCount.echoModel(ai)- zero-config model that echoes the rendered request as text, for asserting prompt/message assembly.@genkit-ai/ai/testingandgenkit/testing, alongside the existingtestModels.Inspection reads in Genkit idiom:
model.lastMessage!.textinstead of reaching into raw request message content.Files
js/ai/src/testing/mock-model.tsjs/ai/src/testing/index.ts,js/genkit/src/testing.tsjs/genkit/tests/mock-model_test.tsjs/testapps/testing-sample/recommendDishflow anddailySpecialtooljs/testapps/testing-sample/tests/menu_test.tsnode:testcoveragejs/testapps/testing-sample/tests/menu.vitest.test.tsWhy a dedicated fake model, not middleware
The RFC alternatives section weighs model middleware as the closest mechanism. This PR implements
mockModelandechoModelas realdefineModelactions using apiVersion v2, mirroring the existing internaldefineProgrammableModelapproach.Function.lengthstreaming behavior in model middleware dispatch.Scope notes
autoModelis not included; it remains an RFC open question for follow-up.Refs #5438.