Skip to content

feat(js/testing): mockModel + echoModel for genkit/testing#5475

Draft
cabljac wants to merge 7 commits into
mainfrom
feat/js-testing-module
Draft

feat(js/testing): mockModel + echoModel for genkit/testing#5475
cabljac wants to merge 7 commits into
mainfrom
feat/js-testing-module

Conversation

@cabljac

@cabljac cabljac commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Implementation of the JS testing-module RFC (#5438).

What

Promotes Genkit internal model-mocking helpers into the public genkit/testing surface so app developers can unit-test flows, prompts, tools, and chat deterministically, with no live model, network, or API key.

  • mockModel(ai, { respond }) - programmable mock. respond(req, { sendChunk }) drives each call response, including text, structured output, tool requests, or a stream. Typed inspection includes model.lastMessage, model.lastRequest, model.requests, and model.requestCount.
  • echoModel(ai) - zero-config model that echoes the rendered request as text, for asserting prompt/message assembly.
  • Re-exported from @genkit-ai/ai/testing and genkit/testing, alongside the existing testModels.

Inspection reads in Genkit idiom: model.lastMessage!.text instead of reaching into raw request message content.

Files

File What
js/ai/src/testing/mock-model.ts implementation + types
js/ai/src/testing/index.ts, js/genkit/src/testing.ts re-exports
js/genkit/tests/mock-model_test.ts unit tests for text, streaming, tool round-trip, inspection, and echo
js/testapps/testing-sample/ sample app with a recommendDish flow and dailySpecial tool
js/testapps/testing-sample/tests/menu_test.ts Node node:test coverage
js/testapps/testing-sample/tests/menu.vitest.test.ts vitest coverage

Why a dedicated fake model, not middleware

The RFC alternatives section weighs model middleware as the closest mechanism. This PR implements mockModel and echoModel as real defineModel actions using apiVersion v2, mirroring the existing internal defineProgrammableModel approach.

  • Avoids Function.length streaming behavior in model middleware dispatch.
  • Keeps typed inspection on the returned model action.
  • Promotes existing fake-model logic rather than rebuilding it on a different substrate.

Scope notes

  • autoModel is not included; it remains an RFC open question for follow-up.
  • The global no-real-model-calls guard is not included in this PR.

Refs #5438.

cabljac and others added 7 commits June 2, 2026 13:42
Promote Genkit's internal model-mocking helpers into the public
`genkit/testing` surface so app developers can unit-test flows,
prompts, tools, and chat deterministically — without a live model,
network, or API key.

- mockModel(ai, { respond }): a programmable mock model that drives each
  call's response (text, structured, tool requests, streamed chunks) and
  records calls with typed inspection (lastRequest / requests /
  requestCount), replacing the untyped `(model as any).__test__*` hack.
- echoModel(ai): a zero-config model that echoes the rendered request as
  text, for asserting prompt/message assembly.
- Re-exported from `@genkit-ai/ai/testing` and `genkit/testing`.
- Tests at js/genkit/tests/mock-model_test.ts.
- Sample app js/testapps/testing-sample demonstrating flow, tool,
  streaming, and prompt-assembly tests.

Implements the JS proposal in docs/js-testing-module-rfc.md (RFC #20).
autoModel (zero-config schema-aware tier) is left for a follow-up per
RFC open question 5.
- echoModel: render non-text parts (media, tool requests/responses,
  reasoning, data) as labelled placeholders instead of silently dropping
  them, so the echo reflects the full request the model would have seen.
- mockModel: snapshot requests with structuredClone instead of
  JSON.parse(JSON.stringify(...)) — preserves more types and avoids
  silently stripping undefined fields.
- mockModel: forward only the ModelInfo fields defineModel accepts
  (versions/label/supports) instead of spreading via `as any`.
- testing-sample: expose a createMenuApp() factory and build a fresh,
  isolated app per test so mocks no longer re-register the default model
  name on a shared registry (removes registry-overwrite log spam).
- bump copyright headers on new files to 2026.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fresh-context review caught that renderPart still dropped two Part
variants (ResourcePart, CustomPart) to empty string — the same
silent-drop class the prior commit fixed for media/tool/reasoning/data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous sample exercised mostly framework behavior because the flow
was a passthrough. Reworked it around an app with logic worth testing:

- recommendPrompt: a dotprompt (Handlebars template + system + tool),
  asserted via echoModel (system + rendered template).
- recommendDish: a flow that requests STRUCTURED output, validates it,
  and derives `withinBudget` itself — tests pin down that derivation. The
  same model output with a lower budget flips the result, proving the
  test exercises app logic, not the model. Adds an error-path test.
- streamRecommendation: a streaming flow driven via flow.stream(), so the
  streaming test goes through the flow instead of bypassing it.

Structured output schema is supplied at the flow's call site (not baked
into the prompt) so the prompt stays text-renderable for echoModel, which
a strict output schema would reject. README updated.

node:test 6/6, vitest 5/5, tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stText

echoModel returns text, which can't satisfy a structured output schema —
Genkit derives `output` by parsing the response text and validating it.
Previously that surfaced as a cryptic ajv "must have required property"
error. Now:

- echoModel declares native constrained support, so the framework hands
  the schema to the model in `request.output.schema` (instead of injecting
  it as prompt text). echoModel detects it and throws an explanatory error
  pointing to the right pattern.
- Add `model.lastRequestText` to every mock: the full assembled request
  (system + all messages) flattened to a string. This gives echo-style
  prompt-assembly assertions on ANY mock — including structured-output
  paths where echoModel can't be used — as pure request inspection,
  decoupled from the response shape.
- Share the request-flattening logic (renderRequestText) between echoModel
  and lastRequestText; reframe echoModel's jsdoc as a text-path preset.

Tests: echoModel throw-under-schema, lastRequestText on a structured path.
Sample gains a lastRequestText assertion on the structured flow. README
updated. genkit 9/9, sample node 7/7 + vitest 5/5, tsc clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added docs Improvements or additions to documentation js config test labels Jun 5, 2026
@google-cla

google-cla Bot commented Jun 5, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces deterministic testing utilities for Genkit, specifically mockModel and echoModel, along with a sample application demonstrating their usage with both node:test and vitest. The review feedback highlights several key robustness and usability improvements in mockModel: handling potential DataCloneErrors when cloning requests with non-serializable properties, joining multi-turn messages with newlines in renderRequestText for better readability, and adding defensive checks in toResponseData to prevent runtime errors when a mock response is falsy.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +241 to +243
async (request, { sendChunk }) => {
// Snapshot so later mutation of the request can't alter recorded history.
requests.push(structuredClone(request));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using structuredClone directly on the request object can throw a DataCloneError if the request contains non-serializable properties. In Genkit, the context or config objects often carry complex, non-serializable values (such as database connections, Express request/response objects, or custom class instances).

To make this robust, we should wrap the cloning in a helper that falls back to a safe shallow/semi-deep clone of the messages and parts if structuredClone fails.

    async (request, { sendChunk }) => {
      // Snapshot so later mutation of the request can't alter recorded history.
      let cloned: GenerateRequest;
      try {
        cloned = structuredClone(request);
      } catch {
        cloned = {
          ...request,
          messages: request.messages.map((m) => ({
            ...m,
            content: m.content.map((c) => ({ ...c })),
          })),
        };
      }
      requests.push(cloned);

Comment on lines +156 to +164
function renderRequestText(request: GenerateRequest): string {
return request.messages
.map(
(m) =>
(m.role === 'user' || m.role === 'model' ? '' : `${m.role}: `) +
m.content.map(renderPart).join('')
)
.join('');
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Currently, renderRequestText joins all messages with an empty string (.join('')). This causes multi-turn conversations to run together (e.g., user: hello and model: hi becomes hellohi), making assertions and debugging output hard to read.

Joining the messages with a newline (\n) instead makes the flattened request text much more readable and easier to assert on.

function renderRequestText(request: GenerateRequest): string {
  return request.messages
    .map(
      (m) =>
        (m.role === 'user' || m.role === 'model' ? '' : \`\${m.role}: \`) +
        m.content.map(renderPart).join('')
    )
    .join('\n');
}

Comment on lines +166 to +175
function toResponseData(response: MockResponse): GenerateResponseData {
if (typeof response === 'string') {
return {
message: { role: 'model', content: [{ text: response }] },
finishReason: 'stop',
};
}
if ('message' in response && response.message) {
return { finishReason: 'stop', ...(response as GenerateResponseData) };
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the user-defined respond function returns undefined or null (for example, if they only stream chunks or have a void callback), toResponseData will throw a TypeError: Cannot use 'in' operator to search for 'message' in undefined.

We should add a defensive check to handle falsy responses gracefully by returning an empty model response.

function toResponseData(response: MockResponse): GenerateResponseData {
  if (!response) {
    return {
      message: { role: 'model', content: [] },
      finishReason: 'stop',
    };
  }
  if (typeof response === 'string') {
    return {
      message: { role: 'model', content: [{ text: response }] },
      finishReason: 'stop',
    };
  }
  if (typeof response === 'object' && 'message' in response && response.message) {
    return { finishReason: 'stop', ...(response as GenerateResponseData) };
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config docs Improvements or additions to documentation js test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant