One line in. A whole production plan out — shot list, scored model routing, continuity bible, second-by-second camera choreography, synced dialogue, genre typography, and a deterministic title/caption render lane. Pure, deterministic, BYOK.
Oberon doesn't stop at "prompt → clip." It runs the part a real film crew runs: it breaks a brief into sequences → scenes → beats → shots, locks continuity, picks the right video model per shot with a transparent score, and composites titles/subtitles with code instead of begging a diffusion model to spell. The human supplies direction and the cost/rights gate; the agents do the decomposition, routing, and assembly.
The planning engine is pure, dependency-free TypeScript and deterministic — the same brief always yields the same plan, no API key required. Generation backends (Veo, Seedance, Runway, Luma, …) sit behind an adapter boundary so a sunset model is a one-line swap.
brief.json ──► oberon plan ──► production.json (16 shots, routing decisions, prompts,
continuity bible, typography, subtitles)
│
├─ oberon route → why each shot got the model it got (7-dim scores)
├─ oberon export → shot-list CSV, prompt pack, EDL, SRT/VTT, …
└─ oberon titles → burn title cards + lower-thirds + subtitles into a cut
git clone https://github.com/agentlas-ai/oberon.git
cd oberon
npm install # builds via the prepare script (zero runtime deps for planning)
# 1) one-line brief → full production plan (no API key, fully deterministic)
node bin/oberon.js plan examples/brief.commercial.json -o production.json
# 2) see WHY each shot got the model it got
node bin/oberon.js route examples/brief.commercial.json
# 3) export usable artifacts for any external video tool
node bin/oberon.js export production.json --all -o out/
# 4) (optional) deterministically burn titles/subtitles into a finished cut
# needs ffmpeg on PATH + a headless browser:
npm i -D playwright && npx playwright install chromium
node bin/oberon.js titles my_cut.mp4 production.json -o out/Try a built-in preset instead of writing a brief:
node bin/oberon.js presets
node bin/oberon.js plan "MIDNIGHT BLOOM" -o production.jsonplan, route, and export need no API keys and no network. Only titles (ffmpeg +
headless Chromium) and actual video generation touch the outside world.
oberon plan produces a FilmProduction: a hierarchy of sequences/scenes/beats/shots where
every shot carries:
- Camera — size / angle / movement / lens, plus a
motionBeats[]second-by-second choreography (entry → develop → cut handle) with speed ramps. - A generation prompt re-synthesised from continuity + choreography + audio direction — the one string that flows to both the keyframe image and the video model.
- A routing decision — the chosen provider, runner-up, margin, and the full 7-dimension score breakdown (see below).
- Continuity — a global bible (locked character/wardrobe/prop traits + do-not-change list) and a sequential carry chain (each shot inherits the prior shot's exit state; 180°, eyeline, 30° and match-on-action rules applied).
- Dialogue & audio — structured lines (speaker, emotion, delivery) with native-audio lip-sync direction, plus an ambience/SFX/music bed.
- Typography & subtitles — a genre/mood-matched font kit and post burn-in SRT/VTT cues (never baked into the generated frame).
Most pipelines route with a first-match if ladder ("has dialogue? → Veo"). Oberon scores
every candidate model on 7 weighted dimensions and picks the best, then keeps the
receipt:
| dimension | what it measures |
|---|---|
task_fit |
how well the model matches this shot's hard requirements (dialogue, precise keyframes, motion) |
quality |
absolute fidelity / realism |
control |
precise control surface (first/last frame, references, editing tooling) |
reliability |
output stability + tooling maturity |
cost |
cheaper scores higher (zeroed in premium) |
latency |
faster scores higher |
continuity |
identity / prop consistency across shots |
A balanced profile (default) favours task-fit, reliability and cost so work doesn't collapse
onto a single max-quality model; a premium profile zeroes cost and lets quality/continuity
dominate. Hero shots (dialogue lip-sync, precise keyframe close-ups) shift the cost weight
into task-fit so the right specialist wins even when it's pricier.
$ oberon route examples/brief.commercial.json
● sc01_bt01_sh001 → Seedance 2.0 (80.2) ⚠ close call
▶ Seedance 2.0 80.2 (task-fit 0.84 · control 0.82 · quality 1.00)
Luma Ray 2 79.9 (task-fit 0.80 · cost 1.00 · reliability 0.85)
Google Veo 3.1 69.6 (task-fit 0.82 · reliability 0.88 · control 0.90)
⚠ winner/runner-up margin 0.3 — Luma Ray 2 is a viable alternative
Every decision is auditable, and close calls (<4 pt margin) are flagged so a human or a downstream agent can override.
Generative video models can't reliably render text — so Oberon doesn't ask them to. Titles, lower-thirds and subtitles are composited by code:
TypographyKit → HTML → headless Chromium PNG → ffmpeg overlay / concat → *_titled.mp4
It's deterministic — same input, same output — and crucially it uses only ffmpeg core
filters (overlay, concat, color, fade), not drawtext/subtitles, which many
ffmpeg builds (Homebrew included) ship without (libfreetype/libass not compiled in). So the
text lane works on any ffmpeg, on any platform, and renders Korean/CJK and web fonts perfectly
because Chromium does the typesetting. The clean master.mp4 stays text-free; the burned
version ships as an additive *_titled.mp4.
The rasterizer is injectable — ship with playwrightRasterizer(), electronRasterizer(), or
your own (puppeteer, etc.) to the same one-function RasterizeFn shape.
The whole engine is a CLI. Author a brief (by hand, or let an agent fill the shot prompts),
then plan / route / export / titles from a shell — no GUI required. The manifest is the
contract and the agent writes the content: pipe a brief through oberon plan, let a coding
assistant fill the shot prompts, and the rest of the pipeline is deterministic.
A gated P0–P10 pipeline; each stage is an agent with a quality gate before the next unlocks
(see agent/AGENT.md for the full contract and per-agent system prompts):
P0 Creative Brief genre/tone/refs → visual DNA
P1 Script & Beats runtime → sequences → scenes → beats
P2 Shot Planner (DP) coverage patterns + second-by-second camera choreography
P3 Continuity Bible global locks + sequential shot-to-shot memory chain
P4 Keyframe Director first/last frame locks (identity before motion)
── [HUMAN GATE] Cost / Rights / Safety hard stop before expensive generation
P5 Provider Router 7-dimension scored routing + decision log ← upgrade #1
P6 Generation Worker submit / poll / retry; 2–5 takes per shot
P7 Vision QA 8-axis scoring + auto-retry (provider swap on identity fail)
P8 Editor / Timeline best-take selection → EDL, J/L-cuts, match cuts
P9 Audio dialogue/ambience/SFX/music + SRT/VTT cues
P10 Delivery multi-aspect masters + deterministic title burn-in ← upgrade #2
The moat — what generic "text-to-video" wrappers don't have: a Continuity Bible + 8-axis Vision QA with automatic provider-swap on identity failure, a human cost/rights/safety hard stop, multi-aspect simultaneous delivery, and now scored routing + a deterministic text lane.
src/
engine.ts planProduction() — the deterministic core
providers.ts provider profiles + 7-dimension scorer (routeVideoProvider)
typography.ts curated font library + genre pairings → TypographyKit
directing.ts camera movement → second-by-second motion beats
continuity-chain.ts shot-to-shot carry/exit memory + film grammar rules
audio-dialogue.ts structured dialogue + native-audio direction + SRT/VTT
title-spec.ts FilmProduction → OberonTitleSpec (burn-in input)
oberon-titles.ts deterministic HTML builders for cards/overlays
render/
titlecards.ts HTML → PNG → ffmpeg overlay/concat compositor
rasterizers.ts playwright / electron headless rasterizers (injectable)
exporters.ts shot-list CSV, prompt pack, bible MD, EDL, SRT/VTT, routing matrix
cli.ts the `oberon` command
agent/ portable agent contract (AGENT.md, routing-card.json, memory.md)
examples/ a ready-to-run brief
Video models behind the router (all BYOK — keys live in the host's vault, never in this repo; only key names are referenced):
| model | strengths | typical use | ~cost / 8s |
|---|---|---|---|
| Google Veo 3.1 | native synced audio, first/last frame, 4K, identity | dialogue/hero shots, precise cuts | $3.20 |
| Seedance 2.0 | top fidelity, 12-asset multi-ref, native audio | premium/multi-ref takes | $1.40 |
| Runway Gen-4.5 | best creative tooling (references, Aleph v2v) | general/reference-driven takes | $1.00 |
| Luma Ray 2 | fast, cheap, strong camera motion + physics | previs, high-movement drafts | $0.64 |
Image/keyframe models: Nano Banana Pro, Imagen 4, gpt-image-1.5, Firefly. Numbers are 2026
research estimates and are easy to update in src/providers.ts.
- Local-first, vendor-neutral, agent-drivable. Generation backends sit behind an adapter boundary; a sunset model is a one-line swap.
- Deterministic by default. Planning, routing, typography and the title lane are pure functions — reproducible, auditable, testable without a single API call.
- BYOK and credential-free. No keys live in this repo; the host runtime supplies them at run time by name.
- The clean master is sacred. Text is composited as an additive layer, never baked into a generated frame.
- Programmatic timeline verbs (trim / split / reorder) with a QA → edit feedback loop.
- Local vendor-neutral voice engine for dialogue / narration.
- Lint-before-render checks on the title lane (track overlap / safe-area / missing-glyph).
- Bundled web fonts so title cards render the exact kit fonts offline (today: system fallbacks).
Apache-2.0. Part of the Agentlas agent ecosystem. The planning engine carries no credentials and no private data — generation backends are supplied by the host runtime's secret vault at run time.