Oberon — an AI film operating system

One line in. A whole production plan out — shot list, scored model routing, continuity bible, second-by-second camera choreography, synced dialogue, genre typography, and a deterministic title/caption render lane. Pure, deterministic, BYOK.

Oberon doesn't stop at "prompt → clip." It runs the part a real film crew runs: it breaks a brief into sequences → scenes → beats → shots, locks continuity, picks the right video model per shot with a transparent score, and composites titles/subtitles with code instead of begging a diffusion model to spell. The human supplies direction and the cost/rights gate; the agents do the decomposition, routing, and assembly.

The planning engine is pure, dependency-free TypeScript and deterministic — the same brief always yields the same plan, no API key required. Generation backends (Veo, Seedance, Runway, Luma, …) sit behind an adapter boundary so a sunset model is a one-line swap.

brief.json ──► oberon plan ──► production.json  (16 shots, routing decisions, prompts,
                                                 continuity bible, typography, subtitles)
                    │
                    ├─ oberon route   → why each shot got the model it got (7-dim scores)
                    ├─ oberon export  → shot-list CSV, prompt pack, EDL, SRT/VTT, …
                    └─ oberon titles  → burn title cards + lower-thirds + subtitles into a cut

Quickstart

git clone https://github.com/agentlas-ai/oberon.git
cd oberon
npm install            # builds via the prepare script (zero runtime deps for planning)

# 1) one-line brief → full production plan (no API key, fully deterministic)
node bin/oberon.js plan examples/brief.commercial.json -o production.json

# 2) see WHY each shot got the model it got
node bin/oberon.js route examples/brief.commercial.json

# 3) export usable artifacts for any external video tool
node bin/oberon.js export production.json --all -o out/

# 4) (optional) deterministically burn titles/subtitles into a finished cut
#    needs ffmpeg on PATH + a headless browser:
npm i -D playwright && npx playwright install chromium
node bin/oberon.js titles my_cut.mp4 production.json -o out/

Try a built-in preset instead of writing a brief:

node bin/oberon.js presets
node bin/oberon.js plan "MIDNIGHT BLOOM" -o production.json

plan, route, and export need no API keys and no network. Only titles (ffmpeg + headless Chromium) and actual video generation touch the outside world.

What you get from one brief

oberon plan produces a FilmProduction: a hierarchy of sequences/scenes/beats/shots where every shot carries:

Camera — size / angle / movement / lens, plus a motionBeats[] second-by-second choreography (entry → develop → cut handle) with speed ramps.
A generation prompt re-synthesised from continuity + choreography + audio direction — the one string that flows to both the keyframe image and the video model.
A routing decision — the chosen provider, runner-up, margin, and the full 7-dimension score breakdown (see below).
Continuity — a global bible (locked character/wardrobe/prop traits + do-not-change list) and a sequential carry chain (each shot inherits the prior shot's exit state; 180°, eyeline, 30° and match-on-action rules applied).
Dialogue & audio — structured lines (speaker, emotion, delivery) with native-audio lip-sync direction, plus an ambience/SFX/music bed.
Typography & subtitles — a genre/mood-matched font kit and post burn-in SRT/VTT cues (never baked into the generated frame).

The three headline upgrades

1. Scored provider routing (not prose heuristics)

Most pipelines route with a first-match if ladder ("has dialogue? → Veo"). Oberon scores every candidate model on 7 weighted dimensions and picks the best, then keeps the receipt:

dimension	what it measures
`task_fit`	how well the model matches this shot's hard requirements (dialogue, precise keyframes, motion)
`quality`	absolute fidelity / realism
`control`	precise control surface (first/last frame, references, editing tooling)
`reliability`	output stability + tooling maturity
`cost`	cheaper scores higher (zeroed in `premium`)
`latency`	faster scores higher
`continuity`	identity / prop consistency across shots

A balanced profile (default) favours task-fit, reliability and cost so work doesn't collapse onto a single max-quality model; a premium profile zeroes cost and lets quality/continuity dominate. Hero shots (dialogue lip-sync, precise keyframe close-ups) shift the cost weight into task-fit so the right specialist wins even when it's pricier.

$ oberon route examples/brief.commercial.json

● sc01_bt01_sh001  →  Seedance 2.0 (80.2)   ⚠ close call
    ▶ Seedance 2.0   80.2 (task-fit 0.84 · control 0.82 · quality 1.00)
      Luma Ray 2     79.9 (task-fit 0.80 · cost 1.00 · reliability 0.85)
      Google Veo 3.1 69.6 (task-fit 0.82 · reliability 0.88 · control 0.90)
    ⚠ winner/runner-up margin 0.3 — Luma Ray 2 is a viable alternative

Every decision is auditable, and close calls (<4 pt margin) are flagged so a human or a downstream agent can override.

2. Deterministic title / caption render lane

Generative video models can't reliably render text — so Oberon doesn't ask them to. Titles, lower-thirds and subtitles are composited by code:

TypographyKit → HTML → headless Chromium PNG → ffmpeg overlay / concat → *_titled.mp4

It's deterministic — same input, same output — and crucially it uses only ffmpeg core filters (overlay, concat, color, fade), not drawtext/subtitles, which many ffmpeg builds (Homebrew included) ship without (libfreetype/libass not compiled in). So the text lane works on any ffmpeg, on any platform, and renders Korean/CJK and web fonts perfectly because Chromium does the typesetting. The clean master.mp4 stays text-free; the burned version ships as an additive *_titled.mp4.

The rasterizer is injectable — ship with playwrightRasterizer(), electronRasterizer(), or your own (puppeteer, etc.) to the same one-function RasterizeFn shape.

3. Terminal-first

The whole engine is a CLI. Author a brief (by hand, or let an agent fill the shot prompts), then plan / route / export / titles from a shell — no GUI required. The manifest is the contract and the agent writes the content: pipe a brief through oberon plan, let a coding assistant fill the shot prompts, and the rest of the pipeline is deterministic.

Architecture

A gated P0–P10 pipeline; each stage is an agent with a quality gate before the next unlocks (see agent/AGENT.md for the full contract and per-agent system prompts):

P0  Creative Brief      genre/tone/refs → visual DNA
P1  Script & Beats      runtime → sequences → scenes → beats
P2  Shot Planner (DP)   coverage patterns + second-by-second camera choreography
P3  Continuity Bible    global locks + sequential shot-to-shot memory chain
P4  Keyframe Director    first/last frame locks (identity before motion)
──  [HUMAN GATE]  Cost / Rights / Safety   hard stop before expensive generation
P5  Provider Router     7-dimension scored routing + decision log     ← upgrade #1
P6  Generation Worker   submit / poll / retry; 2–5 takes per shot
P7  Vision QA           8-axis scoring + auto-retry (provider swap on identity fail)
P8  Editor / Timeline   best-take selection → EDL, J/L-cuts, match cuts
P9  Audio               dialogue/ambience/SFX/music + SRT/VTT cues
P10 Delivery            multi-aspect masters + deterministic title burn-in   ← upgrade #2

The moat — what generic "text-to-video" wrappers don't have: a Continuity Bible + 8-axis Vision QA with automatic provider-swap on identity failure, a human cost/rights/safety hard stop, multi-aspect simultaneous delivery, and now scored routing + a deterministic text lane.

Repo layout

src/
  engine.ts          planProduction() — the deterministic core
  providers.ts       provider profiles + 7-dimension scorer (routeVideoProvider)
  typography.ts      curated font library + genre pairings → TypographyKit
  directing.ts       camera movement → second-by-second motion beats
  continuity-chain.ts shot-to-shot carry/exit memory + film grammar rules
  audio-dialogue.ts  structured dialogue + native-audio direction + SRT/VTT
  title-spec.ts      FilmProduction → OberonTitleSpec (burn-in input)
  oberon-titles.ts   deterministic HTML builders for cards/overlays
  render/
    titlecards.ts    HTML → PNG → ffmpeg overlay/concat compositor
    rasterizers.ts   playwright / electron headless rasterizers (injectable)
  exporters.ts       shot-list CSV, prompt pack, bible MD, EDL, SRT/VTT, routing matrix
  cli.ts             the `oberon` command
agent/               portable agent contract (AGENT.md, routing-card.json, memory.md)
examples/            a ready-to-run brief

Providers

Video models behind the router (all BYOK — keys live in the host's vault, never in this repo; only key names are referenced):

model	strengths	typical use	~cost / 8s
Google Veo 3.1	native synced audio, first/last frame, 4K, identity	dialogue/hero shots, precise cuts	$3.20
Seedance 2.0	top fidelity, 12-asset multi-ref, native audio	premium/multi-ref takes	$1.40
Runway Gen-4.5	best creative tooling (references, Aleph v2v)	general/reference-driven takes	$1.00
Luma Ray 2	fast, cheap, strong camera motion + physics	previs, high-movement drafts	$0.64

Image/keyframe models: Nano Banana Pro, Imagen 4, gpt-image-1.5, Firefly. Numbers are 2026 research estimates and are easy to update in src/providers.ts.

Design principles

Local-first, vendor-neutral, agent-drivable. Generation backends sit behind an adapter boundary; a sunset model is a one-line swap.
Deterministic by default. Planning, routing, typography and the title lane are pure functions — reproducible, auditable, testable without a single API call.
BYOK and credential-free. No keys live in this repo; the host runtime supplies them at run time by name.
The clean master is sacred. Text is composited as an additive layer, never baked into a generated frame.

Roadmap

Programmatic timeline verbs (trim / split / reorder) with a QA → edit feedback loop.
Local vendor-neutral voice engine for dialogue / narration.
Lint-before-render checks on the title lane (track overlap / safe-area / missing-glyph).
Bundled web fonts so title cards render the exact kit fonts offline (today: system fallbacks).

License

Apache-2.0. Part of the Agentlas agent ecosystem. The planning engine carries no credentials and no private data — generation backends are supplied by the host runtime's secret vault at run time.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
agent		agent
bin		bin
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Oberon — an AI film operating system

Quickstart

What you get from one brief

The three headline upgrades

1. Scored provider routing (not prose heuristics)

2. Deterministic title / caption render lane

3. Terminal-first

Architecture

Repo layout

Providers

Design principles

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Oberon — an AI film operating system

Quickstart

What you get from one brief

The three headline upgrades

1. Scored provider routing (not prose heuristics)

2. Deterministic title / caption render lane

3. Terminal-first

Architecture

Repo layout

Providers

Design principles

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages