Skip to content

agentlas-ai/oberon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oberon — an AI film operating system

One line in. A whole production plan out — shot list, scored model routing, continuity bible, second-by-second camera choreography, synced dialogue, genre typography, and a deterministic title/caption render lane. Pure, deterministic, BYOK.

Oberon doesn't stop at "prompt → clip." It runs the part a real film crew runs: it breaks a brief into sequences → scenes → beats → shots, locks continuity, picks the right video model per shot with a transparent score, and composites titles/subtitles with code instead of begging a diffusion model to spell. The human supplies direction and the cost/rights gate; the agents do the decomposition, routing, and assembly.

The planning engine is pure, dependency-free TypeScript and deterministic — the same brief always yields the same plan, no API key required. Generation backends (Veo, Seedance, Runway, Luma, …) sit behind an adapter boundary so a sunset model is a one-line swap.

brief.json ──► oberon plan ──► production.json  (16 shots, routing decisions, prompts,
                                                 continuity bible, typography, subtitles)
                    │
                    ├─ oberon route   → why each shot got the model it got (7-dim scores)
                    ├─ oberon export  → shot-list CSV, prompt pack, EDL, SRT/VTT, …
                    └─ oberon titles  → burn title cards + lower-thirds + subtitles into a cut

Quickstart

git clone https://github.com/agentlas-ai/oberon.git
cd oberon
npm install            # builds via the prepare script (zero runtime deps for planning)

# 1) one-line brief → full production plan (no API key, fully deterministic)
node bin/oberon.js plan examples/brief.commercial.json -o production.json

# 2) see WHY each shot got the model it got
node bin/oberon.js route examples/brief.commercial.json

# 3) export usable artifacts for any external video tool
node bin/oberon.js export production.json --all -o out/

# 4) (optional) deterministically burn titles/subtitles into a finished cut
#    needs ffmpeg on PATH + a headless browser:
npm i -D playwright && npx playwright install chromium
node bin/oberon.js titles my_cut.mp4 production.json -o out/

Try a built-in preset instead of writing a brief:

node bin/oberon.js presets
node bin/oberon.js plan "MIDNIGHT BLOOM" -o production.json

plan, route, and export need no API keys and no network. Only titles (ffmpeg + headless Chromium) and actual video generation touch the outside world.


What you get from one brief

oberon plan produces a FilmProduction: a hierarchy of sequences/scenes/beats/shots where every shot carries:

  • Camera — size / angle / movement / lens, plus a motionBeats[] second-by-second choreography (entry → develop → cut handle) with speed ramps.
  • A generation prompt re-synthesised from continuity + choreography + audio direction — the one string that flows to both the keyframe image and the video model.
  • A routing decision — the chosen provider, runner-up, margin, and the full 7-dimension score breakdown (see below).
  • Continuity — a global bible (locked character/wardrobe/prop traits + do-not-change list) and a sequential carry chain (each shot inherits the prior shot's exit state; 180°, eyeline, 30° and match-on-action rules applied).
  • Dialogue & audio — structured lines (speaker, emotion, delivery) with native-audio lip-sync direction, plus an ambience/SFX/music bed.
  • Typography & subtitles — a genre/mood-matched font kit and post burn-in SRT/VTT cues (never baked into the generated frame).

The three headline upgrades

1. Scored provider routing (not prose heuristics)

Most pipelines route with a first-match if ladder ("has dialogue? → Veo"). Oberon scores every candidate model on 7 weighted dimensions and picks the best, then keeps the receipt:

dimension what it measures
task_fit how well the model matches this shot's hard requirements (dialogue, precise keyframes, motion)
quality absolute fidelity / realism
control precise control surface (first/last frame, references, editing tooling)
reliability output stability + tooling maturity
cost cheaper scores higher (zeroed in premium)
latency faster scores higher
continuity identity / prop consistency across shots

A balanced profile (default) favours task-fit, reliability and cost so work doesn't collapse onto a single max-quality model; a premium profile zeroes cost and lets quality/continuity dominate. Hero shots (dialogue lip-sync, precise keyframe close-ups) shift the cost weight into task-fit so the right specialist wins even when it's pricier.

$ oberon route examples/brief.commercial.json

● sc01_bt01_sh001  →  Seedance 2.0 (80.2)   ⚠ close call
    ▶ Seedance 2.0   80.2 (task-fit 0.84 · control 0.82 · quality 1.00)
      Luma Ray 2     79.9 (task-fit 0.80 · cost 1.00 · reliability 0.85)
      Google Veo 3.1 69.6 (task-fit 0.82 · reliability 0.88 · control 0.90)
    ⚠ winner/runner-up margin 0.3 — Luma Ray 2 is a viable alternative

Every decision is auditable, and close calls (<4 pt margin) are flagged so a human or a downstream agent can override.

2. Deterministic title / caption render lane

Generative video models can't reliably render text — so Oberon doesn't ask them to. Titles, lower-thirds and subtitles are composited by code:

TypographyKit → HTML → headless Chromium PNG → ffmpeg overlay / concat → *_titled.mp4

It's deterministic — same input, same output — and crucially it uses only ffmpeg core filters (overlay, concat, color, fade), not drawtext/subtitles, which many ffmpeg builds (Homebrew included) ship without (libfreetype/libass not compiled in). So the text lane works on any ffmpeg, on any platform, and renders Korean/CJK and web fonts perfectly because Chromium does the typesetting. The clean master.mp4 stays text-free; the burned version ships as an additive *_titled.mp4.

The rasterizer is injectable — ship with playwrightRasterizer(), electronRasterizer(), or your own (puppeteer, etc.) to the same one-function RasterizeFn shape.

3. Terminal-first

The whole engine is a CLI. Author a brief (by hand, or let an agent fill the shot prompts), then plan / route / export / titles from a shell — no GUI required. The manifest is the contract and the agent writes the content: pipe a brief through oberon plan, let a coding assistant fill the shot prompts, and the rest of the pipeline is deterministic.


Architecture

A gated P0–P10 pipeline; each stage is an agent with a quality gate before the next unlocks (see agent/AGENT.md for the full contract and per-agent system prompts):

P0  Creative Brief      genre/tone/refs → visual DNA
P1  Script & Beats      runtime → sequences → scenes → beats
P2  Shot Planner (DP)   coverage patterns + second-by-second camera choreography
P3  Continuity Bible    global locks + sequential shot-to-shot memory chain
P4  Keyframe Director    first/last frame locks (identity before motion)
──  [HUMAN GATE]  Cost / Rights / Safety   hard stop before expensive generation
P5  Provider Router     7-dimension scored routing + decision log     ← upgrade #1
P6  Generation Worker   submit / poll / retry; 2–5 takes per shot
P7  Vision QA           8-axis scoring + auto-retry (provider swap on identity fail)
P8  Editor / Timeline   best-take selection → EDL, J/L-cuts, match cuts
P9  Audio               dialogue/ambience/SFX/music + SRT/VTT cues
P10 Delivery            multi-aspect masters + deterministic title burn-in   ← upgrade #2

The moat — what generic "text-to-video" wrappers don't have: a Continuity Bible + 8-axis Vision QA with automatic provider-swap on identity failure, a human cost/rights/safety hard stop, multi-aspect simultaneous delivery, and now scored routing + a deterministic text lane.

Repo layout

src/
  engine.ts          planProduction() — the deterministic core
  providers.ts       provider profiles + 7-dimension scorer (routeVideoProvider)
  typography.ts      curated font library + genre pairings → TypographyKit
  directing.ts       camera movement → second-by-second motion beats
  continuity-chain.ts shot-to-shot carry/exit memory + film grammar rules
  audio-dialogue.ts  structured dialogue + native-audio direction + SRT/VTT
  title-spec.ts      FilmProduction → OberonTitleSpec (burn-in input)
  oberon-titles.ts   deterministic HTML builders for cards/overlays
  render/
    titlecards.ts    HTML → PNG → ffmpeg overlay/concat compositor
    rasterizers.ts   playwright / electron headless rasterizers (injectable)
  exporters.ts       shot-list CSV, prompt pack, bible MD, EDL, SRT/VTT, routing matrix
  cli.ts             the `oberon` command
agent/               portable agent contract (AGENT.md, routing-card.json, memory.md)
examples/            a ready-to-run brief

Providers

Video models behind the router (all BYOK — keys live in the host's vault, never in this repo; only key names are referenced):

model strengths typical use ~cost / 8s
Google Veo 3.1 native synced audio, first/last frame, 4K, identity dialogue/hero shots, precise cuts $3.20
Seedance 2.0 top fidelity, 12-asset multi-ref, native audio premium/multi-ref takes $1.40
Runway Gen-4.5 best creative tooling (references, Aleph v2v) general/reference-driven takes $1.00
Luma Ray 2 fast, cheap, strong camera motion + physics previs, high-movement drafts $0.64

Image/keyframe models: Nano Banana Pro, Imagen 4, gpt-image-1.5, Firefly. Numbers are 2026 research estimates and are easy to update in src/providers.ts.


Design principles

  • Local-first, vendor-neutral, agent-drivable. Generation backends sit behind an adapter boundary; a sunset model is a one-line swap.
  • Deterministic by default. Planning, routing, typography and the title lane are pure functions — reproducible, auditable, testable without a single API call.
  • BYOK and credential-free. No keys live in this repo; the host runtime supplies them at run time by name.
  • The clean master is sacred. Text is composited as an additive layer, never baked into a generated frame.

Roadmap

  • Programmatic timeline verbs (trim / split / reorder) with a QA → edit feedback loop.
  • Local vendor-neutral voice engine for dialogue / narration.
  • Lint-before-render checks on the title lane (track overlap / safe-area / missing-glyph).
  • Bundled web fonts so title cards render the exact kit fonts offline (today: system fallbacks).

License

Apache-2.0. Part of the Agentlas agent ecosystem. The planning engine carries no credentials and no private data — generation backends are supplied by the host runtime's secret vault at run time.

About

Oberon — an AI film operating system: one-line brief into a full multi-shot production plan with 7-dimension scored model routing, continuity bible, camera choreography, and a deterministic title/caption render lane. Pure, deterministic, BYOK.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors