Duvo

An agentic automation platform. You write what you want done. An AI agent does it. You watch every step. Then we evaluate if it actually worked.

Built with Hono (backend), React + Vite (frontend), Vercel AI SDK, and Anthropic's Claude.

What it does

Chat interface: Send a prompt. Watch the agent think through it.
Step-by-step observability: See every reasoning token, every tool call, every API hit in real time.
Artifacts: The agent can generate files (CSV, JSON, etc.) and you can download them.
Context7 MCP: Optionally enable the agent to look up library docs during execution.
Automatic evaluation: After the agent finishes, we run a second Claude to check if it actually completed the task. If not, we auto-correct if possible.

Running it locally

Prerequisites: Node.js 20+, npm 10+, and an Anthropic API key.

# Clone the repo
git clone https://github.com/mario/duvo
cd duvo

# Run the setup script (creates .env, installs dependencies)
./scripts/dev.sh

# You'll be asked to add your ANTHROPIC_API_KEY to .env
# Do that, then run the script again

# The script starts both the API (port 3001) and web (port 5173)
# Open http://localhost:5173 in your browser

How it works

You send a message ("Fetch the latest AI news and save to CSV")
The client generates a unique runId and sends it to POST /api/chat
The backend streams the response using Vercel AI SDK + Claude
Simultaneously, you connect to GET /api/runs/:runId/events (Server-Sent Events) to watch the timeline
The agent runs up to 10 steps, calling tools as needed
After it finishes, an evaluation LLM scores the result
If the score is low, the agent retries with corrective context
The final evaluation and any artifacts are shown in the UI

Architecture

Backend (Hono, Node.js, port 3001):

In-memory RunStore for run state and events
streamText integration for AI agent execution
Tool definitions: fetch_web_page, fetch_hacker_news, save_to_csv
Evaluation council: automatic pass/fail check after each run
SSE endpoint: /api/runs/:runId/events for real-time observability

Frontend (React + Vite, port 5173):

useChat hook from Vercel AI SDK for message management
Custom useRunEvents hook: subscribes to SSE, displays timeline
ObservabilityPanel: real-time event stream with thinking blocks, tool calls, evaluation results
No auth, no backend state. Just chat and watch.

Context7 MCP

The agent can optionally use Context7 to look up library documentation. It's off by default. Enable the checkbox in the UI to let the agent fetch docs during execution. You'll see a purple "Context7" badge in the timeline when it does.

Why optional? External connections should be explicit. You should know when the agent is reaching out to external services.

Evaluation Council

After every run, a second Claude call evaluates whether the agent completed the task:

Passed: Score 70 or higher. Result is shown to you as-is.
Failed but auto-corrected: Score was low, agent retried, now it passes. You see "Auto-corrected before showing results."
Failed: Score is low and the agent couldn't auto-correct. You see the failure feedback and can refine your prompt.

The score isn't about perfect, it's about "did it do the job." For most tasks, we get it right the first time. When we don't, a retry often fixes it.

Development

Running tests:

npm test -w api

Architecture Decisions: See docs/adr/ for the reasoning behind each choice.

Tech stack at a glance

Layer	Tech	Why
Backend	Hono	Lightweight, great streaming, clean TypeScript
Frontend	React + Vite	Fast dev experience, clear separation of concerns
AI	Vercel AI SDK + Anthropic	Streaming, callbacks for observability, Sonnet for quality
Observability	SSE (Server-Sent Events)	Real-time, simpler than multiplexing
Evaluation	Claude (generateText)	Same quality as the agent, structured JSON output

What's not here (yet)

Persistent database (runs are in-memory)
User accounts / auth
Horizontal scaling (single-server only)
Production deployment config

These are things we can add when we have real users and real constraints. For an MVP, simpler is better.

Try it

Suggested first prompt:

Fetch the latest AI news from Hacker News and save the top 10 stories into a CSV file with columns for title, URL, and score.

Enable Context7 if you want the agent to look up library docs mid-execution.

Watch the timeline. See the thinking. See the tools. See the evaluation.

That's Duvo.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.usefinal		.usefinal
api		api
docs/adr		docs/adr
scripts		scripts
web		web
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
TASK.md		TASK.md
bun.lock		bun.lock
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duvo

What it does

Running it locally

How it works

Architecture

Context7 MCP

Evaluation Council

Development

Tech stack at a glance

What's not here (yet)

Try it

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Duvo

What it does

Running it locally

How it works

Architecture

Context7 MCP

Evaluation Council

Development

Tech stack at a glance

What's not here (yet)

Try it

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages