An agentic automation platform. You write what you want done. An AI agent does it. You watch every step. Then we evaluate if it actually worked.
Built with Hono (backend), React + Vite (frontend), Vercel AI SDK, and Anthropic's Claude.
- Chat interface: Send a prompt. Watch the agent think through it.
- Step-by-step observability: See every reasoning token, every tool call, every API hit in real time.
- Artifacts: The agent can generate files (CSV, JSON, etc.) and you can download them.
- Context7 MCP: Optionally enable the agent to look up library docs during execution.
- Automatic evaluation: After the agent finishes, we run a second Claude to check if it actually completed the task. If not, we auto-correct if possible.
Prerequisites: Node.js 20+, npm 10+, and an Anthropic API key.
# Clone the repo
git clone https://github.com/mario/duvo
cd duvo
# Run the setup script (creates .env, installs dependencies)
./scripts/dev.sh
# You'll be asked to add your ANTHROPIC_API_KEY to .env
# Do that, then run the script again
# The script starts both the API (port 3001) and web (port 5173)
# Open http://localhost:5173 in your browser- You send a message ("Fetch the latest AI news and save to CSV")
- The client generates a unique
runIdand sends it toPOST /api/chat - The backend streams the response using Vercel AI SDK + Claude
- Simultaneously, you connect to
GET /api/runs/:runId/events(Server-Sent Events) to watch the timeline - The agent runs up to 10 steps, calling tools as needed
- After it finishes, an evaluation LLM scores the result
- If the score is low, the agent retries with corrective context
- The final evaluation and any artifacts are shown in the UI
Backend (Hono, Node.js, port 3001):
- In-memory
RunStorefor run state and events streamTextintegration for AI agent execution- Tool definitions:
fetch_web_page,fetch_hacker_news,save_to_csv - Evaluation council: automatic pass/fail check after each run
- SSE endpoint:
/api/runs/:runId/eventsfor real-time observability
Frontend (React + Vite, port 5173):
useChathook from Vercel AI SDK for message management- Custom
useRunEventshook: subscribes to SSE, displays timeline - ObservabilityPanel: real-time event stream with thinking blocks, tool calls, evaluation results
- No auth, no backend state. Just chat and watch.
The agent can optionally use Context7 to look up library documentation. It's off by default. Enable the checkbox in the UI to let the agent fetch docs during execution. You'll see a purple "Context7" badge in the timeline when it does.
Why optional? External connections should be explicit. You should know when the agent is reaching out to external services.
After every run, a second Claude call evaluates whether the agent completed the task:
- Passed: Score 70 or higher. Result is shown to you as-is.
- Failed but auto-corrected: Score was low, agent retried, now it passes. You see "Auto-corrected before showing results."
- Failed: Score is low and the agent couldn't auto-correct. You see the failure feedback and can refine your prompt.
The score isn't about perfect, it's about "did it do the job." For most tasks, we get it right the first time. When we don't, a retry often fixes it.
Running tests:
npm test -w apiArchitecture Decisions: See docs/adr/ for the reasoning behind each choice.
| Layer | Tech | Why |
|---|---|---|
| Backend | Hono | Lightweight, great streaming, clean TypeScript |
| Frontend | React + Vite | Fast dev experience, clear separation of concerns |
| AI | Vercel AI SDK + Anthropic | Streaming, callbacks for observability, Sonnet for quality |
| Observability | SSE (Server-Sent Events) | Real-time, simpler than multiplexing |
| Evaluation | Claude (generateText) | Same quality as the agent, structured JSON output |
- Persistent database (runs are in-memory)
- User accounts / auth
- Horizontal scaling (single-server only)
- Production deployment config
These are things we can add when we have real users and real constraints. For an MVP, simpler is better.
Suggested first prompt:
Fetch the latest AI news from Hacker News and save the top 10 stories into a CSV file with columns for title, URL, and score.
Enable Context7 if you want the agent to look up library docs mid-execution.
Watch the timeline. See the thinking. See the tools. See the evaluation.
That's Duvo.