Skip to content

mario/duvo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Duvo

An agentic automation platform. You write what you want done. An AI agent does it. You watch every step. Then we evaluate if it actually worked.

Built with Hono (backend), React + Vite (frontend), Vercel AI SDK, and Anthropic's Claude.

What it does

  • Chat interface: Send a prompt. Watch the agent think through it.
  • Step-by-step observability: See every reasoning token, every tool call, every API hit in real time.
  • Artifacts: The agent can generate files (CSV, JSON, etc.) and you can download them.
  • Context7 MCP: Optionally enable the agent to look up library docs during execution.
  • Automatic evaluation: After the agent finishes, we run a second Claude to check if it actually completed the task. If not, we auto-correct if possible.

Running it locally

Prerequisites: Node.js 20+, npm 10+, and an Anthropic API key.

# Clone the repo
git clone https://github.com/mario/duvo
cd duvo

# Run the setup script (creates .env, installs dependencies)
./scripts/dev.sh

# You'll be asked to add your ANTHROPIC_API_KEY to .env
# Do that, then run the script again

# The script starts both the API (port 3001) and web (port 5173)
# Open http://localhost:5173 in your browser

How it works

  1. You send a message ("Fetch the latest AI news and save to CSV")
  2. The client generates a unique runId and sends it to POST /api/chat
  3. The backend streams the response using Vercel AI SDK + Claude
  4. Simultaneously, you connect to GET /api/runs/:runId/events (Server-Sent Events) to watch the timeline
  5. The agent runs up to 10 steps, calling tools as needed
  6. After it finishes, an evaluation LLM scores the result
  7. If the score is low, the agent retries with corrective context
  8. The final evaluation and any artifacts are shown in the UI

Architecture

Backend (Hono, Node.js, port 3001):

  • In-memory RunStore for run state and events
  • streamText integration for AI agent execution
  • Tool definitions: fetch_web_page, fetch_hacker_news, save_to_csv
  • Evaluation council: automatic pass/fail check after each run
  • SSE endpoint: /api/runs/:runId/events for real-time observability

Frontend (React + Vite, port 5173):

  • useChat hook from Vercel AI SDK for message management
  • Custom useRunEvents hook: subscribes to SSE, displays timeline
  • ObservabilityPanel: real-time event stream with thinking blocks, tool calls, evaluation results
  • No auth, no backend state. Just chat and watch.

Context7 MCP

The agent can optionally use Context7 to look up library documentation. It's off by default. Enable the checkbox in the UI to let the agent fetch docs during execution. You'll see a purple "Context7" badge in the timeline when it does.

Why optional? External connections should be explicit. You should know when the agent is reaching out to external services.

Evaluation Council

After every run, a second Claude call evaluates whether the agent completed the task:

  • Passed: Score 70 or higher. Result is shown to you as-is.
  • Failed but auto-corrected: Score was low, agent retried, now it passes. You see "Auto-corrected before showing results."
  • Failed: Score is low and the agent couldn't auto-correct. You see the failure feedback and can refine your prompt.

The score isn't about perfect, it's about "did it do the job." For most tasks, we get it right the first time. When we don't, a retry often fixes it.

Development

Running tests:

npm test -w api

Architecture Decisions: See docs/adr/ for the reasoning behind each choice.

Tech stack at a glance

Layer Tech Why
Backend Hono Lightweight, great streaming, clean TypeScript
Frontend React + Vite Fast dev experience, clear separation of concerns
AI Vercel AI SDK + Anthropic Streaming, callbacks for observability, Sonnet for quality
Observability SSE (Server-Sent Events) Real-time, simpler than multiplexing
Evaluation Claude (generateText) Same quality as the agent, structured JSON output

What's not here (yet)

  • Persistent database (runs are in-memory)
  • User accounts / auth
  • Horizontal scaling (single-server only)
  • Production deployment config

These are things we can add when we have real users and real constraints. For an MVP, simpler is better.

Try it

Suggested first prompt:

Fetch the latest AI news from Hacker News and save the top 10 stories into a CSV file with columns for title, URL, and score.

Enable Context7 if you want the agent to look up library docs mid-execution.

Watch the timeline. See the thinking. See the tools. See the evaluation.

That's Duvo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages