Local terminal Deep Agent for pharma marketing and sales analysis.
This MVP is intentionally simple:
- single-user local workflow,
- CSV dummy data generated with realistic patterns,
- SQLite backend,
- orchestrator + subagents (Dexavir, Nabinix),
- shared datasets only (no brand-unique datasets in MVP),
- SQL tools with practical error handling.
- KPI trend and change analysis (
TRx,NBRx, calls, impressions, engagement). - Brand-specific and cross-period questions.
- Schema exploration questions (table/column discovery).
- Call plan metadata questions (for example latest cycle by brand).
pharma-insights-agent/
AGENTS.md
agent.py
subagents/brands.yaml
tools/sql_tools.py
skills/
rx-dataset/SKILL.md
veeva-call-dataset/SKILL.md
digital-engagement-dataset/SKILL.md
call-plan-dataset/SKILL.md
finance-dataset/SKILL.md
inventory-dataset/SKILL.md
sales-analysis/SKILL.md
veeva-crm-analysis/SKILL.md
digital-engagement-analysis/SKILL.md
scripts/
generate_dummy_data.py
build_sqlite.py
data/
raw/*.csv
pharma_mvp.db
eval/questions.md
- Create environment and install:
cd /Users/shyamvora/Documents/GitHub\ Repos/pharma_data_agent
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .- Configure environment:
cp .env.example .env
# add your API key(s)- Generate dummy CSV data:
python scripts/generate_dummy_data.py- Build SQLite DB from CSV files:
python scripts/build_sqlite.py- Ask a question:
python agent.py "What is the change in engagement and impressions for Tier 1 HCPs between Q3 and Q2 of 2025 for Dexavir?"- For full multi-turn conversation testing:
python agent.py --interactive- Optional terminal visibility levels:
# default
python agent.py --interactive --visibility standard
# final answers only
python agent.py --interactive --visibility quiet
# full tool-call detail (includes full SQL in tool call logs)
python agent.py --interactive --visibility debugIn standard/debug, each turn prints:
[TURN]user query[ORCH]planning/delegation/synthesis[SUBAGENT:<name>]start/completion[TOOL]and[TOOL:OK|ERR]execution events[SKILL]skill load notices (best-effort when skill files are read)[ANSWER]final response[AUDIT]per-turn summary (subagents, skills, datasets touched, tools, sql count, retries, errors, duration)[AUDIT:WARN]validation warnings (for example SQL executed without matching dataset skill loads, ordiscover_schemaoveruse without fallback need)
- The orchestrator does not have direct SQL tools.
- The orchestrator does not load dataset or analysis skills.
- SQL/data work is delegated to subagents via
task. - Brand-specific queries should route to
dexavir-analystornabinix-analyst. - Cross-brand analytics should be split across both brand analysts and synthesized by the orchestrator.
- Brand-agnostic analytics, schema/metadata questions, and data-access questions should also be delegated to both brand analysts and synthesized by the orchestrator.
- Dataset skills are the default schema source for analysis queries; schema tools are fallback-only.
- Subagents are responsible for:
- dataset selection
- loading matching dataset skills
- SQL execution
- validated response assembly
This app supports LangSmith tracing for Deep Agents.
- Set environment variables in
.env:
LANGSMITH_API_KEY=...
LANGSMITH_TRACING=true
LANGSMITH_PROJECT=pharma-insights-mvp- Run the app in interactive mode for full conversation tracing:
python agent.py --interactive- In LangSmith, verify:
- runs appear under project
pharma-insights-mvp, - one session groups multiple turns,
- nested spans include tool calls and subagent calls,
- SQL tool calls are visible end-to-end.
- Prompts, tool inputs, and tool outputs can be traced.
- Do not include secrets or sensitive personal data in prompts.
Optional programmatic controls (for example custom tracing_context wrapping and custom env knobs for tags/metadata) are intentionally not implemented in this MVP.
- Default DB path is
data/pharma_mvp.db. - Default model is
anthropic:claude-sonnet-4-5-20250929unlessPHARMA_AGENT_MODELis set. - SQL execution is read-only (
SELECT,WITH,PRAGMA,EXPLAIN). - Responses are configured to include both SQL query text and evidence summary.