Guardian is an enterprise-grade AI-powered penetration testing automation framework that combines multiple AI providers (OpenAI GPT-4, Claude, Google Gemini, OpenRouter) with battle-tested security tools to deliver intelligent, adaptive security assessments with comprehensive evidence capture.
Features β’ Installation β’ Quick Start β’ Documentation β’ Contributing
Guardian is designed exclusively for authorized security testing and educational purposes.
- β Legal Use: Authorized penetration testing, security research, educational environments
- β Illegal Use: Unauthorized access, malicious activities, any form of cyber attack
You are fully responsible for ensuring you have explicit written permission before testing any system. Unauthorized access to computer systems is illegal under laws including the Computer Fraud and Abuse Act (CFAA), GDPR, and equivalent international legislation.
By using Guardian, you agree to use it only on systems you own or have explicit authorization to test.
- 6 AI Providers Supported: OpenAI (GPT-4o), Anthropic (Claude), Google (Gemini), OpenRouter, Ollama (local), OpenAI-compatible (vLLM, LM Studio, Together, Groq)
- Plugin Provider Contract: Third-party providers ship via
[project.entry-points."guardian.providers"]β no fork required - Multi-Agent Architecture: Specialized AI agents (Planner, Tool Selector, Analyst, Reporter) plus debate triage roles (Red Advocate, Blue Advocate, Judge) and Visual Triage
- Multi-Agent Debate Triage: Three-role red/blue/judge debate on ambiguous findings β F1 β₯ single-agent baseline +5pp
- Vision-LLM Visual Triage: Headless screenshot capture + image-grounded analyst enrichment via gpt-4o / Claude 3.5+ / Gemini 1.5+
- RAG Knowledge Base: SQLite + FTS5 grounded retrieval over CVE / CWE / MITRE ATT&CK feeds β kills hallucinated CVE refs
- Judge Model Routing:
think_deeplyswap-and-restore β big model thinks, small model judges, ~10x cost reduction - Learned Tool Selection: Offline ranker trained on session telemetry; abstains when low-confidence and falls back to LLM selector
- Adaptive Testing: AI adjusts tactics based on discovered vulnerabilities and prior tool yields
- False Positive Filtering: Debate triage cuts noise; cheap path skips when fp_probability is decisive
50 Integrated Security Tools across 10 categories:
| Category | Tools |
|---|---|
| Network | nmap, masscan |
| Web Reconnaissance | httpx, whatweb, wafw00f, cmseek |
| Subdomain / DNS | subfinder, amass, dnsrecon |
| Vulnerability Scanning | nuclei, nikto, sqlmap, wpscan |
| SSL/TLS Testing | testssl, sslyze |
| Content Discovery | gobuster, ffuf, arjun |
| Security Analysis | xsstrike, gitleaks |
| Cloud / Container / SBOM | trivy, grype, syft, scoutsuite, prowler, kube-bench |
| Modern Web + OSINT | graphw00f, clairvoyance, jwt_tool, shodan, theharvester |
| SAST + Secrets (B11) | semgrep, trufflehog, dependency-check |
| API Fuzzers (B10) | schemathesis, cariddi, restler |
| Burp/ZAP Bridge (B13) | zap, burp |
| LLM Red-Team (B12) | garak, pyrit, prompt_fuzz |
| Mobile Android (B9) | mobsf, apkleaks, objection |
| Active Directory (B8) | crackmapexec, bloodhound, kerbrute, impacket-secretsdump |
| Vision Evidence (A3) | playwright_screenshot |
- Execution Traceability: Every finding linked to its source tool execution via
execution_id - Complete Command History: Full tool output preserved with each finding
- Raw Evidence Storage: Output snippets bound to findings
- Visual Evidence: Screenshots captured per URL, attached to web findings
- Session Reconstruction: Atomic-checkpointed
session_<id>.jsonenables--resume
- DAG Scheduler: Steps with
depends_onrun in parallel up tomax_parallel_tools - Jinja2 Templates (sandboxed):
parameters: {key: "{{ <id>.parsed.alive_hosts }}"}resolves against prior step results - Conditional Steps:
when:clauses gate execution on prior output - Resume:
--resumepicks up after the last completed step - Parameter Priority: Workflow YAML > config block > tool defaults
- Custom Agents:
agent: debate | visual | analyston analysis steps - Multiple Report Formats: Markdown, HTML, JSON
- SARIF v2.1.0: GitHub-friendly, includes
security-severity, dedupfingerprintsfromexecution_id - DefectDojo: Direct REST upload
- Slack: Webhook posts with severity colour-coding
- Triggered via:
guardian report --export sarif --export defectdojo --export slack
- DNS-Resolve Scope Validation: Closes SSRF-class bypass; private RFC1918 ranges blacklisted
- Prompt-Injection Defense: All tool output wrapped via
<UNTRUSTED_TOOL_OUTPUT>delimiters + ANSI strip - API Key Scrubbing: Logs and reports redact secrets at write time
- Confirmation Gate: Active+ tools (intrusive/destructive) require explicit user approval
- Audit Logging: Rotating logs of every AI decision and action
- Safe Mode: Prevents destructive actions by default
- CVSS v3.1 Recomputation: Validates claimed scores against vector math; flags drift
- Executive Summaries: Non-technical overviews
- Technical Deep-Dives: Findings with evidence, CVSS, CWE, CVE, MITRE technique
- AI Decision Traces: Token usage, cost, thinking-chain ledger per agent
- Visual Triage Sections: Image-grounded enrichment baked into descriptions
- Async Throughout: Tool exec via
asynciosubprocess; agents async - Lazy Tool Loading: 50 tools registered, none imported until needed β
--helpstays under 500ms - Parallel DAG Execution: Independent steps run concurrently per generation
- Workflow Automation: 13+ shipped workflows (recon, web, network, AD, mobile, LLM red-team, SAST, API)
- Python 3.11 or higher (Download)
- AI Provider API Key (Choose one):
- OpenAI API Key (Get it here)
- Anthropic API Key (Get it here)
- Google AI Studio API Key (Get it here)
- OpenRouter API Key (Get it here)
- Git (for cloning repository)
Guardian can intelligently use these tools if installed:
| Tool | Purpose | Installation |
|---|---|---|
| nmap | Port scanning | apt install nmap / choco install nmap |
| masscan | Ultra-fast scan | apt install masscan / Build from source |
| httpx | HTTP probing | go install github.com/projectdiscovery/httpx/cmd/httpx@latest |
| subfinder | Subdomain enum | go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest |
| amass | Network mapping | go install github.com/owasp-amass/amass/v4/...@master |
| nuclei | Vuln scanning | go install github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest |
| whatweb | Tech fingerprint | gem install whatweb / apt install whatweb |
| wafw00f | WAF detection | pip install wafw00f |
| nikto | Web vuln scan | apt install nikto |
| sqlmap | SQL injection | pip install sqlmap / apt install sqlmap |
| wpscan | WordPress scan | gem install wpscan |
| testssl | SSL/TLS testing | Download from testssl.sh |
| sslyze | SSL/TLS analysis | pip install sslyze |
| gobuster | Directory brute | go install github.com/OJ/gobuster/v3@latest |
| ffuf | Web fuzzing | go install github.com/ffuf/ffuf/v2@latest |
| arjun | Parameter discovery | pip install arjun |
| xsstrike | Advanced XSS | git clone https://github.com/s0md3v/XSStrike |
| gitleaks | Secret scanning | go install github.com/zricethezav/gitleaks/v8@latest |
| cmseek | CMS detection | pip install cmseek |
| dnsrecon | DNS enumeration | pip install dnsrecon |
Note: Guardian works without external tools but with limited scanning capabilities. The AI will adapt based on available tools.
git clone https://github.com/zakirkun/guardian-cli.git
cd guardian-cliLinux/macOS:
python3 -m venv venv
source venv/bin/activate
pip install -e .Windows:
python -m venv venv
.\venv\Scripts\activate
pip install -e .Guardian supports multiple AI providers. Configure your preferred provider in config/guardian.yaml:
# config/guardian.yaml
ai:
# Choose your provider: openai, claude, gemini, or openrouter
provider: openai
# OpenAI Configuration (recommended)
openai:
model: gpt-4o
api_key: sk-your-api-key-here # Or set OPENAI_API_KEY env var
# Claude Configuration
claude:
model: claude-3-5-sonnet-20241022
api_key: null # Or set ANTHROPIC_API_KEY env var
# Gemini Configuration
gemini:
model: gemini-2.5-pro
api_key: null # Or set GOOGLE_API_KEY env var
# OpenRouter Configuration
openrouter:
model: anthropic/claude-3.5-sonnet
api_key: null # Or set OPENROUTER_API_KEY env varOr use environment variables:
# Linux/macOS
export OPENAI_API_KEY="sk-your-key-here"
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
export GOOGLE_API_KEY="your-gemini-key"
export OPENROUTER_API_KEY="your-router-key"
# Windows PowerShell
$env:OPENAI_API_KEY="sk-your-key-here"
$env:ANTHROPIC_API_KEY="sk-ant-your-key-here"# Verify installation
python -m cli.main --help
# Check AI provider status
python -m cli.main models# List available workflows
python -m cli.main workflow list
# View AI providers and models
python -m cli.main models
# Run with specific provider
python -m cli.main workflow run --name web_pentest --target example.com --provider openai# Fast security check with evidence capture
python -m cli.main workflow run --name web_pentest --target https://dvwa.csalab.appExpected Output:
- β HTTP discovery with httpx
- β Vulnerability scan with nuclei
- β Full evidence linking (commands + outputs)
- β Markdown report with findings
# Full network penetration test
python -m cli.main workflow run --name network --target 192.168.1.0/24# Run with workflow-specific parameters
# Parameters in workflow YAML override config defaults
python -m cli.main workflow run --name web_pentest --target example.comWorkflow Parameter Priority:
- Workflow YAML parameters (highest priority)
- Config file parameters
- Tool defaults (lowest priority)
# Create HTML report with evidence
python -m cli.main report --session 20260203_175905 --format html# Use OpenAI GPT-4
python -m cli.main workflow run --name web_pentest --target example.com --provider openai
# Use Claude
python -m cli.main workflow run --name web_pentest --target example.com --provider claude
# Use Gemini
python -m cli.main workflow run --name web_pentest --target example.com --provider gemini
# Local Ollama (no cloud)
OLLAMA_HOST=http://localhost:11434 python -m cli.main workflow run --name recon --target scanme.nmap.org --provider ollama
# Any OpenAI-compatible endpoint (vLLM, LM Studio, Together, Groq)
python -m cli.main workflow run --name web_pentest --target example.com --provider openai_compatible# Seed bundled offline corpus
python -m cli.main kb seed
# Show corpus stats
python -m cli.main kb status
# Ad-hoc retrieval
python -m cli.main kb query "log4j JNDI" --top 5
# Ingest external feed (NVD JSON / MITRE STIX / nuclei metadata)
python -m cli.main kb update --kind cve --file ./nvd-2025.jsonEnable analyst grounding in config/guardian.yaml:
rag:
enabled: true
top_k: 5# Workflow YAML uses agent: debate on an analysis step
python -m cli.main workflow run --name web_pentest_with_debate --target https://example.comThree roles (red advocate, blue advocate, judge) debate ambiguous findings only β confident verdicts skip the debate to bound token cost.
# Captures full-page screenshots and feeds them to a vision-capable provider
python -m cli.main workflow run --name web_visual_pentest --target https://example.com --provider openaiRequires playwright: pip install playwright && python -m playwright install chromium. Skipped silently when active provider has no vision support.
# SARIF (GitHub code-scanning friendly)
python -m cli.main report --session 20260203_175905 --export sarif
# Multiple sinks at once
python -m cli.main report --session 20260203_175905 --export sarif --export defectdojo --export slack \
--slack-webhook https://hooks.slack.com/services/...# Anonymise sessions into JSONL (no raw targets, no commands, no secrets)
python -m cli.main telemetry export ./reports --out telemetry.jsonl
# Train the offline tool ranker
python -m cli.main telemetry train telemetry.jsonl
# Inspect what the ranker learned
python -m cli.main telemetry statusEnable in config:
ai:
use_learned_ranker: true # ToolAgent calls ranker before LLM selectorWindows Users: Use
python -m cli.maininstead ofguardian
Edit config/guardian.yaml to customize Guardian's behavior:
# AI Configuration
ai:
provider: openai # openai, claude, gemini, openrouter
openai:
model: gpt-4o
api_key: sk-your-key # Or use OPENAI_API_KEY env var
claude:
model: claude-3-5-sonnet-20241022
api_key: null
gemini:
model: gemini-2.5-pro
api_key: null
temperature: 0.2
max_tokens: 8000
# Penetration Testing Settings
pentest:
safe_mode: true # Prevent destructive actions
require_confirmation: true # Confirm before each step
max_parallel_tools: 3 # Concurrent tool execution
max_depth: 3 # Maximum scan depth
tool_timeout: 300 # Tool timeout in seconds
# Output Configuration
output:
format: markdown # markdown, html, json
save_path: ./reports
include_reasoning: true
verbosity: normal # quiet, normal, verbose, debug
# Scope Validation
scope:
blacklist: # Never scan these
- 127.0.0.0/8
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
require_scope_file: false
max_targets: 100
# Tool Configuration (defaults)
tools:
httpx:
threads: 50
timeout: 10
tech_detect: true
nuclei:
severity: ["critical", "high", "medium"]
templates_path: ~/nuclei-templates
nmap:
default_args: "-sV -sC"
timing: T4Create custom workflows in workflows/ directory:
# workflows/custom_web.yaml
name: custom_web_assessment
description: Custom web security testing
steps:
- name: http_discovery
type: tool
tool: httpx
parameters:
threads: 100 # Override config default (50)
timeout: 15 # Override config default (10)
tech_detect: true
- name: vulnerability_scan
type: tool
tool: nuclei
parameters:
severity: ["critical", "high"] # Override config
templates_path: ".shared/nuclei/templates/"
- name: generate_report
type: report
# Format will use config default (markdown)Parameter Priority:
- Workflow parameters override config parameters
- Config parameters override tool defaults
- Self-contained, reusable workflows
- Quick Start Guide - Get up and running in 5 minutes
- Command Reference - Detailed documentation for all commands
- Configuration Guide - Complete configuration reference
- Workflow Guide - Creating custom workflows
- Eval Guide - Running and extending the eval harness
- Plugin Guide - Shipping third-party providers and tools
- Changelog - Version history and migration notes
- Creating Custom Tools - Build your own tool integrations
- Workflow Development - Create custom testing workflows
- Available Tools - Overview of integrated tools
Guardian Architecture:
βββββββββββββββββββββββββββββββββββββββββββ
β AI Provider Layer β
β (OpenAI, Claude, Gemini, OpenRouter) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Multi-Agent System β
β Planner β Tool Agent β Analyst β β
β Reporter β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Workflow Engine β
β - Parameter Priority β
β - Evidence Capture β
β - Session Management β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Tool Integration Layer β
β (19 Security Tools) β
βββββββββββββββββββββββββββββββββββββββββββ
guardian-cli/
βββ ai/ # AI integration
β βββ providers/ # Multi-provider support
β βββ base_provider.py
β βββ openai_provider.py
β βββ claude_provider.py
β βββ gemini_provider.py
β βββ openrouter_provider.py
βββ cli/ # Command-line interface
β βββ commands/ # CLI commands (init, scan, recon, etc.)
βββ core/ # Core agent system
β βββ agent.py # Base agent
β βββ planner.py # Planner agent
β βββ tool_agent.py # Tool selection agent
β βββ analyst_agent.py # Analysis agent
β βββ reporter_agent.py # Reporting agent
β βββ memory.py # State management
β βββ workflow.py # Workflow orchestration
βββ tools/ # Pentesting tool wrappers
β βββ nmap.py # Nmap integration
β βββ masscan.py # Masscan integration
β βββ httpx.py # httpx integration
β βββ subfinder.py # Subfinder integration
β βββ amass.py # Amass integration
β βββ nuclei.py # Nuclei integration
β βββ sqlmap.py # SQLMap integration
β βββ wpscan.py # WPScan integration
β βββ whatweb.py # WhatWeb integration
β βββ wafw00f.py # Wafw00f integration
β βββ nikto.py # Nikto integration
β βββ testssl.py # TestSSL integration
β βββ sslyze.py # SSLyze integration
β βββ gobuster.py # Gobuster integration
β βββ ffuf.py # FFuf integration
β βββ ... # 15 tools total
βββ workflows/ # Workflow definitions (YAML)
βββ utils/ # Utilities (logging, validation)
βββ config/ # Configuration files
βββ docs/ # Documentation
βββ reports/ # Generated reports
Track A β AI/Agent R&D (7 items)
| ID | Item | Highlights |
|---|---|---|
| A1 | RAG knowledge base | core/knowledge_base.py SQLite + FTS5 + optional embeddings; analyst grounding via kb_references slot; guardian kb {seed,update,query,status} |
| A2 | Multi-agent debate triage | Red/Blue/Judge over MEDIUM-fp findings only; new analysis step type agent: debate |
| A3 | Vision-LLM screenshot analysis | tools/playwright_screenshot.py + core/agents/visual_triage.py; OpenAI + Claude generate_with_images |
| A4 | Plugin contract + local providers | Entry-point discovery for providers AND tools; Ollama + OpenAI-compatible providers shipped |
| A5 | Learned tool selection (offline) | core/learners/tool_ranker.py + core/telemetry.py; opt-in via ai.use_learned_ranker: true |
| A6 | Eval harness | evals/{__init__,scoring,fixtures_loader,test_*}.py + golden fixtures; 3 tiers (parser, workflow, agent grounding) |
| A7 | Judge model upgrade | BaseAgent.think_deeply(judge_model=...) swap-and-restore; transcript-judging for ~10x cost reduction |
Track B β Tool Coverage Expansion (7 items)
| ID | Category | Tools Added |
|---|---|---|
| B8 | Active Directory | crackmapexec, bloodhound, kerbrute, impacket-secretsdump |
| B9 | Mobile Android | mobsf, apkleaks, objection |
| B10 | API fuzzers | schemathesis, restler, cariddi |
| B11 | SAST + secrets | semgrep, trufflehog, dependency-check |
| B12 | LLM red-team | garak, pyrit, prompt_fuzz |
| B13 | Burp/ZAP bridge | zap, burp |
| B14 | Output exporters | SARIF v2.1.0, DefectDojo, Slack |
Quality bar:
- 296 tests pass (+93% from v3 baseline of 153)
- All v3 hardening preserved: prompt-injection delimiters, key scrub, DNS-resolve scope, atomic checkpoints, log rotation, lazy tool loading
guardian --helpstartup time stays <500ms despite 50 tools- New CLI surfaces:
guardian kb,guardian telemetry - 8 new shipped workflows:
web_pentest_with_debate,web_visual_pentest,ad_assessment,mobile_android,llm_redteam,sast_review,api_pentest_v2, plus existing v3 workflows
- Prompt-injection delimiters (
<UNTRUSTED_TOOL_OUTPUT>) on all tool output - DAG scheduler, Pydantic schemas, atomic checkpoints,
--resume - 11 new wrappers (cloud/container/SBOM/GraphQL/JWT/OSINT)
- CVSS v3.1 recomputation + drift detection
- Log rotation, key scrub at write time
- Confirmation gate wired for active+ tools
- Multi-provider AI (OpenAI, Claude, Gemini, OpenRouter)
- Evidence linking via
execution_id - Workflow parameter priority system
We welcome contributions! Here's how:
# Fork and clone
git clone https://github.com/zakirkun/guardian-cli.git
cd guardian-cli
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Format code
black .- π€ AI Provider Integrations - Add more AI models
- π οΈ New Tool Integrations - Add more security tools
- π Custom Workflows - Share your workflow templates
- π Bug Fixes - Report and fix issues
- π Documentation - Improve guides and examples
- π§ͺ Testing - Expand test coverage
See CONTRIBUTING.md for detailed guidelines.
Shipped in v4.0.0:
- Multi-provider AI (OpenAI, Claude, Gemini, OpenRouter, Ollama, OpenAI-compatible)
- Plugin entry-point contract for providers AND tools
- RAG knowledge base (CVE/CWE/MITRE)
- Multi-agent debate triage (red/blue/judge)
- Vision-LLM visual triage with screenshots
- Learned tool selection (offline ranker)
- Judge-model routing for cost reduction
- Eval harness (parser fixtures, workflow integration, agent grounding)
- AD / Mobile / API-fuzz / SAST / LLM red-team / Burp-ZAP / Vision tool tracks
- SARIF + DefectDojo + Slack exporters
- CVSS v3.1 recomputation
- DAG workflow engine with
--resume
Future:
- Web Dashboard for visualization
- PostgreSQL backend for multi-session analytics
- Real-time multi-operator collaboration
- Custom LLM fine-tuning pipeline once telemetry corpus matures
- Plugin marketplace / hub UI
Import Errors
# Reinstall dependencies
pip install -e . --force-reinstallAI Provider Errors
# Verify API key is set
python -m cli.main models
# Check provider configuration
cat config/guardian.yaml | grep -A 5 "ai:"Tool Not Found
# Check tool availability
which nmap
which httpx
# Install missing tools (see Prerequisites)Workflow Not Loading
# Check workflow file exists
ls workflows/web_pentest.yaml
# Verify YAML syntax
python -c "import yaml; yaml.safe_load(open('workflows/web_pentest.yaml'))"Windows Command Not Found
# Use full command
python -m cli.main --helpFor more help, open an issue.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI - GPT-4 capabilities
- Anthropic - Claude AI
- Google - Gemini AI
- LangChain - AI orchestration framework
- ProjectDiscovery - Open-source security tools (httpx, subfinder, nuclei)
- Nmap - Network exploration and security auditing
- The Security Community - Tool developers and researchers
- GitHub Issues: Report bugs or request features
- Discussions: Join community discussions
- Documentation: Read the docs
- Security: Report vulnerabilities privately to security@example.com
Guardian - Intelligent, Ethical, Automated Penetration Testing
Made with β€οΈ by the Security Community