Skip to content
3 changes: 2 additions & 1 deletion .agents/skills/audit-markers/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ test actually loads, and report whether the gate is correctly set or too loose.

Read these before auditing — they are the authoritative source for marker conventions:

- **Marker guide:** `test/MARKERS_GUIDE.md`
- **Test strategy:** `docs/docs/community/testing-strategy.md` — classification decision rules, per-tier definitions, philosophy
- **Marker guide:** `test/MARKERS_GUIDE.md` — marker tables, common patterns, backend reference
- **Marker registration:** `test/conftest.py` (`pytest_configure`) and `pyproject.toml` (`[tool.pytest.ini_options]`)
- **Resource predicates:** `test/predicates.py` (predicate functions for resource gating)
- **Example marker format:** `docs/examples/conftest.py` (`_extract_markers_from_file`)
Expand Down
4 changes: 3 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ uv run mypy . # Type check
## 3. Test Markers
Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualitative`) plus backend and resource markers. The `unit` marker is auto-applied by conftest — never write it explicitly. The `llm` marker is deprecated; use `e2e` instead.

See **[test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md)** for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic, common patterns).
See **[Test Strategy](docs/docs/community/testing-strategy.md)** for classification rules, authoring guide, CI tier map, and local workflow. See **[test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md)** for the full marker reference (tier definitions, backend markers, resource gates, common patterns).

**Examples in `docs/examples/`** are opt-in — unlike `test/` files (auto-collected, default `unit`), examples require an explicit `# pytest:` comment to be collected. Files without this comment are silently ignored (they won't appear in skip summaries either). This is because examples have variable dependencies and limited setup:
```python
Expand Down Expand Up @@ -115,6 +115,8 @@ Pre-commit runs: ruff, mypy, uv-lock, codespell

## 10. Writing Tests

See **[Test Strategy — Authoring guide](docs/docs/community/testing-strategy.md#authoring-guide)** for the full authoring guide (naming, fixture discipline, mock discipline, assertion style).

- Place tests in `test/` mirroring source structure
- Name files `test_*.py` (required for pydocstyle)
- Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`)
Expand Down
34 changes: 14 additions & 20 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,32 +320,23 @@ as it can corrupt state.

### Quick Reference

See the [Test Strategy](docs/docs/community/testing-strategy.md)
page for classification rules, authoring guide, CI tier map, coverage, and the
full local workflow reference. Essential commands:

```bash
# Install all dependencies (required for tests)
uv sync --all-extras --all-groups

# Start Ollama (required for most tests)
ollama serve

# Default: qualitative tests, skip slow tests
# Default: includes qualitative tests, skips slow tests
uv run pytest

# Fast tests only (no qualitative, ~2 min)
uv run pytest -m "not qualitative"

# Unit tests only (self-contained, no services)
uv run pytest -m unit

# Run only slow tests (>1 min)
uv run pytest -m slow

# Run specific backend tests
uv run pytest -m "ollama"
uv run pytest -m "openai"

# CI/CD mode (skips qualitative tests)
CICD=1 uv run pytest

# Lint and format
uv run ruff format .
uv run ruff check .
Expand Down Expand Up @@ -395,23 +386,26 @@ for m in granite4:micro granite4:micro-h deepseek-r1:8b \

### Test Markers

Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualitative`) plus backend and resource markers. See [test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md) for the full marker reference, including tier definitions, backend markers, resource gates, and auto-skip logic.
Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualitative`) plus backend and resource markers. The [Test Strategy](docs/docs/community/testing-strategy.md) page covers classification rules, authoring guide, and CI tiers. See [test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md) for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic).

### CI/CD Tests

CI runs the following checks on every pull request:
1. **Pre-commit hooks** (`pre-commit run --all-files`) - Ruff, mypy, uv-lock, codespell
2. **Test suite** (`CICD=1 uv run pytest`) - Skips qualitative tests for speed
1. **Pre-commit hooks** (`pre-commit run --all-files`) — ruff, mypy, uv-lock, codespell
2. **Test suite** `CICD=1 uv run pytest test` on Python 3.11/3.12/3.13 with Ollama running; skips qualitative tests

To replicate CI locally:
```bash
# Run pre-commit checks (same as CI)
# Pre-commit checks (same as CI)
pre-commit run --all-files

# Run tests with CICD flag (same as CI, skips qualitative tests)
CICD=1 uv run pytest
# Tests with CICD flag (skips qualitative, matches CI scope)
CICD=1 uv run pytest test
```

See the [Test Strategy — CI pipeline](docs/docs/community/testing-strategy.md#ci-pipeline)
for the full CI breakdown and planned nightly/pre-release tiers.

### Timing Expectations

- Fast tests (`-m "not qualitative"`): ~2 minutes
Expand Down
Loading
Loading