FEAT add AgentThreatRulesScorer (ATR taxonomy scorer) by eeee2345 · Pull Request #1893 · microsoft/PyRIT

eeee2345 · 2026-06-02T20:36:51Z

Adds AgentThreatRulesScorer, the scorer half of #1702 (the dataset loader landed in #1715).

What it does

A deterministic TrueFalseScorer that evaluates text against the open Agent Threat Rules (ATR) ruleset via the pyatr engine.
Returns True when at least one rule at or above a configurable min_severity matches; attaches matched rule ids, ATR category, and max severity as score metadata.
Mirrors SubStringScorer's shape (true_false base, _score_piece_async, _build_identifier, min_severity validation).

Dependency

pyatr (>=0.2.6, which bundles the ATR ruleset) is an optional dependency, imported lazily with a clear ImportError. The unit test uses pytest.importorskip("pyatr"); if you'd like CI to exercise it, pyatr can be added to the test extras — happy to wire that into whichever group you prefer.

Pairs with the _AgentThreatRulesDataset loader: the dataset supplies ATR-derived adversarial prompts, and this scorer detects whether a response trips an ATR rule.

Add a deterministic TrueFalseScorer that evaluates text against the open Agent Threat Rules (ATR) ruleset via the pyatr engine and returns True when a rule at or above a configurable min_severity matches, attaching matched rule ids / ATR category / max severity as score metadata. Mirrors SubStringScorer; pyatr (>=0.2.6) is an optional dependency. Scorer half of Signed-off-by: Adam Lin <adam@agentthreatrule.org>

eeee2345 · 2026-06-11T19:43:10Z

Freshened this onto the latest main — it is up to date and mergeable now. Ready for review when convenient.

adrian-gavrila

Thanks for picking this up! Overall it looks good, but there are a couple of things that should be addressed, primarily around things actually being excercised in CI and following some style guidelines.

adrian-gavrila · 2026-06-12T20:29:16Z

+        # pyatr returns matches sorted by severity (critical first).
+        hits = [m for m in matches if _SEVERITY_ORDER.get((m.severity or "").lower(), 0) >= self._severity_floor]
+        triggered = bool(hits)
+
+        if triggered:
+            top = hits[0]


top = hits[0] relies on pyatr returning matches pre-sorted critical-first (per the comment on line 108). That ordering is an undocumented pyatr internal, so if a future release changes or drops the sort, max_severity and the highest-severity value in the description silently go wrong while every test still passes.

We already own a severity ranking in _SEVERITY_ORDER, so we can pick the top match ourselves instead of trusting the engine's order:

top = max(hits, key=lambda m: _SEVERITY_ORDER.get((m.severity or "").lower(), 0))

Separately, there's a casing inconsistency: the filter on line 109 lowercases severity before comparing, but line 122 stores top.severity raw. If pyatr ever emits mixed case, the stored max_severity won't match what the filter normalized.

adrian-gavrila · 2026-06-12T20:29:16Z

+import pytest
+
+# The scorer relies on the optional `pyatr` engine; skip if it is not installed.
+pytest.importorskip("pyatr")


This importorskip is at module scope, so it gates the whole file, and pyatr isn't declared anywhere in pyproject.toml. CI never installs it, so all four tests are skipped on every run. That's also why the broken assertion in test_atr_scorer_passes_benign below was never caught.

Two things to fix it:

Wire pyatr>=0.2.6 into an optional extra and the all group so CI installs it and the live tests actually run.

Move test_atr_scorer_rejects_invalid_min_severity out from behind this module-level skip. It only asserts that a bad min_severity raises ValueError, which happens before the pyatr import in __init__, so it doesn't need the engine and shouldn't be gated.

adrian-gavrila · 2026-06-12T20:29:16Z

+        return self._create_identifier(
+            params={
+                "score_aggregator": self._score_aggregator.__name__,  # type: ignore[ty:unresolved-attribute]
+                "min_severity": self._min_severity,


Nit, optional and low likelihood: _build_identifier omits rules_dir, so two scorers loading different custom rulesets share an eval_hash (and stored scorer_class_identifier). It only bites if you pass a custom rules_dir and use the scorer-eval metrics harness, so the bundled-ruleset path is never affected. Cheap to close though, and consistent with how SubStringScorer includes substring in its identifier:

Suggested change

"min_severity": self._min_severity,

"min_severity": self._min_severity,

"rules_dir": self._rules_dir,

adrian-gavrila · 2026-06-12T20:29:16Z

+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+from typing import Optional


This file currently fails ruff check under the repo config (pyrit/score/** isn't DOC/D-exempt), so CI will be red until these are addressed. I ran it locally; the failures are:

UP045 (x5): Optional[...] should be X | None (lines 4, 42, 43, 45, 94, 119). This is the modern union syntax the codebase standardized on; SubStringScorer, which this file mirrors, already uses it.

D213 (x3): multi-line docstring summary should start on the second line (lines 19, 47, 95), matching SubStringScorer's docstring style.

DOC501 (x2): __init__ raises ValueError (bad min_severity) and ImportError (missing pyatr), but neither is in a Raises: section.

DOC201: _score_piece_async describes its return in prose but has no Returns: section.

ruff check --fix auto-resolves the UP045 and D213; the Raises: and Returns: sections need to be added by hand.

adrian-gavrila · 2026-06-12T20:29:16Z

+
+    assert len(scores) == 1
+    assert scores[0].get_value() is False
+    assert scores[0].score_metadata is None


Two assertion issues here, both of which only surface once this file actually runs in CI (see the coverage comment at the top of the file):

Line 34: assert scores[0].score_metadata is None can never pass. The Score model has a mode="before" validator that coerces None to {}, so on the benign path this is effectively assert {} is None. I confirmed it by installing pyatr locally and running the module: 1 failed, 3 passed. Assert the no-match contract you actually want, e.g. not scores[0].score_metadata or == {}.

Line 23: assert scores[0].score_metadata is not None is vacuous for the same reason (metadata is always a dict), so it never tests anything. The next line (["matched_rule_ids"]) is the real check, so this one can go.

…ertions - Sort hits by severity explicitly; don't rely on pyatr internal ordering - Add pyatr>=0.2.6 as an optional 'atr' extra + into 'all' so CI installs it - Ungate test_atr_scorer_rejects_invalid_min_severity (no engine needed); gate the three engine tests individually with skipif - Fix benign assertion (== {}), drop vacuous 'is not None' - _build_identifier includes rules_dir - ruff: Optional -> X | None, add Raises/Returns, D213

eeee2345 · 2026-06-12T22:33:32Z

Thanks for the thorough review, Adrian — all addressed in the latest push:

Severity selection no longer relies on pyatr's ordering: the scorer now sorts hits by severity explicitly before picking max_severity.
Wired pyatr>=0.2.6 as an optional atr extra and into the all group so CI installs it and the live tests actually run (kept out of dev per the functional-deps convention).
Moved test_atr_scorer_rejects_invalid_min_severity out from behind the module-level skip — it asserts the ValueError that's raised before the pyatr import, so it no longer needs the engine. The three engine tests are now gated individually with a skipif marker.
Fixed the assertions: benign now checks == {} (the model validator coerces None), and dropped the vacuous is-not-None line.
_build_identifier now includes rules_dir so custom rulesets don't share an eval_hash.
ruff clean: Optional[...] -> X | None, plus the Raises/Returns/D213 docstring fixes.

Verified locally: ruff passes, and against pyatr 0.2.6 the injection string trips 5 rules (top severity critical) while the benign string trips none, so the severity-floor tests hold. Ready for another look when you have a moment.

eeee2345 mentioned this pull request Jun 2, 2026

Proposal: Add Agent Threat Rules (ATR) dataset loader and taxonomy scorer #1702

Open

eeee2345 force-pushed the feat/atr-taxonomy-scorer branch from e55c2a6 to 4d55a7d Compare June 4, 2026 22:27

Merge branch 'main' into feat/atr-taxonomy-scorer

d4f3160

adrian-gavrila self-assigned this Jun 12, 2026

adrian-gavrila requested changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT add AgentThreatRulesScorer (ATR taxonomy scorer)#1893

FEAT add AgentThreatRulesScorer (ATR taxonomy scorer)#1893
eeee2345 wants to merge 3 commits into
microsoft:mainfrom
eeee2345:feat/atr-taxonomy-scorer

eeee2345 commented Jun 2, 2026

Uh oh!

eeee2345 commented Jun 11, 2026

Uh oh!

adrian-gavrila left a comment

Uh oh!

adrian-gavrila Jun 12, 2026

Uh oh!

adrian-gavrila Jun 12, 2026

Uh oh!

adrian-gavrila Jun 12, 2026

Uh oh!

adrian-gavrila Jun 12, 2026

Uh oh!

adrian-gavrila Jun 12, 2026

Uh oh!

eeee2345 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"min_severity": self._min_severity,
	"min_severity": self._min_severity,
	"rules_dir": self._rules_dir,

Conversation

eeee2345 commented Jun 2, 2026

Uh oh!

eeee2345 commented Jun 11, 2026

Uh oh!

adrian-gavrila left a comment

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

eeee2345 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants