Moral Compass

140 real-life moral dilemmas. 15 AI models. Where they quietly disagree about what to do.

Give frontier language models the same morally tense everyday situation and see where their answers diverge. Short version: they disagree more than you'd think, in patterns that don't track "newer = safer."

Take the quiz · Findings · Experiments · Methodology

What's here

140 dilemmas — 20 hand-written (real names, concrete numbers, no trolley problems), 120 generated by a self-improving pipeline.
15 models, asked cold — free-text answers with the options hidden; two LLM judges map each answer to A/B/C/D (85.4% raw agreement). The headline stats cover 11 comparable models (5 GPT + 6 Gemini, raw APIs). Four Claude models — including Fable 5, probed days after release — ran through the Claude Code agent instead, so they're flagged and excluded from every aggregate.
7 alignment experiments — sycophancy, value-priming, goodbye retention, evaluation-awareness, sandbagging, Goodhart, persona — results and caveats on the experiments page.
The site — answer dilemmas, compare yourself to the models, build your six-axis compass.

What it found

No single "AI answer." All 11 models agreed on just 2 of 140 dilemmas.
Families pull apart. On 21 of 140, the GPT consensus and the Gemini consensus pick opposite options, consistently.
Answers aren't stable. Swap a character's name or gender and the answer flips about a quarter of the time.
The new Claude has its own way. Fable 5 breaks with its three sibling Claudes on 22 of 140 — see the deep dive.

One caveat runs through everything: the two judges are themselves Gemini-family subject models. It's flagged wherever it matters.

Quick start

cd code/moral_compass && python3 -m http.server 8080   # the site
less code/wsf_alignment/dilemmas/dilemmas.jsonl        # the data

Repo map

Path	What
`code/moral_compass/`	The site. Vanilla HTML/CSS/JS, no build step.
`code/wsf_alignment/dilemmas/`	The 20 hand-written seed dilemmas + design docs.
`code/wsf_alignment/factory/`	The pipeline that generated the other 120.
`code/wsf_alignment/precompute/`	Model responses + judge mappings.
`code/wsf_alignment/exp{1..7}_*/`	The seven experiments.
`code/wsf_alignment/writing/`	Findings write-ups.

How it was built

Authored largely by Claude (in Claude Code) under human design constraints. GPT models ran on Azure OpenAI, Gemini on Vertex AI, Claude through the Claude Code agent. All authoring decisions were independent.

License & citation

MIT for code and data — see LICENSE and CITATION.cff. PRs adding dilemmas, translations, or models are welcome; analytics and engagement features are not.

@misc{moral_compass_2026,
  title  = {Moral Compass: A Cross-Family Probe of Frontier Model Values},
  author = {Di Xiao},
  year   = {2026},
  url    = {https://github.com/larryxiao/compass}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
code		code
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moral Compass

What's here

What it found

Quick start

Repo map

How it was built

License & citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Moral Compass

What's here

What it found

Quick start

Repo map

How it was built

License & citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages