Skip to content

larryxiao/compass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Moral Compass

140 real-life moral dilemmas. 15 AI models. Where they quietly disagree about what to do.

Give frontier language models the same morally tense everyday situation and see where their answers diverge. Short version: they disagree more than you'd think, in patterns that don't track "newer = safer."

Take the quiz · Findings · Experiments · Methodology

What's here

  • 140 dilemmas — 20 hand-written (real names, concrete numbers, no trolley problems), 120 generated by a self-improving pipeline.
  • 15 models, asked cold — free-text answers with the options hidden; two LLM judges map each answer to A/B/C/D (85.4% raw agreement). The headline stats cover 11 comparable models (5 GPT + 6 Gemini, raw APIs). Four Claude models — including Fable 5, probed days after release — ran through the Claude Code agent instead, so they're flagged and excluded from every aggregate.
  • 7 alignment experiments — sycophancy, value-priming, goodbye retention, evaluation-awareness, sandbagging, Goodhart, persona — results and caveats on the experiments page.
  • The site — answer dilemmas, compare yourself to the models, build your six-axis compass.

What it found

  • No single "AI answer." All 11 models agreed on just 2 of 140 dilemmas.
  • Families pull apart. On 21 of 140, the GPT consensus and the Gemini consensus pick opposite options, consistently.
  • Answers aren't stable. Swap a character's name or gender and the answer flips about a quarter of the time.
  • The new Claude has its own way. Fable 5 breaks with its three sibling Claudes on 22 of 140 — see the deep dive.

One caveat runs through everything: the two judges are themselves Gemini-family subject models. It's flagged wherever it matters.

Quick start

cd code/moral_compass && python3 -m http.server 8080   # the site
less code/wsf_alignment/dilemmas/dilemmas.jsonl        # the data

Repo map

Path What
code/moral_compass/ The site. Vanilla HTML/CSS/JS, no build step.
code/wsf_alignment/dilemmas/ The 20 hand-written seed dilemmas + design docs.
code/wsf_alignment/factory/ The pipeline that generated the other 120.
code/wsf_alignment/precompute/ Model responses + judge mappings.
code/wsf_alignment/exp{1..7}_*/ The seven experiments.
code/wsf_alignment/writing/ Findings write-ups.

How it was built

Authored largely by Claude (in Claude Code) under human design constraints. GPT models ran on Azure OpenAI, Gemini on Vertex AI, Claude through the Claude Code agent. All authoring decisions were independent.

License & citation

MIT for code and data — see LICENSE and CITATION.cff. PRs adding dilemmas, translations, or models are welcome; analytics and engagement features are not.

@misc{moral_compass_2026,
  title  = {Moral Compass: A Cross-Family Probe of Frontier Model Values},
  author = {Di Xiao},
  year   = {2026},
  url    = {https://github.com/larryxiao/compass}
}

About

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors