Consilium

consilium (Latin): counsel, deliberation, plan

5 frontier LLMs debate. Claude judges. Structured multi-model deliberation for decisions that matter, grounded in decades of group reasoning research.

$ uv tool install consilium

Blind. Debate. Judge.

The pipeline is designed to prevent anchoring, ensure genuine disagreement, and synthesise competing perspectives into actionable recommendations.

I
Blind Phase
All models generate independent positions in parallel. No model sees another's answer. Prevents first-speaker anchoring.
II
Deliberation
Models see all blind claims, then debate. A rotating challenger must argue the contrarian position. Consensus triggers early exit.
III
Judgement
Claude Opus synthesises using Analysis of Competing Hypotheses. Evaluates evidence against each position. Eliminates rather than confirms.

A council in session

consilium — council mode
$ consilium "Should we use microservices or monolith?"
───────────────────────────────────────────────
Blind phase · 5 models · independent positions
  GPT Start monolith. Extract services at pain points.
  Gemini Modular monolith with domain boundaries.
  Grok Microservices — but only if you have the team.
  DeepSeek Monolith. Premature distribution is the root of all evil.
  GLM Event-driven monolith with async boundaries.
───────────────────────────────────────────────
Debate · round 1 · challenger: Grok
  Grok 4/5 said monolith. But none addressed data team
    independence. Shared DB = shared bottleneck.
───────────────────────────────────────────────
Judge · Claude Opus · ACH synthesis
  Decision: Modular monolith with event boundaries.
  Confidence: high (4/5 convergence + challenger
  raised valid data isolation concern).

Seven ways to deliberate

From quick parallel sampling to full adversarial stress-testing. Auto-routes by difficulty, or choose explicitly.

--council
~$0.50
Full multi-round deliberation with blind phase, debate, and judge synthesis.
--quick
~$0.10
Parallel independent sampling. Fast opinions without debate.
--oxford
~$0.40
Binary for/against debate with rebuttals and a verdict.
--redteam
~$0.20
Adversarial stress-test of a plan or proposal.
--socratic
~$0.30
Probing questions to expose hidden assumptions.
--discuss
~$0.30
Hosted roundtable exploration of open-ended topics.
--solo
~$0.40
Claude debates itself in multiple assigned roles.

Research, not vibes

Every design decision maps to validated principles from group deliberation research.

Independence before exposure
Surowiecki · Delphi · Tetlock
The blind phase captures positions before herding. The single most validated principle in group reasoning.
Structured dissent
Nemeth 2001
Assigned devil's advocates tend to produce bolstering, not genuine reconsideration. The challenger uses questions and different priors instead.
Convergence as signal
Tetlock · Good Judgment Project
When independent agents with different priors agree, the evidence is multiplicative. The judge extremises convergent conclusions.
Competing hypotheses
Heuer · CIA ACH Framework
The judge lists competing conclusions, evaluates evidence against each, and eliminates rather than confirms. Counters confirmation bias.

Five deliberators. One judge.

models
Council
  GPT       gpt-5.2-pro
  Gemini    gemini-3.1-pro-preview
  Grok      grok-4
  DeepSeek  deepseek-r1
  GLM       glm-5
 
Judge
  Claude    claude-opus-4-6

One command.

install
$ uv tool install consilium
$ export OPENROUTER_API_KEY=sk-or-v1-...
$ consilium "Your question here"