Superscalar

Full spec (AI-readable): Superscalar.md →

A rulebook for running multiple AI agents in parallel without losing quality.

Execution-scheduling discipline for parallel sub-agent dispatch.

v0.7.0 §11 Entry 01-06 (n=8) §3.1 Hyperbrief interlock Apache-2.0

What Superscalar is

Superscalar is a rulebook for running multiple AI coding agents in parallel without losing the consistency that pure parallelism gives up.

Superscalar is an execution-scheduling discipline for parallel sub-agent dispatch — bounded parallelism, worktree isolation, in-order retire with a consistency gate. Validated by Entry 06.

The core idea

Central thesis

Pure parallelism gives you speed; Superscalar gives you both.

Parallelism is necessary but not sufficient; the retire stage is coordination, not scheduling.

The four rules inside

The four primitives

§2

Capping concurrency

issue_width formula

A formula that caps how many agents run at once, sized to the task.

An analytical formula bounding in-flight parallelism per lane class.

§3

Per-agent workspaces

Worktree isolation

Each agent gets its own copy of the files.

Dedicated git worktree per write-disjoint sub-agent; underpins speculation and rollback.

§3 retire

The retire stage — merge and check

In-order retire + consistency gate + completeness critic

Results merged in fixed order, checked for contradictions and gaps.

Deterministic merge + cross-lane contradiction reconciliation + uncovered-region mapping.

§4

Speculative work — cost vs benefit

Opt-in speculation + cost-benefit admission

Start work before certainty — when the cost-benefit check says it's worth it.

Speculative dispatch gated by an explicit cost-benefit admission check.

The headline experiment — same code, two approaches

Entry 06 — the headline A/B

We ran the same audit two ways — fast/free vs strict-rules — same number of parallel agents, only the rulebook differed.

Real payment-backend audit, parallel both ways. Arm A naïve max-parallel vs Arm B Superscalar discipline. Single variable: orchestration.

          +118%
          more sourced citations
          grounding (crossRefs)
        

          −40%
          fewer speculative statements
          speculation
        

2.65× longer to finish wall-clock cost

+8% more tokens used token cost

The measurements

Quantitative readouts

What was measured	Metric	Way A (simple)	Arm A (OFF)	Way B (disciplined)	Arm B (ON)	Difference / winner	Δ / Winner

What only the disciplined approach produced

Arm B-only artifacts — what naïve max-parallel structurally cannot produce

The load-bearing meta-finding

Two inspectors saw both halves of a real contradiction. Without coordination, it goes uncaught. Same observations, totally different outcome.

Both arms found a real password-hash contradiction. Arm A logged both halves separately. Arm B's consistency gate paired them and resolved it. Without retire, a real security defect would have shipped silently.

When to use which approach

The trade-off — when to use which arm

Arm A — naïve max-parallel (Workflows default mode)

Use for a quick first look. Fast, but misses anything that needs connecting two separate observations.

Right for fast first-pass recon: 2.65× faster, −8% tokens. Loses dedup, contradiction resolution, completeness map.

Arm B — Superscalar discipline

Use when correctness matters more than speed. 2.6× longer, but catches real contradictions. Slowdown is from agreement-first, not slow work.

Right for handover-grade audits. 2.65× cost is phase serialisation, NOT parallel inefficiency. Retire = coordination, not scheduling.

The full picture in one paragraph

Parallel agents produce parallel reports that don't talk to each other. Superscalar adds an editor role: reads all reports, finds contradictions, dedupes, asks what was missed. The added time isn't wasted — it's the cost of making reports actually fit together.

Superscalar looks like parallelism policy but is coordination policy. Parallelism alone doesn't produce cross-lane consistency. Retire stage is coordination, not scheduling. Wall-clock cost is phase serialisation.

Install the Superscalar plugin

Two steps from a fresh Claude Code install — register the EstreGenesis marketplace once, then install Superscalar.

A Register the marketplace (one time)

Run this once per machine — it points Claude Code at the EstreGenesis marketplace.

/plugin marketplace add SoliEstre/EstreGenesis

B Install Superscalar

Install the Superscalar plugin from the marketplace.

/plugin install superscalar@estregenesis-plugins

Installing brings 2 skills: /superscalar (the dispatch discipline the agent consults before fan-outs) and /subscaler (the tiered-model-composition toggle — frontier main, execution-tier subagents).

Example prompts

Drop one of these into Claude Code after install to see Superscalar in action.

1 Superscalar — parallel deep-research dispatch

Before drafting something where the source material is wide, kick off a research pass that fans out across multiple axes in parallel — Superscalar handles the dispatch and reorder so the results come back consolidated.

Before I draft this, run a deep research pass through Superscalar Workflows. Axes to cover: [your axes here].

When to use Superscalar — five everyday situations

Five canonical dispatch scenarios

Below are five situations where these rules earn back the extra time they take.

Spectrum from Way-A unconstrained fan-out to Way-B disciplined dispatch — each scenario states regime, shape, and measured trade-off.

From a quick first look to a careful handover check.

Ordered by escalating consistency demand: triage scan → audit → read-only sweep → batched edits → shared reshape. Way-A where contradictions are tolerable, Way-B where ship-gates forbid them.

Quickly looking around unfamiliar code

Triage scan of an unfamiliar codebase

Use this when someone has just handed you an unfamiliar code project.

Use this when handed an unfamiliar codebase and the goal is a fast structural map, not verified findings.

You want about an hour of rough overview.

~1h wall-clock budget; output is a prioritised attention list.

You do not need carefully checked findings yet — just a list of what to look at first.

Cross-lane consistency is out of scope; retire is deliberately omitted because its cost is not earned back at the exploratory tier.

How:

Dispatch shape:

Split into a few main areas.
Send several AI agents in parallel, one per area, with no extra rules (Way A).
Each agent records its own findings.
Combine the notes and sort priorities together.

Partition the codebase into N coarse-grained areas (one per component or risk surface).
Fan out N inspector sub-agents via the Agent tool — no foundation, no coordination, no retire (Way A regime).
Each lane emits findings independently; no shared schema enforced.
Operator merges raw notes into a single triage list and ranks by inspection (not by gate).

A 9-area scan finishes in 8–10 minutes. Fast, but contradictions slip through.

9-lane scan in 8-10 min wall-clock. Cross-lane contradictions leak unfiltered — the priced-in trade-off of the exploratory tier.

Checking a payment or login system for security holes

Security audit of a payment or auth backend

Use this when checking payment or login systems, where design-vs-code mismatches become security holes.

Use this when auditing a payment / auth backend where spec-vs-implementation divergence is itself the threat vector.

Quality is non-negotiable here.

Ship-gate tolerates zero hidden contradictions; retire is non-optional before sign-off.

How:

Dispatch shape:

Turn the rules on (Way B).
Have foundation agents agree on the shape of the data and lock it.
Dispatch inspectors in parallel; each ties findings back to specific code lines.
Pair contradicting observations and resolve them explicitly.
Produce a gap map for the next round.

Activate Way-B regime: foundation prelude + parallel issue + in-order retire all on.
Foundation prelude: sub-agents read the spec, agree on schema + invariants, lock the shared model before any inspector starts.
Parallel issue: fan out 5 inspector sub-agents (Entry 06 n=5). Each finding must ground to a file:line citation; ungrounded claims rejected upstream.
Retire — consistency gate: a serial pass pairs cross-lane contradictions ('spec says BCrypt' vs 'impl uses SHA-256') and resolves them explicitly.
Retire — completeness critic: emits a gap map (inspected vs skipped) that seeds the next iteration's dispatch plan.

24 minutes elapsed, +118% grounded citations, −40% guesswork, zero hidden contradictions.

Entry 06 A/B (n=8): 24 min wall-clock. +118% grounded refs, −40% speculative claims, 0 hidden contradictions. Wall-clock delta = prelude + retire cost, not parallelism loss.

Reading a lot of files at once (no writing)

Read-only sweep above the issue_width cap

Use this when you need more parallel agents than usual — and every agent is only reading, not writing.

Use this when you want to dispatch above the issue_width cap and can statically prove every lane is read-only.

Read-only means no clash to worry about.

Read-only lanes are write-disjoint by construction; the WAW / WAR hazard class the issue_width cap bounds does not exist here.

How:

Dispatch shape:

Check the recommended cap (say, 6 agents).
Confirm the host can handle one more (say, 7).
Dispatch all 7 read-only agents in parallel.

Compute the issue_width cap from §2's formula for the read-only workload class (e.g., w=6).
Verify host headroom: host can sustain w+1 concurrent lanes without thrash (memory, FDs, rate-limit).
Statically prove every lane is read-only (no Edit / Write / side-effecting Bash); dispatch all w+1 lanes under §2's read-only admission exception.

Wider coverage in the same time, with no quality drop.

Wider coverage at the same wall-clock, no quality regression. issue_width cap bounds write-hazard probability; under proven read-only semantics that probability is zero, so the cap is safely relaxed.

Several small fixes in different files, shipped together

Batched independent edits with planned-order retire

Use this when you have 4-5 small edits in different files that need to ship together.

Use this when 4-5 independent edits ship together and touch disjoint files.

Parallel for speed, but clean history in planned order.

Goal: parallel issue for throughput + in-order retire so the merge log reads in planned order regardless of completion order.

How:

Dispatch shape:

Write the desired merge order in the work list.
Give each edit its own private working folder (worktree).
Run all edits at once.
Merge in planned order, not completion order.

Record the intended merge order in the work list — the retire-stage ordering invariant, fixed at issue time.
Provision a dedicated git worktree per edit lane (§3). Each lane writes exclusively to its own worktree → write-disjoint at file granularity.
Dispatch all edit sub-agents concurrently via Workflow.parallel or Agent fan-out; lanes need no coordination because write sets are disjoint.
In-order retire: merge completed worktrees back into main in planned order — never in completion order. Reorder buffer holds early-finishers until predecessor retires.

Edits run in parallel, history reads in planned order. Example: 3 edits merged in 5 minutes.

Parallel issue throughput + planned-order merge log + zero file collisions by construction. Reference run: 3 edits merged in planned order in ~5 min wall-clock.

Reshaping a shared piece that many other pieces depend on

Reshaping a shared dependency with many callers

Use this when reshaping a shared piece many other parts depend on.

Use this when reshaping a shared dependency (function signature, type, API contract) that many callers depend on.

Lock the shape first, or the changes collide.

If caller-site lanes start before the lock, they target stale signatures and retire must discard their work. Lock first, fan out second.

How:

Dispatch shape:

One foundation agent reshapes the piece and locks the new shape.
Do not start dependent edits until the lock is in place.
Then send all dependent agents in parallel to update their callers.

Foundation prelude: one foundation sub-agent reshapes the piece, writes the new signature / type / contract to a lockable artefact, and commits the lock.
Admission barrier: no dependent-edit lane admitted until the foundation lock is in place (§3 admission-gate precondition preventing stale-target dispatch).
Post-lock fan-out: dispatch all caller-site sub-agents in parallel against the locked shape, each updating its own caller code. All lanes read the same locked design → cross-lane consistency at the shared boundary is invariant by construction.

No collisions at the shared boundary. Example: 3 agents, 3 files, zero collisions.

Zero collisions at the shared boundary by construction (not by reconciliation). Reference: 3 sub-agents updated 3 separate caller files in parallel post-lock, 0 shared-boundary collisions.

Want to read more?

Read the full spec

The full rulebook file (Superscalar.md) is in the EstreGenesis project on GitHub.

github.com/SoliEstre/EstreGenesis/blob/main/Superscalar.md →