Level
Style

Superscalar

A rulebook for running multiple AI agents in parallel without losing quality.

Execution-scheduling discipline for parallel sub-agent dispatch.

Execution-scheduling discipline for parallel sub-agent dispatch.

v0.4.2 §11 Entry 01-06 (n=8) §3.1 Hyperbrief interlock Apache-2.0

What Superscalar is

What Superscalar is

Overview

Superscalar is a rulebook for running multiple AI coding agents in parallel without losing the consistency that pure parallelism gives up.

Superscalar is an execution-scheduling discipline for parallel sub-agent dispatch — bounded parallelism, worktree isolation, in-order retire with a consistency gate. Validated by Entry 06.

Superscalar specifies an execution-scheduling discipline for parallel sub-agent dispatch — bounded parallelism, worktree-isolated reorder buffer, cost-benefit admission, and in-order retire with consistency gate and completeness critic.

The core idea

Central thesis

Central thesis

Pure parallelism gives you speed; Superscalar gives you both.

Parallelism is necessary but not sufficient; the retire stage is coordination, not scheduling.

Cross-lane consistency arises not from parallelism per se but from the orchestration discipline at the retire stage.

The four rules inside

The four primitives

The four formally specified components

§2

Capping concurrency

issue_width formula

issue_width formula

A formula that caps how many agents run at once, sized to the task.

An analytical formula bounding in-flight parallelism per lane class.

A formula that derives, analytically, an upper bound on in-flight parallelism per lane class.

§3

Per-agent workspaces

Worktree isolation

Worktree-isolated reorder buffer

Each agent gets its own copy of the files.

Dedicated git worktree per write-disjoint sub-agent; underpins speculation and rollback.

A reorder-buffer model assigning isolated worktrees to write-disjoint speculative sub-agents.

§3 retire

The retire stage — merge and check

In-order retire + consistency gate + completeness critic

In-order retire (consistency gate + completeness critic)

Results merged in fixed order, checked for contradictions and gaps.

Deterministic merge + cross-lane contradiction reconciliation + uncovered-region mapping.

The coordination stage at which lane outputs are merged, contradictions reconciled, and uncovered regions enumerated.

§4

Speculative work — cost vs benefit

Opt-in speculation + cost-benefit admission

Opt-in speculation policy

Start work before certainty — when the cost-benefit check says it's worth it.

Speculative dispatch gated by an explicit cost-benefit admission check.

A policy on speculative dispatch admitted by an explicit cost-benefit gate.

The headline experiment — same code, two approaches

Entry 06 — the headline A/B

Entry 06 — the headline A/B measurement

We ran the same audit two ways — fast/free vs strict-rules — same number of parallel agents, only the rulebook differed.

Real payment-backend audit, parallel both ways. Arm A naïve max-parallel vs Arm B Superscalar discipline. Single variable: orchestration.

A downstream payment-backend dogfood — 9-dimension audit, parallelism held constant, orchestration discipline isolated.

+118% more sourced citations grounding (crossRefs) grounded cross-references
−40% fewer speculative statements speculation speculative claims
2.65× longer to finish wall-clock cost wall-clock cost
+8% more tokens used token cost token cost

The measurements

Quantitative readouts

Quantitative measurements

What was measured Metric Metric Way A (simple) Arm A (OFF) Arm A (OFF) Way B (disciplined) Arm B (ON) Arm B (ON) Difference / winner Δ / Winner Δ / Winner

What only the disciplined approach produced

Arm B-only artifacts — what naïve max-parallel structurally cannot produce

Artifacts unique to Arm B — items naïve max-parallel cannot structurally produce

The load-bearing meta-finding

Two inspectors saw both halves of a real contradiction. Without coordination, it goes uncaught. Same observations, totally different outcome.

Both arms found a real password-hash contradiction. Arm A logged both halves separately. Arm B's consistency gate paired them and resolved it. Without retire, a real security defect would have shipped silently.

Both arms surfaced both halves of a real authentication-hash contradiction. Only Arm B's consistency gate reconciled them.

When to use which approach

The trade-off — when to use which arm

Applicability and trade-offs of each arm

Arm A — naïve max-parallel (Workflows default mode)

Use for a quick first look. Fast, but misses anything that needs connecting two separate observations.

Right for fast first-pass recon: 2.65× faster, −8% tokens. Loses dedup, contradiction resolution, completeness map.

Right for fast first-pass reconnaissance.

Arm B — Superscalar discipline

Use when correctness matters more than speed. 2.6× longer, but catches real contradictions. Slowdown is from agreement-first, not slow work.

Right for handover-grade audits. 2.65× cost is phase serialisation, NOT parallel inefficiency. Retire = coordination, not scheduling.

Right for handover-grade audits + cross-dimension consistency.

The full picture in one paragraph

The full picture in one paragraph

Synthesis in a single paragraph

Parallel agents produce parallel reports that don't talk to each other. Superscalar adds an editor role: reads all reports, finds contradictions, dedupes, asks what was missed. The added time isn't wasted — it's the cost of making reports actually fit together.

Superscalar looks like parallelism policy but is coordination policy. Parallelism alone doesn't produce cross-lane consistency. Retire stage is coordination, not scheduling. Wall-clock cost is phase serialisation.

The Superscalar mapping looks like a parallelism policy but is not. The §3 retire stage is a coordination mechanism, not a scheduling mechanism.

When to use Superscalar — five everyday situations

Five canonical dispatch scenarios

Five reference dispatch scenarios

Below are five situations where these rules earn back the extra time they take.

Spectrum from Way-A unconstrained fan-out to Way-B disciplined dispatch — each scenario states regime, shape, and measured trade-off.

Five scenarios delineate the admissible operating envelope from Way-A bypass to Way-B full activation, each specifying regime, topology, and empirically attested trade-off.

From a quick first look to a careful handover check.

Ordered by escalating consistency demand: triage scan → audit → read-only sweep → batched edits → shared reshape. Way-A where contradictions are tolerable, Way-B where ship-gates forbid them.

Monotonically increasing consistency demand. Way-A vs Way-B selection is a function of the downstream ship-gate's admissible contradiction budget.

Quickly looking around unfamiliar code

Triage scan of an unfamiliar codebase

Exploratory triage of an unfamiliar codebase

Use this when someone has just handed you an unfamiliar code project.

Use this when handed an unfamiliar codebase and the goal is a fast structural map, not verified findings.

Applies when the operative objective is a structural map, not verified findings against a normative reference.

You want about an hour of rough overview.

~1h wall-clock budget; output is a prioritised attention list.

~1h wall-clock budget; deliverable is a prioritised attention list, not a graded finding set.

You do not need carefully checked findings yet — just a list of what to look at first.

Cross-lane consistency is out of scope; retire is deliberately omitted because its cost is not earned back at the exploratory tier.

Cross-lane consistency requirement is deliberately suspended; retire omission is the correct operating decision under exploratory amortisation.

How:

Dispatch shape:

Dispatch topology:

  1. Split into a few main areas.
  2. Send several AI agents in parallel, one per area, with no extra rules (Way A).
  3. Each agent records its own findings.
  4. Combine the notes and sort priorities together.
  1. Partition the codebase into N coarse-grained areas (one per component or risk surface).
  2. Fan out N inspector sub-agents via the Agent tool — no foundation, no coordination, no retire (Way A regime).
  3. Each lane emits findings independently; no shared schema enforced.
  4. Operator merges raw notes into a single triage list and ranks by inspection (not by gate).
  1. Codebase partitioned into N coarse-grained regions along natural component / risk-surface boundaries.
  2. N inspector sub-agents dispatched via Agent-tool fan-out under Way-A regime — foundation, coordination, and retire all deliberately omitted.
  3. Each lane writes to a private scratch surface absent a shared schema; their union constitutes the speculative reorder buffer.
  4. Operator performs manual collation and imposes priority by inspection; consistency gate and completeness critic are not invoked.

A 9-area scan finishes in 8–10 minutes. Fast, but contradictions slip through.

9-lane scan in 8-10 min wall-clock. Cross-lane contradictions leak unfiltered — the priced-in trade-off of the exploratory tier.

9-lane scan in 8-10 min wall-clock band. Non-zero unresolved contradiction incidence is the cost basis attributable to retire-stage suspension.

Checking a payment or login system for security holes

Security audit of a payment or auth backend

Spec-grounded security audit of a payment or authentication backend

Use this when checking payment or login systems, where design-vs-code mismatches become security holes.

Use this when auditing a payment / auth backend where spec-vs-implementation divergence is itself the threat vector.

Applies where spec-vs-implementation divergence is itself the threat surface — a vulnerability class undetectable by single-lane inspection of either artefact alone.

Quality is non-negotiable here.

Ship-gate tolerates zero hidden contradictions; retire is non-optional before sign-off.

Ship-gate imposes a contradiction budget of zero; the retire stage with consistency gate and completeness critic is constitutive, not discretionary.

How:

Dispatch shape:

Dispatch topology:

  1. Turn the rules on (Way B).
  2. Have foundation agents agree on the shape of the data and lock it.
  3. Dispatch inspectors in parallel; each ties findings back to specific code lines.
  4. Pair contradicting observations and resolve them explicitly.
  5. Produce a gap map for the next round.
  1. Activate Way-B regime: foundation prelude + parallel issue + in-order retire all on.
  2. Foundation prelude: sub-agents read the spec, agree on schema + invariants, lock the shared model before any inspector starts.
  3. Parallel issue: fan out 5 inspector sub-agents (Entry 06 n=5). Each finding must ground to a file:line citation; ungrounded claims rejected upstream.
  4. Retire — consistency gate: a serial pass pairs cross-lane contradictions ('spec says BCrypt' vs 'impl uses SHA-256') and resolves them explicitly.
  5. Retire — completeness critic: emits a gap map (inspected vs skipped) that seeds the next iteration's dispatch plan.
  1. Way-B regime engaged in full — foundation prelude, parallel issue, in-order retire all activated without exception.
  2. Foundation prelude: sub-agents establish consensus on schema + invariants and commit the shared model as a lock; no inspector lane admissible until the lock is emplaced (cf. §3.1).
  3. Parallel issue: 5 inspector sub-agents dispatched against the locked design. Every finding constrained to admit a file:line grounding citation; ungrounded propositions inadmissible to the reorder buffer.
  4. Retire — consistency gate: serial reduction over cross-lane finding set, pairing propositionally contradictory observations and committing explicit resolutions.
  5. Retire — completeness critic: emits a gap map characterising inspected vs uninspected regions, constituting the formal seed of the next iteration's dispatch plan.

24 minutes elapsed, +118% grounded citations, −40% guesswork, zero hidden contradictions.

Entry 06 A/B (n=8): 24 min wall-clock. +118% grounded refs, −40% speculative claims, 0 hidden contradictions. Wall-clock delta = prelude + retire cost, not parallelism loss.

Entry 06 (controlled A/B, n=8): 24 min wall-clock. +118% grounded refs, −40% speculative claims, 0 hidden contradictions. Wall-clock delta attributable to prelude + retire serialisation, not intra-phase efficiency loss.

Reading a lot of files at once (no writing)

Read-only sweep above the issue_width cap

Read-only sweep with relaxed issue_width admission

Use this when you need more parallel agents than usual — and every agent is only reading, not writing.

Use this when you want to dispatch above the issue_width cap and can statically prove every lane is read-only.

Applies where desired in-flight parallelism exceeds the issue_width bound and every lane is statically provable to operate under read semantics.

Read-only means no clash to worry about.

Read-only lanes are write-disjoint by construction; the WAW / WAR hazard class the issue_width cap bounds does not exist here.

Read-only lanes are write-disjoint by construction; the WAW / WAR hazard classes against which the issue_width bound is the principal protection are absent from the dispatch envelope.

How:

Dispatch shape:

Dispatch topology:

  1. Check the recommended cap (say, 6 agents).
  2. Confirm the host can handle one more (say, 7).
  3. Dispatch all 7 read-only agents in parallel.
  1. Compute the issue_width cap from §2's formula for the read-only workload class (e.g., w=6).
  2. Verify host headroom: host can sustain w+1 concurrent lanes without thrash (memory, FDs, rate-limit).
  3. Statically prove every lane is read-only (no Edit / Write / side-effecting Bash); dispatch all w+1 lanes under §2's read-only admission exception.
  1. Compute nominal admission bound w by §2 issue_width formula instantiated for the read-only class (illustrative: w=6).
  2. Establish host headroom by empirical probe attesting w+1 concurrent lanes sustain without observable thrash (memory, FDs, rate-limit budgets).
  3. Adduce a static read-semantics proof discharging WAW / WAR hazard obligations; dispatch w+1 lanes under §2's read-only exception, the relaxed bound formally justified by hazard-class vacuity.

Wider coverage in the same time, with no quality drop.

Wider coverage at the same wall-clock, no quality regression. issue_width cap bounds write-hazard probability; under proven read-only semantics that probability is zero, so the cap is safely relaxed.

Coverage enlarged at constant wall-clock with no observable regression. issue_width bound exists to bound expected write-hazard incidence; under the static read-only proof that incidence is identically zero, formally establishing admissibility of the relaxed bound.

Several small fixes in different files, shipped together

Batched independent edits with planned-order retire

Worktree-isolated parallel edits with in-order retire

Use this when you have 4-5 small edits in different files that need to ship together.

Use this when 4-5 independent edits ship together and touch disjoint files.

Applies where a finite set of pairwise-independent, write-disjoint edits is to be released as a single bundle.

Parallel for speed, but clean history in planned order.

Goal: parallel issue for throughput + in-order retire so the merge log reads in planned order regardless of completion order.

Operative requirement: parallel issue for throughput conjoined with in-order retire so the merge sequence matches the planned ordering, not the empirical lane-completion ordering.

How:

Dispatch shape:

Dispatch topology:

  1. Write the desired merge order in the work list.
  2. Give each edit its own private working folder (worktree).
  3. Run all edits at once.
  4. Merge in planned order, not completion order.
  1. Record the intended merge order in the work list — the retire-stage ordering invariant, fixed at issue time.
  2. Provision a dedicated git worktree per edit lane (§3). Each lane writes exclusively to its own worktree → write-disjoint at file granularity.
  3. Dispatch all edit sub-agents concurrently via Workflow.parallel or Agent fan-out; lanes need no coordination because write sets are disjoint.
  4. In-order retire: merge completed worktrees back into main in planned order — never in completion order. Reorder buffer holds early-finishers until predecessor retires.
  1. Intended merge order recorded at issue time, constituting the retire-stage ordering invariant against which the retire sequence is to be verified.
  2. Dedicated git worktree provisioned per lane in conformity with §3, rendering distinct lanes' write sets pairwise disjoint at file granularity and constituting the per-worktree reorder-buffer slot.
  3. Edit sub-agents dispatched concurrently via Workflow.parallel (equivalently Agent-tool fan-out); no inter-lane coordination requisite by virtue of write-disjointness established at provisioning.
  4. In-order retire: completed worktrees merged in conformity with the planned ordering invariant, not the empirical completion ordering; reorder buffer retains early-completing lanes pending predecessor retirement, preserving the invariant across the bundle.

Edits run in parallel, history reads in planned order. Example: 3 edits merged in 5 minutes.

Parallel issue throughput + planned-order merge log + zero file collisions by construction. Reference run: 3 edits merged in planned order in ~5 min wall-clock.

Yield: parallel issue throughput, merge log conforming to the planned ordering invariant, zero file-collision incidence by construction. Representative run: 3 pairwise-independent edits retired in planned order at ~5 min wall-clock.

Reshaping a shared piece that many other pieces depend on

Reshaping a shared dependency with many callers

Reshape of a shared dependency under a foundation-lock prelude

Use this when reshaping a shared piece many other parts depend on.

Use this when reshaping a shared dependency (function signature, type, API contract) that many callers depend on.

Applies to reshape of a shared dependency (signature / type / API contract) on which many caller sites depend, inducing a many-to-one write-conflict topology absent proper ordering.

Lock the shape first, or the changes collide.

If caller-site lanes start before the lock, they target stale signatures and retire must discard their work. Lock first, fan out second.

Absent prior foundation-lock emplacement, parallel caller-site lanes will with non-zero probability target a stale signature; the foundation-lock prelude is therefore a precondition of dependent-edit issue-stage admissibility.

How:

Dispatch shape:

Dispatch topology:

  1. One foundation agent reshapes the piece and locks the new shape.
  2. Do not start dependent edits until the lock is in place.
  3. Then send all dependent agents in parallel to update their callers.
  1. Foundation prelude: one foundation sub-agent reshapes the piece, writes the new signature / type / contract to a lockable artefact, and commits the lock.
  2. Admission barrier: no dependent-edit lane admitted until the foundation lock is in place (§3 admission-gate precondition preventing stale-target dispatch).
  3. Post-lock fan-out: dispatch all caller-site sub-agents in parallel against the locked shape, each updating its own caller code. All lanes read the same locked design → cross-lane consistency at the shared boundary is invariant by construction.
  1. Foundation prelude: single foundation sub-agent effects the reshape, transcribes the new signature / type / contract to a lockable artefact, and commits the lock, the locked artefact constituting the shared ground against which all subsequent caller-site lanes are to be evaluated.
  2. Admission barrier (§3): no dependent-edit lane admissible to issue prior to foundation-lock emplacement; the admission gate thereby discharges the stale-target preclusion obligation.
  3. Post-lock parallel issue: caller-site sub-agents dispatched concurrently against the locked artefact, each updating its dedicated caller code. Since every lane reads the identical locked artefact, cross-lane consistency at the shared boundary is preserved as a structural invariant, not by post hoc reconciliation at retire.

No collisions at the shared boundary. Example: 3 agents, 3 files, zero collisions.

Zero collisions at the shared boundary by construction (not by reconciliation). Reference: 3 sub-agents updated 3 separate caller files in parallel post-lock, 0 shared-boundary collisions.

Shared-boundary collision incidence zero by construction, not by post hoc reconciliation — attributable to the structural invariant established at foundation-lock prelude. Representative run: 3 caller-site sub-agents effected concurrent updates on 3 distinct caller files post-lock, 0 shared-boundary collisions.

Try it yourself

Install the plugin

Plugin deployment

Add Superscalar as a plug-in piece to Claude Code, and it checks the two key rules before every parallel run.

Setup is two slash commands. No background server, no extra software.

Superscalar ships as a Claude Code plugin wrapping Superscalar.md as a single model-invoked skill (v0.1.2). 0 npm deps. 0 env. Includes the v0.4.1 §3.1 Hyperbrief interlock.

Superscalar ships as a Claude Code plugin wrapping Superscalar.md as a single model-invoked skill (v0.1.2). Zero deps, zero env. Includes v0.4.1 §3.1 Hyperbrief interlock.

# Add the EstreGenesis marketplace:
/plugin marketplace add SoliEstre/EstreGenesis

# Install the superscalar plugin:
/plugin install superscalar@estregenesis-plugins

# No env needed. No npm install needed.
# Optional: install hyperbrief alongside to activate the §3.1 interlock end-to-end.

Want to read more?

Read the full spec

Reference the full specification

The full rulebook file (Superscalar.md) is in the EstreGenesis project on GitHub.

github.com/SoliEstre/EstreGenesis/blob/main/Superscalar.md →