Practical Parallelism With Claude Code Part 2: The Ensemble Method

June 4, 2025 · 11 min read

In Part 1, we looked at parallel task assignment: different agents tackling different components. There’s another flavor of parallelism waiting in the wings. The ensemble method runs the same prompt through multiple independent Claude Code instances, leans into the stochastic nature of AI generation, and picks the best result from the bunch.

This isn’t just theoretical; it’s a recognized approach used in AI research and production systems for improving output quality through controlled variability.

Why the Ensemble Method Works #

Large language models are inherently stochastic. Even with identical prompts, each run produces different outputs due to:

Temperature settings that introduce controlled randomness
Attention variations that focus on different aspects of the problem
Generation path diversity where slight early differences compound into major variations

What tends to be treated as a liability becomes a secret weapon. The ensemble method not only tolerates variability, it thrives on it.

Each time you run the same prompt through a fresh Claude instance, you’re tossing a new voice into the chorus. With a bit of luck, one of them sings in tune. Think of it like brainstorming with a handful of erratic but sharp interns: most suggestions will be forgettable, one or two might be surprising, and one might actually be usable.

No single review catches everything. One agent may zero in on that SQL injection risk you forgot about. Another might obsess over the caching logic. A third could recommend migrating your entire system to event-driven microservices and then wander off into Kafka metaphors. The point is, they each bring different eyes to the same page.

And there you are in the middle of it all, reviewing the conflicting advice, discarding the nonsense, and plucking out the gold. You get to choose what matters most. Maybe it’s completeness, maybe clarity, maybe sheer audacity. Whatever your criteria, you’re the one making the call, not the model.

The Ensemble vs. Parallel Assignment #

Approach	Use Case	Benefit	Example
Ensemble	Same complex task	Higher quality single result	Code review, system design, debugging
Parallel Assignment	Different subtasks	Faster completion	API + Frontend + Database components

Use ensemble when you need the best possible solution. Use parallel when you just need it done.

Practical Example: Multi-Agent Code Review System #

I’ll demonstrate this by building a system where three Claude Code agents perform identical code reviews on the same codebase, then compare their findings to select the most comprehensive analysis.

System Architecture #

Our ensemble code review system has these components:

Target Codebase: A sample Node.js project with intentional issues (security vulnerabilities, performance problems, code style violations)
Three Identical Agents: Each receives the same comprehensive review prompt
Review Aggregator: Collects all reviews in a structured format
Comparison Dashboard: Web interface to view all three reviews side-by-side and select the winner

Why Code Review Works Well for Ensemble:

High variability potential: Each agent might focus on different types of issues
Clear quality metrics: Easy to judge which review is more thorough or insightful
Practical value: Improves real development workflows
Objective comparison: Can count issues found, categorize by severity, measure coverage

The Target Codebase #

A sample Express.js API with intentionally problematic code that demonstrates various review opportunities:

// server.js - Intentionally problematic code for review
const express = require('express');
const fs = require('fs');
const crypto = require('crypto');

const app = express();
app.use(express.json());

// Issues: No rate limiting, no input validation, SQL injection vulnerability
app.post('/users', (req, res) => {
  const { name, email, password } = req.body;
  const hashedPassword = crypto.createHash('md5').update(password).digest('hex');

  const query = `INSERT INTO users (name, email, password) VALUES ('${name}', '${email}', '${hashedPassword}')`;
  // Simulate DB query - obvious SQL injection vulnerability
  console.log('Executing:', query);

  res.json({ success: true, user: { name, email } });
});

// Issues: Path traversal vulnerability, synchronous file operations
app.get('/files/:filename', (req, res) => {
  const filename = req.params.filename;
  const filepath = `./uploads/${filename}`;

  try {
    const data = fs.readFileSync(filepath, 'utf8');
    res.send(data);
  } catch (err) {
    res.status(404).send('File not found');
  }
});

// Issues: No error handling, memory leak potential
const cache = new Map();
app.get('/cache/:key', (req, res) => {
  const key = req.params.key;

  if (!cache.has(key)) {
    // Simulate expensive operation
    const value = Math.random().toString(36).repeat(1000);
    cache.set(key, value);
  }

  res.json({ key, value: cache.get(key) });
});

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

This code contains multiple categories of issues that different agents might prioritize differently:

Security: SQL injection, path traversal, weak hashing
Performance: Synchronous file operations, unbounded cache growth
Code Quality: No error handling, missing validation, console.log statements
Architecture: No middleware organization, mixed concerns

Setting Up the Ensemble Workflow #

Unlike Part 1’s approach where agents worked on different components, here all three agents review the same codebase. The setup is simpler but the coordination is different:

# Initialize the code review project
mkdir code-review-ensemble && cd code-review-ensemble
git init
echo "node_modules/" > .gitignore

# Create the target codebase (the problematic server.js above)
# We'll create a complete package.json and additional files too

# Set up three identical review worktrees
git checkout -b review-agent-1
git checkout -b review-agent-2
git checkout -b review-agent-3

git worktree add ../review-1 review-agent-1
git worktree add ../review-2 review-agent-2
git worktree add ../review-3 review-agent-3

# Copy the target codebase to each worktree (they need identical starting points)
for i in 1 2 3; do
  cp -r . ../review-$i/
done

Launch the tmux ensemble session:

# Create tmux session with three identical windows
tmux new-session -d -s code-review-ensemble

tmux new-window -t code-review-ensemble:2 -n "reviewer-1"
tmux new-window -t code-review-ensemble:3 -n "reviewer-2"
tmux new-window -t code-review-ensemble:4 -n "reviewer-3"

# Navigate each to its worktree
tmux send-keys -t code-review-ensemble:2 "cd ../review-1" Enter
tmux send-keys -t code-review-ensemble:3 "cd ../review-2" Enter
tmux send-keys -t code-review-ensemble:4 "cd ../review-3" Enter

# Launch Claude Code in each window
tmux send-keys -t code-review-ensemble:2 "claude" Enter
tmux send-keys -t code-review-ensemble:3 "claude" Enter
tmux send-keys -t code-review-ensemble:4 "claude" Enter

# Attach to session
tmux attach-session -t code-review-ensemble

Key Difference from Part 1: All agents start with identical codebases and receive identical prompts. The variation comes from the stochastic nature of AI generation, not different starting conditions.

The Ensemble Review Prompt #

Now give each agent the exact same comprehensive review prompt. The magic happens when identical prompts produce different insights:

Prompt for All Three Agents:

Perform a comprehensive code review of this Node.js Express application. Analyze the codebase for:

Security Vulnerabilities

SQL injection risks

Path traversal vulnerabilities

Authentication/authorization issues

Input validation gaps

Cryptographic weaknesses

Performance Issues

Blocking operations

Memory leaks

Inefficient algorithms

Resource management problems

Scaling bottlenecks

Code Quality Problems

Error handling deficiencies

Code organization issues

Missing validations

Maintainability concerns

Testing gaps

Architecture & Design

Separation of concerns

Middleware usage

API design patterns

Configuration management

Dependency management

Output Format Requirements:

Create a structured review report with:

Executive summary of overall code health

Issues categorized by severity (Critical, High, Medium, Low)

Specific line numbers and code snippets for each issue

Detailed remediation suggestions with code examples

Priority ranking for addressing issues

Estimated effort for each fix

Save your review as: code-review-report.md

Focus on being thorough and practical. Include both immediate fixes and longer-term architectural improvements.

This single prompt will generate three different reviews because each Claude Code instance will:

Prioritize different issue categories (one might focus more on security, another on performance)
Catch different specific issues within the same categories
Suggest different solutions for the same problems
Organize findings differently even with the same format requirements

Building the Comparison Dashboard #

After all three agents complete their reviews, you need a way to compare them side-by-side. First, gather the review files and then build the dashboard in the main project directory.

Prompt for Dashboard Agent:

Create a web-based code review comparison dashboard with these requirements:

Layout Structure:

Three-column layout showing all reviews simultaneously

Expandable sections for each review category (Security, Performance, etc.)

Highlight unique issues found by only one reviewer

Show common issues found by multiple reviewers

Comparison Features:

Issue count summary for each reviewer

Severity distribution charts (Critical/High/Medium/Low)

Side-by-side code snippets for the same issues

“Winner selection” buttons for each category or overall

Data Integration:

Parse markdown review files from local directory: review-1.md, review-2.md, review-3.md

Extract structured data (issues, severity, line numbers, suggestions)

Handle different markdown formatting styles gracefully

Use fetch() to load review files (requires the files to be in same directory as HTML)

Selection Interface:

Allow users to mark their preferred review for each category

Export a “best of” combined review

Save selection decisions for future reference

Technical Requirements:

Single HTML file with embedded CSS/JavaScript

Responsive design for different screen sizes

Works with local files when served via simple HTTP server

Print-friendly styles for documentation

Create comparison-dashboard.html in the main project directory that can be served with python -m http.server.

Running the Ensemble Review #

Once your setup is complete, the workflow becomes:

Execute reviews simultaneously:

# Switch between tmux windows and run the identical prompt in each
# Ctrl-b 2 → paste prompt → wait for completion (reviewer-1)
# Ctrl-b 3 → paste prompt → wait for completion (reviewer-2)
# Ctrl-b 4 → paste prompt → wait for completion (reviewer-3)

Gather the review files:

# Return to main project directory
cd code-review-ensemble

# Copy review files to main directory for dashboard access
cp ../review-1/code-review-report.md ./review-1.md
cp ../review-2/code-review-report.md ./review-2.md
cp ../review-3/code-review-report.md ./review-3.md

Create the comparison dashboard:

# Run the dashboard creation prompt in main directory
claude
# (paste the dashboard prompt into claude)

Compare results:

# Serve the dashboard via HTTP to avoid CORS issues
python -m http.server 8000

# Open dashboard in browser
open http://localhost:8000/comparison-dashboard.html

# Or examine files directly if preferred
diff review-1.md review-2.md
wc -l review-*.md  # Compare lengths

Select the winner: Based on your criteria (thoroughness, practical suggestions, coverage, clarity), choose the best review or combine insights from multiple reviews.

What the Ensemble Method Reveals #

When you run this system, you’ll typically see fascinating variations like:

Agent 1 might produce:

15 security issues, 8 performance issues, 12 code quality issues
Deep focus on SQL injection with detailed prepared statement examples
Brief mentions of caching problems

Agent 2 might produce:

11 security issues, 14 performance issues, 10 code quality issues
Comprehensive analysis of the unbounded cache growth
Detailed async/await refactoring suggestions
Less emphasis on input validation

Agent 3 might produce:

13 security issues, 6 performance issues, 18 code quality issues
Extensive architectural recommendations
Testing strategy suggestions not mentioned by others
Different approach to error handling patterns

The ensemble advantage: You get coverage across all these areas plus the ability to choose the best insights from each review.

Committing and Selecting Results #

After completing the reviews:

# Commit each agent's work
cd ../review-1 && git add . && git commit -m "Agent 1 comprehensive code review"
cd ../review-2 && git add . && git commit -m "Agent 2 comprehensive code review"
cd ../review-3 && git add . && git commit -m "Agent 3 comprehensive code review"

# Return to main project
cd ../code-review-ensemble

# Create a "winner" branch for the selected review
git checkout -b best-review

# Copy the winning review (or create a combined one)
cp ../review-2/code-review-report.md ./final-review.md

# Or use Claude Code to merge insights from multiple reviews:
claude

Prompt for combining reviews:

I have three code reviews from different agents for the same codebase. Please create a comprehensive “best of” review that:

Combines unique insights from each review

Removes duplicate findings

Prioritizes the most critical issues first

Uses the clearest explanations and best code examples

Maintains consistent formatting and organization

The three review files are: ../review-1/code-review-report.md, ../review-2/code-review-report.md, ../review-3/code-review-report.md

When to Use the Ensemble Method #

The ensemble approach has clear trade-offs that determine when it’s worth the extra cost and complexity:

Use Ensemble When: #

High-stakes decisions: Critical system design, security reviews, or architecture choices where the cost of mistakes is high.

Complex problem domains: Code review, debugging difficult issues, system design, or any task where multiple valid approaches exist.

Quality over speed: When you need the best possible result and have time/budget for multiple attempts.

Learning and comparison: When you want to understand the range of possible solutions or improve your prompting technique.

Coverage gaps concern: When a single review might miss important issues due to attention limitations.

Stick with Single Agent When: #

Simple, well-defined tasks: Basic CRUD implementations, straightforward bug fixes, or routine refactoring.

Tight deadlines: When time-to-result matters more than optimal quality.

Cost sensitivity: Each additional agent run costs more in API usage.

Consistent output needed: When you need predictable, standardized results rather than creative variation.

Cost-Benefit Analysis #

For our code review example:

Costs:

3x the compute/API costs
3x the time investment
Additional complexity in comparison and selection
More tmux/git overhead

Benefits:

Higher chance of catching critical security issues
More comprehensive coverage across different problem categories
Better solution diversity (multiple ways to fix the same problem)
Learning opportunity to see different approaches

Break-even point: When the value of catching one additional critical issue exceeds the cost of running two extra reviews.

For a production system review, this almost always favors the ensemble approach. For routine maintenance tasks, single-agent reviews are usually sufficient.

Key Takeaways #

The ensemble method transforms AI stochasticity from a bug into a feature. By embracing rather than fighting the variability in AI outputs, you can:

Improve quality through multiple perspectives on the same problem
Increase coverage by having different agents focus on different aspects
Generate better solutions by comparing multiple approaches
Learn about AI behavior by observing how the same prompt produces different results

Combined with the parallel task assignment from Part 1, you now have two complementary approaches to AI-assisted development:

Ensemble method: Same task, multiple agents, select the best result
Parallel assignment: Different tasks, multiple agents, combine all results

Both leverage tmux and git worktrees to manage complexity, but serve different goals in your development workflow.

AI-assisted development is moving past autocomplete. The real win is better decisions, made with help from a panel of tireless, mildly opinionated machines.