Practical Parallelism With Claude Code Part 2: The Ensemble Method

  ·  11 min read

In Part 1, we looked at parallel task assignment: different agents tackling different components. There’s another flavor of parallelism waiting in the wings. The ensemble method runs the same prompt through multiple independent Claude Code instances, leans into the stochastic nature of AI generation, and picks the best result from the bunch.

This isn’t just theoretical; it’s a recognized approach used in AI research and production systems for improving output quality through controlled variability.

Why the Ensemble Method Works #

Large language models are inherently stochastic. Even with identical prompts, each run produces different outputs due to:

  • Temperature settings that introduce controlled randomness
  • Attention variations that focus on different aspects of the problem
  • Generation path diversity where slight early differences compound into major variations

What tends to be treated as a liability becomes a secret weapon. The ensemble method not only tolerates variability, it thrives on it.

Each time you run the same prompt through a fresh Claude instance, you’re tossing a new voice into the chorus. With a bit of luck, one of them sings in tune. Think of it like brainstorming with a handful of erratic but sharp interns: most suggestions will be forgettable, one or two might be surprising, and one might actually be usable.

No single review catches everything. One agent may zero in on that SQL injection risk you forgot about. Another might obsess over the caching logic. A third could recommend migrating your entire system to event-driven microservices and then wander off into Kafka metaphors. The point is, they each bring different eyes to the same page.

And there you are in the middle of it all, reviewing the conflicting advice, discarding the nonsense, and plucking out the gold. You get to choose what matters most. Maybe it’s completeness, maybe clarity, maybe sheer audacity. Whatever your criteria, you’re the one making the call, not the model.

The Ensemble vs. Parallel Assignment #

Approach Use Case Benefit Example
Ensemble Same complex task Higher quality single result Code review, system design, debugging
Parallel Assignment Different subtasks Faster completion API + Frontend + Database components

Use ensemble when you need the best possible solution. Use parallel when you just need it done.

Practical Example: Multi-Agent Code Review System #

I’ll demonstrate this by building a system where three Claude Code agents perform identical code reviews on the same codebase, then compare their findings to select the most comprehensive analysis.

System Architecture #

Our ensemble code review system has these components:

  • Target Codebase: A sample Node.js project with intentional issues (security vulnerabilities, performance problems, code style violations)
  • Three Identical Agents: Each receives the same comprehensive review prompt
  • Review Aggregator: Collects all reviews in a structured format
  • Comparison Dashboard: Web interface to view all three reviews side-by-side and select the winner

Why Code Review Works Well for Ensemble:

  • High variability potential: Each agent might focus on different types of issues
  • Clear quality metrics: Easy to judge which review is more thorough or insightful
  • Practical value: Improves real development workflows
  • Objective comparison: Can count issues found, categorize by severity, measure coverage

The Target Codebase #

A sample Express.js API with intentionally problematic code that demonstrates various review opportunities:

// server.js - Intentionally problematic code for review
const express = require('express');
const fs = require('fs');
const crypto = require('crypto');

const app = express();
app.use(express.json());

// Issues: No rate limiting, no input validation, SQL injection vulnerability
app.post('/users', (req, res) => {
  const { name, email, password } = req.body;
  const hashedPassword = crypto.createHash('md5').update(password).digest('hex');

  const query = `INSERT INTO users (name, email, password) VALUES ('${name}', '${email}', '${hashedPassword}')`;
  // Simulate DB query - obvious SQL injection vulnerability
  console.log('Executing:', query);

  res.json({ success: true, user: { name, email } });
});

// Issues: Path traversal vulnerability, synchronous file operations
app.get('/files/:filename', (req, res) => {
  const filename = req.params.filename;
  const filepath = `./uploads/${filename}`;

  try {
    const data = fs.readFileSync(filepath, 'utf8');
    res.send(data);
  } catch (err) {
    res.status(404).send('File not found');
  }
});

// Issues: No error handling, memory leak potential
const cache = new Map();
app.get('/cache/:key', (req, res) => {
  const key = req.params.key;

  if (!cache.has(key)) {
    // Simulate expensive operation
    const value = Math.random().toString(36).repeat(1000);
    cache.set(key, value);
  }

  res.json({ key, value: cache.get(key) });
});

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

This code contains multiple categories of issues that different agents might prioritize differently:

  • Security: SQL injection, path traversal, weak hashing
  • Performance: Synchronous file operations, unbounded cache growth
  • Code Quality: No error handling, missing validation, console.log statements
  • Architecture: No middleware organization, mixed concerns

Setting Up the Ensemble Workflow #

Unlike Part 1’s approach where agents worked on different components, here all three agents review the same codebase. The setup is simpler but the coordination is different:

# Initialize the code review project
mkdir code-review-ensemble && cd code-review-ensemble
git init
echo "node_modules/" > .gitignore

# Create the target codebase (the problematic server.js above)
# We'll create a complete package.json and additional files too

# Set up three identical review worktrees
git checkout -b review-agent-1
git checkout -b review-agent-2
git checkout -b review-agent-3

git worktree add ../review-1 review-agent-1
git worktree add ../review-2 review-agent-2
git worktree add ../review-3 review-agent-3

# Copy the target codebase to each worktree (they need identical starting points)
for i in 1 2 3; do
  cp -r . ../review-$i/
done

Launch the tmux ensemble session:

# Create tmux session with three identical windows
tmux new-session -d -s code-review-ensemble

tmux new-window -t code-review-ensemble:2 -n "reviewer-1"
tmux new-window -t code-review-ensemble:3 -n "reviewer-2"
tmux new-window -t code-review-ensemble:4 -n "reviewer-3"

# Navigate each to its worktree
tmux send-keys -t code-review-ensemble:2 "cd ../review-1" Enter
tmux send-keys -t code-review-ensemble:3 "cd ../review-2" Enter
tmux send-keys -t code-review-ensemble:4 "cd ../review-3" Enter

# Launch Claude Code in each window
tmux send-keys -t code-review-ensemble:2 "claude" Enter
tmux send-keys -t code-review-ensemble:3 "claude" Enter
tmux send-keys -t code-review-ensemble:4 "claude" Enter

# Attach to session
tmux attach-session -t code-review-ensemble

Key Difference from Part 1: All agents start with identical codebases and receive identical prompts. The variation comes from the stochastic nature of AI generation, not different starting conditions.

The Ensemble Review Prompt #

Now give each agent the exact same comprehensive review prompt. The magic happens when identical prompts produce different insights:

Prompt for All Three Agents:

Perform a comprehensive code review of this Node.js Express application. Analyze the codebase for:

  1. Security Vulnerabilities

    • SQL injection risks
    • Path traversal vulnerabilities
    • Authentication/authorization issues
    • Input validation gaps
    • Cryptographic weaknesses
  2. Performance Issues

    • Blocking operations
    • Memory leaks
    • Inefficient algorithms
    • Resource management problems
    • Scaling bottlenecks
  3. Code Quality Problems

    • Error handling deficiencies
    • Code organization issues
    • Missing validations
    • Maintainability concerns
    • Testing gaps
  4. Architecture & Design

    • Separation of concerns
    • Middleware usage
    • API design patterns
    • Configuration management
    • Dependency management

Output Format Requirements:

Create a structured review report with:

  • Executive summary of overall code health
  • Issues categorized by severity (Critical, High, Medium, Low)
  • Specific line numbers and code snippets for each issue
  • Detailed remediation suggestions with code examples
  • Priority ranking for addressing issues
  • Estimated effort for each fix

Save your review as: code-review-report.md

Focus on being thorough and practical. Include both immediate fixes and longer-term architectural improvements.

This single prompt will generate three different reviews because each Claude Code instance will:

  • Prioritize different issue categories (one might focus more on security, another on performance)
  • Catch different specific issues within the same categories
  • Suggest different solutions for the same problems
  • Organize findings differently even with the same format requirements

Building the Comparison Dashboard #

After all three agents complete their reviews, you need a way to compare them side-by-side. First, gather the review files and then build the dashboard in the main project directory.

Prompt for Dashboard Agent:

Create a web-based code review comparison dashboard with these requirements:

  1. Layout Structure:

    • Three-column layout showing all reviews simultaneously
    • Expandable sections for each review category (Security, Performance, etc.)
    • Highlight unique issues found by only one reviewer
    • Show common issues found by multiple reviewers
  2. Comparison Features:

    • Issue count summary for each reviewer
    • Severity distribution charts (Critical/High/Medium/Low)
    • Side-by-side code snippets for the same issues
    • “Winner selection” buttons for each category or overall
  3. Data Integration:

    • Parse markdown review files from local directory: review-1.md, review-2.md, review-3.md
    • Extract structured data (issues, severity, line numbers, suggestions)
    • Handle different markdown formatting styles gracefully
    • Use fetch() to load review files (requires the files to be in same directory as HTML)
  4. Selection Interface:

    • Allow users to mark their preferred review for each category
    • Export a “best of” combined review
    • Save selection decisions for future reference
  5. Technical Requirements:

    • Single HTML file with embedded CSS/JavaScript
    • Responsive design for different screen sizes
    • Works with local files when served via simple HTTP server
    • Print-friendly styles for documentation

Create comparison-dashboard.html in the main project directory that can be served with python -m http.server.

Running the Ensemble Review #

Once your setup is complete, the workflow becomes:

  1. Execute reviews simultaneously:

    # Switch between tmux windows and run the identical prompt in each
    # Ctrl-b 2 → paste prompt → wait for completion (reviewer-1)
    # Ctrl-b 3 → paste prompt → wait for completion (reviewer-2)
    # Ctrl-b 4 → paste prompt → wait for completion (reviewer-3)
    
  2. Gather the review files:

    # Return to main project directory
    cd code-review-ensemble
    
    # Copy review files to main directory for dashboard access
    cp ../review-1/code-review-report.md ./review-1.md
    cp ../review-2/code-review-report.md ./review-2.md
    cp ../review-3/code-review-report.md ./review-3.md
    
  3. Create the comparison dashboard:

    # Run the dashboard creation prompt in main directory
    claude
    # (paste the dashboard prompt into claude)
    
  4. Compare results:

    # Serve the dashboard via HTTP to avoid CORS issues
    python -m http.server 8000
    
    # Open dashboard in browser
    open http://localhost:8000/comparison-dashboard.html
    
    # Or examine files directly if preferred
    diff review-1.md review-2.md
    wc -l review-*.md  # Compare lengths
    
  5. Select the winner: Based on your criteria (thoroughness, practical suggestions, coverage, clarity), choose the best review or combine insights from multiple reviews.

What the Ensemble Method Reveals #

When you run this system, you’ll typically see fascinating variations like:

Agent 1 might produce:

  • 15 security issues, 8 performance issues, 12 code quality issues
  • Deep focus on SQL injection with detailed prepared statement examples
  • Brief mentions of caching problems

Agent 2 might produce:

  • 11 security issues, 14 performance issues, 10 code quality issues
  • Comprehensive analysis of the unbounded cache growth
  • Detailed async/await refactoring suggestions
  • Less emphasis on input validation

Agent 3 might produce:

  • 13 security issues, 6 performance issues, 18 code quality issues
  • Extensive architectural recommendations
  • Testing strategy suggestions not mentioned by others
  • Different approach to error handling patterns

The ensemble advantage: You get coverage across all these areas plus the ability to choose the best insights from each review.

Committing and Selecting Results #

After completing the reviews:

# Commit each agent's work
cd ../review-1 && git add . && git commit -m "Agent 1 comprehensive code review"
cd ../review-2 && git add . && git commit -m "Agent 2 comprehensive code review"
cd ../review-3 && git add . && git commit -m "Agent 3 comprehensive code review"

# Return to main project
cd ../code-review-ensemble

# Create a "winner" branch for the selected review
git checkout -b best-review

# Copy the winning review (or create a combined one)
cp ../review-2/code-review-report.md ./final-review.md

# Or use Claude Code to merge insights from multiple reviews:
claude

Prompt for combining reviews:

I have three code reviews from different agents for the same codebase. Please create a comprehensive “best of” review that:

  1. Combines unique insights from each review
  2. Removes duplicate findings
  3. Prioritizes the most critical issues first
  4. Uses the clearest explanations and best code examples
  5. Maintains consistent formatting and organization

The three review files are: ../review-1/code-review-report.md, ../review-2/code-review-report.md, ../review-3/code-review-report.md

When to Use the Ensemble Method #

The ensemble approach has clear trade-offs that determine when it’s worth the extra cost and complexity:

Use Ensemble When: #

High-stakes decisions: Critical system design, security reviews, or architecture choices where the cost of mistakes is high.

Complex problem domains: Code review, debugging difficult issues, system design, or any task where multiple valid approaches exist.

Quality over speed: When you need the best possible result and have time/budget for multiple attempts.

Learning and comparison: When you want to understand the range of possible solutions or improve your prompting technique.

Coverage gaps concern: When a single review might miss important issues due to attention limitations.

Stick with Single Agent When: #

Simple, well-defined tasks: Basic CRUD implementations, straightforward bug fixes, or routine refactoring.

Tight deadlines: When time-to-result matters more than optimal quality.

Cost sensitivity: Each additional agent run costs more in API usage.

Consistent output needed: When you need predictable, standardized results rather than creative variation.

Cost-Benefit Analysis #

For our code review example:

Costs:

  • 3x the compute/API costs
  • 3x the time investment
  • Additional complexity in comparison and selection
  • More tmux/git overhead

Benefits:

  • Higher chance of catching critical security issues
  • More comprehensive coverage across different problem categories
  • Better solution diversity (multiple ways to fix the same problem)
  • Learning opportunity to see different approaches

Break-even point: When the value of catching one additional critical issue exceeds the cost of running two extra reviews.

For a production system review, this almost always favors the ensemble approach. For routine maintenance tasks, single-agent reviews are usually sufficient.

Key Takeaways #

The ensemble method transforms AI stochasticity from a bug into a feature. By embracing rather than fighting the variability in AI outputs, you can:

  1. Improve quality through multiple perspectives on the same problem
  2. Increase coverage by having different agents focus on different aspects
  3. Generate better solutions by comparing multiple approaches
  4. Learn about AI behavior by observing how the same prompt produces different results

Combined with the parallel task assignment from Part 1, you now have two complementary approaches to AI-assisted development:

  • Ensemble method: Same task, multiple agents, select the best result
  • Parallel assignment: Different tasks, multiple agents, combine all results

Both leverage tmux and git worktrees to manage complexity, but serve different goals in your development workflow.

AI-assisted development is moving past autocomplete. The real win is better decisions, made with help from a panel of tireless, mildly opinionated machines.