Monte Carlo Tree Search — Decision Making with Random Simulation

Why Monte Carlo Tree Search Matters

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that made headlines when it powered AlphaGo's victory over world champion Lee Sedol in 2016. MCTS combines the randomness of Monte Carlo methods with tree search to make optimal decisions in complex environments.

Why this matters for your career:

  • MCTS is the foundation of modern game AI (AlphaGo, AlphaZero, Chess engines)
  • Used in robotics for motion planning under uncertainty
  • Applicable to resource allocation, scheduling, and optimization problems
  • Understanding MCTS demonstrates advanced algorithmic thinking in interviews

What Is Monte Carlo Tree Search?

MCTS is a best-first search algorithm that builds a search tree incrementally by running random simulations. It does not require a heuristic evaluation function — it discovers good moves by simulating random playouts.

The Four Steps of MCTS

MCTS proceeds in four phases per iteration:

1. SELECTION:  Traverse the tree from root to a leaf node
2. EXPANSION:  Add one or more child nodes to the tree
3. SIMULATION: Run a random playout from the new node
4. BACKPROPAGATION: Update statistics up the tree

UCB1 Formula (Upper Confidence Bound)

During selection, MCTS chooses the child that maximizes:

UCB1 = w_i / n_i + c * sqrt(ln(N) / n_i)

Where:

  • w_i = number of wins after i-th move
  • n_i = number of simulations after i-th move
  • N = total number of simulations for the parent node
  • c = exploration parameter (typically sqrt(2))

This formula balances exploration (trying less-visited moves) and exploitation (choosing moves that have performed well).

Implementation

import math
import random

class MCTSNode:
    def __init__(self, state, parent=None, move=None):
        self.state = state
        self.parent = parent
        self.move = move
        self.children = []
        self.visits = 0
        self.wins = 0
        self.untried_moves = state.get_valid_moves()

    def ucb1(self, c=math.sqrt(2)):
        if self.visits == 0:
            return float('inf')
        exploitation = self.wins / self.visits
        exploration = c * math.sqrt(math.log(self.parent.visits) / self.visits)
        return exploitation + exploration

    def best_child(self, c=math.sqrt(2)):
        return max(self.children, key=lambda child: child.ucb1(c))

    def expand(self):
        move = self.untried_moves.pop()
        next_state = self.state.apply_move(move)
        child = MCTSNode(next_state, parent=self, move=move)
        self.children.append(child)
        return child

    def simulate(self):
        """Random playout from this node's state."""
        state = self.state
        while not state.is_terminal():
            move = random.choice(state.get_valid_moves())
            state = state.apply_move(move)
        return state.get_result()

    def backpropagate(self, result):
        self.visits += 1
        self.wins += result
        if self.parent:
            self.parent.backpropagate(result)


def mcts(root, iterations=1000):
    """Run MCTS for a given number of iterations and return the best move."""
    for _ in range(iterations):
        node = root

        # Selection
        while node.untried_moves == [] and node.children != []:
            node = node.best_child()

        # Expansion
        if node.untried_moves:
            node = node.expand()

        # Simulation
        result = node.simulate()

        # Backpropagation
        node.backpropagate(result)

    # Return best move based on visit count
    best = max(root.children, key=lambda child: child.visits)
    return best.move

Example: Tic-Tac-Toe with MCTS

class TicTacToe:
    def __init__(self):
        self.board = [None] * 9
        self.current_player = 'X'

    def get_valid_moves(self):
        return [i for i, cell in enumerate(self.board) if cell is None]

    def apply_move(self, move):
        new_game = TicTacToe()
        new_game.board = list(self.board)
        new_game.board[move] = self.current_player
        new_game.current_player = 'O' if self.current_player == 'X' else 'X'
        return new_game

    def is_terminal(self):
        return self.get_winner() is not None or None not in self.board

    def get_winner(self):
        lines = [
            [0,1,2], [3,4,5], [6,7,8],  # rows
            [0,3,6], [1,4,7], [2,5,8],  # columns
            [0,4,8], [2,4,6]            # diagonals
        ]
        for line in lines:
            vals = [self.board[i] for i in line]
            if vals[0] == vals[1] == vals[2] and vals[0] is not None:
                return vals[0]
        return None

    def get_result(self):
        winner = self.get_winner()
        if winner == 'X':
            return 1
        elif winner == 'O':
            return -1
        else:
            return 0

# Play: X uses MCTS, O uses random
root = MCTSNode(TicTacToe())
best_move = mcts(root, iterations=5000)
print(f"MCTS recommends move at position: {best_move}")

Advantages Over Traditional Search

| Aspect | Traditional Minimax | MCTS | |--------|-------------------|------| | Needs heuristic evaluation | Yes (hand-crafted) | No (learns from simulation) | | Branching factor limit | Limited (cannot handle Go) | Handles very large branching factors | | Domain knowledge required | Extensive | Minimal (only rules needed) | | Parallelization | Difficult | Trivially parallelizable | | Anytime algorithm | No (needs full search) | Yes (returns best-so-far anytime) | | Memory usage | Grows exponentially with depth | Grows linearly with iterations |

Real-World Applications

| Application | How MCTS Is Used | |-------------|-----------------| | AlphaGo / AlphaZero | MCTS combined with neural networks for Go and Chess | | Video Game AI | Unit positioning, resource management, strategy | | Robotics | Motion planning in uncertain environments | | Scheduling | Job shop scheduling, resource allocation | | Protein Folding | Predicting protein structures | | Transportation | Route planning under uncertain traffic conditions | | Drug Discovery | Molecule optimization and search |

Tuning MCTS Parameters

| Parameter | Effect | Typical Value | |-----------|--------|--------------| | Iterations | Higher = better decisions, slower | 100-100000 depending on time budget | | Exploration constant (c) | Higher = more exploration | sqrt(2) ≈ 1.414 | | Simulation policy | Random vs. light heuristic | Random (simple) or rule-based (better) | | Tree reuse | Keep tree between moves | Keep root = previous best child |

Summary

Monte Carlo Tree Search is a powerful decision-making algorithm that combines random simulation with incremental tree search. It requires no domain-specific heuristics, handles enormous branching factors, and can be stopped anytime for a reasonable answer. Its success in AlphaGo and AlphaZero has made it one of the most important algorithms in modern AI.

Key takeaways:

  • MCTS uses four steps per iteration: Selection, Expansion, Simulation, Backpropagation
  • UCB1 balances exploration vs. exploitation
  • No heuristic evaluation function needed — learns through simulation
  • Trivially parallelizable — each simulation is independent
  • Anytime algorithm — returns best move whenever stopped
  • Used in AlphaGo, video games, robotics, and scheduling

What's Next: Probability Analysis

The next chapter covers probability analysis of randomized algorithms — understanding expected runtime, high-probability bounds, and how to analyze algorithm behavior.

Unlock Full Tutorial

This chapter is paid content. Join the project to unlock over 5000 words of deep analysis, including 10+ god-tier Prompts and real Source Code examples!