Monte Carlo Tree Search — Decision Making with Random Simulation
Why Monte Carlo Tree Search Matters
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm that made headlines when it powered AlphaGo's victory over world champion Lee Sedol in 2016. MCTS combines the randomness of Monte Carlo methods with tree search to make optimal decisions in complex environments.
Why this matters for your career:
- MCTS is the foundation of modern game AI (AlphaGo, AlphaZero, Chess engines)
- Used in robotics for motion planning under uncertainty
- Applicable to resource allocation, scheduling, and optimization problems
- Understanding MCTS demonstrates advanced algorithmic thinking in interviews
What Is Monte Carlo Tree Search?
MCTS is a best-first search algorithm that builds a search tree incrementally by running random simulations. It does not require a heuristic evaluation function — it discovers good moves by simulating random playouts.
The Four Steps of MCTS
MCTS proceeds in four phases per iteration:
1. SELECTION: Traverse the tree from root to a leaf node
2. EXPANSION: Add one or more child nodes to the tree
3. SIMULATION: Run a random playout from the new node
4. BACKPROPAGATION: Update statistics up the tree
UCB1 Formula (Upper Confidence Bound)
During selection, MCTS chooses the child that maximizes:
UCB1 = w_i / n_i + c * sqrt(ln(N) / n_i)
Where:
- w_i = number of wins after i-th move
- n_i = number of simulations after i-th move
- N = total number of simulations for the parent node
- c = exploration parameter (typically sqrt(2))
This formula balances exploration (trying less-visited moves) and exploitation (choosing moves that have performed well).
Implementation
import math
import random
class MCTSNode:
def __init__(self, state, parent=None, move=None):
self.state = state
self.parent = parent
self.move = move
self.children = []
self.visits = 0
self.wins = 0
self.untried_moves = state.get_valid_moves()
def ucb1(self, c=math.sqrt(2)):
if self.visits == 0:
return float('inf')
exploitation = self.wins / self.visits
exploration = c * math.sqrt(math.log(self.parent.visits) / self.visits)
return exploitation + exploration
def best_child(self, c=math.sqrt(2)):
return max(self.children, key=lambda child: child.ucb1(c))
def expand(self):
move = self.untried_moves.pop()
next_state = self.state.apply_move(move)
child = MCTSNode(next_state, parent=self, move=move)
self.children.append(child)
return child
def simulate(self):
"""Random playout from this node's state."""
state = self.state
while not state.is_terminal():
move = random.choice(state.get_valid_moves())
state = state.apply_move(move)
return state.get_result()
def backpropagate(self, result):
self.visits += 1
self.wins += result
if self.parent:
self.parent.backpropagate(result)
def mcts(root, iterations=1000):
"""Run MCTS for a given number of iterations and return the best move."""
for _ in range(iterations):
node = root
# Selection
while node.untried_moves == [] and node.children != []:
node = node.best_child()
# Expansion
if node.untried_moves:
node = node.expand()
# Simulation
result = node.simulate()
# Backpropagation
node.backpropagate(result)
# Return best move based on visit count
best = max(root.children, key=lambda child: child.visits)
return best.move
Example: Tic-Tac-Toe with MCTS
class TicTacToe:
def __init__(self):
self.board = [None] * 9
self.current_player = 'X'
def get_valid_moves(self):
return [i for i, cell in enumerate(self.board) if cell is None]
def apply_move(self, move):
new_game = TicTacToe()
new_game.board = list(self.board)
new_game.board[move] = self.current_player
new_game.current_player = 'O' if self.current_player == 'X' else 'X'
return new_game
def is_terminal(self):
return self.get_winner() is not None or None not in self.board
def get_winner(self):
lines = [
[0,1,2], [3,4,5], [6,7,8], # rows
[0,3,6], [1,4,7], [2,5,8], # columns
[0,4,8], [2,4,6] # diagonals
]
for line in lines:
vals = [self.board[i] for i in line]
if vals[0] == vals[1] == vals[2] and vals[0] is not None:
return vals[0]
return None
def get_result(self):
winner = self.get_winner()
if winner == 'X':
return 1
elif winner == 'O':
return -1
else:
return 0
# Play: X uses MCTS, O uses random
root = MCTSNode(TicTacToe())
best_move = mcts(root, iterations=5000)
print(f"MCTS recommends move at position: {best_move}")
Advantages Over Traditional Search
| Aspect | Traditional Minimax | MCTS | |--------|-------------------|------| | Needs heuristic evaluation | Yes (hand-crafted) | No (learns from simulation) | | Branching factor limit | Limited (cannot handle Go) | Handles very large branching factors | | Domain knowledge required | Extensive | Minimal (only rules needed) | | Parallelization | Difficult | Trivially parallelizable | | Anytime algorithm | No (needs full search) | Yes (returns best-so-far anytime) | | Memory usage | Grows exponentially with depth | Grows linearly with iterations |
Real-World Applications
| Application | How MCTS Is Used | |-------------|-----------------| | AlphaGo / AlphaZero | MCTS combined with neural networks for Go and Chess | | Video Game AI | Unit positioning, resource management, strategy | | Robotics | Motion planning in uncertain environments | | Scheduling | Job shop scheduling, resource allocation | | Protein Folding | Predicting protein structures | | Transportation | Route planning under uncertain traffic conditions | | Drug Discovery | Molecule optimization and search |
Tuning MCTS Parameters
| Parameter | Effect | Typical Value | |-----------|--------|--------------| | Iterations | Higher = better decisions, slower | 100-100000 depending on time budget | | Exploration constant (c) | Higher = more exploration | sqrt(2) ≈ 1.414 | | Simulation policy | Random vs. light heuristic | Random (simple) or rule-based (better) | | Tree reuse | Keep tree between moves | Keep root = previous best child |
Summary
Monte Carlo Tree Search is a powerful decision-making algorithm that combines random simulation with incremental tree search. It requires no domain-specific heuristics, handles enormous branching factors, and can be stopped anytime for a reasonable answer. Its success in AlphaGo and AlphaZero has made it one of the most important algorithms in modern AI.
Key takeaways:
- MCTS uses four steps per iteration: Selection, Expansion, Simulation, Backpropagation
- UCB1 balances exploration vs. exploitation
- No heuristic evaluation function needed — learns through simulation
- Trivially parallelizable — each simulation is independent
- Anytime algorithm — returns best move whenever stopped
- Used in AlphaGo, video games, robotics, and scheduling
What's Next: Probability Analysis
The next chapter covers probability analysis of randomized algorithms — understanding expected runtime, high-probability bounds, and how to analyze algorithm behavior.