4.2 Random Forests: The Wisdom of Multiple Trees

Introduction

A Random Forest is an ensemble of multiple decision trees working together to make a decision. Each tree is trained on a different random sample of the data, and the final prediction is obtained by majority voting. It's like consulting a small panel instead of relying on a single opinion.

🏒

Activity

Random Forest Visualizer: Many Trees, One Decision

How to Explore It

  1. Define the Forest: Choose how many trees to train (typically 10-100). More trees generally mean higher accuracy but more computation.
  2. Train the Forest: Each tree trains on a bootstrap sample and a random subset of features. Observe the diversity between trees.
  3. Visualize Predictions: For a new point, see how each tree votes. The class with most votes is the forest's final prediction.
What to watch for: You'll explore how multiple decision trees work together in a Random Forest. You'll see how randomness in training leads to diversity, and how majority voting improves final accuracy.

Interactive Demonstration

Random Forest Builder

Configuration

Metrics

β€”

Forest votes

Click on the chart

Forest Decision Boundary

Individual trees

Key Concepts

How Does a Random Forest Work?

Construction Process

A Random Forest is built in three main steps:

  1. Bootstrap Sampling: For each tree, take a random sample with replacement from the training set (some cases repeat, others are omitted)

  2. Random Feature Selection: At each node of the tree, only consider a random subset of features for the split (typically $\sqrt{n}$ features out of $n$ total)

  3. Majority Voting: To classify a new case, each tree votes and the most voted class is chosen

Mathematical formula: For $T$ trees and $K$ classes, the prediction for a case $x$ is:

$$ \hat{y}(x) = \text{argmax}_{k} \sum_{t=1}^{T} \mathbb{1}[h_t(x) = k] $$

where $h_t(x)$ is the prediction of tree $t$ and $\mathbb{1}$ is the indicator function.

Advantages of Random Forest

Strengths

Why use multiple trees?

  • Reduces overfitting: A single tree can memorize training data, but averaging many trees generalizes better
  • Handles noisy data: Individual errors compensate each other
  • Robust to outliers: Majority voting is less sensitive to extreme cases
  • Estimates feature importance: Can measure which features are most useful for classification
  • Works well without much tuning: Fewer critical hyperparameters to optimize

Important Parameters

Forest Configuration

The main parameters of a Random Forest are:

  • Number of trees ($T$): Typically 10-500. More trees improve accuracy but increase computation time
  • Maximum depth: Limits the complexity of each tree
  • Minimum samples per node: Controls when to stop splitting
  • Number of features per split: Generally $\sqrt{n}$ for classification, $n/3$ for regression
  • Bootstrap size: Percentage of data for each tree (usually 100%)

Comparison: Single Tree vs. Forest

AspectDecision TreeRandom Forest
InterpretabilityHigh (you can follow each rule)Medium (it's a set of trees)
AccuracyGoodExcellent
OverfittingProneHighly resistant
Training timeFastSlower (trains multiple trees)
Prediction timeVery fastModerate (consults multiple trees)
RobustnessSensitive to data changesVery robust
When to use Random Forest?

Random Forest is ideal when:

  • You need high accuracy
  • You have enough data (at least hundreds of cases)
  • It's not critical to explain every decision in detail
  • You want a robust model that works well "out of the box"

Use a single tree when:

  • Interpretability is critical (e.g., strict compliance requirements)
  • You have little data
  • You need extremely fast real-time predictions

Out-of-Bag (OOB) Error

Built-in Validation

A unique feature of Random Forest is the Out-of-Bag (OOB) error:

  • Each tree trains on ~63% of the data (bootstrap)
  • The remaining ~37% are "out-of-bag" data for that tree
  • Generalization error can be estimated using OOB predictions without needing a separate validation set

This makes Random Forest especially useful when data is limited, as it leverages the entire set for training and validation simultaneously.

Practical Applications

Random Forests are often used for:

  • Routing and categorization: Assign items to teams/queues based on multiple signals
  • Risk or propensity scoring: Combine many weak signals robustly
  • Feature importance exploration: Identify which inputs are most predictive (with caveats)

They’re a strong default when you need solid performance without extensive tuning.

Experiment

Use the interactive demonstration to:

  1. Observe how different numbers of trees affect accuracy
  2. See the diversity between trees in the forest
  3. Compare predictions of individual trees vs. the complete forest
  4. Understand how majority voting smooths decisions
Practical Note

Although Random Forest is very powerful, it consumes more memory and computation time than a single tree. In latency-sensitive systems, you may need to balance accuracy with response speed.