4.2 Random Forests: The Wisdom of Multiple Trees

Introduction

A Random Forest is an ensemble of multiple decision trees working together to make a decision. Each tree is trained on a different random sample of the data, and the final prediction is obtained by majority voting. It's like consulting a small panel instead of relying on a single opinion.

🏢

Activity

Random Forest Visualizer: Many Trees, One Decision

How to Explore It

Define the Forest: Choose how many trees to train (typically 10-100). More trees generally mean higher accuracy but more computation.
Train the Forest: Each tree trains on a bootstrap sample and a random subset of features. Observe the diversity between trees.
Visualize Predictions: For a new point, see how each tree votes. The class with most votes is the forest's final prediction.

What to watch for: You'll explore how multiple decision trees work together in a Random Forest. You'll see how randomness in training leads to diversity, and how majority voting improves final accuracy.

Interactive Demonstration

Random Forest Builder

Configuration

Trees: 10

Max. depth: 4

Features (%): 70%

Metrics

—

Forest votes

Click on the chart

Forest Decision Boundary

Individual trees

Key Concepts

How Does a Random Forest Work?

Construction Process

A Random Forest is built in three main steps:

Bootstrap Sampling: For each tree, take a random sample with replacement from the training set (some cases repeat, others are omitted)
Random Feature Selection: At each node of the tree, only consider a random subset of features for the split (typically $\sqrt{n}$ features out of $n$ total)
Majority Voting: To classify a new case, each tree votes and the most voted class is chosen

Mathematical formula: For $T$ trees and $K$ classes, the prediction for a case $x$ is:

$$ \hat{y}(x) = \text{argmax}_{k} \sum_{t=1}^{T} \mathbb{1}[h_t(x) = k] $$

where $h_t(x)$ is the prediction of tree $t$ and $\mathbb{1}$ is the indicator function.

Advantages of Random Forest

Strengths

Why use multiple trees?

Reduces overfitting: A single tree can memorize training data, but averaging many trees generalizes better
Handles noisy data: Individual errors compensate each other
Robust to outliers: Majority voting is less sensitive to extreme cases
Estimates feature importance: Can measure which features are most useful for classification
Works well without much tuning: Fewer critical hyperparameters to optimize

Important Parameters

Forest Configuration

The main parameters of a Random Forest are:

Number of trees ($T$): Typically 10-500. More trees improve accuracy but increase computation time
Maximum depth: Limits the complexity of each tree
Minimum samples per node: Controls when to stop splitting
Number of features per split: Generally $\sqrt{n}$ for classification, $n/3$ for regression
Bootstrap size: Percentage of data for each tree (usually 100%)

Comparison: Single Tree vs. Forest

Aspect	Decision Tree	Random Forest
Interpretability	High (you can follow each rule)	Medium (it's a set of trees)
Accuracy	Good	Excellent
Overfitting	Prone	Highly resistant
Training time	Fast	Slower (trains multiple trees)
Prediction time	Very fast	Moderate (consults multiple trees)
Robustness	Sensitive to data changes	Very robust

When to use Random Forest?

Random Forest is ideal when:

You need high accuracy
You have enough data (at least hundreds of cases)
It's not critical to explain every decision in detail
You want a robust model that works well "out of the box"

Use a single tree when:

Interpretability is critical (e.g., strict compliance requirements)
You have little data
You need extremely fast real-time predictions

Out-of-Bag (OOB) Error

Built-in Validation

A unique feature of Random Forest is the Out-of-Bag (OOB) error:

Each tree trains on ~63% of the data (bootstrap)
The remaining ~37% are "out-of-bag" data for that tree
Generalization error can be estimated using OOB predictions without needing a separate validation set

This makes Random Forest especially useful when data is limited, as it leverages the entire set for training and validation simultaneously.

Practical Applications

Random Forests are often used for:

Routing and categorization: Assign items to teams/queues based on multiple signals
Risk or propensity scoring: Combine many weak signals robustly
Feature importance exploration: Identify which inputs are most predictive (with caveats)

They’re a strong default when you need solid performance without extensive tuning.

Experiment

Use the interactive demonstration to:

Observe how different numbers of trees affect accuracy
See the diversity between trees in the forest
Compare predictions of individual trees vs. the complete forest
Understand how majority voting smooths decisions

Practical Note

Although Random Forest is very powerful, it consumes more memory and computation time than a single tree. In latency-sensitive systems, you may need to balance accuracy with response speed.