4.2 Random Forests: The Wisdom of Multiple Trees
Introduction
A Random Forest is an ensemble of multiple decision trees working together to make a decision. Each tree is trained on a different random sample of the data, and the final prediction is obtained by majority voting. It's like consulting a small panel instead of relying on a single opinion.
Activity
Random Forest Visualizer: Many Trees, One Decision
How to Explore It
- Define the Forest: Choose how many trees to train (typically 10-100). More trees generally mean higher accuracy but more computation.
- Train the Forest: Each tree trains on a bootstrap sample and a random subset of features. Observe the diversity between trees.
- Visualize Predictions: For a new point, see how each tree votes. The class with most votes is the forest's final prediction.
Interactive Demonstration
Random Forest Builder
Configuration
Metrics
β
Forest votes
Click on the chart
Forest Decision Boundary
Individual trees
Key Concepts
How Does a Random Forest Work?
Construction Process
A Random Forest is built in three main steps:
Bootstrap Sampling: For each tree, take a random sample with replacement from the training set (some cases repeat, others are omitted)
Random Feature Selection: At each node of the tree, only consider a random subset of features for the split (typically $\sqrt{n}$ features out of $n$ total)
Majority Voting: To classify a new case, each tree votes and the most voted class is chosen
Mathematical formula: For $T$ trees and $K$ classes, the prediction for a case $x$ is:
$$ \hat{y}(x) = \text{argmax}_{k} \sum_{t=1}^{T} \mathbb{1}[h_t(x) = k] $$where $h_t(x)$ is the prediction of tree $t$ and $\mathbb{1}$ is the indicator function.
Advantages of Random Forest
Strengths
Why use multiple trees?
- Reduces overfitting: A single tree can memorize training data, but averaging many trees generalizes better
- Handles noisy data: Individual errors compensate each other
- Robust to outliers: Majority voting is less sensitive to extreme cases
- Estimates feature importance: Can measure which features are most useful for classification
- Works well without much tuning: Fewer critical hyperparameters to optimize
Important Parameters
Forest Configuration
The main parameters of a Random Forest are:
- Number of trees ($T$): Typically 10-500. More trees improve accuracy but increase computation time
- Maximum depth: Limits the complexity of each tree
- Minimum samples per node: Controls when to stop splitting
- Number of features per split: Generally $\sqrt{n}$ for classification, $n/3$ for regression
- Bootstrap size: Percentage of data for each tree (usually 100%)
Comparison: Single Tree vs. Forest
| Aspect | Decision Tree | Random Forest |
|---|---|---|
| Interpretability | High (you can follow each rule) | Medium (it's a set of trees) |
| Accuracy | Good | Excellent |
| Overfitting | Prone | Highly resistant |
| Training time | Fast | Slower (trains multiple trees) |
| Prediction time | Very fast | Moderate (consults multiple trees) |
| Robustness | Sensitive to data changes | Very robust |
When to use Random Forest?
Random Forest is ideal when:
- You need high accuracy
- You have enough data (at least hundreds of cases)
- It's not critical to explain every decision in detail
- You want a robust model that works well "out of the box"
Use a single tree when:
- Interpretability is critical (e.g., strict compliance requirements)
- You have little data
- You need extremely fast real-time predictions
Out-of-Bag (OOB) Error
Built-in Validation
A unique feature of Random Forest is the Out-of-Bag (OOB) error:
- Each tree trains on ~63% of the data (bootstrap)
- The remaining ~37% are "out-of-bag" data for that tree
- Generalization error can be estimated using OOB predictions without needing a separate validation set
This makes Random Forest especially useful when data is limited, as it leverages the entire set for training and validation simultaneously.
Practical Applications
Random Forests are often used for:
- Routing and categorization: Assign items to teams/queues based on multiple signals
- Risk or propensity scoring: Combine many weak signals robustly
- Feature importance exploration: Identify which inputs are most predictive (with caveats)
Theyβre a strong default when you need solid performance without extensive tuning.
Experiment
Use the interactive demonstration to:
- Observe how different numbers of trees affect accuracy
- See the diversity between trees in the forest
- Compare predictions of individual trees vs. the complete forest
- Understand how majority voting smooths decisions
Practical Note
Although Random Forest is very powerful, it consumes more memory and computation time than a single tree. In latency-sensitive systems, you may need to balance accuracy with response speed.