π Random Forests Pseudocode
Random Forests is a powerful ensemble learning method that builds multiple decision trees during training and outputs the mode (classification) or mean prediction (regression) of the individual trees. It reduces overfitting and improves generalization compared to single decision trees.
The Algorithm
Random Forest Training
ALGORITHM: Random Forest Training
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
X: Training feature matrix of shape (n_samples, n_features)
y: Training labels of shape (n_samples,)
n_trees: Number of trees in the forest (e.g., 100)
max_features: Number of features to consider at each split
(default: βn_features for classification, n_features/3 for regression)
max_depth: Maximum depth of each tree (optional)
min_samples_split: Minimum samples to split a node
bootstrap: Whether to use bootstrap sampling (default: True)
OUTPUT:
forest: List of trained decision trees
PROCEDURE:
forest β empty list
FOR i = 1 TO n_trees:
1. // Create bootstrap sample (sampling with replacement)
IF bootstrap:
indices β sample_with_replacement(range(n_samples), size=n_samples)
X_boot β X[indices]
y_boot β y[indices]
ELSE:
X_boot β X
y_boot β y
2. // Train decision tree with random feature selection
tree β DecisionTree(
max_depth = max_depth,
min_samples_split = min_samples_split,
max_features = max_features, // KEY: Random feature subset at each split
splitter = "random_subset"
)
tree.fit(X_boot, y_boot)
3. APPEND tree to forest
RETURN forestModified Tree Building (Random Feature Selection)
ALGORITHM: Random Forest Tree Split
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This modifies the standard decision tree split selection:
FUNCTION find_best_split(X, y, max_features):
1. // Randomly select a subset of features
all_features β [0, 1, 2, ..., n_features-1]
candidate_features β random_sample(all_features, size=max_features, replace=False)
2. // Find best split among only the candidate features
best_gain β -β
best_feature β None
best_threshold β None
FOR each feature f in candidate_features: // NOT all features!
FOR each threshold t in candidate_thresholds(f):
gain β compute_information_gain(X, y, f, t)
IF gain > best_gain:
best_gain β gain
best_feature β f
best_threshold β t
3. RETURN best_feature, best_thresholdRandom Forest Prediction
ALGORITHM: Random Forest Classification Prediction
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
x: Sample feature vector
forest: List of trained decision trees
OUTPUT:
predicted_class: Final predicted class
probabilities: Class probability distribution
PROCEDURE:
1. // Collect predictions from all trees
votes β empty dictionary
FOR each tree in forest:
prediction β tree.predict(x)
votes[prediction] β votes.get(prediction, 0) + 1
2. // Majority voting
predicted_class β argmax(votes)
3. // Compute probabilities
total_votes β sum(votes.values())
FOR each class c:
probabilities[c] β votes.get(c, 0) / total_votes
4. RETURN predicted_class, probabilities
ALGORITHM: Random Forest Regression Prediction
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
x: Sample feature vector
forest: List of trained decision trees
OUTPUT:
predicted_value: Averaged prediction
PROCEDURE:
predictions β empty list
FOR each tree in forest:
pred β tree.predict(x)
APPEND pred to predictions
predicted_value β mean(predictions)
RETURN predicted_valueOut-of-Bag (OOB) Error Estimation
ALGORITHM: Out-of-Bag Error Estimation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// OOB samples are those NOT included in the bootstrap sample for each tree
// They provide a "free" validation estimate
PROCEDURE:
oob_predictions β empty dictionary // sample_idx β list of predictions
FOR i, tree in enumerate(forest):
// Find samples NOT in bootstrap sample for this tree
oob_indices β samples not in bootstrap_sample[i]
FOR idx in oob_indices:
pred β tree.predict(X[idx])
oob_predictions[idx].append(pred)
// Aggregate OOB predictions
oob_error β 0
FOR idx in oob_predictions:
IF classification:
final_pred β majority_vote(oob_predictions[idx])
ELSE:
final_pred β mean(oob_predictions[idx])
IF final_pred β y[idx]:
oob_error β oob_error + 1
oob_error β oob_error / n_samples
RETURN oob_errorFeature Importance
ALGORITHM: Feature Importance (Mean Decrease in Impurity)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PROCEDURE:
feature_importances β zeros(n_features)
FOR each tree in forest:
FOR each node in tree:
IF node is internal (not leaf):
// Weighted impurity decrease
importance β (node.n_samples / total_samples) Γ
(node.impurity -
(left.n_samples/node.n_samples) Γ left.impurity -
(right.n_samples/node.n_samples) Γ right.impurity)
feature_importances[node.feature] += importance
// Normalize
feature_importances β feature_importances / sum(feature_importances)
RETURN feature_importances
ALGORITHM: Permutation Feature Importance
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// More reliable but computationally expensive
PROCEDURE:
baseline_score β evaluate(forest, X_test, y_test)
FOR j = 1 TO n_features:
X_permuted β copy(X_test)
X_permuted[:, j] β random_permutation(X_test[:, j])
permuted_score β evaluate(forest, X_permuted, y_test)
feature_importances[j] β baseline_score - permuted_score
RETURN feature_importancesMathematical Foundation
Why Random Forests Work
The key insight is variance reduction through averaging:
$$\text{Var}\left(\frac{1}{B}\sum_{b=1}^{B} T_b(x)\right) = \frac{\sigma^2}{B}$$$$\text{Var}\left(\frac{1}{B}\sum_{b=1}^{B} T_b(x)\right) = \rho\sigma^2 + \frac{1-\rho}{B}\sigma^2$$Random feature selection reduces $\rho$ β Lower ensemble variance!
Bootstrap Aggregating (Bagging)
$$P(\text{sample NOT selected}) = \left(1 - \frac{1}{n}\right)^n \approx e^{-1} \approx 0.368$$Complexity Analysis
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Training | $O(B \cdot n \cdot m' \cdot \log n \cdot d)$ | $O(B \cdot d)$ |
| Prediction | $O(B \cdot d)$ | $O(1)$ |
Where $B$ = trees, $n$ = samples, $m'$ = max_features, $d$ = tree depth.
Hyperparameters
| Parameter | Effect | Typical Values |
|---|---|---|
| n_trees | More trees β better, but diminishing returns | 100-500 |
| max_features | Lower β less correlation, more bias | βm (clf), m/3 (reg) |
| max_depth | Deeper β more complex trees | None or 10-30 |
| min_samples_split | Higher β more regularization | 2-10 |
| bootstrap | True enables OOB estimation | True |
Key Insights
Parallelizable: Trees are independent; train on multiple cores.
Handles High Dimensions: Works well even with many features.
Built-in Validation: OOB error provides estimate without separate validation set.
Feature Importance: Provides interpretable feature rankings.
Robust: Less sensitive to outliers and noise than single trees.
No Scaling Needed: Invariant to monotonic feature transformations.
References
- Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://link.springer.com/article/10.1023/A:1010933404324
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning, Chapter 15. Springer. https://hastie.su.domains/ElemStatLearn/
- scikit-learn Documentation: Random Forests. https://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees
- Louppe, G. (2014). Understanding Random Forests. PhD Thesis. https://arxiv.org/abs/1407.7502