ExtraTrees

Extremely Randomized Trees (ExtraTrees) is an enhanced variant of Random Forest that introduces complete randomness in split thresholds, further reducing variance and increasing training speed.

Overview

Proposed by Geurts, Ernst, and Wehenkel (2006), ExtraTrees differs from Random Forest in one critical way:

Random Forest: Evaluates several candidate split thresholds per feature, selects the best
ExtraTrees: Selects split thresholds completely at random — no search for optimality

This extreme randomization yields lower variance at the cost of a small increase in bias. In practice, ExtraTrees often trains faster and generalizes better on high-dimensional financial datasets.

Original paper: Extremely randomized trees — Geurts, Ernst & Wehenkel, Machine Learning 63(1), 3–42, 2006

1. High-Dimensional Factor Screening

A-Share factor libraries may contain hundreds of raw factors. ExtraTrees converges faster in high-dimensional, small-sample settings — ideal for rapid feature importance evaluation across 3900+ stocks.

2. Intraday Timing

For minute-level OHLCV feature matrices, ExtraTrees' training speed advantage is particularly valuable when models need frequent retraining throughout the trading day.

3. Ensemble with Random Forest

Averaging ExtraTrees and Random Forest predictions (equal weight) typically produces more stable IC values than either model alone — a common ensemble baseline.

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`n_estimators`	Number of trees	100–500
`max_features`	Features per split	`"sqrt"` (classification)
`max_depth`	Tree depth	`None`
`min_samples_leaf`	Minimum samples per leaf	5–20
`bootstrap`	Bootstrap sampling	`False` (default)
`n_jobs`	Parallel workers	`-1`

Key Difference from Random Forest

ExtraTrees defaults to bootstrap=False (uses the full dataset), while Random Forest defaults to bootstrap=True. Keep the default in most financial backtesting scenarios.

Strengths & Limitations

Strengths:

Faster training than Random Forest (no threshold search)
Lower variance — better generalization on noisy financial data
Highly parallelizable with n_jobs=-1

Limitations:

Random thresholds reduce individual tree accuracy — requires more trees to compensate
Slightly less robust to outliers than Random Forest

ExtraTrees ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. High-Dimensional Factor Screening ​

2. Intraday Timing ​

3. Ensemble with Random Forest ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​