AutoML — Automated Machine Learning

AutoML automates model selection, feature engineering, and hyperparameter optimization — enabling rapid construction of high-performance predictive models for A-Share quantitative strategies without manual tuning.

Overview

AutoML automates the following pipeline stages:

Feature Preprocessing: Normalization, imputation, encoding
Model Selection: Searches across candidate algorithms (RF, XGBoost, LightGBM, linear models, etc.)
Hyperparameter Optimization (HPO): Bayesian optimization, Hyperband, TPE
Ensemble Learning: Weighted combination of multiple models (Stacking/Voting)

Major frameworks:

Framework	Highlights	Official Link
auto-sklearn	scikit-learn based, Bayesian optimization + ensemble	auto-sklearn
FLAML	Microsoft, resource-aware, extremely fast	FLAML
Optuna	HPO framework, works with any model	Optuna
H2O AutoML	Enterprise-grade, large-scale distributed	H2O.ai

FLAML paper: FLAML: A Fast and Lightweight AutoML Library — Wang et al., MLSys 2021

1. Fast Factor Baseline Evaluation

Before committing to deep research on a new factor, use AutoML to quickly assess its predictive power as an objective baseline. This avoids selection bias from experience-driven or manual tuning.

2. Model Search Replacing Manual Tuning

Use FLAML to search over XGBoost/LightGBM/CatBoost hyperparameter spaces with AUC or IC as the objective:

python

import flaml

automl = flaml.AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    metric="roc_auc",
    time_budget=300  # Find best model within 5 minutes
)
print(automl.best_estimator)
print(automl.best_config)

3. Ensemble for Stability

auto-sklearn and H2O AutoML natively support Ensemble output — automatically weighting LightGBM, RandomForest, and LogisticRegression. The resulting IC stability significantly exceeds any single model.

4. Time-Series Cross-Validation (Critical!)

Financial time-series data must not use random splits. Configure AutoML with a custom CV generator:

python

from sklearn.model_selection import TimeSeriesSplit

cv = TimeSeriesSplit(n_splits=5)
automl.fit(X, y, eval_method="cv", split_type=cv)

Framework Comparison

Feature	auto-sklearn	FLAML	Optuna
Model selection	✅ Auto	✅ Auto	❌ Manual
HPO strategy	Bayesian + ensemble	Resource-aware	TPE/CMA-ES
Speed	Slow (resource-heavy)	Fastest	Medium
Ensemble output	✅	✅	❌
Time-series CV	Custom needed	Custom needed	Native support

Strengths & Limitations

Strengths:

Dramatically reduces tuning time and lowers human-induced overfitting risk
Systematic search typically finds better configurations than manual tuning
FLAML completes searches on 1000-stock datasets in under 5 minutes

Limitations:

Financial time-series requires careful CV configuration — framework defaults are often inappropriate
Black-box search makes it harder to control overfitting direction
Ensemble models are slow to serve — not suitable for real-time signal generation

AutoML — Automated Machine Learning ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. Fast Factor Baseline Evaluation ​

2. Model Search Replacing Manual Tuning ​

3. Ensemble for Stability ​

4. Time-Series Cross-Validation (Critical!) ​

Framework Comparison ​

Strengths & Limitations ​

Official References ​