Skip to content

AutoML — Automated Machine Learning

AutoML automates model selection, feature engineering, and hyperparameter optimization — enabling rapid construction of high-performance predictive models for A-Share quantitative strategies without manual tuning.


Overview

AutoML automates the following pipeline stages:

  1. Feature Preprocessing: Normalization, imputation, encoding
  2. Model Selection: Searches across candidate algorithms (RF, XGBoost, LightGBM, linear models, etc.)
  3. Hyperparameter Optimization (HPO): Bayesian optimization, Hyperband, TPE
  4. Ensemble Learning: Weighted combination of multiple models (Stacking/Voting)

Major frameworks:

FrameworkHighlightsOfficial Link
auto-sklearnscikit-learn based, Bayesian optimization + ensembleauto-sklearn
FLAMLMicrosoft, resource-aware, extremely fastFLAML
OptunaHPO framework, works with any modelOptuna
H2O AutoMLEnterprise-grade, large-scale distributedH2O.ai

FLAML paper: FLAML: A Fast and Lightweight AutoML Library — Wang et al., MLSys 2021


Applications in A-Share Quantitative Strategies

1. Fast Factor Baseline Evaluation

Before committing to deep research on a new factor, use AutoML to quickly assess its predictive power as an objective baseline. This avoids selection bias from experience-driven or manual tuning.

2. Model Search Replacing Manual Tuning

Use FLAML to search over XGBoost/LightGBM/CatBoost hyperparameter spaces with AUC or IC as the objective:

python
import flaml

automl = flaml.AutoML()
automl.fit(
    X_train, y_train,
    task="classification",
    metric="roc_auc",
    time_budget=300  # Find best model within 5 minutes
)
print(automl.best_estimator)
print(automl.best_config)

3. Ensemble for Stability

auto-sklearn and H2O AutoML natively support Ensemble output — automatically weighting LightGBM, RandomForest, and LogisticRegression. The resulting IC stability significantly exceeds any single model.

4. Time-Series Cross-Validation (Critical!)

Financial time-series data must not use random splits. Configure AutoML with a custom CV generator:

python
from sklearn.model_selection import TimeSeriesSplit

cv = TimeSeriesSplit(n_splits=5)
automl.fit(X, y, eval_method="cv", split_type=cv)

Framework Comparison

Featureauto-sklearnFLAMLOptuna
Model selection✅ Auto✅ Auto❌ Manual
HPO strategyBayesian + ensembleResource-awareTPE/CMA-ES
SpeedSlow (resource-heavy)FastestMedium
Ensemble output
Time-series CVCustom neededCustom neededNative support

Strengths & Limitations

Strengths:

  • Dramatically reduces tuning time and lowers human-induced overfitting risk
  • Systematic search typically finds better configurations than manual tuning
  • FLAML completes searches on 1000-stock datasets in under 5 minutes

Limitations:

  • Financial time-series requires careful CV configuration — framework defaults are often inappropriate
  • Black-box search makes it harder to control overfitting direction
  • Ensemble models are slow to serve — not suitable for real-time signal generation

Official References

⚡ Real-time Data · 📊 Smart Analysis · 🎯 Backtesting