XGBoost

XGBoost (eXtreme Gradient Boosting) is a highly efficient, scalable gradient boosted decision tree framework widely used in quantitative finance for stock prediction and factor modeling.

Overview

Introduced by Tianqi Chen and Carlos Guestrin (KDD 2016), XGBoost extends GBDT with system-level optimizations: parallel computation, distributed training, and GPU acceleration. Its objective function uses second-order Taylor expansion for loss estimation and L1/L2 regularization to control model complexity — making it robust against noisy financial data.

Original paper: XGBoost: A Scalable Tree Boosting System — Chen & Guestrin, KDD 2016

1. Stock Direction Prediction (Binary Classification)

Use objective='binary:logistic' with technical indicators, financial ratios, and market microstructure features as inputs. The output probability score serves as a long/short ranking signal.

2. Return Prediction (Regression)

Predict future N-day returns using objective='reg:squarederror'. Use feature_importances_ to identify effective Alpha factors.

3. Stock Ranking / Selection (Learning to Rank)

Use objective='rank:ndcg' or 'rank:pairwise' to rank candidate stocks by expected return, selecting the Top-K stocks each period. This leverages the LambdaMART algorithm to directly optimize NDCG ranking metrics.

Official tutorial: XGBoost Learning to Rank

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`n_estimators`	Number of boosting rounds	200–500
`max_depth`	Maximum tree depth	3–6
`learning_rate`	Step size (shrinkage)	0.01–0.1
`subsample`	Row sampling ratio	0.7–0.9
`colsample_bytree`	Feature sampling per tree	0.6–0.8
`reg_alpha`	L1 regularization (sparsity)	0–1
`reg_lambda`	L2 regularization (weight decay)	1–10
`tree_method`	Tree construction algorithm	`"hist"`

Strengths & Limitations

Strengths:

Built-in regularization resists overfitting on noisy financial data
Native missing value handling — no imputation needed for incomplete financial reports
early_stopping_rounds prevents over-iteration automatically
Distributed training support: Dask, Spark, PySpark, and GPU

Limitations:

Many hyperparameters require systematic tuning (use Optuna)
Deep trees tend to overfit short financial time series — keep max_depth low

XGBoost ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. Stock Direction Prediction (Binary Classification) ​

2. Return Prediction (Regression) ​

3. Stock Ranking / Selection (Learning to Rank) ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​