LightGBM

LightGBM is a gradient boosting framework developed by Microsoft Research, renowned for its leaf-wise tree growth strategy and histogram-based algorithm that enables high-speed training on large A-Share factor datasets.

Overview

Published at NeurIPS 2017, LightGBM uses Histogram-based binning of continuous features to dramatically reduce memory usage and computation. Unlike XGBoost's level-wise growth, LightGBM uses leaf-wise (best-first) growth — always splitting the leaf with the highest gain — achieving lower error with fewer trees.

Original paper: LightGBM: A Highly Efficient Gradient Boosting Decision Tree — Ke et al., NeurIPS 2017

1. Up/Down Classification (Binary Label)

Use objective='binary' and output predict_proba as the individual stock rise probability for daily long signals. Recommended metric: metric='auc'.

2. Return Range Prediction (Quantile Regression)

Use objective='quantile' with alpha=0.9 to predict the upper bound of returns, enabling conservative position sizing suitable for the high-volatility A-Share market.

3. Factor Ranking / Stock Selection (LambdaRank)

Use objective='lambdarank' to rank the stock pool by NDCG objective, directly producing per-period portfolio weights. LightGBM is especially efficient for large-scale ranking tasks.

Official feature guide: LightGBM Features

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`num_leaves`	Number of leaves (core complexity control)	20–64
`learning_rate`	Step size	0.01–0.05
`n_estimators`	Number of boosting rounds	300–1000
`max_depth`	Tree depth limit (-1 = unlimited)	5–8
`feature_fraction`	Feature sampling ratio	0.6–0.8
`bagging_fraction`	Data sampling ratio	0.7–0.9
`min_child_samples`	Minimum samples per leaf	20–50
`lambda_l1` / `lambda_l2`	Regularization	0.1–1.0

Strengths & Limitations

Strengths:

10× faster training than XGBoost — ideal for daily batch backtesting
Low memory footprint — handles 3900+ stocks × hundreds of factors
Direct support for categorical_feature (industry/sector codes, no encoding needed)
Built-in NDCG, MAP, AUC, Quantile metrics for finance use cases

Limitations:

Leaf-wise growth can overfit on small datasets — constrain num_leaves
Sensitive to hyperparameters — use Optuna or grid search

LightGBM ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. Up/Down Classification (Binary Label) ​

2. Return Range Prediction (Quantile Regression) ​

3. Factor Ranking / Stock Selection (LambdaRank) ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​