CatBoost

CatBoost is a gradient boosted decision tree framework developed by Yandex, distinguished by its native support for categorical features and Ordered Boosting strategy — no manual One-Hot Encoding required.

Overview

The name "CatBoost" comes from "Category" + "Boosting". It is specifically optimized for categorical features using Ordered Target Statistics — computing category target means from historical samples to avoid target leakage. Training supports CPU/GPU and the model can be exported to ONNX/PMML for production deployment.

Official docs: CatBoost About Research papers: CatBoost Papers

1. Mixed-Feature Factor Modeling

A-Share data naturally contains categorical fields: industry classification, sector codes, CSRC industry categories. CatBoost accepts these directly without LabelEncoder or One-Hot encoding, reducing feature engineering overhead significantly.

2. Financial Report Feature Utilization

Categorical fields like report type (initial/amendment/correction) and audit opinion (unqualified/qualified) can be passed directly as categorical features — CatBoost encodes them automatically to capture financial quality signals.

3. Market Timing Classifier

Use macro-state labels (bull/bear/sideways) as categorical features combined with technical indicators to build a market timing model producing long/short signals.

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`iterations`	Number of trees	300–1000
`learning_rate`	Step size	0.01–0.1
`depth`	Tree depth	4–8
`l2_leaf_reg`	L2 regularization	1–10
`cat_features`	Categorical feature indices	Per actual columns
`eval_metric`	Evaluation metric	`'AUC'` / `'NDCG'`
`task_type`	Compute device	`'CPU'` / `'GPU'`
`early_stopping_rounds`	Early stopping	50–100

Strengths & Limitations

Strengths:

No preprocessing required for categorical features — pass industry/sector codes directly
Ordered Boosting prevents target leakage, more robust on financial time series
Built-in SHAP values and feature importance visualization for factor attribution
ONNX export for seamless trading system integration

Limitations:

Slower training than LightGBM
For purely numerical features, XGBoost/LightGBM often outperform

CatBoost ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. Mixed-Feature Factor Modeling ​

2. Financial Report Feature Utilization ​

3. Market Timing Classifier ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​