Skip to content

CatBoost

CatBoost is a gradient boosted decision tree framework developed by Yandex, distinguished by its native support for categorical features and Ordered Boosting strategy — no manual One-Hot Encoding required.


Overview

The name "CatBoost" comes from "Category" + "Boosting". It is specifically optimized for categorical features using Ordered Target Statistics — computing category target means from historical samples to avoid target leakage. Training supports CPU/GPU and the model can be exported to ONNX/PMML for production deployment.

Official docs: CatBoost About Research papers: CatBoost Papers


Applications in A-Share Quantitative Strategies

1. Mixed-Feature Factor Modeling

A-Share data naturally contains categorical fields: industry classification, sector codes, CSRC industry categories. CatBoost accepts these directly without LabelEncoder or One-Hot encoding, reducing feature engineering overhead significantly.

2. Financial Report Feature Utilization

Categorical fields like report type (initial/amendment/correction) and audit opinion (unqualified/qualified) can be passed directly as categorical features — CatBoost encodes them automatically to capture financial quality signals.

3. Market Timing Classifier

Use macro-state labels (bull/bear/sideways) as categorical features combined with technical indicators to build a market timing model producing long/short signals.


ParameterDescriptionRecommended
iterationsNumber of trees300–1000
learning_rateStep size0.01–0.1
depthTree depth4–8
l2_leaf_regL2 regularization1–10
cat_featuresCategorical feature indicesPer actual columns
eval_metricEvaluation metric'AUC' / 'NDCG'
task_typeCompute device'CPU' / 'GPU'
early_stopping_roundsEarly stopping50–100

Strengths & Limitations

Strengths:

  • No preprocessing required for categorical features — pass industry/sector codes directly
  • Ordered Boosting prevents target leakage, more robust on financial time series
  • Built-in SHAP values and feature importance visualization for factor attribution
  • ONNX export for seamless trading system integration

Limitations:

  • Slower training than LightGBM
  • For purely numerical features, XGBoost/LightGBM often outperform

Official References

⚡ Real-time Data · 📊 Smart Analysis · 🎯 Backtesting