Logistic Regression
Logistic Regression is the most interpretable classification model in quantitative finance, commonly used for binary A-Share up/down prediction and multi-class market state identification.
Overview
Logistic Regression maps a linear combination to probability via the sigmoid function:
$$P(y=1|X) = \sigma(Xw + w_0) = \frac{1}{1 + e^{-(Xw+w_0)}}$$
With L2 regularization, the objective is:
$$\min_{w,b} \frac{1}{2}w^Tw + C \sum_{i=1}^{n} \log\left(e^{-y_i(X_i^Tw+b)} + 1\right)$$
Supports L1 (Lasso), L2 (Ridge), and ElasticNet regularization. Also known as logit regression or maximum-entropy classification (MaxEnt).
Official docs: Logistic Regression — scikit-learn
Applications in A-Share Quantitative Strategies
1. Binary Up/Down Classification
Use next-day direction (1/-1) as label with technical indicators and financial factors as input. predict_proba outputs the rise probability as a stock scoring/ranking signal. Coefficients are directly interpretable as factor weights.
2. L1 Regularization for Automatic Factor Selection
Use penalty='l1' with solver='liblinear' to automatically zero out non-contributing factors among hundreds of candidates — producing a sparse factor portfolio without manual selection.
3. Market State Classification
Train a multi-class logistic regression on market regime labels (extreme market days, policy events), identifying the current market state for use with position management modules.
Key Parameters (Finance-Recommended)
| Parameter | Description | Recommended |
|---|---|---|
C | Inverse regularization strength (lower = stronger) | 0.01–10 |
penalty | Regularization type | 'l2' (stable) / 'l1' (sparse) |
solver | Solver algorithm | 'lbfgs' (L2) / 'liblinear' (L1) |
max_iter | Max optimization iterations | 500–1000 |
class_weight | Handle class imbalance | 'balanced' |
multi_class | Multi-class strategy | 'ovr' / 'multinomial' |
Solver Reference
| Solver | L1 | L2 | Large-Scale |
|---|---|---|---|
liblinear | ✅ | ✅ | Medium |
lbfgs | ❌ | ✅ | Good |
saga | ✅ | ✅ | Largest |
Strengths & Limitations
Strengths:
- Coefficients map directly to factor weights — highest interpretability among all models
- L1 regularization produces sparse solutions — natural factor selection
- Extremely fast training — suitable for daily rolling retraining
- Probability outputs enable position sizing control
Limitations:
- Linear decision boundary — cannot capture non-linear factor relationships
- Scale-sensitive — factors must be normalized (Z-score standardization) before fitting
