Logistic Regression

Logistic Regression is the most interpretable classification model in quantitative finance, commonly used for binary A-Share up/down prediction and multi-class market state identification.

Overview

Logistic Regression maps a linear combination to probability via the sigmoid function:

$$P(y=1|X) = \sigma(Xw + w_0) = \frac{1}{1 + e^{-(Xw+w_0)}}$$

With L2 regularization, the objective is:

$$\min_{w,b} \frac{1}{2}w^Tw + C \sum_{i=1}^{n} \log\left(e^{-y_i(X_i^Tw+b)} + 1\right)$$

Supports L1 (Lasso), L2 (Ridge), and ElasticNet regularization. Also known as logit regression or maximum-entropy classification (MaxEnt).

Official docs: Logistic Regression — scikit-learn

1. Binary Up/Down Classification

Use next-day direction (1/-1) as label with technical indicators and financial factors as input. predict_proba outputs the rise probability as a stock scoring/ranking signal. Coefficients are directly interpretable as factor weights.

2. L1 Regularization for Automatic Factor Selection

Use penalty='l1' with solver='liblinear' to automatically zero out non-contributing factors among hundreds of candidates — producing a sparse factor portfolio without manual selection.

3. Market State Classification

Train a multi-class logistic regression on market regime labels (extreme market days, policy events), identifying the current market state for use with position management modules.

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`C`	Inverse regularization strength (lower = stronger)	0.01–10
`penalty`	Regularization type	`'l2'` (stable) / `'l1'` (sparse)
`solver`	Solver algorithm	`'lbfgs'` (L2) / `'liblinear'` (L1)
`max_iter`	Max optimization iterations	500–1000
`class_weight`	Handle class imbalance	`'balanced'`
`multi_class`	Multi-class strategy	`'ovr'` / `'multinomial'`

Solver Reference

Solver	L1	L2	Large-Scale
`liblinear`	✅	✅	Medium
`lbfgs`	❌	✅	Good
`saga`	✅	✅	Largest

Strengths & Limitations

Strengths:

Coefficients map directly to factor weights — highest interpretability among all models
L1 regularization produces sparse solutions — natural factor selection
Extremely fast training — suitable for daily rolling retraining
Probability outputs enable position sizing control

Limitations:

Linear decision boundary — cannot capture non-linear factor relationships
Scale-sensitive — factors must be normalized (Z-score standardization) before fitting

Logistic Regression ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. Binary Up/Down Classification ​

2. L1 Regularization for Automatic Factor Selection ​

3. Market State Classification ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​