LinearSVC

LinearSVC is a linear Support Vector Classifier implemented using the liblinear library. It trains orders of magnitude faster than kernel SVM on large datasets, making it well-suited for high-dimensional A-Share factor classification tasks.

Overview

LinearSVC solves the following optimization problem (L2 regularization + squared hinge loss):

$$\min_{w,b} \frac{1}{2}w^Tw + C \sum_{i=1}^{n} \max(0, 1 - y_i(w^T\phi(x_i) + b))^2$$

Using the liblinear solver, it runs orders of magnitude faster than kernel-based SVM (SVC) on million-sample datasets. Supports One-vs-Rest multiclass classification.

Core library paper: LIBLINEAR: A Library for Large Linear Classification — Fan et al., JMLR 9, 1871–1874, 2008

1. High-Dimensional Factor Classification (Timing / Stock Selection)

When the number of factors greatly exceeds the number of samples, LinearSVC's high-dimensional classification excels. C controls regularization strength; penalty='l1' enables factor sparsification across the full 3900+ stock universe.

2. Multi-Asset State Classification

Classify stocks into return buckets (strong up / weak up / sideways / down) as a multi-class target. LinearSVC uses OvR strategy, training one classifier per class and outputting the highest-confidence prediction.

3. Text Factor Classification

Research report and announcement TF-IDF vectors can be extremely high-dimensional (thousands of dimensions). LinearSVC is one of the most efficient classifiers on sparse, high-dimensional text features — useful for sentiment classification (positive/negative) as an auxiliary signal.

Key Parameters (Finance-Recommended)

Parameter	Description	Recommended
`C`	Inverse regularization strength	0.001–10
`penalty`	Regularization type	`'l2'` (default) / `'l1'`
`loss`	Loss function	`'squared_hinge'` (default)
`max_iter`	Max iterations	1000–5000
`class_weight`	Class weight	`'balanced'`
`multi_class`	Multi-class strategy	`'ovr'`
`dual`	Dual/primal formulation	`False` when samples < features

L1 Parameter Constraint

When penalty='l1', you must also set loss='squared_hinge' and dual=False, otherwise scikit-learn raises a parameter combination error.

Strengths & Limitations

Strengths:

Orders of magnitude faster than RBF-SVM on large datasets — suitable for daily full-market retraining
penalty='l1' produces sparse weights — natural factor selection
High memory efficiency for sparse text/report feature matrices

Limitations:

Linear decision boundary — not suitable for non-linear factor relationships
predict_proba requires additional Platt Scaling calibration (less direct than Logistic Regression)
Scale-sensitive — must standardize features before fitting

LinearSVC ​

Overview ​

Applications in A-Share Quantitative Strategies ​

1. High-Dimensional Factor Classification (Timing / Stock Selection) ​

2. Multi-Asset State Classification ​

3. Text Factor Classification ​

Key Parameters (Finance-Recommended) ​

Strengths & Limitations ​

Official References ​