LinearSVC
LinearSVC is a linear Support Vector Classifier implemented using the liblinear library. It trains orders of magnitude faster than kernel SVM on large datasets, making it well-suited for high-dimensional A-Share factor classification tasks.
Overview
LinearSVC solves the following optimization problem (L2 regularization + squared hinge loss):
$$\min_{w,b} \frac{1}{2}w^Tw + C \sum_{i=1}^{n} \max(0, 1 - y_i(w^T\phi(x_i) + b))^2$$
Using the liblinear solver, it runs orders of magnitude faster than kernel-based SVM (SVC) on million-sample datasets. Supports One-vs-Rest multiclass classification.
Core library paper: LIBLINEAR: A Library for Large Linear Classification — Fan et al., JMLR 9, 1871–1874, 2008
Applications in A-Share Quantitative Strategies
1. High-Dimensional Factor Classification (Timing / Stock Selection)
When the number of factors greatly exceeds the number of samples, LinearSVC's high-dimensional classification excels. C controls regularization strength; penalty='l1' enables factor sparsification across the full 3900+ stock universe.
2. Multi-Asset State Classification
Classify stocks into return buckets (strong up / weak up / sideways / down) as a multi-class target. LinearSVC uses OvR strategy, training one classifier per class and outputting the highest-confidence prediction.
3. Text Factor Classification
Research report and announcement TF-IDF vectors can be extremely high-dimensional (thousands of dimensions). LinearSVC is one of the most efficient classifiers on sparse, high-dimensional text features — useful for sentiment classification (positive/negative) as an auxiliary signal.
Key Parameters (Finance-Recommended)
| Parameter | Description | Recommended |
|---|---|---|
C | Inverse regularization strength | 0.001–10 |
penalty | Regularization type | 'l2' (default) / 'l1' |
loss | Loss function | 'squared_hinge' (default) |
max_iter | Max iterations | 1000–5000 |
class_weight | Class weight | 'balanced' |
multi_class | Multi-class strategy | 'ovr' |
dual | Dual/primal formulation | False when samples < features |
L1 Parameter Constraint
When penalty='l1', you must also set loss='squared_hinge' and dual=False, otherwise scikit-learn raises a parameter combination error.
Strengths & Limitations
Strengths:
- Orders of magnitude faster than RBF-SVM on large datasets — suitable for daily full-market retraining
penalty='l1'produces sparse weights — natural factor selection- High memory efficiency for sparse text/report feature matrices
Limitations:
- Linear decision boundary — not suitable for non-linear factor relationships
predict_probarequires additional Platt Scaling calibration (less direct than Logistic Regression)- Scale-sensitive — must standardize features before fitting
