Skip to content

LinearSVC

LinearSVC is a linear Support Vector Classifier implemented using the liblinear library. It trains orders of magnitude faster than kernel SVM on large datasets, making it well-suited for high-dimensional A-Share factor classification tasks.


Overview

LinearSVC solves the following optimization problem (L2 regularization + squared hinge loss):

$$\min_{w,b} \frac{1}{2}w^Tw + C \sum_{i=1}^{n} \max(0, 1 - y_i(w^T\phi(x_i) + b))^2$$

Using the liblinear solver, it runs orders of magnitude faster than kernel-based SVM (SVC) on million-sample datasets. Supports One-vs-Rest multiclass classification.

Core library paper: LIBLINEAR: A Library for Large Linear Classification — Fan et al., JMLR 9, 1871–1874, 2008


Applications in A-Share Quantitative Strategies

1. High-Dimensional Factor Classification (Timing / Stock Selection)

When the number of factors greatly exceeds the number of samples, LinearSVC's high-dimensional classification excels. C controls regularization strength; penalty='l1' enables factor sparsification across the full 3900+ stock universe.

2. Multi-Asset State Classification

Classify stocks into return buckets (strong up / weak up / sideways / down) as a multi-class target. LinearSVC uses OvR strategy, training one classifier per class and outputting the highest-confidence prediction.

3. Text Factor Classification

Research report and announcement TF-IDF vectors can be extremely high-dimensional (thousands of dimensions). LinearSVC is one of the most efficient classifiers on sparse, high-dimensional text features — useful for sentiment classification (positive/negative) as an auxiliary signal.


ParameterDescriptionRecommended
CInverse regularization strength0.001–10
penaltyRegularization type'l2' (default) / 'l1'
lossLoss function'squared_hinge' (default)
max_iterMax iterations1000–5000
class_weightClass weight'balanced'
multi_classMulti-class strategy'ovr'
dualDual/primal formulationFalse when samples < features

L1 Parameter Constraint

When penalty='l1', you must also set loss='squared_hinge' and dual=False, otherwise scikit-learn raises a parameter combination error.


Strengths & Limitations

Strengths:

  • Orders of magnitude faster than RBF-SVM on large datasets — suitable for daily full-market retraining
  • penalty='l1' produces sparse weights — natural factor selection
  • High memory efficiency for sparse text/report feature matrices

Limitations:

  • Linear decision boundary — not suitable for non-linear factor relationships
  • predict_proba requires additional Platt Scaling calibration (less direct than Logistic Regression)
  • Scale-sensitive — must standardize features before fitting

Official References

⚡ Real-time Data · 📊 Smart Analysis · 🎯 Backtesting