Transformer

The Transformer is a deep learning architecture based on the Self-Attention mechanism. It has achieved breakthrough results in financial time series forecasting, sentiment analysis, and cross-asset factor modeling — making it a key component of cutting-edge quantitative strategies.

Overview

Introduced by Vaswani et al. at NeurIPS 2017, the Transformer's core innovation is Multi-Head Self-Attention:

$$\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Attention weights capture dependencies between any positions in a sequence without sequential hidden state propagation (unlike LSTM), providing significant advantages on long financial time series.

Original paper: Attention Is All You Need — Vaswani et al., NeurIPS 2017

Finance-Specific Variants

Model	Highlights	Link
FinBERT	Financial text sentiment analysis, BERT fine-tuned on financial corpora	ProsusAI/finbert
TFT (Temporal Fusion Transformer)	Multi-step time series prediction with static/dynamic features	TFT Paper
Informer	Optimized for long-sequence prediction, O(L log L) complexity	Informer Paper
PatchTST	Slices time series into patches for efficient local pattern capture	PatchTST Paper

1. Financial News Sentiment Analysis (FinBERT)

FinBERT is pre-trained on large financial corpora (annual reports, research notes, news). It directly classifies A-Share announcements, exchange inquiry letters, and financial news into sentiment categories (positive/neutral/negative), outputting a sentiment score as an Alpha factor:

python

from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
model     = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')

inputs  = tokenizer("Revenue exceeded expectations, net profit up 30% YoY", return_tensors='pt')
outputs = model(**inputs)
probs   = torch.softmax(outputs.logits, dim=-1)  # [negative, neutral, positive]

2. Multi-Step Time Series Forecasting (TFT)

TFT supports multivariate input (OHLCV + factors + macro variables), handling both static encoded features (industry, sector) and dynamic time series features simultaneously. It outputs quantile-interval predictions — well-suited for A-Share risk management.

3. Cross-Asset Attention

Use a Transformer encoder to apply Cross-Attention across multiple stocks' features at the same timestep, capturing intra-industry co-movement effects (sector leaders driving peers). This forms the basis of graph-attention stock selection models.

4. Alpha Factor Sequence Modeling

Feed the last 60 days of multi-factor cross-sectional data as a sequence into a Transformer. Self-attention discovers how factor predictive power strengthens or decays over time, generating dynamic factor weights.

Core Concepts

Concept	Description
Positional Encoding	Transformers have no inherent position awareness — sine/cosine encodings are added
Multi-Head Attention	Multiple Q/K/V groups in parallel, capturing different subspace dependencies
Dropout	`p=0.1–0.3`, prevents overfitting on limited financial data
Layer Norm	Per-layer normalization, stabilizes scale differences across financial variables
`d_model`	Hidden dimension — 64–256 is typically sufficient for financial sequences

Strengths & Limitations

Strengths:

Self-attention naturally captures long-range price and factor dependencies — outperforms LSTM/GRU
Pre-trained models (FinBERT) require no training from scratch — low transfer learning cost
Parallel training structure is faster than RNN-based models

Limitations:

Large parameter counts — high overfitting risk given limited A-Share history
Requires GPU for training — inference latency is much higher than tree-based models
Low interpretability — factor importance is less intuitive than XGBoost/LightGBM

Transformer ​

Overview ​

Finance-Specific Variants ​

Applications in A-Share Quantitative Strategies ​

1. Financial News Sentiment Analysis (FinBERT) ​

2. Multi-Step Time Series Forecasting (TFT) ​

3. Cross-Asset Attention ​

4. Alpha Factor Sequence Modeling ​

Core Concepts ​

Strengths & Limitations ​

Official References ​