Attention is a smoothed cubic spline

Lai, Zehua; Lim, Lek-Heng; Liu, Yucong

Computer Science > Artificial Intelligence

arXiv:2408.09624 (cs)

[Submitted on 19 Aug 2024]

Title:Attention is a smoothed cubic spline

Authors:Zehua Lai, Lek-Heng Lim, Yucong Liu

View PDF HTML (experimental)

Abstract:We highlight a perhaps important but hitherto unobserved insight: The attention module in a transformer is a smoothed cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. More precisely, we show that with ReLU-activation, attention, masked attention, encoder-decoder attention are all cubic splines. As every component in a transformer is constructed out of compositions of various attention modules (= cubic splines) and feed forward neural networks (= linear splines), all its components -- encoder, decoder, and encoder-decoder blocks; multilayered encoders and decoders; the transformer itself -- are cubic or higher-order splines. If we assume the Pierce-Birkhoff conjecture, then the converse also holds, i.e., every spline is a ReLU-activated encoder. Since a spline is generally just $C^2$, one way to obtain a smoothed $C^\infty$-version is by replacing ReLU with a smooth activation; and if this activation is chosen to be SoftMax, we recover the original transformer as proposed by Vaswani et al. This insight sheds light on the nature of the transformer by casting it entirely in terms of splines, one of the best known and thoroughly understood objects in applied mathematics.

Comments:	20 pages, 2 figures
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Numerical Analysis (math.NA)
MSC classes:	26B40, 41A15, 65D07, 68T01, 14P10, 13J30
Cite as:	arXiv:2408.09624 [cs.AI]
	(or arXiv:2408.09624v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2408.09624

Submission history

From: Lek-Heng Lim [view email]
[v1] Mon, 19 Aug 2024 00:56:44 UTC (29 KB)

Computer Science > Artificial Intelligence

Title:Attention is a smoothed cubic spline

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Attention is a smoothed cubic spline

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators