Some may also find Junji Hashimoto&#x27;s GPU programming library in lean (w&#x2F; webgpu) interesting:<a href="https:&#x2F;&#x2F;github.com&#x2F;Verilean&#x2F;hesper" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Verilean&#x2F;hesper</a>Even includes an example of transformer inference (quantized 1.5 bit):<a href="https:&#x2F;&#x2F;github.com&#x2F;Verilean&#x2F;hesper&#x2F;blob&#x2F;a688ce9848d6416b2e958b29a0a3b95518df7505&#x2F;Hesper&#x2F;Models&#x2F;BitNet.lean" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Verilean&#x2F;hesper&#x2F;blob&#x2F;a688ce9848d6416b2e95...</a>

How could this lend insight into why Fast Fourier Transform approximates self-attention?&gt; Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.[1] &quot;Fnet: Mixing tokens with fourier transforms&quot; (2021) <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2105.03824" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2105.03824</a> .. 
&quot;Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs&quot; 
<a href="https:&#x2F;&#x2F;syncedreview.com&#x2F;2021&#x2F;05&#x2F;14&#x2F;deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-19&#x2F;" rel="nofollow">https:&#x2F;&#x2F;syncedreview.com&#x2F;2021&#x2F;05&#x2F;14&#x2F;deepmind-podracer-tpu-ba...</a>&quot;Why formalize mathematics – more than catching errors&quot; (2025) <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=45695541">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=45695541</a>Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs, and do Lean formalisms provide any insight into how or why?

I guess the next step would be adding support for quantized arithmetic.

HN

TorchLean: Formalizing Neural Networks in Lean

Comments (12)