Joye Personal Blog

Blog Notes Talks Projects Links About Contact

Back

Tags: #attention

2025年12月18日

Understanding Attention: From Q, K, V to Multi-Head

A deep dive into Attention, the Transformer's core engine: grasp Q, K, V via a database-query analogy, master Multi-Head, and clear up Softmax vs RMSNorm.

13 min read
- llm
- transformer
- minimind
- attention
- multi-head

© 2026 Joye & Site Policy

Astro & Pure theme powered