Joye Personal Blog

Blog Notes Talks Projects Links About Contact

Back

Tags: #minimind

2025年12月19日

FeedForward: The Transformer's Other Half Beyond Attention

A deep dive into the FeedForward network and how RMSNorm, RoPE, Attention, and FeedForward assemble into a complete Transformer Block.

16 min read
- llm
- transformer
- minimind
- feedforward
- swiglu
- architecture
2025年12月18日

Understanding Attention: From Q, K, V to Multi-Head

A deep dive into Attention, the Transformer's core engine: grasp Q, K, V via a database-query analogy, master Multi-Head, and clear up Softmax vs RMSNorm.

13 min read
- llm
- transformer
- minimind
- attention
- multi-head
2025年12月17日

RoPE: From Permutation Invariance to Multi-Frequency

A deep dive into RoPE (Rotary Position Embedding), the standard position encoding for modern LLMs: the math, the engineering, and floating-point precision.

12 min read
- llm
- transformer
- minimind
- rope
- position encoding
2025年12月16日

Why Transformers Need Normalization: Gradients to RMSNorm

A deep dive into why deep neural networks need normalization, and how RMSNorm became standard in modern LLMs

9 min read