Learn how LLMs work by building one in Zig.
zigllm is a book-shaped Zig codebase. It implements transformer architectures across six progressive layers, starting at raw tensors and ending at sampled text. Every component is documented to teach why it works, not just how.
Zig 0.14+ · MIT · 285+ tests as executable documentation
Six layers, bottom up.
The project is organized like a textbook. Start at Layer 1 — tensors — and build the primitives you need for the next chapter. By Layer 6 you have a generation loop.
- §6 InferenceGeneration, sampling, KV caching, streaming.
- §5 ModelsLLaMA, GPT-2, Mistral, Falcon, GGUF loading, tokenization.
- §4 TransformersMulti-head attention, FFNs, full blocks.
- §3 Neural primitivesActivations (SwiGLU, GELU), normalization (RMSNorm), RoPE.
- §2 Linear algebraSIMD matrix ops, K-quantization, IQ-quantization.
- §1 FoundationTensors, memory management, memory mapping.
18 architecture families
LLaMA, Mistral, GPT-2, Falcon, Qwen, Phi, GPT-J, GPT-NeoX, BLOOM, Mamba, BERT, Gemma, StarCoder, MoE, multi-modal, and more — implemented as a study of how each family differs from the others.
Sampling, in full
Greedy, top-k, top-p, temperature, Mirostat, typical, tail-free, contrastive. Grammar-constrained generation: JSON, regex, CFG. The decoding loop is no longer a black box.
GGUF, end to end
Memory-mapped GGUF loading compatible with the llama.cpp ecosystem. Read a pre-trained model from disk, decode the tensors yourself, and run a forward pass.
A language that gets out of the way.
Manual memory management makes the cost of every allocation legible.
comptime generics let
a single tensor type specialize to f32, f16, or quantized formats without runtime dispatch.
First-class SIMD intrinsics make matmul kernels readable instead of obscured behind a
framework.
The project also serves as evidence: Zig is a viable language for ML/AI workloads, with no runtime, no garbage collector, and no Python on the hot path.
Recent essays
How a sampling loop becomes 'generation'
Generation is not a single decision. It's a loop, a probability distribution, and a choice about how greedy or how wild to be. zigllm's Layer 6 makes the loop visible — and shows you why temperature, top-k, and top-p exist.
Tensors as the lingua franca
Tensors aren't a 'data structure.' They're a language. Every layer of zigllm — from raw memory to a sampled token — speaks it, and the shape of each tensor tells you what stage of the pipeline you're in.
What you actually learn by writing the matmul yourself
Matrix multiplication is the heart of a transformer. Writing it by hand in Zig — strides, SIMD, K-quantization — is where 'I know how this works' stops being a claim and starts being legible.