cognisoc / zigllm
Preface · An educational project

Learn how LLMs work by building one in Zig.

zigllm is a book-shaped Zig codebase. It implements transformer architectures across six progressive layers, starting at raw tensors and ending at sampled text. Every component is documented to teach why it works, not just how.

Zig 0.14+ · MIT · 285+ tests as executable documentation

Table of contents

Six layers, bottom up.

The project is organized like a textbook. Start at Layer 1 — tensors — and build the primitives you need for the next chapter. By Layer 6 you have a generation loop.

  1. §6
    Inference
    Generation, sampling, KV caching, streaming.
  2. §5
    Models
    LLaMA, GPT-2, Mistral, Falcon, GGUF loading, tokenization.
  3. §4
    Transformers
    Multi-head attention, FFNs, full blocks.
  4. §3
    Neural primitives
    Activations (SwiGLU, GELU), normalization (RMSNorm), RoPE.
  5. §2
    Linear algebra
    SIMD matrix ops, K-quantization, IQ-quantization.
  6. §1
    Foundation
    Tensors, memory management, memory mapping.
§ 1

18 architecture families

LLaMA, Mistral, GPT-2, Falcon, Qwen, Phi, GPT-J, GPT-NeoX, BLOOM, Mamba, BERT, Gemma, StarCoder, MoE, multi-modal, and more — implemented as a study of how each family differs from the others.

§ 2

Sampling, in full

Greedy, top-k, top-p, temperature, Mirostat, typical, tail-free, contrastive. Grammar-constrained generation: JSON, regex, CFG. The decoding loop is no longer a black box.

§ 3

GGUF, end to end

Memory-mapped GGUF loading compatible with the llama.cpp ecosystem. Read a pre-trained model from disk, decode the tensors yourself, and run a forward pass.

Why Zig

A language that gets out of the way.

Manual memory management makes the cost of every allocation legible. comptime generics let a single tensor type specialize to f32, f16, or quantized formats without runtime dispatch. First-class SIMD intrinsics make matmul kernels readable instead of obscured behind a framework.

The project also serves as evidence: Zig is a viable language for ML/AI workloads, with no runtime, no garbage collector, and no Python on the hot path.

Begin reading

Open chapter one.

Clone the repo, run the tests, and follow the doc tree. The fastest path to understanding is the one you walk in code.