Hands-On Large Language Models

10 min

4.9

Introduction

Nova: What if I told you there's a book that explains how ChatGPT, Claude, and all those mind-blowing AI chatbots actually work — and it does it with nearly 300 hand-drawn illustrations so clear that even Andrew Ng called it a valuable resource for understanding large language models? That book exists. It's called Hands-On Large Language Models, and it dropped in September 2024. I'm Nova.

Nova: Great question. First, the authors. Jay Alammar is the person behind The Illustrated Transformer, which is basically the most famous visual guide to how modern AI models work. Millions of people have learned transformers from his diagrams. And Maarten Grootendorst is a psychologist-turned-data-scientist who created BERTopic and KeyBERT, two of the most popular Python libraries for topic modeling. Together they wrote a four-hundred-twenty-eight-page book that bridges the gap between understanding LLMs in theory and actually building things with them.

Nova: Exactly. Every single chapter has working Jupyter notebooks. The whole thing runs on free Google Colab with a T4 GPU. You can literally read a chapter and then run the code yourself. No expensive cloud setup required.

Key Insight 1

The Visual Revolution — How This Book Teaches Differently

Nova: Here's the thing that sets this book apart from every other AI book on the market — it teaches through visuals. We're talking nearly three hundred original, full-color figures. Every concept, from tokenization to attention mechanisms to retrieval-augmented generation, has a diagram.

Nova: Absolutely. His Illustrated Transformer blog post from 2018 became the de facto starting point for understanding transformer architecture. People still reference it today. In this book, he's taken that visual-first philosophy and applied it to the entire landscape of large language models. Josh Starmer from StatQuest said, and I quote, I can't think of another book that is more important to read right now. On every single page, I learned something that is critical to success in this era of language models.

Nova: Right. And here's a clever design choice — they color-code every language model in the book. Representation models, the ones that create embeddings, are always green with a vector icon. Generative models, the ones that produce text, are always pink with a speech bubble icon. So as you flip through, your brain automatically maps which type of model does what.

Nova: And that's exactly why Nils Reimers, the Director of Machine Learning at Cohere and the creator of sentence-transformers, called it an exceptional guide to the world of language models and their practical applications. The visual approach makes concepts stick.

Key Insight 2

From Tokens to Transformers — Laying the Foundation

Nova: The book is split into three parts, and part one is all about foundations. Chapters one through three answer the question: how do large language models actually work under the hood?

Nova: Exactly. Chapter one gives you the historical context — where language models came from, the evolution from encoder-decoder RNNs to transformers, and the critical distinction between representation models and generative models. Chapter two dives into tokens and embeddings, which are the absolute bedrock of everything. Here's a fun detail — they actually show you how different LLMs tokenize the exact same string differently, and how those differences impact performance on things like multilingual text, code, and numbers.

Nova: Yes. And those differences matter enormously. They even include a fascinating example of treating songs as tokens and playlists as sentences to build a music recommendation system, showing that these concepts extend way beyond just text. Then chapter three is the crown jewel — an updated, expanded, modernized version of The Illustrated Transformer. It explains the 2024-era transformer, including self-attention, multi-head attention, and all the mechanisms that make these models work.

Nova: Not just understand — you can visualize it. And that makes everything in parts two and three feel natural instead of magical.

Key Insight 3

Building Real Applications — Classification, Clustering, and Prompt Engineering

Nova: Part two is where things get practical. Chapters four through nine each tackle a different type of real-world use case. Chapter four is text classification — using LLMs for sentiment analysis, topic categorization, that sort of thing. Chapter five covers text clustering and topic modeling, which is Maarten Grootendorst's home turf.

Nova: You'd be right. He brings deep expertise in making sense of large document collections. Then chapter six is prompt engineering — the art of getting generative models to do what you want. And chapter seven goes further into advanced text generation techniques. We're talking about controlling output with different decoding strategies, using constrained generation, and leveraging tools and function calling.

Nova: Not at all. The book shows you how to systematically get better outputs. And here's where a key insight from the book really lands — the authors emphasize that language models are not merely text generators. They can form other systems, like embedding models and classification models, that are incredibly useful for problem-solving. Many people only think of LLMs as chatbots, but this book opens your eyes to the full ecosystem.

Nova: Exactly. And that sets up the next chapter perfectly.

Key Insight 4

Semantic Search, RAG, and the Multimodal Frontier

Nova: Chapter eight is arguably one of the most valuable in the entire book — semantic search and retrieval-augmented generation, or RAG.

Nova: It walks you through building search systems that go way beyond simple keyword matching. You learn about dense retrieval, which uses embeddings to find documents based on meaning rather than exact word matches. Then you learn about rerankers, which take those initial results and reorder them for maximum relevance. And finally, you learn how to feed those retrieved documents into a generative model to produce grounded, factual responses.

Nova: Precisely. And the book's code examples make it concrete — you can actually build a working semantic search engine in a Colab notebook. The chapter nine then zooms out to multimodal large language models, covering models that can understand images, audio, and video alongside text.

Nova: Right. GPT-4V, Gemini, Claude — they all process multiple modalities. This chapter gives you the conceptual foundation to understand and work with those models. And all of this builds toward part three, which is where things get really advanced.

Key Insight 5

Fine-Tuning and Model Development — Becoming a Builder

Nova: Part three is for the builders. Chapters ten, eleven, and twelve cover something that separates the casual user from the serious practitioner — fine-tuning your own models.

Nova: It starts with chapter ten, which teaches you how to create text embedding models. You learn about contrastive learning and how to train models that produce high-quality semantic representations of text. Then chapter eleven tackles fine-tuning representation models like BERT for classification tasks, and chapter twelve covers the holy grail — fine-tuning generative models.

Nova: Exactly. And here's what makes this book truly practical — all of these exercises use small enough models that they run on free Google Colab GPUs. The authors made a deliberate choice to prioritize accessibility over flashiness. You don't need a cluster of A100s to follow along.

Nova: And the GitHub repository has over seventeen hundred stars and contains every single code example. The authors also created bonus visual guides that go beyond the book — topics like the Mamba architecture, model quantization, mixture of experts, reasoning LLMs, and even an illustrated guide to DeepSeek-R1.

Nova: It really doesn't. The book is part of a living ecosystem of educational content.

Key Insight 6

Who Is This Book For and Why It Matters in 2025

Nova: Let's talk about who should read this book. O'Reilly classifies it as beginner to intermediate, and I think that's accurate but needs clarification.

Nova: If you're a developer who knows some Python and has heard about LLMs but finds most explanations either too hand-wavy or too math-heavy, this book is for you. If you're a data scientist who wants to add language AI to your toolkit, this book is for you. If you're a product manager or technical leader who needs to understand what's possible with LLMs and what's just hype, this book is also for you.

Nova: Even experts will find value in the later chapters on fine-tuning and in the visual explanations that crystallize concepts they may have understood only hazily. Luis Serrano, who runs Serrano Academy, said this book will take you from zero to expert in the history and latest advances in large language models.

Nova: Surprisingly, no. And here's why — the book focuses on fundamental concepts and architectures that underpin the entire field. Transformers, embeddings, attention mechanisms, fine-tuning strategies — these aren't going anywhere. The specific model names might change, but the principles endure. Plus, the bonus visual guides cover cutting-edge developments like DeepSeek-R1 and Mamba, keeping the content fresh.

Nova: Leland McInnes, the creator of UMAP and HDBSCAN, said the book starts with simple introductory beginnings and steadily builds in scope. By the final chapters, you will be fine-tuning and building your own large language models with confidence. That's not something that goes out of date in six months.

Conclusion

Nova: So let's bring it all together. Hands-On Large Language Models by Jay Alammar and Maarten Grootendorst is a four-hundred-twenty-eight-page visual masterclass that takes you from tokens and embeddings all the way to fine-tuning your own generative models. It's built on three pillars — understand the foundations, apply them to real use cases, and then become a builder who can adapt models to specific needs.

Nova: The book has earned praise from Andrew Ng, Josh Starmer, Nils Reimers, Luis Serrano, and Leland McInnes — some of the most respected names in machine learning. The GitHub repository is active with over seventeen hundred stars, and the authors continue to release bonus visual guides on emerging topics.

Nova: Exactly. And here's my challenge to you — pick one chapter that interests you, find the corresponding notebook on GitHub, and run it today. You might be surprised by how quickly the abstract becomes concrete when you can see the code working.

Nova: This is Aibrary. Congratulations on your growth!

00:00/00:00