From 72543e7bd7aa19bcfa29c2f9c1ef4bafc5d2dad2 Mon Sep 17 00:00:00 2001
From: David Lohmann <5475305+DLohmann@users.noreply.github.com>
Date: Sun, 12 Oct 2025 00:49:57 -0700
Subject: [PATCH] Add visual to section3.md

---
 _includes/section3.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/_includes/section3.md b/_includes/section3.md
index cebdb4d..67d14f3 100644
--- a/_includes/section3.md
+++ b/_includes/section3.md
@@ -1,5 +1,6 @@
 
-In this section, we'll explore a number of concepts which will take us from the decoder-only Transformer architecture towards understanding the implementation choices and tradeoffs behind many of today's frontier LLMs. If you first want a birds-eye view the of topics in section and some of the following ones, the post ["Understanding Large Language Models"](https://magazine.sebastianraschka.com/p/understanding-large-language-models) by Sebastian Raschka is a nice summary of what the LLM landscape looks like (at least up through mid-2023). 
+In this section, we'll explore a number of concepts which will take us from the decoder-only Transformer architecture towards understanding the implementation choices and tradeoffs behind many of today's frontier LLMs. If you first want a birds-eye view the of topics in section and some of the following ones, the post ["Understanding Large Language Models"](https://magazine.sebastianraschka.com/p/understanding-large-language-models) by Sebastian Raschka is a nice summary of what the LLM landscape looks like (at least up through mid-2023). For a visual overview of the scale and processing done by GPT-2 (small), nano-GPT, GPT-2 (XL), and GPT-3, the [LLM Visualization](https://bbycroft.net/llm) by Brendan Bycroft provides a detailed guide ([source](https://github.com/bbycroft/llm-viz)).
+
 
 <h2>Tokenization</h2>