Education

How artificial intelligence actually works

From a 1950s thought experiment to ChatGPT — the whole story of AI, machine learning and n-grams, told so anyone can get it. Click the timeline, train a tiny brain, and watch a computer predict the next word. No maths degree required.

  • History of AI
  • Machine learning
  • How machines learn
  • Neural networks
  • N-grams & language models

Everything below is interactive — scrub the timeline, drag the sliders, and make a real model finish your sentence.

Start here

So… what is artificial intelligence, really?

Artificial intelligence is just getting computers to do things that normally need human smarts — recognising a face, understanding a sentence, winning a game, suggesting the next word you'll type. For decades we tried to do that by writing down every rule by hand. It barely worked. The breakthrough was letting the computer figure out the rules by itself, from mountains of examples. That idea is machine learning, and it's the engine under almost every "AI" you hear about today.

Old way: hand-written rules

Humans spell out every "if this, then that". Fine for simple things, hopeless for messy reality like language or vision.

New way: learn from examples

Show the machine thousands of examples and let it adjust itself until it gets them right. It discovers the rules nobody could write down.

Today: predict the next thing

Modern AI is mostly a giant prediction machine — guess the next word, the next pixel, the next move — trained on the whole internet.

This guide walks the whole journey, in order: where AI came from, how a machine learns, what a neural network is, and the deceptively simple trick — n-grams — that still explains how ChatGPT-style models "talk". Each stop has something to click.

The epic backstory

70+ years of AI, in one scrubbable timeline

AI isn't new — it's older than the moon landing. It boomed, crashed (twice!), and roared back. Tap any year to see what happened, or press play and ride the whole story from Turing to ChatGPT.

The "AI winters"

Twice, AI over-promised and under-delivered — funding froze and the whole field went quiet (mid-1970s and late-1980s). Each winter ended when a new idea — better algorithms, more data, faster chips — thawed it out. Today's boom is the spring after the second winter.

The big shift

Machine learning: stop coding rules, start showing examples

Imagine writing a program to tell cats from dogs by listing rules: "pointy ears, whiskers, …". You'd give up fast. Machine learning flips it: you show the computer thousands of labelled photos and it learns the difference itself, tuning millions of tiny internal numbers until it's usually right. Flip the switch below to feel the difference.

The three classic flavours of learning

Almost all machine learning is one of these three. Hover or tap a card.

Supervised

Learn from examples that come with the right answer (the "label"). The classic workhorse.

Like: spam vs not-spam from emails you've already sorted.

Unsupervised

No labels — the machine finds hidden patterns and groups by itself.

Like: a shop clustering customers into "types" nobody named in advance.

Reinforcement

Learn by trial and error, chasing rewards and avoiding penalties.

Like: a program mastering a game by playing it a million times.
The learning loop

How does a machine actually "learn"? It rolls downhill.

Training is a loop: the model makes a guess, we measure how wrong it was (the "loss"), and we nudge its internal numbers to be a little less wrong. Repeat thousands of times. Picture the loss as a valley — the model is a ball trying to roll to the bottom, where it's least wrong. That roll is called gradient descent.

Drag the learning rate (how big each step is) and press Train. Too small and it crawls; too big and it bounces and overshoots; just right and it settles into the valley.

Step0
Loss (wrongness)
Learning rate (step size) 0.10

That's the whole secret of training, scaled up: a model can have billions of those internal numbers (called weights), and gradient descent quietly nudges every single one, over and over, until the predictions are good.

The brain analogy

Neural networks: lots of tiny decisions stacked up

A neural network is built from artificial neurons, loosely inspired by brain cells. Each neuron is dead simple: it takes some inputs, multiplies each by a weight (how much that input matters), adds them up, and "fires" if the total clears a threshold. The magic isn't one neuron — it's millions of them, and the weights are exactly what gradient descent learns.

Meet one neuron deciding "should I go outside?". Flip the inputs on/off, drag how much each one matters, and watch the neuron fire — or not.

Firing threshold 1.0

Stack them into layers

Wire the outputs of one row of neurons into the next. Early layers spot simple things (an edge, a curve); later layers combine those into bigger ideas (an eye, a face). Information flows forward, layer by layer.

"Deep" just means many layers

Deep learning is nothing mystical — it's a neural network with many layers stacked deep. More layers can capture more abstract patterns, which is why deep networks cracked vision and language once we had the data and the chips to train them.

The star of the show

N-grams: how a computer guesses the next word

Before neural language models, there were n-grams — and they're the clearest way to feel how a machine writes. The idea: to predict the next word, just look at the last few words and ask "in real text, what usually came next?". Count it up across a whole book and you can finish sentences. Your phone's keyboard does exactly this.

Build a baby language model — live

We trained a real model on the little story below by counting which words follow which. Pick how much context it gets, then click the predicted words (or hit autopilot) to write a sentence. Watch how bigger context = more coherent.

Most likely next word — click one to add it

Why bigger n is better… and harder

A unigram ignores context and picks common words — gibberish. A bigram looks 1 word back; a trigram, 2 words back — more context, more coherent. But each extra word of context needs far more text to have seen every combination. That trade-off (coherence vs. data) is the n-gram's whole life.

You use this every day

The word suggestions above your phone keyboard are basically an n-gram model trained on what people type. It's why it nails "see you later" but fumbles a sentence with a long-range twist — it only remembers the last word or two.

Quick game: can you beat the model?

Here's a sentence with the last word hidden. Guess it — then see what the bigram model predicted.

The leap

From counting words to ChatGPT

N-grams hit a wall: they only see a couple of words back, and they have no idea what words mean — "king" and "queen" are just different strings to them. Modern language models fixed both problems. First, text is chopped into tokens and turned into numbers a network can chew on.

Type anything — watch it get split into tokens (the chunks a model actually reads). Notice it often splits inside long or rare words. (Simplified for illustration.)

Each chunk becomes a number (an ID). Real models like GPT have a vocabulary of ~50,000–100,000 tokens, and a rule of thumb is ~4 characters ≈ 1 token.

The 2017 breakthrough: attention

The Transformer ("Attention Is All You Need", 2017) let a model weigh every earlier word at once and learn which ones matter — however far back. No more "last two words only". That's how a model keeps track of who "she" refers to three sentences ago.

Words become meaning

Instead of treating words as strings, the network learns a vector (a list of numbers) for each one, so related words sit near each other. Train this on much of the internet, at enormous scale, and you get a model that can chat, summarise and code.

Want to go one level deeper?

Those word-vectors, semantic search and how AI answers over your own documents are the whole story of our companion guide. Open “RAG & Vectors” →

Where we are now

AI today: amazing, useful, and not magic

Narrow, not general

Every AI today is narrow — brilliant at one thing (chat, chess, images). A single system as flexible as a human (AGI) doesn't exist yet.

It can be confidently wrong

A model predicts plausible words, not verified truth, so it can "hallucinate" — sound sure while being wrong. Always double-check.

It mirrors its data

A model learns from human text, so it can absorb our biases and mistakes. Good AI needs careful data and human oversight.

A tool for people

It already drafts text, translates, codes, spots tumours and powers your maps. At its best it's a power tool that makes humans faster — not a replacement for thinking.

In one breath

The whole journey, recapped

  1. AI = making computers do things that need human-like smarts.
  2. The big shift: stop hand-writing rules, start learning from examples — that's machine learning.
  3. Learning = guess, measure the error, nudge the numbers, repeat — rolling downhill via gradient descent.
  4. Neural networks stack millions of tiny "weigh-the-inputs-and-fire" neurons; deep = many layers.
  5. N-grams predict the next word by counting what usually follows — simple, and the soul of language models.
  6. Transformers (2017) added attention and meaning, scaling n-grams' idea into ChatGPT.
  7. Today's AI is powerful but narrow — it predicts, it doesn't truly "understand". Use it, and check it.