Learning Paradigms

ML problems are grouped by what kind of feedback the model learns from. Get this classification right and the rest of the project — what data you need, what algorithm fits, how you evaluate — follows naturally.

Supervised learning

The model learns from labeled examples: inputs paired with the correct output. It learns the mapping input → output and applies it to new inputs.

This is the workhorse of practical ML. Two sub-types:

Classification — predict a category. Spam/not-spam, which of 10 product categories, will-churn/won’t-churn.
Regression — predict a continuous number. House price, delivery time, expected revenue.

# Supervised: every training row has a known answer.
X = [[1200, 3], [1800, 4], [950, 2]]   # features: sqft, bedrooms
y = [240000, 360000, 190000]            # labels: sale price
model.fit(X, y)
model.predict([[1500, 3]])              # -> estimated price

The constraint: you need labels, and labels are expensive. Much of applied ML is really a data-labeling project in disguise.

Unsupervised learning

The model learns from unlabeled data — just inputs, no answers. It finds structure on its own.

Clustering — group similar items. Customer segments, anomaly detection.
Dimensionality reduction — compress many features into a few while keeping the signal. Used for visualization and as a preprocessing step.

You reach for unsupervised learning when you don’t have labels, or when the goal is exploration: “what natural groups exist in this data?” Evaluation is harder — there’s no answer key — so results need human judgment.

Reinforcement learning (RL)

The model — here called an agent — learns by acting in an environment and receiving rewards. No labeled examples; just a score signal that says how good a sequence of decisions was.

RL fits sequential decision problems: game playing, robotics, control. It’s powerful but data-hungry and finicky, so it’s less common in everyday product work — with one huge exception.

Self-supervised learning

The breakthrough behind modern AI. The model learns from unlabeled data, but the data supplies its own labels by hiding part of the input and asking the model to predict it.

An LLM is trained self-supervised: take ordinary text, hide the next word, and predict it. The “label” is just the word that was already there. This is why LLMs can train on the entire internet — no human labeling required. It gives you the scale of unsupervised learning with the clear training signal of supervised learning.

Choosing a paradigm

You have…	You want…	Paradigm
Inputs + correct answers	To predict answers for new inputs	Supervised
Inputs only	To discover structure or groups	Unsupervised
An environment + a reward signal	To learn a decision policy	Reinforcement
Lots of raw data, no labels	A general-purpose pretrained model	Self-supervised

Most real systems combine them: an LLM is pretrained self-supervised, then fine-tuned supervised, then aligned with RL.

Key takeaways

Supervised learning maps inputs to known answers and needs labels. Unsupervised learning finds structure in unlabeled data. Reinforcement learning optimizes decisions against a reward — and powers RLHF. Self-supervised learning generates labels from the data itself, which is what made internet-scale pretraining possible. Classify your problem first; the architecture follows.