LLM Engineering

This is the section most readers came for. LLM engineering is the discipline of building reliable software on top of large language models — models you call, not models you train. It demands a different mindset than ordinary programming: your core dependency is non-deterministic, occasionally wrong, and priced per word.

In this section

How LLMs Work Next-token prediction, pretraining vs. post-training, and a grounded view of what LLMs can and can't do.

Context & Decoding Tokens, context windows, embeddings, and the decoding parameters — temperature, top-p — that shape every output.

Adapting LLMs Prompting vs. RAG vs. fine-tuning — the decision framework, plus LoRA and how to evaluate LLM output.

What you’ll be able to do

Reason about why an LLM produced a given output, choose decoding settings deliberately, estimate token cost, and decide between prompting, retrieval, and fine-tuning for a real feature.

Prerequisites

Deep Learning — particularly the transformer and self-attention.