LLM Engineering
This is the section most readers came for. LLM engineering is the discipline of building reliable software on top of large language models — models you call, not models you train. It demands a different mindset than ordinary programming: your core dependency is non-deterministic, occasionally wrong, and priced per word.
In this section
Section titled “In this section” How LLMs Work Next-token prediction, pretraining vs. post-training, and a grounded view of what LLMs can and can't do.
Context & Decoding Tokens, context windows, embeddings, and the decoding parameters — temperature, top-p — that shape every output.
Adapting LLMs Prompting vs. RAG vs. fine-tuning — the decision framework, plus LoRA and how to evaluate LLM output.
What you’ll be able to do
Section titled “What you’ll be able to do”Reason about why an LLM produced a given output, choose decoding settings deliberately, estimate token cost, and decide between prompting, retrieval, and fine-tuning for a real feature.
Prerequisites
Section titled “Prerequisites”Deep Learning — particularly the transformer and self-attention.