Design Principles
Designing AI systems is mostly about unlearning an assumption from ordinary software: that a given input produces a known, correct, repeatable output. With an LLM, none of that holds. These principles are how you build reliably anyway.
1. Design for non-determinism
Section titled “1. Design for non-determinism”The same input can yield different outputs. So:
- Never assume an exact response — parse and validate what comes back.
- Don’t put unvalidated model output straight into a database, a query, or a UI.
- Make operations idempotent where you can, so a retry is always safe.
The model is a source of proposals, not a source of truth. Your code decides what to trust.
2. Constrain the task
Section titled “2. Constrain the task”A model’s reliability is inversely proportional to how open-ended its job is.
Whenever possible, narrow the task. Turn “answer the user” into “pick one of these five intents.” Turn “write the response” into “fill these named fields.” Closed tasks are easier to constrain, validate, and evaluate.
3. Constrain the output
Section titled “3. Constrain the output”Free-form text is hostile to downstream code. Demand structured output — JSON matching a schema — so the result is machine-checkable.
schema = { "type": "object", "properties": { "sentiment": {"enum": ["positive", "negative", "neutral"]}, "confidence": {"type": "number"}, }, "required": ["sentiment", "confidence"],}# Use the provider's structured-output / JSON-schema mode, then still validate.A schema turns “hope the model formatted it right” into “reject it if it didn’t.”
4. Plan for failure explicitly
Section titled “4. Plan for failure explicitly”Every LLM call has more failure modes than a normal function. Enumerate them and handle each:
| Failure | Mitigation |
|---|---|
| Timeout / slow response | Aggressive timeout + retry with backoff |
| Rate limited (429) | Backoff, queue, or fail over to another provider |
| Malformed / invalid output | Schema validation, then one re-ask |
| Hallucinated content | Ground with retrieval; verify before use |
| Provider outage | Fallback model or graceful degradation |
| Unsafe / off-topic output | Guardrails on input and output |
A demo handles the happy path. A system handles this table.
5. Verify, don’t trust
Section titled “5. Verify, don’t trust”The model proposes; your system disposes. Before acting on output:
- Validate structure — schema, types, required fields.
- Validate semantics — is the cited document ID real? Is the number in range?
- Cross-check when stakes are high — run the model’s SQL read-only first; confirm a quoted fact against the source.
- Keep humans in the loop for irreversible or high-impact actions.
6. Make it observable
Section titled “6. Make it observable”You cannot debug what you cannot see, and LLM systems fail quietly — a slightly worse answer, not a stack trace. Log, for every call: the full prompt, the raw response, token counts, latency, cost, model version, and which retrieved context was used. Without this, a production complaint is unreproducible.
7. Guardrails on both sides
Section titled “7. Guardrails on both sides”A guardrail is a check around the model, independent of it:
- Input guardrails — block prompt injection, off-topic requests, and abuse before spending a model call.
- Output guardrails — scan responses for leaked secrets, unsafe content, or policy violations before they reach the user.
Guardrails are deterministic code or separate classifiers — never “we asked the model nicely not to.” The threats they defend against — prompt injection, data leakage, unsafe output — are covered in depth in AI Safety & Security.
8. Keep the model swappable
Section titled “8. Keep the model swappable”The model landscape changes monthly. Isolate provider calls behind your own interface so you can switch models, route by task, or fall back without rewriting the application.
class LLMClient: def complete(self, prompt: str, *, schema=None) -> Result: ...# App code depends on LLMClient — never on a vendor SDK directly.Key takeaways
Section titled “Key takeaways”Build for non-determinism: validate everything, never trust raw output. Constrain both the task and the output format to make the model reliable and checkable. Enumerate failure modes and handle each one. Verify outputs against ground truth before acting. Instrument every call, wrap the model in guardrails, and keep it swappable behind your own abstraction.