Design Principles

Designing AI systems is mostly about unlearning an assumption from ordinary software: that a given input produces a known, correct, repeatable output. With an LLM, none of that holds. These principles are how you build reliably anyway.

1. Design for non-determinism

The same input can yield different outputs. So:

Never assume an exact response — parse and validate what comes back.
Don’t put unvalidated model output straight into a database, a query, or a UI.
Make operations idempotent where you can, so a retry is always safe.

The model is a source of proposals, not a source of truth. Your code decides what to trust.

2. Constrain the task

A model’s reliability is inversely proportional to how open-ended its job is.

Whenever possible, narrow the task. Turn “answer the user” into “pick one of these five intents.” Turn “write the response” into “fill these named fields.” Closed tasks are easier to constrain, validate, and evaluate.

3. Constrain the output

Free-form text is hostile to downstream code. Demand structured output — JSON matching a schema — so the result is machine-checkable.

schema = {
    "type": "object",
    "properties": {
        "sentiment":  {"enum": ["positive", "negative", "neutral"]},
        "confidence": {"type": "number"},
    },
    "required": ["sentiment", "confidence"],
}
# Use the provider's structured-output / JSON-schema mode, then still validate.

A schema turns “hope the model formatted it right” into “reject it if it didn’t.”

4. Plan for failure explicitly

Every LLM call has more failure modes than a normal function. Enumerate them and handle each:

Failure	Mitigation
Timeout / slow response	Aggressive timeout + retry with backoff
Rate limited (429)	Backoff, queue, or fail over to another provider
Malformed / invalid output	Schema validation, then one re-ask
Hallucinated content	Ground with retrieval; verify before use
Provider outage	Fallback model or graceful degradation
Unsafe / off-topic output	Guardrails on input and output

A demo handles the happy path. A system handles this table.

5. Verify, don’t trust

The model proposes; your system disposes. Before acting on output:

Validate structure — schema, types, required fields.
Validate semantics — is the cited document ID real? Is the number in range?
Cross-check when stakes are high — run the model’s SQL read-only first; confirm a quoted fact against the source.
Keep humans in the loop for irreversible or high-impact actions.

6. Make it observable

You cannot debug what you cannot see, and LLM systems fail quietly — a slightly worse answer, not a stack trace. Log, for every call: the full prompt, the raw response, token counts, latency, cost, model version, and which retrieved context was used. Without this, a production complaint is unreproducible.

7. Guardrails on both sides

A guardrail is a check around the model, independent of it:

Input guardrails — block prompt injection, off-topic requests, and abuse before spending a model call.
Output guardrails — scan responses for leaked secrets, unsafe content, or policy violations before they reach the user.

Guardrails are deterministic code or separate classifiers — never “we asked the model nicely not to.” The threats they defend against — prompt injection, data leakage, unsafe output — are covered in depth in AI Safety & Security.

8. Keep the model swappable

The model landscape changes monthly. Isolate provider calls behind your own interface so you can switch models, route by task, or fall back without rewriting the application.

class LLMClient:
    def complete(self, prompt: str, *, schema=None) -> Result: ...
# App code depends on LLMClient — never on a vendor SDK directly.

Key takeaways

Build for non-determinism: validate everything, never trust raw output. Constrain both the task and the output format to make the model reliable and checkable. Enumerate failure modes and handle each one. Verify outputs against ground truth before acting. Instrument every call, wrap the model in guardrails, and keep it swappable behind your own abstraction.