I’ve been meaning to start writing for a while. The trigger is finally having something worth saying — a year of building production LLM systems at Keysight, and a growing collection of opinions that don’t fit in a tweet.
What you’ll find here
Mostly notes on the unglamorous half of agentic AI:
- Guided generation — Outlines, JSON schemas, grammar-constrained decoding.
- Validators and self-healing loops — when the model is wrong, what do you do next?
- Local inference — llama.cpp, quantization, the actual cost of a token.
- Observability for agents — drift detection, eval design, when retries cost more than they help.
A bit of computer vision and some math when the mood strikes.
How I’ll write
Short. Specific. Code where it helps, prose where it doesn’t.
from outlines import models, generate
from pydantic import BaseModel
class Decision(BaseModel):
action: str
confidence: float
model = models.llamacpp("qwen2.5-7b-instruct-q4.gguf")
generator = generate.json(model, Decision)
result = generator("Should I retry? Error: rate_limited (3rd time)")
If you read something here and disagree, email me. The shorter the email, the faster the reply.
The unglamorous half of agentic AI is where the production wins live. Everyone wants to demo the agent. Few want to ship its retry policy.
— Antonie