Back to writing
· Meta · 1 min read

Welcome — what this blog is about

Why I'm starting this, what to expect, and the kind of writing I want to do here.

I’ve been meaning to start writing for a while. The trigger is finally having something worth saying — a year of building production LLM systems at Keysight, and a growing collection of opinions that don’t fit in a tweet.

What you’ll find here

Mostly notes on the unglamorous half of agentic AI:

  • Guided generation — Outlines, JSON schemas, grammar-constrained decoding.
  • Validators and self-healing loops — when the model is wrong, what do you do next?
  • Local inference — llama.cpp, quantization, the actual cost of a token.
  • Observability for agents — drift detection, eval design, when retries cost more than they help.

A bit of computer vision and some math when the mood strikes.

How I’ll write

Short. Specific. Code where it helps, prose where it doesn’t.

from outlines import models, generate
from pydantic import BaseModel

class Decision(BaseModel):
    action: str
    confidence: float

model = models.llamacpp("qwen2.5-7b-instruct-q4.gguf")
generator = generate.json(model, Decision)
result = generator("Should I retry? Error: rate_limited (3rd time)")

If you read something here and disagree, email me. The shorter the email, the faster the reply.

The unglamorous half of agentic AI is where the production wins live. Everyone wants to demo the agent. Few want to ship its retry policy.

— Antonie