Agentic AI, local models, test harnesses

I build AI systems that have to work on real machines.

Field notes from Chicago. I care about latency, failure modes, and the small boring pieces that turn agent demos into systems people can actually trust.

About Projects Answerproof multiharness Field Notes Stack Now Uses Archive

$ focus

Answerproof

An AI search visibility audit for B2B SaaS teams that need to know where buyers see competitors instead.

Agent systems that run

Durable tasks, routing, retries, and failure handling. Less demo, more system.

Local AI on real hardware

Mac Studio setups, model choice, latency tradeoffs, and when local is the right move.

Harnesses and evaluation

If you cannot replay or explain the failure, you do not have much of a harness.

multiharness

A local-first agent harness that routes between local and cloud models without hiding the tradeoffs.

multiharness case study

The short public page for why routing is the product and the failure path matters.

Field notes

Short notes on routing, failure visibility, and the parts of the system I actually trust.

Replayable benchmarks

A short note on why benchmark trust starts when the harness can replay the same failure.

The boundary layer is the product

A short note on why useful local AI stacks need routing, verification, and visible boundaries.

The workflow survives model swaps

A short note on why the real benchmark is whether the system still works after the model changes.

Stack

The public version of the setup: multiharness, ds4, and the reasoning behind both.

$ featured writing

Remote Agents Need A Run Trail

Jun 08, 2026

An agent can run somewhere else. The trail still has to be easy to inspect.
Answerproof: AI Search Audits Start With Buyer Questions

May 30, 2026

AI search visibility is not a rank tracker with a new label.
What 12 AI Search Pre-Checks Showed

May 30, 2026

I ran 12 small AI-search pre-checks while building Answerproof.
Replayable Benchmarks

May 22, 2026

A benchmark only starts earning trust when the harness can replay the same failure.
Memory Is Not Operating State

May 22, 2026

Agent memory should help recover context. It should not become the authority for what is true.