Agentic AI, local models, test harnesses

I build AI systems that have to work on real machines.

Field notes from Chicago. I care about latency, failure modes, and the small boring pieces that turn agent demos into systems people can actually trust.

Answerproof

An AI search visibility audit for B2B SaaS teams that need to know where buyers see competitors instead.

Agent systems that run

Durable tasks, routing, retries, and failure handling. Less demo, more system.

Local AI on real hardware

Mac Studio setups, model choice, latency tradeoffs, and when local is the right move.

Harnesses and evaluation

If you cannot replay or explain the failure, you do not have much of a harness.

multiharness

A local-first agent harness that routes between local and cloud models without hiding the tradeoffs.

multiharness case study

The short public page for why routing is the product and the failure path matters.

Field notes

Short notes on routing, failure visibility, and the parts of the system I actually trust.

Replayable benchmarks

A short note on why benchmark trust starts when the harness can replay the same failure.

The boundary layer is the product

A short note on why useful local AI stacks need routing, verification, and visible boundaries.

The workflow survives model swaps

A short note on why the real benchmark is whether the system still works after the model changes.

Stack

The public version of the setup: multiharness, ds4, and the reasoning behind both.