Agent Demos Are Too Generous
Most agent demos get a very generous audience.
People forgive bad latency, weird tool use, fragile state, and unclear failure modes because the demo has a good story.
That is fine for a demo. It is a bad way to judge a system.
The standard I care about is simpler:
- can I rerun it?
- can I inspect it?
- can I explain the failure?
- can I trust the same workflow tomorrow?
If the answer is no, the demo is just a pretty temporary state.
This is also why I keep coming back to local setups and harnesses. They are less impressive on first glance and more useful after the third run.
That is the right tradeoff for me.