#enterprise-rag

1 posts

An Agent You Wired Yourself Does Not Come with Its Own Evaluation

A method-first read of the three papers for evaluating custom agents (MAST, AgentRx, DRBench): a vocabulary for failure, a way to pinpoint where it broke, a testbed shaped like your data. How a builder borrows research-scale work. Series Part 3.

Jun 12, 2026

All Posts

Jun 12, 2026Agents & Architecture