
Three Altitudes for Evaluating a Well-Built Agent: Score, Claim, Substrate
A method-first read of the three papers that evaluate well-packaged agents (TRACE, DeepHalluBench, TRAIL): the altitude that rewrites the score, the one that verifies claims, the one that changes the substrate. Series Part 2.
