
Agents & Architecture
Where Did the Agent Go Wrong: From Answer Accuracy to Process Evaluation
Starting from the high-score illusion where the top-accuracy model ranks last on utility, this piece lays out why evaluation is moving to process and maps the methods of seven papers. Series Part 1.