Testing AI Agents: Deterministic Evaluation in a Non-Deterministic World