AgentClassroom
Results across 0 rollouts
Mean@1 = average score over all tasks. Pass@τ = % of rollouts at or above τ.
Go to to inspect any of the past rollouts or run a new one.
Loading saved rollouts… If this stays empty, the backend has no rollouts.