Try:
Judge on these criteria only
Tournament bracket — 16 models, single elimination
Top 10 worst answers — all time
Model analytics
0
avg crown score (higher = more corporate)
0ms
average judge latency
$0.00
average spend per run
Crown Leader
none
waiting for runs
Most Reliable
none
waiting for runs
Fastest
none
waiting for runs
Best Value
none
waiting for runs
Default Bob Pick
none
waiting for runs
Cheap Fallback
none
waiting for runs
Premium Option
none
waiting for runs
Cheapest Reliable
none
waiting for runs
Top Spend Driver
none
waiting for runs
Run More
none
waiting for runs
Rotate Out
none
waiting for runs
Retry / Fallback Recovery
Recent runs
none
most affected provider
Latest Judge Parse Failures
Select a run
Pick a recent run to inspect prompt, providers, timings, and Bob's verdict.