About
I lead RL environment development at Surge AI, the biggest post-training data and evals provider to frontier labs.
I believe the community discourse is in the dark ages of model evaluation. When a new model comes out, the discussion is driven by benchmarks (often contrived, broken, and gamed), hype-froth social media vibes, and leaderboards that are a plague on AI.
My goal with this newsletter is to provide something more useful: data-backed findings that give the qualitative “why” behind the quantitative results, colored by my personal experience as a power user of these tools.
