About

I lead RL environment development at Surge AI, the biggest post-training data and evals provider to frontier labs.

I believe the community discourse is in the dark ages of model evaluation. When a new model comes out, the discussion is driven by benchmarks (often contrived, broken, and gamed), hype-froth social media vibes, and leaderboards that are a plague on AI.

My goal with this newsletter is to provide something more useful: data-backed findings that give the qualitative “why” behind the quantitative results, colored by my personal experience as a power user of these tools.

Follow me on Twitter, LinkedIn, and GitHub.

User's avatar

Subscribe to Nick Heiner's Substack

Independent benchmarks, essays on the future of work, and dispatches from someone building AI products and testing AI agents every day at Surge AI.

People