Areas of expertise






AI Evals
Human EvalsAI QualityModel Behavior
AI is unpredictable in a way normal software isn't. The same product can be brilliant one moment and confidently wrong the next, and most teams have no reliable way to tell which they're shipping. A model predicts patterns in data; it doesn't actually understand what your users need. That's the gap I close. I build the testing and measurement layer that tells you whether your AI is genuinely good, not just whether the demo went well. Using real users in real situations, I turn "it usually seems fine" into something you can actually measure, trust, and improve before it reaches the people you're building for.
01 / 06