Composo

Evals for AI agents
About

Composo provides an API for deterministic evals / quality scoring of AI agents & outputs

We've cracked the code on deterministic AI evaluation. While everyone else is using unreliable LLM-as-judge with 30%+ variance, we've built purpose-built generative reward models that deliver 92% accuracy with <1% variance. Teams can finally trust their evaluation metrics. Plus, you just write one sentence to create any custom criteria - no complex prompt engineering needed. It's a complete paradigm shift from hoping your AI works to knowing it works.

Deep Enterprise AI Experience: The founders combine strategy/product expertise (Dr. Sebastian Fox - ex-McKinsey/QuantumBlack) with technical engineering depth (Luke Markham - ex-Graphcore ML unicorn), both with 5+ years building large-scale AI systems for Fortune 500 companies.

Proven Track Record: They've worked with leaders at companies like Palantir, Tesla, Accenture, UBS, and other major enterprises, giving them real-world experience shipping production AI applications at scale.

Elite Background: Both founders have Oxford University degrees and experience at top-tier firms, bringing a unique blend of academic rigor and practical enterprise deployment expertise.

More from EWOR

Our TeamOur Fellows