Founder & CEO at
METR, building evaluations so we know if we're getting close to very risky AI. Formerly at DeepMind and OpenAI.
Some research highlights:
-
Measuring AI ability to complete long tasks
- We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has
been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend
predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
-
GPT-5 autonomy evaluation report
- We evaluate whether GPT-5 poses significant catastrophic risks via AI self-improvement,
rogue replication, or sabotage of AI labs. We conclude that this seems unlikely. However,
capability trends continue rapidly, and models display increasing eval awareness.
-
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
- We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity
of experienced open-source developers working on their own repositories. Surprisingly, we find that when
developers use AI tools, they take 19% longer than without: AI makes them slower. We view this result as a
snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve,
we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation
-
Resources for Autonomy Evaluations
- task suite, evaluation protocol, estimates of the "elicitation gap"
-
Evaluating LLM Agents on Realistic Autonomous Tasks
-
Evaluating LLMs trained on code
(alignment section)
-
Obfuscated arguments problem
- a problem with recursive-decomposition-based alignment approaches
-
"Imitative generalisation"
- explainer for Paul Christiano's 'Learning the Prior'
-
Risks from AI persuasion
- thoughts on the likelihood and consequences of superhuman persuasion before AGI
-
Reflection mechanisms as an alignment target
- work done by my AI safety camp mentees surveying Mechanical Turkers on their feelings towards different reflection mechanisms
I sometimes post alignment-related thinking
here.
Contact me at: beth dot m dot surname at gmail.com
Follow me on Twitter/X.
If you have any feedback for me, I'd love to hear it. You can submit it anonymously (or pseudonymously) here.