ICML 2025 · April 2025

MIB: A Mechanistic Interpretability Benchmark

Authors: Aaron Mueller*, Atticus Geiger*, +18 others (including me), David Bau, Yonatan Belinkov

Tl;dr: This mainly combined a bunch of benchmarks that various people made into one nice package. I helped add a couple of InterpBench models.

← Go Back
arXiv →

I’m not a main contributor so I’d rather just let you read the paper itself. You should definitely reach out to Aaron Mueller if you have any questions.