ICML 2025 · April 2025
MIB: A Mechanistic Interpretability Benchmark
Tl;dr: This mainly combined a bunch of benchmarks that various people made into one nice package. I helped add a couple of InterpBench models.
I’m not a main contributor so I’d rather just let you read the paper itself. You should definitely reach out to Aaron Mueller if you have any questions.