Neel Nanda (at ICLR)

neelnanda.io

2 Followers

community-curated profile

Mechanistic Interpretability research @DeepMind. Formerly @AnthropicAI, independent In this to reduce AI X-risk. Neural networks can be understood, let's do it!

Overview Posts Content Recommendations

Paper May 2, 2023

Progress measures for grokking via mechanistic interpretability

by Neel Nanda (at ICLR)

Recommended by 1 person

1 mention

Paper

Finding Neurons in a Haystack: Case Studies with Sparse Probing

by Wes Gurnee and Neel Nanda (at ICLR)

Recommended by 1 person

1 mention

Paper Feb 6, 2023

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

by Bilal Chughtai and Neel Nanda (at ICLR)

Neel Nanda (at ICLR)

Most Recommended