Anthropic

3 Followers

community-curated profile

We're an AI research company that builds reliable, interpretable, and steerable AI systems. Our first product is Claude, an AI assistant for tasks at any scale.

Overview Content

Article Mar 8, 2023

Core Views on AI Safety: When, Why, What, and How

by Anthropic

Recommended by 1 person

1 mention

Tweet Oct 5, 2023

The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move

by Anthropic

Tweet May 4, 2023

In this short note, we explore thinking of the traditional idea of "distributed representations" as two distinct phenomena: "composition" and "superposition". We walk through toy examples from Thorpe (1989), discussing them from this lens. twitter.co

by Anthropic

Anthropic

Most Recommended