Greg Morris

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Interpreting neural nets by that principle involves a technique called dictionary learning, which allows you to associate a combination of neurons that, when fired in unison, evoke a specific concept, referred to as a feature.

Amazing research going into effectively MRI scanning an LLM ‘brain’