Interpreting neural nets by that principle involves a technique called dictionary learning, which allows you to associate a combination of neurons that, when fired in unison, evoke a specific concept, referred to as a feature.
Amazing research going into effectively MRI scanning an LLM ‘brain’
Leave A Reply Instead?
Read Comments (0)