A Collection of Empirical Frames about Language Models

post by Daniel Tan (dtch1997) · 2025-01-02T02:49:05.965Z · LW · GW · 0 comments

Contents

  Representational Frames
  Functional Frames
  Changelog
None
No comments

What's the sum total of everything we know about language models? At the object level, probably way too much for any one person (not named Gwern) to understand. 

However, it might be possible to abstract most of our knowledge into pithily-worded frames (i.e. intuitions, ideas, theories) that are much more tractable to grok. And once we have all this information neatly written down in one place, unexpected connections may start to pop up. 

This post contains a collection of frames about models that are (i) empirically justified and (ii) seem to tell us something useful. (They are highly filtered by my experience and taste.) In each case I've distilled the key idea down to 1-2 sentences and provided a link to the original source. I've also included open questions for which I am not aware of conclusive evidence. 

I'm hoping that by doing this, I'll make some sort of progress towards "prosaic interpretability [LW · GW]" (final name pending). In the event that I don't, having an encyclopedia like this seems useful regardless. 

I'll broadly split the frames into representational and functional frames. Representational frames look 'inside' the model, at its subcomponents, in order to make claims about what the model is doing. Functional frames look 'outside' the model, at its relationships with other entities (e.g. data distribution, learning objectives etc) in order to make claims about the model. 

---

This is intended to be a living document; I will update this in the future as I gather more frames. I strongly welcome all suggestions that could expand the list here! 

Representational Frames

---

Functional Frames

Frames

---

Changelog

2 Jan: Initial post

0 comments

Comments sorted by top scores.