0 comments

Comments sorted by top scores.

comment by Amalthea (nikolas-kuhn) · 2023-11-14T18:47:01.992Z · LW(p) · GW(p)

And we have a good idea of what signals we care about.

Seems dubious. Or, understood narrowly, is an irrelevant tautology, and the real question is which signals are important (what we should care about), which again is unclear whether we know that.

At least it'd be good to give further evidence (sorry if that is elsewhere and I missed it).

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2023-11-15T01:24:22.016Z · LW(p) · GW(p)

I'm not confident enough to claim the statement is either likely wrong or a tautology, but I also do not know in what sense Nina thinks we have a good idea of what signals we care about, and ask for more clarification on this point.

comment by Charlie Steiner · 2023-11-14T22:01:22.112Z · LW(p) · GW(p)

Thanks!

I think one lesson of superposition research is that neural nets are the compressed version. The world is really complicated, and NNs that try to model it are incentivized to try to squeeze as much in as they can.

I guess I also wrote a hot take [? · GW] about this.

Replies from: firstuser-here

↑ comment by 1stuserhere (firstuser-here) · 2023-11-15T12:42:59.918Z · LW(p) · GW(p)

It's also worth noting that LLMs are not learning directly from the raw input stream but from a crux of that data (LLMs learn on compressed data) i.e. the LLMs are fed tokenized data, and the tokenizers act as compressors. This benefits the models by enabling them to have a more information-rich context.

Replies from: fergus-fettes

↑ comment by Fergus Fettes (fergus-fettes) · 2023-11-27T17:10:26.646Z · LW(p) · GW(p)

Would you say that tokenization is part of the architecture?

And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?