LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
Thanks, this is very interesting.
I wonder if this approach is extendable to learning to predict the next word from a corpus of texts...
The first layer might perhaps still be embedding from words to vectors, but what should one do then? What would be a possible minimum viable dataset?
Perhaps, in the spirit of PoC of the paper, one might consider binary sequences of 0s and 1s, and have only two words, 0 and 1, and ask what would it take to have a good predictor of the next 0 or 1 given a long sequence of those as a context. This might be a good starting point, and then one might consider different examples of that problem (different examples of (sets of) sequences of 0 and 1 to learn from).
abramdemski on Modern Transformers are AGI, and Human-LevelYeah, I didn't do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.
lblack on How to safely use an optimizerFirst thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe.
Maybe we can stop that by scrapping every Oracle that doesn't answer and training a new one with presumably new goals? Or would the newly trained Oracles just cooperate with the former dead ones in one long-term plan to break out, take control, and reward all the dead Oracles created on the way with utility?
Second thought: What kind of optimisation problems can we specify well enough for a formal proof checker to tell whether they've been satisficed? Are they the kind of problems where solving them can save the world?
It feels to me like the answer is 'yes'. A lot of core research that would allow e.g. for brain augmentation seem like they'd be in that category. But my inner John Wentworth sim is looking kind of sceptical.
It also gets to choose the timing of its answer, but I assume we are not being idiots about that and setting the output channel to always deliver results after a set time t, no more and no less.
I was ultimately disappointed by it - somewhat like Umineko, there is a severe divergence from reader expectations. Alexander Wales's goal for it, however well he achieved it by his own lights, was not one that is of interest to me as a reader, and it wound up being less than the sum of its parts for me. So I would have enjoyed it better if I had known from the start to read it for its parts (eg. revision mages or 'unicorns' or 'Doris Finch').
saidachmiz on My Interview With Cade Metz on His Reporting About Slate Star CodexI agree that this investigation was worthwhile and important.
But is it a case of “lying to interview subjects”? That is what we’re talking about [LW(p) · GW(p)], after all. Did Bly even interview anyone, in the course of her investigation?
Undercover investigative journalism has some interesting ethical conundrums of its own, but it’s not clear what it has to do with interviews, or lying to the subjects thereof…
cheops-steller on Inexistence of Rational Disagreement when Information can be Freely ExchangedI agree. My main point is not that we're rational yet we disagree. But even as we strive to be rational in the future, we can still disagree due to imperfections in language. Perfect communication doesn't entail complete revelation of brain states, as with perfect communication humans can still be selective as to what to communicate, so self interest wouldn't be a major problem.
zane on [SP] The Edge of Morality...why did someone promote this to a Frontpage post.
abramdemski on Modern Transformers are AGI, and Human-LevelAnd you end up with "well for most of human history, a human with those disabilities would be a net drain on their tribe. Sometimes they were abandoned to die as a consequence. "
And it implies something like "can perform robot manipulation and wash dishes, or the "make a cup of coffee in a strangers house" test. And reliably enough to be paid minimum wage or at least some money under the table to do a task like this.
The replace-human-labor test gets quite interesting and complex when we start to time-index it. Specifically, two time-indexes are needed: a 'baseline' time (when humans are doing all the relevant work) and a comparison time (where we check how much of the baseline economy has been automated).
Without looking anything up, I guess we could say that machines have already automated 90% of the economy, if we choose our baseline from somewhere before industrial farming equipment, and our comparison time somewhere after. But this is obviously not AGI.
A human who can do exactly what GPT4 can do is not economically viable in 2024, but might have been economically viable in 2020.
gerald-monroe on Modern Transformers are AGI, and Human-LevelYou also have a simple algorithm problem. Humans learn by replacing bad policy with good. Aka a baby replaces "policy that drops objects picked up" ->. "policy that usually results in object retention".
This is because at a mechanistic level the baby tries many times to pickup and retain objects, and a fixed amount of circuitry in their brain has connections that resulted in a drop down weighted and ones they resulted in retention reinforced.
This means that over time as the baby learns, the compute cost for motor manipulation remains constant. Technically O(1) though thats a bit of a confusing way to express it.
With in context window learning, you can imagine an LLM+ robot recording :
Robotic token string: <string of robotic policy tokens 1> : outcome, drop
Robotic token string: <string of robotic policy tokens 2> : outcome, retain
Robotic token string: <string of robotic policy tokens 2> : outcome, drop
And so on extending and consuming all of the machines context window, and every time the machine decides which tokens to use next it needs O(n log n) compute to consider all the tokens in the window. (Used to be n^2, this is a huge advance)
This does not scale. You will not get capable or dangerous AI this way. Obviously you need to compress that linear list of outcomes from different strategies to update the underlying network that generated them so it is more likely to output tokens that result in success.
Same for any other task you want the model to do. In context learning scales poorly. This also makes it safe....
abramdemski on Modern Transformers are AGI, and Human-LevelI don't think it is sensible to model humans as "just the equivalent of a sort of huge content window" because this is not a particularly good computational model of how human learning and memory work; but I do think that the technology behind the increasing context size of modern AIs contributes to them having a small but nonzero amount of the thing Steven is pointing at, due to the spontaneous emergence of learning algorithms. [LW · GW]