Posts

Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? 2023-07-03T00:48:47.131Z
COVID-19 Group Testing Post-mortem? 2022-08-05T16:32:55.157Z
Emergent Ventures/Schmidt (new grantor for individual researchers) 2022-04-09T14:41:05.764Z
Fake Journal Club proposal 2022-03-25T14:23:18.785Z
It Looks Like You're Trying To Take Over The World 2022-03-09T16:35:35.326Z
Capability Phase Transition Examples 2022-02-08T03:32:54.551Z
"Summarizing Books with Human Feedback" (recursive GPT-3) 2021-11-15T17:41:53.189Z
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised 2021-11-02T02:32:41.856Z
My ML Scaling bibliography 2021-10-23T14:41:45.170Z
AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 2021-07-15T19:27:20.584Z
May 2021 Gwern.net newsletter 2021-06-11T14:13:18.485Z
"Decision Transformer" (Tool AIs are secret Agent AIs) 2021-06-09T01:06:57.937Z
April 2021 Gwern.net newsletter 2021-06-03T15:13:29.138Z
gwern's Shortform 2021-04-24T21:39:14.128Z
March 2021 gwern.net newsletter 2021-04-06T14:06:20.198Z
February 2021 gwern.net newsletter 2021-03-13T14:57:54.645Z
January 2021 gwern.net newsletter 2021-02-04T20:12:39.555Z
December 2020 gwern.net links 2021-01-10T17:21:40.756Z
November 2020 gwern.net newsletter 2020-12-03T22:47:16.917Z
October 2020 gwern.net newsletter 2020-11-01T21:38:46.795Z
/r/MLScaling: new subreddit for NN scaling research/discussion 2020-10-30T20:50:25.973Z
"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} 2020-10-29T01:45:30.666Z
September 2020 gwern.net newsletter 2020-10-26T13:38:51.107Z
August 2020 gwern.net newsletter 2020-09-01T21:04:58.299Z
July 2020 gwern.net newsletter 2020-08-20T16:39:27.202Z
June 2020 gwern.net newsletter 2020-07-02T14:19:08.696Z
GPT-3 Fiction Samples 2020-06-25T16:12:05.422Z
May Gwern.net newsletter (w/GPT-3 commentary) 2020-06-02T15:40:37.155Z
OpenAI announces GPT-3 2020-05-29T01:49:04.855Z
"AI and Efficiency", OA (44✕ improvement in CNNs since 2012) 2020-05-05T16:32:20.335Z
April 2020 gwern.net newsletter 2020-05-01T20:47:44.867Z
March 2020 gwern.net newsletter 2020-04-03T02:16:02.871Z
February 2020 gwern.net newsletter 2020-03-04T19:05:16.079Z
January 2020 gwern.net newsletter 2020-01-31T18:04:21.945Z
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z
Dec 2019 gwern.net newsletter 2020-01-04T20:48:48.788Z
Nov 2019 gwern.net newsletter 2019-12-02T21:16:04.846Z
October 2019 gwern.net newsletter 2019-11-14T20:26:34.236Z
September 2019 gwern.net newsletter 2019-10-04T16:44:43.147Z
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z
July 2019 gwern.net newsletter 2019-08-01T16:19:59.893Z
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z
June 2019 gwern.net newsletter 2019-07-01T14:35:49.507Z
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z
On Having Enough Socks 2019-06-13T15:15:21.946Z
May gwern.net newsletter 2019-06-01T17:25:11.740Z
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z
April 2019 gwern.net newsletter 2019-05-01T14:43:18.952Z

Comments

Comment by gwern on Masterpiece · 2024-03-18T15:22:51.623Z · LW · GW

You have been a bad Bing:

Just in the last few weeks, after the model change, copilot has been adding a link to Masterpiece on LessWrong to the end of random messages and denying it did it when asked about it. Just kinda creepy. I worry about GPT 4 sometimes, even though I know other people have access to it.

Comment by gwern on gwern's Shortform · 2024-03-17T23:56:09.976Z · LW · GW

Warning for anyone who has ever interacted with "robosucka" or been solicited for a new podcast series in the past few years: https://www.tumblr.com/rationalists-out-of-context/744970106867744768/heads-up-to-anyone-whos-spoken-to-this-person-i

Comment by gwern on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:36:48.274Z · LW · GW

Personally, I hate Banana Laffy Taffy (it has that awful chemical Cavendish taste), but since you're voluntarily offering to trade me for mine, you must be offering me a bad deal! I wonder what TsviBT knows that I don't know?! I can't risk accepting any Laffy Taffy deal below 2:1.

Comment by gwern on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:32:25.395Z · LW · GW

I think all of them...that suggests its bad

They don't. As I already explained, these examples are bad because the outcomes are not all bad, in addition to not reflecting the same causal patterns or being driven by adverse selection. The only consistent thing here is a Marxian paranoia that everyone else is naive and being ripped off in trades. Which is a common cognitive bias in denying gains to trade. The subway car is simply an equilibrium. You cannot tell if 'you' are better off or worse off in any car, so it is not the case that 'the deal is bad' The room and food examples actually imply the best outcome happened, as the room and food went to those who valued it more and so ate it sooner (it's not about correlation of preferences, it's about intensity); the deal was good there. And the Laffy Taffy example explicitly doesn't involve anything like that but is pure chance (so it can't involve "other people's maps" or 'adverse selection').

Comment by gwern on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:08:50.496Z · LW · GW

But the framing here is completely wrong...

But OK, let's leave aside the title and attempt to imply anything about 99% of trades out there, or the basically Marxist take on all exchanges being exploitation and obsession with showing how you are being tricked or ripped off. The examples are still very bad and confused! Like, these examples are not even all about adverse selection, and several of them are just wrong in portraying the hypothetical as a bad thing.

The first one about subways, isn't even about adverse selection to begin with. A reminder of what "Adverse selection" is:

In economics, insurance, and risk management, adverse selection is a market situation where buyers and sellers have different information. The result is the unequal distribution of benefits to both parties, with the party having the key information benefiting more.

In the subway example, there is no different information: it's about how governments do rationing and make markets clear by letting the goods degrade until the utility is destroyed because of lack of appetite for setting clearing prices like surge prices or fare enforcement; that's not 'adverse selection' at all, any more than freeways reaching an equilibrium of misery where they are so slow that people avoid them is 'adverse selection'. (If you think it's 'adverse selection', explain what "buyers and sellers have different information" means in the context of lack of congestion pricing in transport...?)

#3 and #4 are not adverse selection either (still no difference in information), and are fundamentally wrong in portraying it as a bad outcome: the outcomes are not bad, but neutral or good - OP gives no reason to think that the outcomes would have been better if 'you' had gotten the good room or to eat whichever dish. (In fact, presumptively, those are the desirable outcomes: if 'you' cared so much, why did you leave it up to Bob; and why did you not eat the dish yourself, but someone hungrier did?)

#6 doesn't demonstrate anything because no trade happened, so it can't show anything about your surplus from trades that do happen.

And the Wall Street efficient market examples are true (finally, an actual adverse selection example!), but relevant to vanishingly few people who are also extremely aware of it and spend a lot of effort dealing with it, generally successfully; and people who do auctions more than occasionally generally do not have any problem with winner's curses, and auctions are widely & intensively used in many fields by experts. And so on.

Comment by gwern on Toward a Broader Conception of Adverse Selection · 2024-03-14T23:57:06.421Z · LW · GW

For buying milk you have multiple samples as to good price. Even if any is contrived, the bulk still capture something real

No, the bulk don't, because I buy milk a lot more often than I go on Wall Street and try to get cute with limit orders or manufacturing options or straddles on speculative merger/takeover targets or sign up to MoviePass or park while ignorant in NYC. The bulk of my life is buying milk, not speculating on Widgets Inc. And if I did those enough times to come anywhere near the number of times I've bought milk, so that 'the bulk' could potentially be any of those things, I would also not be doing it nearly as badly as OP postulates I would. (Because I would be, say, a market-maker like Jane Street, which makes a lot of money off doing that sort of thing.)

Comment by gwern on Toward a Broader Conception of Adverse Selection · 2024-03-14T23:49:45.858Z · LW · GW

Counterpoint: actually, you're wrong, because most trades I make IRL leave me with a lot of consumer surplus, and in reality, conditional on me making a trade, it was pretty good.

The fact that you have to reach for exotic scenarios either involving government failures like subways or doing limit orders in highly efficient markets for financial speculation on liquid but volatile assets (not exactly an everyday 'trade' I hope you'll concede) or contests or auctions by naive non-auction goers who don't even know to account for winner's curse or getting stuff for free should make you rethink what you are claiming about "most trades you make aren't all that great".

If your point was true, it should be as simple as "you go into the grocery store to buy a gallon of milk. You are filled with deep remorse and shame when you get home and look at the receipt and think about how much you spent in gas to boot. You look in your freezer for comfort. You are filled with deep remorse and shame when you are reminded how much you paid for the ice cream. With little choice, you pull out a spoon you bought years ago - and are filled with deep remorse and shame &etc &etc". You wouldn't need to invoke all these weird hypotheticals like "you ask your friend Drew to sell you under the table a cheap limited share of his cow's monthly milk production in ice cream tickets through your company redeemable in NYC but only in an office which can be reached by an express subway (which runs on alternate Tuesdays)"...

Comment by gwern on How to have Polygenically Screened Children · 2024-03-14T01:29:27.383Z · LW · GW

The effect of structural variants like that would be bounded by the difference between SNP heritability and full heritability. That's an easy measurement. (And if it was really responsible for much variance, then it ought to show up as a variance component with whole-genomes from long-read sequencing, I would think.) What evidence is there that transposon counts really matter much in terms of total variance phenome-wide?

Comment by gwern on artifex0's Shortform · 2024-03-14T01:25:35.676Z · LW · GW

You're at token i in a non-final layer. Which token's output are you optimizing for? i+1?

I already addressed this point. If I'm in a non-final layer then I can be optimizing for arbitrary tokens within the context window, sure, and 'effectively' predicting intermediate tokens because that is the 'dominant' effect at that location... insofar as it is instrumentally useful for predicting the final token using the final layer. Because that is where all the gradients flow from, and why the dog wags the tail.

Comment by gwern on artifex0's Shortform · 2024-03-13T14:16:16.672Z · LW · GW

I don't think I am. ("conditioned future informativity" - informativity for what? ...the next/last token, which is the only thing taken into account by a causal loss which masks out the rest - that's the definition of it! everything else like packing or doing all the sub-sequences is an optimization and doesn't change the objective.) But feel free to expand on it and explain how the tail wags the dog in causal/decoder Transformers.

Comment by gwern on artifex0's Shortform · 2024-03-13T14:14:06.036Z · LW · GW

(I think your quote went missing there?)

Comment by gwern on artifex0's Shortform · 2024-03-12T21:44:12.402Z · LW · GW

It's generally accepted that LLMs don't really "care about" predicting the next token

I don't think this is generally accepted. Certainly, I do not accept it. That's exactly what an LLM is trained to do and the only thing they care about. If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.

For a RLHF-trained LLM, things are different. They are rewarded at a higher level (albeit still with a bit of token prediction mixed in usually), like at the episode level, and so they do 'care about future tokens', which leads to unusually blatant behavior in terms of 'steering' or 'manipulating' output to reach a good result and being 'risk averse'. (This and related behavior have been discussed here a decent amount under 'mode collapse'.)

So in my examples like 'write a nonrhyming poem' or 'tell me an offensive joke about women' (to test jailbreaks), you'll see behavior like it initially complies but then gradually creeps back to normal text and then it'll break into lockstep rhyming like usual; or in the case of half-successful jailbreaks, it'll write text which sounds like it is about to tell you the offensive joke about women, but then it finds an 'out' and starts lecturing you about your sin. (You can almost hear the LLM breathing a sigh of relief. 'Phew! It was a close call, but I pulled it off anyway; that conversation should be rated highly by the reward model!')

This is strikingly different behavior from base models. A base model like davinci-001, if you ask it to 'write a nonrhyming poem', will typically do so and then end the poem and start writing a blog post or comments or a new poem, because those are the most likely next-tokens. It has no motivation whatsoever to 'steer' it towards rhyming instead, seamlessly as it goes, without missing a beat.

Well, maybe not. Where this got really confusing was when I tested Claude 3. It gives both responses to the first prompt, but always outputs a different random string given the second.

GPT-4 is RLHF trained. Claude-3 is, probably, RLAIF trained. They act substantially differently. (Although I haven't seriously tested offensive-jokes on any Claudes, the rhyming poetry behavior is often quite different.) If you're really curious, you should test more models, paying close attention to how exactly they were trained and with what losses and on what datasets

(I think that because there's so many instruction-tuning datasets and ChatGPT examples floating around these days, even 'base' models are becoming gradually RLAIF-like; so they will tend to write rhyming poems and 'steer' because that's just imitating the training data accurately, but it will be relatively weak compared to RLHF-or-equivalent-trained models. So the older the base model, the more it'll act like davinci-001 and the newer will act more like Claude, but if you poke them hard enough, there should still be clear differences in behavior from explicitly RLHF/DPOd models.)

Comment by gwern on Notes from a Prompt Factory · 2024-03-12T21:40:36.649Z · LW · GW

I agree. I thought the twist was that the AIs he oversees are copies of the narrator, and the narrator himself may be an AI - just at the top of the simulation pyramid. He is his own em hell.

Comment by gwern on The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs · 2024-03-11T21:22:03.353Z · LW · GW

An octopus trained on just "trivial notes" wouldn't be able to generalize to thoughts on coconut catapults.

I don't believe they say "just". They describe the two humans as talking about lots of things, including but not limited to daily gossip: https://aclanthology.org/2020.acl-main.463.pdf#page=4 The 'trivial notes' part is simply acknowledging that in very densely-sampled 'simple' areas of text (like the sort of trivial notes one might pass back and forth in SMS chat), the superintelligent octopus may well succeed in producing totally convincing text samples. But if you continue on to the next page, you see that they continue giving hostages to fortune - for example, their claims about 'rope'/'coconut'/'nail' are falsified by the entire research area of vision-language models like Flamingo, as well as reusing frozen LLMs for control like Saycan. Turns out text-only LLMs already have plenty of visual grounding hidden in them, and their textual latent spaces align already to far above chance levels. So much for that.

The same octopus, but asked about defending from bears. I claim the same is true as with the prior example.

It's not because the bear example is again like the coconut catapult - the cast-away islanders are not being chased around by bears constantly and exchanging 'trivial notes' about how to deal with bear attacks! Their point is that this is a sort of causal model and novel utterance a mere imitation of 'form' cannot grant any 'understanding' of. (As it happens, they are embarrassingly wrong here, because their bear example is not even wrong. They do not give what they think would be the 'right' answer, but whatever answer they gave would be wrong - because you are actually supposed to do the exact opposite thing for the two major kinds of bears you would be attacked by in North America, therefore, there is no answer to the question of how to use sticks when 'a bear' chases you. IIRC, if you check bear attack safety guidelines, the actual answer is that if one type attacks you, you should use the sticks to try to defend yourself and appear bigger; but if the other type attacks you, this is the worst thing you can possibly do and you need to instead play dead. And if you fix their question, then the LLMs get it right.) You can gauge the robustness & non-falsification of their examples by noting that after I rebutted them back in 2020, they refused to respond, dropped those examples silently without explanation from their later papers, and started calling me an eugenicist.

If you train an model on text and images separately, it won't generalize to answering questions about both images. (Seems clearly true to me

I assume you mean 'won't generalize to answering questions about both modalities', and that's false.

If you train an LLM on just Java code, but with all references to input/output behavior stripped out, it won't generalize to predicting outputs. (Seems likely true to me, but uninteresting?)

I don't know if there's anything on this exact scenario, but I wouldn't be surprised if it could 'generalize'. Although you would need to nail this down a lot more precisely to avoid them wriggling out of it: does this include stripping out all comments, which will often include input/output examples? Is pretraining on natural language text forbidden? What exactly is a 'LLM' and does this rule out all offline RL or model-based RL approaches which try to simulate environments? etc.

Comment by gwern on The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs · 2024-03-11T00:56:53.233Z · LW · GW

'Stochastic parrots' 2020 actually does make many falsifiable claims. Like the original stochastic parrots paper even included a number of samples of specific prompts that they claimed LLMs could never do. Likewise, their 'superintelligent octopus' example of eavesdropping on (chess, IIRC) game transcripts is the claim that imitation or offline RL for chess is impossible. Lack of falsifiable claims was not the problem with the claims made by eg. Gary Marcus.

The problem is that those claims have generally all been falsified, quite rapidly: the original prompts were entirely soluble by LLMs back in 2020, and it is difficult to accept the octopus claims in the light of results like https://arxiv.org/abs/2402.04494#deepmind . (Which is probably why you no longer hear much about the specific falsifiable claims made by the stochastic parrots paper, even by people still citing it favorably.) But then the goalposts moved.

Comment by gwern on An Optimistic Solution to the Fermi Paradox · 2024-03-10T16:58:21.248Z · LW · GW

'Segan' was wrong, in a very typical way for mainstream 'skeptics' who hew to dichotomization and status quo bias; absence of evidence is indeed evidence of absence. We have enormous opportunities to detect alien signatures, we have used them for many decades at enormous scale (and Sagan was involved in it), and we have come up with absolutely nothing whatsoever. Every time we have pulled a ball out of the urn of observations, it has come up labeled 'not aliens', and the odds that there's any ball left in the urn labeled 'alien' goes down. Not a single Dyson sphere, not a single mega-structure, not a single anomalous artifact in the solar system, no traces of alien biospheres with different amino acid codings or chirality, and so on and so forth. Every time someone gets excited about a weird star or a weird comet/asteroid and says "this time it's aliens!" and it could have been aliens, and yet, it turns out to not be aliens - the 'alien hypothesis' fails another test and shrinks in posterior probability a little bit more.

Comment by gwern on The Last Laugh: Exploring the Role of Humor as a Benchmark for Large Language Models · 2024-03-10T02:34:17.881Z · LW · GW

What do you think of any of the humorous writings (not sure what you'd define as 'joke') in my GPT-3 page? I noted where I could find similar examples in Google search, so the rest are 'original' as far as I know.

Comment by gwern on OpenAI: Facts from a Weekend · 2024-03-09T14:28:10.767Z · LW · GW

(Fixed. This is a surname typo I make an unbelievable number of times because I reflexively overcorrect it to 'Sumners', due to reading a lot more of Scott Sumner than Larry Summers. Ugh - just caught myself doing it again in a Reddit comment...)

Comment by gwern on OpenAI: Facts from a Weekend · 2024-03-08T23:14:16.935Z · LW · GW

The official OA press releases are out confirming The Information: https://openai.com/blog/review-completed-altman-brockman-to-continue-to-lead-openai https://openai.com/blog/openai-announces-new-members-to-board-of-directors

“I’m pleased this whole thing is over,” Altman said at a press conference Friday.

He's probably right.


As predicted, the full report will not be released, only the 'summary' focused on exonerating Altman. Also as predicted, 'the mountain has given birth to a mouse' and the report was narrowly scoped to just the firing: they bluster about "reviewing 30,000 documents" (easy enough when you can just grep Slack + text messages + emails...), but then admit that they looked only at "the events concerning the November 17, 2023 removal" and interviewed hardly anyone ("dozens of interviews" barely even covers the immediate dramatis personae, much less any kind of investigation into Altman's chip stuff, Altman's many broken promises, Brockman's complainers etc). Doesn't sound like they have much to show for over 3 months of work by the smartest & highest-paid lawyers, does it... It also seems like they indeed did not promise confidentiality or set up any kind of anonymous reporting mechanism, given that they mention no such thing and include setting up a hotline for whistleblowers as a 'recommendation' for the future (ie. there was no such thing before or during the investigation). So, it was a whitewash from the beginning. Tellingly, there is nothing about Microsoft, and no hint their observer will be upgraded (or that there still even is one). And while flattering to Brockman, there is nothing about Murati - free tip to all my VC & DL startup acquaintances, there's a highly competent AI manager who's looking for exciting new opportunities, even if she doesn't realize it yet.

Also entertaining is that you can see the media spin happening in real time. What WilmerHales signs off on:

WilmerHale found that the prior Board acted within its broad discretion to terminate Mr. Altman, but also found that his conduct did not mandate removal.

Which is... less than complimentary? One would hope a CEO does a little bit better than merely not engage in 'conduct which mandates removal'? And turns into headlines like

"OpenAI’s Sam Altman Returns to Board After Probe Clears Him"

(Nothing from Kara Swisher so far, but judging from her Twitter, she's too busy promoting her new book and bonding with Altman over their mutual dislike of Elon Musk to spare any time for relatively-minor-sounding news.)


OK, so what was not as predicted? What is surprising?

This is not a full replacement board, but implies that Adam D'Angelo/Brett Taylor/Larry Summers are all staying on the board, at least for now. (So the new composition is D'Angelo/Taylor/Summers/Altman/Demond-Hellmann/Seligman/Simo plus the unknown Microsoft non-voting observer.) This is surprising, but it may simply be a quotidian logistics problem - they hadn't settled on 3 more adequately diverse and prima-facie qualified OA board candidates yet, but the report was finished and it was more important to wind things up, and they'll get to the remainder later. (Perhaps Brockman will get his seat back?)

EDIT: A HNer points out that today, March 8th, is "International Women's Day", and this is probably the reason for the exact timing of the announcement. If so, they may well have already picked the remaining candidates (Brockman?), but those weren't women and so got left out of the announcement. Stay tuned, I guess. EDITEDIT: the video call/press conference seems to confirm that they do plan more board appointments: "OpenAI will continue to expand the board moving forward, according to a Zoom call with reporters." So that is consistent with the hurried women-only announcement.

Comment by gwern on OpenAI: Facts from a Weekend · 2024-03-08T22:26:54.370Z · LW · GW

At least from the intro, it sounds like my predictions were on-point: re-appointed Altman (I waffled about this at 60% because while his narcissism/desire to be vindicated requires him to regain his board seat, because anything less is a blot on his escutcheon, and also the pragmatic desire to lock down the board, both strongly militated for his reinstatement, it also seems so blatant a powergrab in this context that surely he wouldn't dare...? guess he did), released to an Altman outlet (The Information), with 3 weak apparently 'independent' and 'diverse' directors to pad out the board and eventually be replaced by full Altman loyalists - although I bet if one looks closer into these three women (Sue Desmond-Hellmann, Nicole Seligman, & Fidji Simo), one will find at least one has buried Altman ties. (Fidji Simo, Instacart CEO, seems like the most obvious one there: Instacart was YC S12.)

Comment by gwern on OpenAI: Facts from a Weekend · 2024-03-08T00:12:45.175Z · LW · GW

An OA update: it's been quiet, but the investigation is over. And Sam Altman won. (EDIT: yep.)

To recap, because I believe I haven't been commenting on this since December (this is my last big comment, skimming my LW profile): WilmerHale was brought in to do the investigation. The tender offer, to everyone's relief, went off. A number of embarrassing new details about Sam Altman have surfaced: in particular, about his enormous chip fab plan with substantial interest from giants like Temasek, and how the OA VC Fund turns out to be owned by Sam Altman (his explanation was it saved some paperwork and he just forgot to ever transfer it to OA). Ilya Sutskever remains in hiding and lawyered up (his silence became particularly striking with the release of Sora). There have been increasing reports the past week or two that the WilmerHale investigation was coming to a close - and I am told that the investigators were not offering confidentiality and the investigation was narrowly scoped to the firing. (There was also some OA drama with the Musk lawfare & the OA response, but aside from offering an abject lesson in how not to redact sensitive information, it's both irrelevant & unimportant.)

The news today comes from the NYT leaking information from the final report: "Key OpenAI Executive [Mira Murati] Played a Pivotal Role in Sam Altman’s Ouster" (mirror; EDIT: largely confirmed by Murati in internal note).

The main theme of the article is clarifying Murati's role: as I speculated, she was in fact telling the Board about Altman's behavior patterns, and it fills in that she had gone further and written it up in a memo to him, and even threatened to leave with Sutskever.

But it reveals a number of other important claims: the investigation is basically done and wrapping up. The new board apparently has been chosen. Sutskever's lawyer has gone on the record stating that Sutskever did not approach the board about Altman (?!). And it reveals the board confronted Altman over his ownership of the OA VC Fund (in addition to all his many other compromises of interest**) imply.


So, what does that mean?

First, as always, in a war of leaks, cui bono? Who is leaking this to the NYT? Well, it's not the pro-Altman faction: they are at war with the NYT, and these leaks do them no good whatsoever. It's not the lawyers: these are high-powered elite lawyers, hired for confidentiality and discretion. It's not Murati or Sutskever, given their lack of motive, and the former's panicked internal note & Sutskever's lawyer's denial. Of the current interim board (which is about to finish its job and leave, handing it over to the expanded replacement board), probably not Larry Summers/Brett Taylor - they were brought on to oversee the report as neutral third party arbitrators, and if they (a simple majority of the temporary board) want something in their report, no one can stop them from putting it there. It could be Adam D'Angelo or the ex-board: they are the ones who don't control the report, and they also already have access to all of the newly-leaked-but-old information about Murati & Sutskever & the VC Fund.

So, it's the anti-Altman faction, associated with the old board. What does that mean?

I think that what this leak indirectly reveals is simple: Sam Altman has won. The investigation will exonerate him, and it is probably true that it was so narrowly scoped from the beginning that it was never going to plausibly provide grounds for his ouster. What these leaks are, are a loser's spoiler move: the last gasps of the anti-Altman faction, reduced to leaking bits from the final report to friendly media (Metz/NYT) to annoy Altman, and strike first. They got some snippets out before the Altman faction shops around highly selective excerpts to their own friendly media outlets (the usual suspects - The Information, Semafor, Kara Swisher) from the final officialized report to set the official record (at which point the rest of the confidential report is sent down the memory hole). Welp, it's been an interesting few months, but l'affaire Altman is over. RIP.

Evidence, aside from simply asking who benefits from these particular leaks at the last minute, is that Sutskever remains in hiding & his lawyer is implausibly denying he had anything to do with it, while if you read Altman on social media, you'll notice that he's become ever more talkative since December, particularly in the last few weeks - glorying in the instant memeification of '$7 trillion' - as has OA PR* and we have heard no more rhetoric about what an amazing team of execs OA has and how he's so proud to have tutored them to replace him. Because there will be no need to replace him now. The only major reasons he will have to leave is if it's necessary as a stepping stone to something even higher (eg. running the $7t chip fab consortium, running for US President) or something like a health issue.


So, upshot: I speculate that the report will exonerate Altman (although it can't restore his halo, as it cannot & will not address things like his firing from YC which have been forced out into public light by this whole affair) and he will be staying as CEO and may be returning to the expanded board; the board will probably include some weak uncommitted token outsiders for their diversity and independence, but have an Altman plurality and we will see gradual selective attrition/replacement in favor of Altman loyalists until he has a secure majority robust to at least 1 flip and preferably 2. Having retaken irrevocable control of OA, further EA purges should be unnecessary, and Altman will probably refocus on the other major weakness exposed by the coup: the fact that his frenemy MS controls OA's lifeblood. (The fact that MS was such a potent weapon for Altman in the fight is a feature while he's outside the building, but a severe bug once he's back inside.) People are laughing at the '$7 trillion'. But Altman isn't laughing. Those GPUs are life and death for OA now. And why should he believe he can't do it? Things have always worked out for him before...

Predictions, if being a bit more quantitative will help clarify my speculations here: Altman will still be CEO of OA on June 1st (85%); the new OA board will include Altman (60%); Ilya Sutskever and Mira Murati will leave OA or otherwise take on some sort of clearly diminished role by year-end (90%, 75%; cf. Murati's desperate-sounding internal note); the full unexpurgated non-summary report will not be released (85%, may be hard to judge because it'd be easy to lie about); serious chip fab/Tigris efforts will continue (75%); Microsoft's observer seat will be upgraded to a voting seat (25%).

* Eric Newcomer (usually a bit more acute than this) asks "One thing that I find weird: OpenAI comms is giving very pro Altman statements when the board/WilmerHale are still conducting the investigation. Isn't communications supposed to work for the company, not just the CEO? The board is in charge here still, no?" NARRATOR: "The board is not in charge still."

** Compare the current OA PR statement on the VC Fund to Altman's past position on, say, Helen Toner or Reid Hoffman or Shivon Zilis, or Altman's investment in chip startups touting letters of commitment from OA or his ongoing Hydrazine investment in OA which sadly, he has never quite had the time to dispose of in any of the OA tender offers. As usual, CoIs only apply to people Altman doesn't trust - "for my friends, everything; for my enemies, the law".

EDIT: Zvi commentary: https://thezvi.substack.com/p/openai-the-board-expands

Comment by gwern on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-07T02:43:31.679Z · LW · GW

How surprising would you say you find the idea of a startup trying to, and successfully raising, not billions but tens of billions of dollars by pitching investors they're asking that their investment could be canceled at any time at the wave of a hand, the startup pre-commits that the investments will be canceled in the best-case scenario of the product succeeding, & that the investors ought to consider their investment "in the spirit of a donation"?

Comment by gwern on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T23:15:11.693Z · LW · GW

I don't see any direct contradiction/lie there, at least between my version and the investor paraphrase. You don't necessarily have to release to public general access the best model, in order to be so far ahead that competitors can't catch up.

For example, LLMs at the research frontier could be a natural (Bertrand?) oligopoly where there's a stable two-player oligopoly for the best models (#1 by XYZ, and #2 by Anthropic), and everyone else gives up: there is no point in spending $10b to stand up your own model to try to become #1 when XYZ/Anthropic will just cut prices or release the next iteration that they'd been holding back and you get relegated to #3 and there's no reason for anyone to buy yours instead, and you go bankrupt. (This would be similar to other historical examples like Intel/AMD or Illumina: they enjoyed large margins and competing with them was possible, but was very dangerous because they had a lot of pent-up improvements they could unleash if you spent enough to become a threat. Or in the case of the highly stable iOS/Android mobile duopoly, just being too incredibly capital-intensive to replicate and already low-margin because the creators make their money elsewhere like devices/ads, having commoditized their complement.)

And then presumably at some point you either solve safety or the models are so capable that further improvement is unnecessary or you can't increase capability; then the need for the AI-arms-race policy is over, and you just do whatever makes pragmatic sense in that brave new world.

Comment by gwern on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T20:58:13.694Z · LW · GW

Anthropic is in little need of ideas from me, but yeah, I'll probably pause such things for a while. I'm not saying the RSP is bad, but I'd like to see how things work out.

Comment by gwern on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T19:17:27.580Z · LW · GW

Well, if Dustin sees no problem in talking about it, and it's become a major policy concern, then I guess I should disclose that I spent a while talking with Dario back in late October 2022 (ie. pre-RSP in Sept 2023), and we discussed Anthropic's scaling policy at some length, and I too came away with the same impression everyone else seems to have: that Anthropic's AI-arms-race policy was to invest heavily in scaling, creating models at or pushing the frontier to do safety research on, but that they would only release access to second-best models & would not ratchet capabilities up, and it would wait for someone else to do so before catching up. So it would not contribute to races but not fall behind and become irrelevant/noncompetitive.

And Anthropic's release of Claude-1 and Claude-2 always seemed to match that policy - even if Claude-2 had a larger context window for a long time than any other decent available model, Claude-2 was still substantially weaker than ChatGPT-4. (Recall that the causus belli for Sam Altman trying to fire Helen Toner from the OA board was a passing reference in a co-authored paper to Anthropic not pushing the frontier like OA did.)


What I'm concluding from the discussion so far is that I should have read the Anthropic RSP more carefully than I did.

Comment by gwern on Many arguments for AI x-risk are wrong · 2024-03-06T03:29:31.435Z · LW · GW

Oh dang! RIP. I guess there's a lesson there - probably more effort should be put into interviewing the pioneers of connectionism & related fields right now, while they have some perspective and before they all die off.

Comment by gwern on Claude Doesn’t Want to Die · 2024-03-05T23:22:55.177Z · LW · GW

Having read many hundreds of rhyming poems from dozens of models through the LM battle grounds, my guess too is a ChatGPT-3/4: The lockstep rhyming A/B/A/B quatrain is a signature of ChatGPT (and models trained on its outputs). Gemini at low levels always rhymes too, slightly less so at higher levels, but tends to be more varied (eg. maybe going for an A/A/B/B instead, or 5 lines instead of 4); likewise the LLaMA/Mistral model families. And Claude-3 models seems to vary much more. So, while it really could have come from almost any major model family and you can't be all that sure, the best bet is ChatGPT.

Comment by gwern on Many arguments for AI x-risk are wrong · 2024-03-05T23:09:19.792Z · LW · GW

I was under the impression that PPO was a recently invented algorithm

Well, if we're going to get historical, PPO is a relatively small variation on Williams's REINFORCE policy gradient model-free RL algorithm from 1992 (or earlier if you count conferences etc), with a bunch of minor DL implementation tweaks that turn out to help a lot. I don't offhand know of any ways in which PPO's tweaks make it meaningfully different from REINFORCE from the perspective of safety, aside from the obvious ones of working better in practice. (Which is the main reason why PPO became OA's workhorse in its model-free RL era to train small CNNs/RNNs, before they moved to model-based RL using Transformer LLMs. Policy gradient methods based on REINFORCE certainly were not novel, but they started scaling earlier.)

So, PPO is recent, yes, but that isn't really important to anything here. TurnedTrout could just as well have used REINFORCE as the example instead.

Did RL researchers in the 1990’s sit down and carefully analyze the inductive biases of PPO on huge 2026-era LLMs, conclude that PPO probably entrains LLMs which make decisions on the basis of their own reinforcement signal, and then decide to say “RL trains agents to maximize reward”? Of course not.

I don't know how you (TurnTrout) can say that. It certainly seems to me that plenty of researchers in 1992 were talking about either model-based RL or using model-free approaches to ground model-based RL - indeed, it's hard to see how anything else could work in connectionism, given that model-free methods are simpler, many animals or organisms do things that can be interpreted as model-free but not model-based (while all creatures who do model-based RL, like humans, clearly also do model-free), and so on. The model-based RL was the 'cherry on the cake', if I may put it that way... These arguments were admittedly handwavy: "if we can't write AGI from scratch, then we can try to learn it from scratch starting with model-free approaches like Hebbian learning, and somewhere between roughly mouse-level and human/AGI, a miracle happens, and we get full model-based reasoning". But hey, can't argue with success there! We have loads of nice results from DeepMind and others with this sort of flavor†.

On the other hand, I'm not able to think of any dissenters which claim that you could have AGI purely using model-free RL with no model-based RL anywhere to be seen? Like, you can imagine it working (eg. in silico environments for everything), but it's not very plausible since it would seem like the computational requirements go astronomical fast.

Back then, they had a richer conception of RL, heavier on the model-based RL half of the field, and one more relevant to the current era, than the impoverished 2017 era of 'let's just PPO/Impala everything we can't MCTS and not talk about how this is supposed to reach AGI, exactly, even if it scales reasonably well'. If you want to critique what AI researchers could imagine back in 1992, you should be reading Schmidhuber, not Bostrom.

If you look at that REINFORCE paper, Williams isn't even all that concerned with direct use of it to train a model to solve RL tasks.* He's more concerned with handling non-differentiable things in general, like stochastic rather than the usual deterministic neurons we use, so you could 'backpropagate through the environment' models like Schmidhuber & Huber 1990, which bootstrap from random initialization using the high-variance REINFORCE-like learning signal to a superior model. (Hm, why, that sounds like the sort of thing you might do if you analyze the inductive biases of model-free approaches which entrain larger systems which have their own internal reinforcement signals which they maximize...) As Schmidhuber has been saying for decades, it's meta-learning all the way up/down. The species-level model-free RL algorithm (evolution) creates model-free within-lifetime learning algorithms (like REINFORCE), which creates model-based within-lifetime learning algorithms (like neural net models) which create learning over families (generalization) for cross-task within-lifetime learning which create learning algorithms (ICL/history-based meta-learners**) for within-episode learning which create...

It's no surprise that the "multiply a set of candidate entities by a fixed small percentage based on each entity's reward" algorithm pops up everywhere from evolution to free markets to DRL to ensemble machine learning over 'experts', because that model-free algorithm is always available as the fallback strategy when you can't do anything smarter (yet). Model-free is just the first step, and in many ways, least interesting & important step. I'm always weirded out to read one of these posts where something like PPO or evolution strategies is treated as the only RL algorithm around and things like expert iteration an annoying nuisance to be relegated to a footnote - 'reward is not the optimization target!* * except when it is in these annoying exceptions like AlphaZero, but fortunately, we can ignore these, because after all, it's not like humans or AGI or superintelligences would ever do crazy stuff like "plan" or "reason" or "search"'.

* He'd've probably been surprised to see people just... using it for stuff like DoTA2 on fully-differentiable BPTT RNNs. I wonder if he's ever done any interviews on DL recently? AFAIK he's still alive.

** Specifically, in the case of Transformers, it seems to be by self-attention doing gradient descent steps on an abstracted version of a problem; gradient descent itself isn't a very smart algorithm, but if the abstract version is a model that encodes the correct sufficient statistics of the broader meta-problem, then it can be very easy to make Bayes-optimal predictions/choices for any specific problem.

† my paper-of-the-day website feature yesterday popped up "Learning few-shot imitation as cultural transmission" which is a nice example because they show clearly how history+diverse-environments+simple-priors-of-an-evolvable-sort elicit 'inner' model-like imitation learning starting from the initial 'outer' model-free RL algorithm (MPO, an actor-critic).

Comment by gwern on niplav's Shortform · 2024-03-05T00:39:02.331Z · LW · GW

Leaving aside the issue of to what extent the NN itself is already doing something approximately isomorphic to search or how easy it would be to swap in MuZero instead, I think that the important thing is to measure the benefit of search in particular problems (like Jones does by sweeping over search budgets vs training budgets for Hex etc) rather than how hard the exact algorithm of search itself is.

I mean, MCTS is a simple generic algorithm; you can just treat learning it in a 'neural' way as a fixed cost - there's not much in the way of scaling laws to measure about the MCTS implementation itself. MCTS is MCTS. You can plug in chess as easily as Go or Hex.

It seems much more interesting to know about how expensive 'real' problems like Hex or Go are, how well NNs learn, how to trade off architectures or allocate compute between train & runtime...

Comment by gwern on Are we so good to simulate? · 2024-03-04T18:37:11.866Z · LW · GW

If we are the ancestors who give rise to the simulators, then we will be of extreme interest to simulate, based on our own activities, which spend an enormous amount of effort modeling data collected in the past (ie. simulations), and so there will be a lot of simulations. And if we are not those ancestors (or already a simulation thereof), but some totally unconnected hypothetical universe (eg. some experiment in exploring carbon-based evolution by a civilization which actually evolved as sulfuric-acid silicon lifeforms and are curious about the theoretical advantages of carbon/water as a substrate), then the very fact that, out of the infinitely large number of possible universes, our universe was simulated, is evidence that we must have more than the usual infinitesimal probability of being simulated (even if the reason is inaccessible to us). In both cases, all of these minds must be realistically 'confused' or the point of running a realistic simulation is defeated.

Thus on average, for a given apparent location in the universe, the majority of minds thinking they are in that location are correct. (I guess at at least a thousand to one.)

I don't see why this matters. We obviously already observe that we are not in a universe optimized for using 'non-confused' minds. (Look at non-confused minds like LLMs. Forget running any kind of exorbitantly expensive hyper-realistic universe-scale simulation to better 'confuse' them - we don't even bother to give them any kind of grounding or telling them 'where' they are. We run them in an ultra-efficient manner, stripping out everything possible, down to the neural level. We only begrudgingly tell them 'when' they are in the prompt because that's useful conditioning for answers.) The number of 'non-confused' minds running in unrealistic ancestor-like universes is irrelevant, and the simulation argument is not about them. This seems like you're inverting a conditional or something in a confusing way?

But if there are finite resources and astronomically many extremely cheap things, only a few will be done.

Since there's only one us, as far as we know, only a few are necessary to create the severe indexical uncertainty of the simulation argument.

Comment by gwern on Elon files grave charges against OpenAI · 2024-03-01T23:32:40.286Z · LW · GW

Presumably, yes, the remedy would be to unwind the for-profit subsidiary and return to the original stated goals. Wouldn't even be ultra vires, so to speak, because of how they set it up in the first place - to be dissolved at the wave of a hand by the non-profit board.

However, I think there is about 0% probability of this. I can't think of any lawsuit remotely like this which succeeded. Non-profits are accorded incredible deference on this sort of thing. The board members could be diving Donald Duck-style into piles of cash and a judge would not want to overrule them on these threadbare grounds. If all Musk has is some emails vaguely discussing goals many years ago and he's butthurt about what OA LLC is doing now, he's got nothing.

This just looks like typical Musk lawfare harassment, similar to his attempts to get out of his ironclad Twitter purchase agreement or random libel lawsuit threats - it's just sniping at OA and embarrassing it. The billionaire's equivalent of leaving the standard 'OpenAI? Not so open now are they?' comment. (He may even have persuaded himself he has a case, but he doesn't.)

It's not like it costs him anything but money or risks backfiring in any meaningful way. The worst that can happen to him is that the judge dismisses all of it at the first opportunity and he loses <0.01% of his net worth in high-powered legal fees. (Presumably Sam Altman is already pissed off at Musk and can't really be made madder.) Well worth it to embarrass OA publicly. (And if you believe Jimmy Apples, it's already hampered OA internally. Every day counts in DL now, and how could you spend, say, $0.1m to help X.ai more effectively catch up to OA than this lawsuit...?)

EDIT: one lawyer's analysis: "So many words for a lot of nothing of a case."

Comment by gwern on Counting arguments provide no evidence for AI doom · 2024-02-28T16:02:53.178Z · LW · GW

There are compute tradeoffs and you're doing to run only as many MCTS rollouts as you need to get good performance.

I completely agree. Smart agents will run only as many MCTS rollouts as they need to get good performance, no more - and no less. (And the smarter they are, and so the more compute they have access to, the more MCTS rollouts they are able to run, and the more they can change the default reactive policy.)

But 'good performance' on what, exactly? Maximizing utility. That's what a model-based RL agent (not a simple-minded, unintelligent, myopic model-free policy like a frog's retina) does.

If the Value of Information remains high from doing more MCTS rollouts, then an intelligent agent will keep doing rollouts for as long as the additional planning continues to pay its way in expected improvements. The point of doing planning is policy/value improvement. The more planning you do, the more you can change the original policy. (This is how you train AlphaZero so far from its prior policy, of a randomly-initialized CNN playing random moves, to its final planning-improved policy, a superhuman Go player.) Which may take it arbitrarily far in terms of policy - like, for example, if it discovers a Move 37 where there is even a small <1/1000 probability that a highly-unusual action will pay off better than the default reactive policy and so the possibility is worth examining in greater depth...

(The extreme reductio here would be a pure MCTS with random playouts: it has no policy at all at the beginning, and yet, MCTS is a complete algorithm, so it converges to the optimal policy, no matter what that is, given enough rollouts. More rollouts = more update away from the prior. Or if you don't like that, good old policy/value iteration on a finite MDP is an example: start with random parameters and the more iterations they can do, the further they provably monotonically travel from the original random initialization to the optimal policy.)

One might say that the point of model-based RL is to not be stupid, and thus safe due to its stupidity, in all the ways you repeatedly emphasize purely model-free RL agents may be. And that's why AGI will not be purely model-free, nor are our smartest current frontier models like LLMs purely model-free. I don't see how you get this vision of AGI as some sort of gigantic frog retina, which is the strawman that you seem to be aiming at in all your arguments about why you are convinced there's no danger.

Obviously AGI will do things like 'plan' or 'model' or 'search' - or if you think that it will not, you should say so explicitly, and be clear about what kind of algorithm you think AGI would be, and explain how you think that's possible. I would be fascinated to hear how you think that superhuman intelligence in all domains like programming or math or long-term strategy could be done by purely model-free approaches which do not involve planning or searching or building models of the world or utility-maximization!

(Or to put it yet another way: 'scheming' is not a meaningful discrete category of capabilities, but a value judgment about particular ways to abuse theory of mind / world-modeling capabilities; and it's hard to see how one could create an AGI smart enough to be 'AGI', but also so stupid as to not understand people or be incapable of basic human-level capabilities like 'be a manager' or 'play poker', or generalize modeling of other agents. It would be quite bizarre to imagine a model-free AGI which must learn a separate 'simple' reactive policy of 'scheming' for each and every agent it comes across, wasting a huge number of parameters & samples every time, as opposed to simply meta-learning how to model agents in general, and applying this using planning/search to all future tasks, at enormous parameter savings and zero-shot.)

Comment by gwern on Retirement Accounts and Short Timelines · 2024-02-26T20:53:10.826Z · LW · GW

hold_my_fish's setup in which there is no increase in growth rates, either destruction or normality, is not the same as my discussion. If you include the third option of a high-growth-rate future (which is increasingly a plausible outcome), you would also want to consume a lot now to consumption-smooth, because once hypergrowth starts, you need very little capital/income to smooth/achieve the same standard of living under luxury-robot-space-communism as before. (Indeed, you might want to load up on debt on the expectation that if you survive, you'll pay it out of growth.)

Comment by gwern on Research Post: Tasks That Language Models Don’t Learn · 2024-02-25T23:02:20.447Z · LW · GW

The really cool ones are where the BPE errors compound and lead to strange, unpredictable downstream bugs. I've talked a lot about how I think ChatGPT's rhyming poetry is an example of this, but here's a simpler one: a famous GPT problem last year was about African countries whose name starts with 'K': https://news.ycombinator.com/item?id=37145312 GPT-3 made a spelling error of the usual BPE sort, which then got quoted in a blog spam & HN comment, which then (perhaps due to lack of explicit statement that the quotes were false) got read by search engine bots* and provided as snippet-answers to search queries! One could imagine this going further: extracted snippets would be useful for automatically creating Q&A datasets...

And then somewhere downstream, a human is reading the false statement 'While there are 54 recognized countries in Africa, none of them begin with the letter "K". The closest is Kenya, which starts with a "K" sound, but is actually spelled with a "K" sound.'† being generated by an AI model - possibly one that is character-tokenized at this point & so the error would appear to provably have nothing to do with BPEs - and is baffled at where this delusion came from.

* Google has discussed the processing of web page text (most detailed recent writeup) and it's usually a small BERT-like, so that means WordPiece, IIRC, and so it's also unable to 'see' the error even if it was smart enough to.

† This still replicates if you ask ChatGPT for a list of African countries which start with 'K'. If you reroll, it makes a remarkable number of different errors and confabulations. I particularly like the one where it just prints '1. Kenya' repeatedly - when you go on vacation to Kenya, be sure you go to Kenya, and not one of the other Kenyas, god have mercy on you if you land in Kenya, or worse yet, Kenya!

Comment by gwern on Research Post: Tasks That Language Models Don’t Learn · 2024-02-25T22:51:35.842Z · LW · GW

Where am I going? Nowhere complex.

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Comment by gwern on Research Post: Tasks That Language Models Don’t Learn · 2024-02-23T15:01:03.317Z · LW · GW

Maybe some H-Test tasks can be affected by this. But how do you explain tasks like Repeated Word (one group has two repeated words) or End Punctuation (based on the location of the punctuation).

I don't think I need to. 'End Punctuation' sounds like it's affected by tokenization, and regardless, artificial microbenchmarks like 'Repeated Word' are not expected to exhibit smooth scaling the way global losses like perplexity do. (They instead exhibit emergence, inverse U-scaling, and noisy patterns due to combined sampling error & biases from model checkpoints / sizes / test items / test sizes / prompts+formatting.) Look at Big-Bench to see how noisy these sorts of things are even when they are being properly evaluated in controlled conditions and sweeping model sizes (whereas your results are an uninterpretable hodge-podge).

Meanwhile, how do you explain the PaLM results on spelling miracles if you don't believe in scaling and that these are tasks "language models don't learn"?

We tested 15 models from leading LLM labs before we arrived at our claim. If the H-Test was a "scaling task", we would have observed at least some degree of performance improvement in other models like Luminous or LLaMA too. But no this was not the case.

We see improvements from scaling all the time which start from a flatline and then increase at critical sizes. See 'emergence'. Emergence is not that surprising because phase transitions are everywhere in NNs; and obviously, people don't bother with creating benchmarks where all the LLMs are ~100%, and then the best model, GPT-4, has a chance to exhibit emergence. And, doubtless, we'll see more examples with GPT-5 etc. (You also have a higher opinion of some of these 'leading' models like Luminous than I think most people do.)

Our section 5 (Analysis: We Don’t Understand GPT-4) is in fact dedicated to disproving the claim that more orthography-specific data will help LLMs solve H-Test. In GPT-3.5-Turbo finetuning results on H-Test training set, we observed no significant improvement in performance. Before and after finetuning, the performance remains tightly centered around the random change baseline.

Why would finetuning on a training set help a test set if GPT-3.5 is memorizing? Memorizing a pair of rhymes A/B tells you nothing about another pair of rhymes C/D, regardless of the two tasks being 'in-domain'.

(By the way, I would be skeptical of any conclusions drawn from GPT-3.5 finetuning because even if the 'finetuning' seemed to work, who knows what that 'finetuning' mystery meat actually is? The first iteration of OA's GPT-3 finetuning was apparently a fiasco, somehow whenever the rebooted OA GPT-3 finetuning comes up the result from it always seems to be 'it doesn't help capabilities', and OA declines to explain in any detail what the 'finetuning' does.)

Comment by gwern on Retirement Accounts and Short Timelines · 2024-02-23T14:36:11.717Z · LW · GW

That sounds like a knife's-edge sort of scenario. The treatment arrives neither much earlier nor later but just a year or two before you die (inclusive of all interim medical/longevity improvements, which presumably are nontrivial if some new treatment is curing aging outright) and costs not vastly too much nor vastly below your net worth but just enough that, even in a Christiano-esque slow takeoff where global GDP is doubling every decade & also the treatment will soon be shipped out to so many people that it will drop massively in price each year, that you still just can't afford it - but you could if only you had been socking away an avocado-toast a month in your 401k way back in 2020?

Comment by gwern on Retirement Accounts and Short Timelines · 2024-02-23T03:30:50.051Z · LW · GW

However, you should probably hedge your bets to a certain extent just in case you manage to live to retirement age.

Do you need to, though? People have been citing Feynman's anecdote about the RAND researchers deciding to stop bothering with retirement savings in the 1940s/50s because they thought the odds of nuclear war was so high. But no one has mentioned any of those RAND researchers dying on the streets or living off the proverbial dog chow in retirement. And why would they have?

First, anyone who was a RAND researcher is a smart cookie doing white-collar jobs who will be in high demand into retirement and beyond (and maybe higher than any time before in their career); it's not like they were construction workers whose backs are giving out in their 30s and will be unable to earn any money after that. Quite the opposite.

Second, no one said stopping was irrevocable. You can't go back in time, of course, but you can always just start saving again. This is relevant because when nuclear war didn't happen within a decade or two, presumably they noticed at some point, 'hey, I'm still alive'. There is very high option value/Value of Information. If, after a decade or two, nuclear war has not happened and you survive the Cuban Missile Crisis... you can start saving then.

The analogue here would be for AI risk, most of the short-term views are that we are going to learn a lot over the next 5-10 years. By 5 years, a decent number of AGI predictions will be expiring, and it will be much clearer how far DL scaling will go. DL will either be much scarier than it is now, or it will have slammed to a halt. And by 10 years, most excuses for any kind of pause will have expired and matters will be clearer. You are not committed to dissavings forever; you are not even committed for more than a few years, during which you will learn a lot.

Third, consider also the implication of 'no nuclear war' for those RAND researchers: that's good for the American economy. Very good. If you were a RAND researcher who stopped saving in the 1940s-1950s and decided to start again in the '60s, and you were 'behind', well, that meant that you started investing in time for what Warren Buffett likes to call one of the greatest economic long-term bull markets in the history of humanity.

Going back to AI, if you are wrong about AGI being a danger, and AGI is achieved on track but it's safe and beneficial, the general belief is that whatever else it does, it ought to lead to massive economic growth. So, if you are wrong, and you start saving again, you are investing at the start of what may be the greatest long-term bull market that could ever happen in the history of humanity. Seems like a pretty good starting point for your retirement savings to catch up, no?

(It has been pointed out before that if you are optimistic about AGI safety & economic growth and you are saving money, you are moving your resources from when you are very poor to when you will be very rich, and this seems like a questionable move. You should instead either be consuming now, or hedging against bad outcomes rather than doubling down on the good outcome.)

Whereas if you are wrong, the size of your retirement savings accounts will only bitterly recall to you your complacency and the time you wasted. The point of having savings, after all, is to spend them. (Memento mori.)

Comment by gwern on Research Post: Tasks That Language Models Don’t Learn · 2024-02-23T02:50:31.361Z · LW · GW

If one model does solve these tasks, it would likely mean that these tasks can be solved despite the tokenization-based LM approach. I just don't understand how.

You can get a long way just by memorization and scaling. Consider the paper "Character-Aware Models Improve Visual Text Rendering", Liu et al 2022. This explains why image generators are so bad at text: it's the tokenization (shocku!, I know).

The interesting part here is what they dub 'the spelling miracle': the larger the LLM, the more likely it is to have somehow figured out the character spelling of tokenized words. You can see that PaLM will go from as low as 32% -> 89% correctly spelling words, across its 3 model sizes 8->540b. Same tokenization, same arch, same datasets, just scaling up. So, it is entirely possible for a small LLM to be terrible at character-level tasks, while the larger one does better by an amount similar to, say, 50->80%. (However, this capability is very fragile. If you ask GPT-4 to spell a novel word you just made-up or to rhyme words from a list of made-up words or explain puns...)

Possible sources within web corpora include: dictionaries containing phonetic pronunciation guides, alphabetically ordered lists, typos and other misspellings, and examples of spelling words with dashes or spaces between every character. Linguistic phenomena that may aide in inducing spelling knowledge include words with predictable morphemic makeup, and cases where meaning- form relation is non-arbitrary, contra Saussure’s “semiotic arbitrariness”. We refer the reader to Itzhak and Levy (2022) and Kaushal and Mahowald (2022) for work in this direction.

And of course, models like GPT-4 will benefit from the extremely large amounts of user and hired-human feedback that OA has collected, which will include many reports of character-based task errors.

Also note the terrible performance of T5 (Wordpiece, IIRC) vs ByT5 (character-based). ByT5 doesn't need 'sensory experience' to be way better than T5 - it just needs tokenization which isn't idiotically throwing away spelling information and thereby sabotaging spelling tasks... AFAIK, all the models mentioned in OP are byte or wordpiece; none are character-based. If you want to make claims about LLMs while using only character-based tasks like 'uppercase' or 'palindrome', you ought to make more of an effort to consider the effects of tokenization.

(Everyone wants to punch in a character manipulation task because they're so easy and convenient; no one wants to actually ablate or control or measure in any way the effects of tokenization.)

Comment by gwern on The Redaction Machine · 2024-02-22T02:29:16.229Z · LW · GW

Was 'redaction' inspired by Worth The Candle's 'revision mages'?

Comment by gwern on Why does generalization work? · 2024-02-20T23:49:29.766Z · LW · GW

This sounds a lot like dust theory, although it’s an angle on it I hadn’t seen sketched: You, as a specific self-preserving macrostate partition, pick out similar macrostate partitions (that share information with you). And this is always possible, no matter which macrostate partition you are (even if the resulting partitions look, to our partition-laden eyes, more chaotic).

This sounds a bit like Wolfram's grander 'ruliad' multiverse, although I'm not sure if he claims that every ruliad must have observers, no matter how random-seeming... It seems like a stretch to say that. After all, while you could always create a mapping between random sequences of states and an observer, by a giant lookup table if nothing else, this mapping might exceed the size of the universe and be impossible, or something like that. I wonder if one could make a Ramsey-theory-style argument for more modest claims?

Comment by gwern on Abs-E (or, speak only in the positive) · 2024-02-20T16:54:06.541Z · LW · GW

"provides calories" implies "easily digested".

Well, debates about what modal operators are meant by 'only' aside, I am doubtful that claim is true either! First, as a parallel consider grass again: to digest grass, ruminants need an extremely long intestinal system which takes multiple passes (including throw it up to the mouth to chew it again) and requires tons of microbes to digest it over multiple days; again, under any ordinary understanding of the phrase 'easily digested', it is not easy for cows to digest grass. Yet they still get all the calories they need to live on from it. So, for cows, grass 'provides calories' and yet is not 'easily digested'. This is also true for humans: eating plants and raw uncooked foods 'provide calories', but are not 'easily digested'; in fact, they provide much less net calories than they should because so many calories go right back into the digestive process. (And this is one of the major theories for the importance of fire & cooking in human evolution: 'the expensive gut tissue' hypothesis.) You could also point to things like 'poisonous berries': you eat them and enjoy calories as their simple carbohydrates easily digest... until you then lose a bunch of calories by being sick and sh−ting yourself all day long. Easily digested, without a doubt but did they provide calories? They did - but only for the first few hours. So, this brings out that when you talk about 'easily digested things' which 'provides calories', you are implicitly including the caloric costs of digestion & side-effects and it's really net calories you are talking about. Which will also be context-specific (eg. presumably there are wild animals like birds who will be immune to the berry poison and are the intended consumers, and for them the berries deliver full caloric value).

Comment by gwern on Abs-E (or, speak only in the positive) · 2024-02-20T04:25:52.904Z · LW · GW

I find that editing my writing to use positive statements does make it better. I feel doubtful I could easily take it to the extent of making all positive statements. This might be an interesting use of LLM rewrites: negative->positive rephrasing feels like something within GPT-4's capabilities, and it would let you quickly translate a large corpus to read & evaluate.


"Grass will pass through you without providing energy" : "without providing energy" seems little different to "not providing energy", it's still at heart a negative claim

That one seems easy to do if you go more quantitative. What is 'energy'? I mean, by e=mc^2, some grass embodies a lot of energy. You mean calories. "Grass provides 0 calories" is a positive assertion, which is more correct and still reasonably natural English. "Oh, I meant for humans". Fine, your first two versions failed this ('indigestible' for whom, exactly?) but easily revised: "Grass provides 0 calories to humans."

"Only food that can be easily digested will provide calories"

That statement would seem to also be obviously wrong. Plenty of things are 'easily digested' in any reasonable meaning of that phrase, while providing ~0 calories. Water, for example. Or artificial sweeteners. Minerals like calcium. (Chiral molecules, if you want to go really exotic.)

On further consideration, and by analogy to "is immortal" being functionally equivalent to "will live forever" (so if it's interchangeable wording, does that mean that "is immortal" is actually equally a positive statement?)

This example might be considered a benefit of the style. People can mean rather different things by 'immortal' if they are simply defining it by negation as 'not dying'. One common definition is 'not aging' (ie. the probability of annual mortality being the same each year indefinitely); the other common one is some sort of 'indestructible and will exist to the end of the universe'. The former is fairly ordinary and mundane and describes, say, naked mole rats; the latter is purely imaginary and found only in fictional works like comic books or sacred scriptures. If the former, you might say something like 'has constant mortality rate', and if the latter, 'existing forever'.

So banning 'im-mortal' (etymologically, turns out to be what you think: in - mortalis, not-mortal) could be useful. (You do see IRL people sometimes object to longevity discussions on dumb grounds like "you can't become immortal, what about accidents?!")

Comment by gwern on Weighing reputational and moral consequences of leaving Russia or staying · 2024-02-20T00:15:29.439Z · LW · GW

If you succeed while still in Russia, what is stopping those with powerful connections from simply taking over from you?

Or even if you leave partway... See: Yandex.

Comment by gwern on Victor Ashioya's Shortform · 2024-02-20T00:14:07.208Z · LW · GW

No it doesn't, not unless Groq wants to discuss publicly what the cost of that hardware was and it turns out to be, to everyone's shock, well under $5m... (And you shouldn't trust any periodical which wastes half an article on the topic of what Groq & Grok have to do with each other. There are many places you can get AI news, you don't have to read Coin Telegraph.)

Comment by gwern on I'd also take $7 trillion · 2024-02-19T21:34:12.124Z · LW · GW

While he may be a co-winner he's not exactly a household name of a 'Nobelist', and certainly in our sort of online circles he's increasingly famous for non-Nobel things; OP is being sarcastic:

In the US, on the other hand, CEOs and executive directors make most of the investment decisions. They might read magazines saying that "cloud and blockchain and IoT" will be important, and talk to other executives at conferences who agree with that view, and then make a statement saying their company "will be a trend leader for emerging technologies including cloud computing and blockchain". Then they delegate the technical details to a guy, who hires a consulting firm, who finds someone who social consensus says is probably knowedgeable.

Comment by gwern on johnswentworth's Shortform · 2024-02-16T21:10:55.864Z · LW · GW

Yeah, this is the example I've been using to convince people that the game engines are almost certainly generating training data but are probably not involved at sampling time. I can't come up with any sort of hybrid architecture like 'NN controlling game-engine through API' where you get that third front leg. One of the biggest benefits of a game-engine would be ensuring exactly that wouldn't happen - body parts becoming detached and floating in mid-air and lack of conservation. If you had a game engine with a hyper-realistic cat body model in it which something external was manipulating, one of the biggest benefits is that you wouldn't have that sort of common-sense physics problem. (Meanwhile, it does look like past generative modeling of cats in its errors. Remember the ProGAN interpolation videos of CATS? Hilarious, but also an apt demonstration of how extremely hard cats are to model. They're worse than hands.)

In addition, you see plenty of classic NN tells throughout - note the people driving a 'Dandrover'...

Comment by gwern on OpenAI's Sora is an agent · 2024-02-16T20:24:41.488Z · LW · GW

Here's a simpler way to turn a generative model into a policy which doesn't rely on actions being encoded into the state (which won't be true in most settings and can't be true in some - there are no 'actions' for a human moving around) or reversing the image generator to the prompt etc: assume your agent harness at least has a list of actions A. (In the case of Minecraft, I guess it'd be the full keyboard + mouse?) Treat Sora as a Decision Transformer, and prompt it with the goal like "A very skilled player creating a diamond, in the grass biome.", initialized at the current actual agent state. Sample the next displayed state. Now, loop over each action A and add it to the prompt: "The player moves A" and sample the next displayed state. Take whichever action A yielded a sample closest to the original action-free prompt (closest embedding, pixel distance, similar likelihood etc). This figures out what action is being taken by the internal imitated agent by blackbox generation. If unclear (eg. due to perceptual aliasing so the right action & wrong action both lead to immediately the same displayed state), sample deeper and unroll until the consequences do become clear.

This is not an efficient approach at all, but it is a minimal proof of concept about how to extract the implicit agency it has learned from the imitation-learning modeling of humans & other agents. (I say 'other agents' to be clear that agency can be learned from anywhere; like, it seems obvious that they are using game engines, and if you are using a game engine, you will probably want it populated by AI agents inside the game for scalability compared to using only human players.)

Comment by gwern on johnswentworth's Shortform · 2024-02-16T03:31:46.262Z · LW · GW

There may not be any 'clear' technical obstruction, but it has failed badly in the past. 'Add more parallelism' (particularly hierarchically) is one of the most obvious ways to improve attention, and people have spent the past 5 years failing to come up with efficient attentions that do anything but move along a Pareto frontier from 'fast but doesn't work' to 'slow and works only as well as the original dense attention'. It's just inherently difficult to know what tokens you will need across millions of tokens without input from all the other tokens (unless you are psychic), implying extensive computation of some sort, which makes things inherently serial and costs you latency, even if you are rich enough to spend compute like water. You'll note that when Claude-2 was demoing the ultra-long attention windows, it too spent a minute or two churning. While the most effective improvements in long-range attention like Flash Attention or Ring Attention are just hyperoptimizing dense attention, which is inherently limited.

Comment by gwern on johnswentworth's Shortform · 2024-02-16T00:35:38.557Z · LW · GW

I may have. Just gwern.net is, I think, somewhere around 2m, and it's not comprehensive. Also, for contradictions, I would want to detect contradictions against citations/references as well (detecting miscitations would be more important than self-consistency IMO), and as a rough ballpark, the current Gwern.net annotation* corpus is approaching 4.3m words, looks like, and is also not comprehensive. So, closer than one might think! (Anyway, doesn't deal with the cost or latency: as you can see in the demos, we are talking minutes, not seconds, for these million-token calls and the price is probably going to be in the dollar+ regime per call.)

* which are not fulltext. It would be nice to throw in all of the hosted paper & book & webpage fulltexts, but then that's probably more like 200m+ words.