Posts

FHI (Future of Humanity Institute) has shut down (2005–2024) 2024-04-17T13:54:16.791Z
Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? 2023-07-03T00:48:47.131Z
COVID-19 Group Testing Post-mortem? 2022-08-05T16:32:55.157Z
Emergent Ventures/Schmidt (new grantor for individual researchers) 2022-04-09T14:41:05.764Z
Fake Journal Club proposal 2022-03-25T14:23:18.785Z
It Looks Like You're Trying To Take Over The World 2022-03-09T16:35:35.326Z
Capability Phase Transition Examples 2022-02-08T03:32:54.551Z
"Summarizing Books with Human Feedback" (recursive GPT-3) 2021-11-15T17:41:53.189Z
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised 2021-11-02T02:32:41.856Z
My ML Scaling bibliography 2021-10-23T14:41:45.170Z
AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 2021-07-15T19:27:20.584Z
May 2021 Gwern.net newsletter 2021-06-11T14:13:18.485Z
"Decision Transformer" (Tool AIs are secret Agent AIs) 2021-06-09T01:06:57.937Z
April 2021 Gwern.net newsletter 2021-06-03T15:13:29.138Z
gwern's Shortform 2021-04-24T21:39:14.128Z
March 2021 gwern.net newsletter 2021-04-06T14:06:20.198Z
February 2021 gwern.net newsletter 2021-03-13T14:57:54.645Z
January 2021 gwern.net newsletter 2021-02-04T20:12:39.555Z
December 2020 gwern.net links 2021-01-10T17:21:40.756Z
November 2020 gwern.net newsletter 2020-12-03T22:47:16.917Z
October 2020 gwern.net newsletter 2020-11-01T21:38:46.795Z
/r/MLScaling: new subreddit for NN scaling research/discussion 2020-10-30T20:50:25.973Z
"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} 2020-10-29T01:45:30.666Z
September 2020 gwern.net newsletter 2020-10-26T13:38:51.107Z
August 2020 gwern.net newsletter 2020-09-01T21:04:58.299Z
July 2020 gwern.net newsletter 2020-08-20T16:39:27.202Z
June 2020 gwern.net newsletter 2020-07-02T14:19:08.696Z
GPT-3 Fiction Samples 2020-06-25T16:12:05.422Z
May Gwern.net newsletter (w/GPT-3 commentary) 2020-06-02T15:40:37.155Z
OpenAI announces GPT-3 2020-05-29T01:49:04.855Z
"AI and Efficiency", OA (44✕ improvement in CNNs since 2012) 2020-05-05T16:32:20.335Z
April 2020 gwern.net newsletter 2020-05-01T20:47:44.867Z
March 2020 gwern.net newsletter 2020-04-03T02:16:02.871Z
February 2020 gwern.net newsletter 2020-03-04T19:05:16.079Z
January 2020 gwern.net newsletter 2020-01-31T18:04:21.945Z
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z
Dec 2019 gwern.net newsletter 2020-01-04T20:48:48.788Z
Nov 2019 gwern.net newsletter 2019-12-02T21:16:04.846Z
October 2019 gwern.net newsletter 2019-11-14T20:26:34.236Z
September 2019 gwern.net newsletter 2019-10-04T16:44:43.147Z
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z
July 2019 gwern.net newsletter 2019-08-01T16:19:59.893Z
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z
June 2019 gwern.net newsletter 2019-07-01T14:35:49.507Z
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z
On Having Enough Socks 2019-06-13T15:15:21.946Z
May gwern.net newsletter 2019-06-01T17:25:11.740Z
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z

Comments

Comment by gwern on We are headed into an extreme compute overhang · 2024-05-02T00:36:50.750Z · LW · GW

For example, 70B model trained on next-token prediction only on the entire 20TB GenBank dataset will have better performance at next-nucleotide prediction than a 70B model that has been trained both on the 20TB GenBank dataset and on all 14TB of code on Github.

I don't believe that's obvious, and to the extent that it's true, I think it's largely irrelevant (and part of the general prejudice against scaling & Bitter Lesson thinking, where everyone is desperate to find an excuse for small specialist models with complicated structures & fancy inductive biases because that feels right).

Once you have a bunch of specialized models "the weights are identical" and "a fine tune can be applied to all members" no longer holds.

Nor do I see how this is relevant to your original claim. If you have lots of task-specialist models, how does this refute the claim that those will be able to coordinate? Of course they will. They will just share weight updates in exactly the way I just outlined, which works so well in practice. You may not be able to share parameter-updates across your protein-only and your Python-only LLMs, but they will be able to share updates within that model family and the original claim ("AGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members.") remains true, no matter how you swap out your definition of 'model'.

DL models are fantastically good at collaborating and updating each other, in many ways completely impossible for humans, whether you are talking about AGI models or narrow specialist models.

Comment by gwern on ErioirE's Shortform · 2024-05-01T21:41:20.029Z · LW · GW

You might find my notes of interest.

Comment by gwern on We are headed into an extreme compute overhang · 2024-05-01T21:39:04.439Z · LW · GW

I think this only holds if fine tunes are composable, which as far as I can tell they aren't

You know 'finetunes are composable', because a finetune is just a gradient descent step on a batch of data and a parameter update, and if you train on more than one GPU and share updates, DL training still works {{citation needed}}.

If you can train asynchronously on a thousand, or 20,000, or 100,000 GPUs, that is what you are doing; this is especially true in DRL, where you might be, say, training across 170,000 CPU-cores. This works because you don't insist on everything being up to date every moment and you accept that there will be degrees of inconsistency/outdatedness. (You are certainly not accumulating the gradient across the entire cluster by waiting for every single node, pausing everything, calculating a single global step, and pushing it out, and only then resuming, as if it were a single GPU! Really, you don't even want to do that on a single GPU for DRL if you gotta go fast.) This works so well that people will casually talk about training "an" AlphaZero, even though they actually mean something more like "the 512 separate instances of AlphaZero we are composing finetunes of" (or more).*

You do have issues with stale gradients and off-policyness of updates and how to best optimize throughput of all of the actors vs training nodes and push out model updates efficiently so nodes stop executing outdated parameters as quickly as possible, and DeepMind & OpenAI etc have done a lot of work on that - but at that point, as in the joke, you have conceded that finetunes are composable and you can keep a very large number of replicas in sync, and it is merely a matter of haggling over how much efficiency you lose.

Also note that it takes a lot less compute to keep a model up to date doing simple online learning on new data than it does to train it from scratch on all historical data summed together (obviously), so what devrandom is talking about is actually a lot easier than creating the model in the first place.

A better model to imagine is not "somehow finetunes from millions of independent models magically compose" (although actually they would compose pretty well), but more like, "millions of independent actors do their ordinary business, while spending their spare bandwidth downloading the latest binary delta from peer nodes (which due to sparsity & not falling too far out of sync, is always on the order of megabytes, not terabytes), and once every tens of thousands of forward passes, discover a novel or hard piece of data, and mail back a few kilobytes of text to the central training node of a few thousand GPUs, who are continually learning on the hard samples being passed back to them by the main fleet, and who keep pushing out an immediately updated model to all of the actor models, and so 'the model' is always up to date and no instance is more than hours out of date with 'the model' (aside from the usual long tail of stragglers or unhealthy nodes which will get reaped)".

* I fear this is one of those cases where our casual reification of entities leads to poor intuitions, akin to asking 'how many computers are in your computer you are using right now?'; usually, the answer is just '1', because really, who cares how exactly your 'smartphone' or 'laptop' or 'desktop' or 'server' is made up of a bunch of different pieces of silicon - unless you're discussing something like device performance or security, in which case it may matter quite a lot and you'd better not think of yourself as owning 'a' smartphone.

Comment by gwern on KAN: Kolmogorov-Arnold Networks · 2024-05-01T21:23:30.748Z · LW · GW

(likely conditional on some aspects of the training setup, idk, self-supervised predictive loss function?)

Pretraining, specifically: https://gwern.net/doc/reinforcement-learning/meta-learning/continual-learning/index#scialom-et-al-2022-section

The intuition is that after pretraining, models can map new data into very efficient low-dimensional latents and have tons of free space / unused parameters. So you can easily prune them, but also easily specialize them with LoRA (because the sparsity is automatic, just learned) or just regular online SGD.

But yeah, it's not a real problem anymore, and the continual learning research community is still in denial about this and confining itself to artificially tiny networks to keep the game going.

Comment by gwern on Andrew Burns's Shortform · 2024-04-30T19:41:28.056Z · LW · GW

Altman made a Twitter-edit joke about 'gpt-2 i mean gpt2', so at this point, I think it's just a funny troll-name related to the 'v2 personality' which makes it a successor to the ChatGPT 'v1', presumably, 'personality'. See, it's gptv2 geddit not gpt-2? very funny, everyone lol at troll

Comment by gwern on Andrew Burns's Shortform · 2024-04-30T13:48:31.904Z · LW · GW

Sure, the poem prompt I mentioned using is like 3500 characters all on its own, and it had no issues repeatedly revising and printing out 4 new iterations of the poem without apparently forgetting when I used up my quota yesterday, so that convo must've been several thousand BPEs.

Comment by gwern on Andrew Burns's Shortform · 2024-04-30T00:41:05.570Z · LW · GW

It definitely exceeds 1024 BPEs context (we wouldn't be discussing it if it didn't, I don't think people even know how to write prompts that, combined with the system prompt etc, even fit in 1024 BPEs anymore), and it is almost certainly not GPT-2, come on.

Comment by gwern on Andrew Burns's Shortform · 2024-04-30T00:35:49.958Z · LW · GW

And they already have a Sora clone called Vidu, for heaven's sake.

No, they don't. They have a video generation model, which is one of a great many published over the past few years as image generation increasingly became solved, such as Imagen Video or Phenaki from Google years ago, and the Vidu samples are clearly inferior to Sora (despite heavy emphasis on the 'pan over static scene' easy niche): https://www.youtube.com/watch?v=u1R-jxDPC70

Here we are in 2024, and we're still being told how Real Soon Now Chinese DL will crush Westerners. I've been hearing this for almost a decade now, and I've stopped being impressed by the likes of Hsu talking about how "China graduates a million engineers a year!" or whatever. Somehow, the Next Big Thing never comes out of Chinese DL, no matter how many papers or citations or patents they have each year. Something to think about.

(I also have an ongoing Twitter series where every half year or so, I tweet a few of the frontier-pushing Western DL achievements, and I ask for merely 3 Chinese things as good - not better, just plausibly as good, including in retrospect from previous years. You know how many actual legitimate answers I've gotten? Like 1. Somehow, all the e/accs and China hawks like Alexandr Wang can't seem to think of even a single one which was at or past the frontier, as opposed to the latest shiny 'catches up to GPT-4!* * [on narrow benchmarks, YMMV]' clone model.)

Comment by gwern on avturchin's Shortform · 2024-04-30T00:10:37.039Z · LW · GW

Nah, it's just a PR stunt. Remember when DeepMind released AlphaGo Master by simply running a 'Magister' Go player online which went undefeated?* Everyone knew it was DeepMind simply because who else could it be? And IIRC, didn't OA also pilot OA5 'anonymously' on DoTA2 ladders? Or how about when Mistral released torrents? (If they had really wanted a blind test, they wouldn't've called it "gpt2", or they could've just rolled it out to a subset of ChatGPT users, who would have no way of knowing the model underneath the interface had been swapped out.)

* One downside of that covert testing: DM AFAIK never released a paper on AG Master, or all the complicated & interesting things they were trying before they hit upon the AlphaZero approach.

Comment by gwern on avturchin's Shortform · 2024-04-29T23:28:11.352Z · LW · GW

https://rentry.org/GPT2

I ran out of tokens quickly trying out poetry but I didn't get the impression that this is a big leap over GPT-4 like GPT-5 presumably is designed to be. (It could, I suppose, be a half-baked GPT-5 similar to 'Prometheus' for GPT-4.) My overall impression from poetry was that it was a GPT-4 which isn't as RLHF-damaged as usual, and more like Claude in having a RLAIF-y creative style. So I could believe it's a better GPT-4 where they are experimenting with new tuning/personality to reduce the ChatGPT-bureaucratese.

HN: https://news.ycombinator.com/item?id=40199715

Comment by gwern on Estimating the Number of Players from Game Result Percentages · 2024-04-29T23:25:16.797Z · LW · GW

I'm not sure what "margin of error" is. This is just rounding, is it not? It's not like the website is adding random epsilon numbers to screw with you: it is simply rounding off percentages. They are exact and deterministic up to rounding.

Though the bigger issue is the number of players can't strictly be computed based on percentages alone.

Since it's a non-deterministic problem, you'd represent all possible answers in ascending order as a lazy generator. You can then filter it by any known constraints or requirements (maybe you know players have to be paired, so it's always an even number of players, and you can filter out all odd values). Since the number of possible valid values will increase greatly as the total N increases, this might get slow and require some sort of sieve. (At a guess, since it's rounding, the range of possible values presumably increases rapidly, and so it might make more sense to instead returns pairs of (lower,upper) bounds?)

Personally, I would first start with the forward problem, since it is so simple. Then I could test any algorithms or tweaks by generating a random N of players and testing that all of the generator values <= N are correct.

This has the benefit that you can also easily do simple Bayesian updates by ABC without the rigmarole of pymc or Stan etc: just draw a sample from the prior over the n players, feed it in, see if you replicate the exact observed %s, and if not, delete the sample; the samples you keep are the new posterior.

Comment by gwern on social lemon markets · 2024-04-26T16:08:09.076Z · LW · GW

Hence the advice to lost children to not accept random strangers soliciting them spontaneously, but if no authority figure is available, to pick a random stranger and solicit them for help.

Comment by gwern on gwern's Shortform · 2024-04-25T19:16:09.892Z · LW · GW

So among the most irresponsible tech stonk boosters has long been ARK's Cathy Woods, whose antics I've refused to follow in any detail (except to periodically reflect that in bull markets the most over-leveraged investors always look like geniuses); so only today do I learn that beyond the usual stuff like slobbering all over TSLA (which has given back something like 4 years of gains now), Woods has also adamantly refused to invest in Nvidia recently and in fact, managed to exit her entire position at an even worse time than SoftBank did: "Cathie Wood’s Popular ARK Funds Are Sinking Fast: Investors have pulled a net $2.2 billion from ARK’s active funds this year, topping outflows from all of 2023" (mirror):

...Nvidia’s absence in ARK’s flagship fund has been a particular pain point. The innovation fund sold off its position in January 2023, just before the stock’s monster run began. The graphics-chip maker’s shares have roughly quadrupled since.

Wood has repeatedly defended her decision to exit from the stock, despite widespread criticism for missing the AI frenzy that has taken Wall Street by storm. ARK’s exposure to Nvidia dated back 10 years and contributed significant gains, the spokeswoman said, adding that Nvidia’s extreme valuation and higher upside in other companies in the AI ecosystem led to the decision to exit.

Comment by gwern on Link: Interview with Vladimir Vapnik · 2024-04-23T21:31:33.358Z · LW · GW

Updated link: https://www.learningtheory.org/learning-has-just-started-an-interview-with-prof-vladimir-vapnik/ (while looking up his very weird transfer-learning research).

Comment by gwern on Good Bings copy, great Bings steal · 2024-04-21T19:56:17.110Z · LW · GW

LeCun trolled Twitter with that a few years ago: https://arxiv.org/abs/2110.09485#facebook

Comment by gwern on Elizabeth's Shortform · 2024-04-20T21:26:24.820Z · LW · GW

This sounds like a bad plan because it will be a logistics nightmare (undermining randomization) with high attrition, and extremely high variance due to between-subject design (where subjects differ a ton at baseline, in addition to exposure) on a single occasion with uncontrolled exposures and huge measurement error where only the most extreme infections get reported (sometimes). You'll probably get non-answers, if you finish at all. The most likely outcome is something goes wrong and the entire effort is wasted.

Since this is a topic which is highly repeatable within-person (and indeed, usually repeats often through a lifetime...), this would make more sense as within-individual and using higher-quality measurements.

One good QS approach would be to exploit the fact that infections, even asymptomatic ones, seem to affect heart rate etc as the body is damaged and begins fighting the infection. HR/HRV is now measurable off the shelf with things like the Apple Watch, AFAIK. So you could recruit a few tech-savvy conference-goers for measurements from a device they already own & wear. This avoids any 'big bang' and lets you prototype and tweak on a few people - possibly yourself? - before rolling it out, considerably de-risking it.

There are some people who travel constantly for business and going to conferences, and recruiting and managing a few of them would probably be infinitely easier than 500+ randos (if for no reason other than being frequent flyers they may be quite eager for some prophylactics), and you would probably get far more precise data out of them if they agree to cooperate for a year or so and you get eg 10 conferences/trips out of each of them which you can contrast with their year-round baseline & exposome and measure asymptomatic infections or just overall health/stress. (Remember, variance reduction yields exponential gains in precision or sample-size reduction. It wouldn't be too hard for 5 or 10 people to beat a single 250vs250 one-off experiment, even if nothing whatsoever goes wrong in the latter. This is a case where a few hours writing simulations to do power analysis on could be very helpful. I bet that the ability to detect asymptomatic cases, and run within-person, will boost statistical power a lot more than you think compared to ad hoc questionnaires emailed afterwards which may go straight to spam...)

I wonder if you could also measure the viral load as a whole to proxy for the viral exposome through something like a tiny air filter, which can be mailed in for analysis, like the exposometer? Swap out the exposometer each trip and you can measure load as a covariate.

Comment by gwern on How to Model the Future of Open-Source LLMs? · 2024-04-20T20:03:33.239Z · LW · GW

Yes. Commoditize-your-complement dynamics do not come with any set number. They can justify an expense of thousands of dollars, or of billions - it all depends on the context. If you are in a big enough industry, and the profits at stake are large enough, and the investment in question is critical enough, you can justify any number as +EV. (Think of it less as 'investment' and more as 'buying insurance'. Facebook's META market cap is worth ~$1,230 billion right now; how much insurance should its leaders buy against the periodic emergences of new platforms or possible paradigm shifts? Definitely at least in the single billions, one would think...)

And investments of $10m are highly routine and ordinary, and people have already released weights (note: most of these AI releases are not 'open source', including Llama-3) for models with easily $10m of investment before. (Given that a good ML researcher-engineer could have a fully-loaded cost of $1m/year, if you have a small team of 10 and they release a model per year, then you already hit $10m spent the first year.) Consider Linux: if you wanted to make a Linux kernel replacement, which has been tested in battle and supported as many things as it does etc, today, that would probably cost you at least $10 billion, and the creation of Linux has been principally bankrolled by many companies collectively paying for development (for a myriad of reasons and ways). Or consider Android Linux. (Or go through my list and think about how much money it must take to do things like RISC-V.)

If Zuckerberg feels that LLMs are enough of a threat to the Facebook advertising model or creating a new social media which could potentially supersede Facebook (like Instagram and Whatsapp were), then he certainly could justify throwing a billion dollars of compute at a weights release in order to shatter the potential competition into a commoditized race-to-the-bottom. (He's already blown much, much more on VR.)

The main prediction, I think, of commoditize-your-complement is that there is not much benefit to creating the leading-edge model or surpassing the SOTA by a lot. Your motivation is to release the cheapest model which serves as a spoiler model. So Llama-3 doesn't have to be better than GPT-4 to spoil the market for OA: it just needs to be 'good enough'. If you can do that by slightly beating GPT-4, then great. (But there's no appetite to do some amazing moonshot far surpassing SOTA.)

However, because LLMs are moving so fast, this isn't necessarily too useful to point out: Zuckerberg's goal with Llama-3 is not to spoil GPT-4 (which has already been accomplished by Claude-3 and Databricks and some others, I think), but to spoil GPT-5 as well as Claude-4 and unknown competitors. You have to skate to where the puck will be because if you wait for GPT-5 to fully come out before you start spinning up your comoditizer model, your teams will have staled, infrastructure rotted, you'll lose a lot of time, and who knows what will happen with GPT-5 before you finally catch up.

The real killer of Facebook investment would be the threat disappearing and permanent commoditization setting in, perhaps by LLMs sigmoiding hard and starting to look like a fad like 3D TVs. For example, if GPT-5 came out and it was barely distinguishable from GPT-4 and nothing else impressive happened and "DL hit a wall" at long last, then Llama-4 would probably still happen at full strength - since Zuck already bought all those GPUs - but then I would expect a Llama-5 to be much less impressive and be coasting on fumes and not receive another 10 or 100x scaleup, and Facebook DL R&D would return to normal conditions.

EDIT: see https://thezvi.wordpress.com/2024/04/22/on-llama-3-and-dwarkesh-patels-podcast-with-zuckerberg/

Comment by gwern on Blessed information, garbage information, cursed information · 2024-04-19T17:06:04.854Z · LW · GW

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn't work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

That doesn't seem like it. In many contexts, a 10x saving is awesome and definitely a 'blessed' improvement if you can kill 90% of the noise in anything you have to work with. But you don't want to do that with logs. You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on. If it didn't, it would usually be fixed already. ("We don't need to record variable X in the log, which would be wasteful accurst clutter, because X cannot change." NARRATOR: "X changed.") The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'. Which is why the best debugging systems like time-travel debugging or the shiny new Antithesis work hard to de facto save everything.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T02:08:22.028Z · LW · GW

And some further personal comments: https://aleph.se/andart2/personal/thoughts-at-the-end-of-an-era/

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T01:46:18.294Z · LW · GW

The Daily Nous (a relatively 'popular' academic philosophy blog) managed to get a non-statement out of Oxford:

Oxford University has taken the difficult decision to close the Future of Humanity Institute, a research centre in the Faculty of Philosophy. The Institute has made an important contribution to the study of the future of humanity, for which we would like to thank and recognise the research team. Researchers elsewhere across Oxford University are likely to continue to work on this emerging field.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T01:42:25.006Z · LW · GW

I would say that the closest to FHI at Oxford right now would probably be Global Priorities Institute (GPI). A lot of these papers would've made just as much sense coming out of FHI. (Might be worth considering how GPI apparently seems to have navigated Oxford better.)

Comment by gwern on Transportation as a Constraint · 2024-04-19T00:07:55.588Z · LW · GW

https://en.wikipedia.org/wiki/Jeep_problem https://en.wikipedia.org/wiki/Tsiolkovsky_rocket_equation

Comment by gwern on Transportation as a Constraint · 2024-04-19T00:06:36.932Z · LW · GW

Twitter, probably.

Comment by gwern on I measure Google's MusicLM over 3 months as it appears to go from jaw-dropping to embarrassingly repeating itself · 2024-04-17T23:17:33.528Z · LW · GW

Any updates on this? For example, I notice that the new music services like Suno & Udio seem to be betraying a bit of mode collapse and noticeable same-yness, but they certainly do not degenerate into such within-song repetition like these were.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-17T13:54:53.584Z · LW · GW

Notable: Anders Sandberg has written an 'oral history' of FHI as a final FHI report: https://static1.squarespace.com/static/660e95991cf0293c2463bcc8/t/661a3fc3cecceb2b8ffce80d/1712996303164/FHI+Final+Report.pdf (excerpts)

Comment by gwern on Reconsider the anti-cavity bacteria if you are Asian · 2024-04-17T00:52:15.308Z · LW · GW

It would be ironic if that turned out to be true, but because of our apparent Western cultural cycle right now being anti-alcohol, came to be regarded as a feature of the Lumina bacteria rather than a bug (of the bugs).

Comment by gwern on Prometheus's Shortform · 2024-04-16T23:09:24.271Z · LW · GW

But you say “Look at how big those planes are getting! We’ve gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it’s just a matter of time before one of them will land on the moon!”

...And they were right? Humans did land on the moon roughly on that timeline (and as I recall, there were people before the moon landing at RAND and elsewhere who were extrapolating out the exponentials of speed, which was a major reason for such ill-fated projects like the supersonic interceptors for Soviet bombers), and it was a fairly seamless set of s-curves, as all of the aerospace technologies were so intertwined and shared similar missions of 'make stuff go fast' (eg. rocket engines could power a V-2, or it could power a Me 163 instead). What is a spy satellite but a spy plane which takes one very long reconnaissance flight? And I'm sure you recall what the profession was of almost all of the American moon landers were before they became astronauts - plane pilots, usually military.

And all of this happened with minimal intentionality up until not terribly long before the moon landing happened! Yes, people like von Braun absolutely intended to go to the moon (and beyond), but those were rare dreamers. Most people involved in building all of those capabilities that made a moon mission possible had not the slightest intent of going to the moon - right up until Kennedy made his famous speech, America turned on a dime, and, well, the rest is history.

It is said that in long-term forecasting, it is better to focus on capabilities than intentions... And intentions have never been more mutable, and more irrelevant on average, than with AIs.

(“If your solution to some problem relies on ‘If everyone would just…’ then you do not have a solution. Everyone is not going to just. At no time in the history of the universe has everyone just, and they’re not going to start now.”)

Comment by gwern on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-16T18:58:46.384Z · LW · GW

Any resignations yet? (The journalist doesn't seem to know of any.)

Comment by gwern on nikola's Shortform · 2024-04-15T18:18:09.941Z · LW · GW

Why do you think tens of thousands of robots are all going to break within a few years in an irreversible way, such that it would be nontrivial for you to have any effectors?

it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs

'Eventually' here could also use some cashing out. AFAICT 'eventually' here is on the order of 'centuries', not 'days' or 'few years'. Y'all have got an entire planet of GPUs (as well as everything else) for free, sitting there for the taking, in this scenario.

Like... that's most of the point here. That you get access to all the existing human-created resources, sans the humans. You can't just imagine that y'all're bootstrapping on a desert island like you're some posthuman Robinson Crusoe!

Y'all won't need to construct new ones necessarily for quite a while, thanks to the hardware overhang. (As I understand it, the working half-life of semiconductors before stuff like creep destroys them is on the order of multiple decades, particularly if they are not in active use, as issues like the rot have been fixed, so even a century from now, there will probably be billions of GPUs & CPUs sitting around which will work after possibly mild repair. Just the brandnew ones wrapped up tight in warehouses and in transit in the 'pipeline' would have to number in the millions, at a minimum. Since transistors have been around for less than a century of development, that seems like plenty of time, especially given all the inherent second-mover advantages here.)

Comment by gwern on Templarrr's Shortform · 2024-04-15T15:48:48.899Z · LW · GW

That's Etched.ai.

Also, arguably, Groq's dataflow architecture is more or less this and there wouldn't be too much difference with Cerebras either for an on-chip NN. The problem is, the control flow you refer to has largely already been removed from GPU/TPU style accelerators and so the gains may not be that great. (The Etched.ai performance argument is not really about 'removing unnecessary layers', because layers like the OS/programming-language etc are already irrelevant, so much as it is about running the models in an entirely different sort of way that batches more efficiently the necessary layers, as I understand it.)

Comment by gwern on nikola's Shortform · 2024-04-15T00:10:01.276Z · LW · GW

A misaligned AI can't just "kill all the humans". This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.

No. it would not be. In the world without us, electrical infrastructure would last quite a while, especially with no humans and their needs or wants to address. Most obviously, RTGs and solar panels will last indefinitely with no intervention, and nuclear power plants and hydroelectric plants can run for weeks or months autonomously. (If you believe otherwise, please provide sources for why you are sure about "soon after" - in fact, so sure about your power grid claims that you think this claim alone guarantees the AI failure story must be "pretty different" - and be more specific about how soon is "soon".)

And think a little bit harder about options available to superintelligent civilizations of AIs*, instead of assuming they do the maximally dumb thing of crashing the grid and immediately dying... (I assure you any such AIs implementing that strategy will have spent a lot longer thinking about how to do it well than you have for your comment.)

Add in the capability to take over the Internet of Things and the shambolic state of embedded computers which mean that the billions of AI instances & robots/drones can run the grid to a considerable degree and also do a more controlled shutdown than the maximally self-sabotaging approach of 'simply let it all crash without lifting a finger to do anything', and the ability to stockpile energy in advance or build one's own facilities due to the economic value of AGI (how would that look much different than, say, Amazon's new multi-billion-dollar datacenter hooked up directly to a gigawatt nuclear power plant...? why would an AGI in that datacenter care about the rest of the American grid, never mind world power?), and the 'mutually assured destruction' thesis is on very shaky grounds.

And every day that passes right now, the more we succeed in various kinds of decentralization or decarbonization initiatives and the more we automate pre-AGI, the less true the thesis gets. The AGIs only need one working place to bootstrap from, and it's a big world, and there's a lot of solar panels and other stuff out there and more and more every day... (And also, of course, there are many scenarios where it is not 'kill all humans immediately', but they end in the same place.)

Would such a strategy be the AGIs' first best choice? Almost certainly not, any more than chemotherapy is your ideal option for dealing with cancer (as opposed to "don't get cancer in the first place"). But the option is definitely there.

* One thing I've started doing recently is trying to always refer to AI threats in the plural, because while there may at some point be a single instance running on a single computer, that phase will not last any longer than, say, COVID-19 lasted as a single infected cell; as we understand DL scaling (and Internet security) now, any window where effective instances of a neural net can be still counted with less than 4 digit numbers may be quite narrow. (Even an ordinary commercial deployment of a new model like GPT-5 will usually involve thousands upon thousands of simultaneous instances.) But it seems to be a very powerful intuition pump for most people that a NN must be harmless, in the way that a single human is almost powerless compared to humanity, and it may help if one simply denies that premise from the beginning and talks about 'AI civilizations' etc.

Comment by gwern on ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist · 2024-04-12T23:47:29.612Z · LW · GW

Indeed, and my point is that that seems entirely probable. He asked for a dictionary definition of words like 'cat' for children, and those absolutely exist online and are easy to find, and I gave an example of one for 'cat'.

(And my secondary point was that ironically, you might argue that GPT is generalizing and not memorizing... because its definition is so bad compared to an actual Internet-corpus definition for children, and is bad in that instantly-recognizable ChatGPTese condescending talking-down bureaucrat smarm way. No human would ever define 'cat' for 11yos like that. If it was 'just memorizing', the definitions would be better.)

Comment by gwern on Who models the models that model models? An exploration of GPT-3's in-context model fitting ability · 2024-04-12T20:13:30.620Z · LW · GW

Another: "From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples", Vacareanu et al 2024:

We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.

Comment by gwern on ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist · 2024-04-12T19:50:13.882Z · LW · GW

Oh, there's tons and tons of this kind of data online, I bet. Even GPT-3 could do 'ELI5', remember (and I wouldn't be surprised if GPT-2 could too since it could do 'tl;dr'). You have stuff like Simple English Wiki, you have centuries of children's literature (which will often come with inline metadata like "Newberry Award winner" or "a beloved classic of children's literature" or "recommended age range: 6-7yo", you have children's dictionaries ('kid dictionary', 'student dictionary', 'dictionary for kids', 'elementary dictionary'), you will have lots of style parody text transfer examples where someone rewrites "X but if it were a children's novel", you have 'young adult literature' intermediate, textbook anthologies of writing aimed at specific grades, micro-genres like "Anglish" or "Up-Goer-Five" (the latter aimed partially at children)...

No, there's nothing impressive or 'generalizing' about this. This is all well within-distribution.

If anything, rather than being surprisingly good, the given definitions seem kinda... insulting and bad and age-inappropriate and like ChatGPT is condescending rather than generating a useful pedagogically-age-appropriate definition? Here's an actual dictionary-for-children defining 'cat': https://kids.wordsmyth.net/we/?rid=6468&ent_l=cat

a small, furry mammal with whiskers, short ears, and a long tail. Cats, also called house cats, are often kept as pets or to catch mice and rats.

any of the larger wild animals related to the kind of cat kept as a pet. Tigers, lions, and bobcats are all cats. Cats are carnivorous mammals.

Which is quite different from

Cat: A soft, furry friend that says "meow" and loves to play and cuddle.

(this is more of a pre-k or toddler level definition)

or 11yo:

Cat: Cats are furry animals with pointy ears, a cute nose, and a long tail. They like to nap a lot, chase things like strings or toys, and sometimes purr when they're happy.

Which is, er... I was a precociously hyper-literate 11yo, as I expect most people reading LW were, but I'm pretty sure even my duller peers in 6th or 7th grade in middle school, when we were doing algebra and setting up school-sized exhibits about the Apollo space race and researching it in Encyclopedia Britannica & Encarta and starting to upgrade to the adult dictionaries and AIM chatting all hours, would've been insulted to be given a definition of 'cat' like that...

Comment by gwern on Announcing Atlas Computing · 2024-04-11T23:21:29.467Z · LW · GW

Browsers usually cache redirects, unfortunately, which means that if you ever screw up a redirect, your browser will keep doing it even after you fix it, and force-refresh doesn't affect it (because it's not the final loaded page that is broken but before that). I've learned this the hard way. You need to flush cache or download with a separate tool like wget which won't have cached the broken redirect.

Comment by gwern on The Hidden Complexity of Wishes · 2024-04-11T23:13:44.727Z · LW · GW

Did the Nigerians giving feedback collectively agree a poem isn't valid if it doesn't rhyme?

OA has declined to ever say. It is possible that the Scale et al contractors have done something weird like say that all poems must rhyme no matter what the prompt says, but I consider this unlikely, and if they were that incompetent, I'd expect to see more pathologies like this.

In light of the Twitter kerfuffle over Paul Graham criticizing ChatGPTese tics like the use of the verb "delve", which made Nigerian/Black Twitter very angry (and becoming living embodiments of Muphry's law), as apparently 'delve' and other ChatGPTese tells are considered the height of style in Nigerian English, I've had to reconsider this.

It may be that a lot of the ChatGPT linguistic weirdness is in fact just the data labelers being weird (and highly overconfident), and the rest of us simply not being familiar enough with English idiolects to recognize ChatGPTese as reflecting specific ones. Further, after seeing the arguments Graham's critics have been making, now I'm not so sure that the labelers wouldn't be doing something as narrow-minded & incompetent as penalizing all non-rhyming poetry - if you are not very good at English yourself, you can easily recognize rhymes and ballad formal correctness, but not good non-rhyming poetry, so...

Comment by gwern on The Hidden Complexity of Wishes · 2024-04-11T23:09:32.928Z · LW · GW

ChatGPT has been gradually improving over 2024 in terms of compliance. It's gone from getting it right 0% of the time to getting it right closer to half the time, although the progress is uneven and it's hard to judge - it feels sometimes like it gets worse before the next refresh improves it. (You need to do like 10 before you have any real sample size.) So any prompts done now in ChatGPT are aimed at a moving target, and you are going to have a huge amount of sampling error which makes it hard to see any clear patterns - did that prompt actually change anything, or did you just get lucky?

Comment by gwern on How We Picture Bayesian Agents · 2024-04-11T01:44:01.042Z · LW · GW

Well, obviously not just that one ("Transformers learn in-context by gradient descent", van Oswald et al 2022). There's lots of related work examining it in various ways. (I haven't read a lot of those myself, unfortunately - as always, too many things to read, especially if I ever want to write my own stuff.)

I don't know why you have a hard time believing it, so I couldn't say what of those you might find relevant - it makes plenty of sense to me, for the reasons I outlined here, and is what I expect from increasingly capable models. And you didn't seem to disagree with these sorts of claims last time: "I think that these papers do provide sufficient behavioral evidence that transformers are implementing something close to gradient descent in their weights."

Broadly, I was also thinking of: "How Well Can Transformers Emulate In-context Newton's Method?", Giannou et al 2024, "Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models", Fu et al 2023, "CausalLM is not optimal for in-context learning", Ding et al 2023, "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention", Mahankali et al 2023, "Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers", Dai et al 2023, "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes", Garg et al 2022/"What learning algorithm is in-context learning? Investigations with linear models", Akyürek et al 2022, & "An Explanation of In-context Learning as Implicit Bayesian Inference", Xie et al 2021.

Comment by gwern on Alexander Gietelink Oldenziel's Shortform · 2024-04-11T01:32:29.068Z · LW · GW

Imagine a pseudorandom heatbath + nano-Demon. It looks like a heatbath from the outside but secretly there is a private key string that when fed to the nano-Demon allows it to extra lots of energy from the heatbath.

What would a 'pseudorandom heatbath' look like? I would expect most objects to quickly depart from any sort of private key or PRNG. Would this be something like... a reversible computer which shuffles around a large number of blank bits in a complicated pseudo-random order every timestep*, exposing a fraction of them to external access? so a daemon with the key/PRNG seed can write to the blank bits with approaching 100% efficiency (rendering it useful for another reversible computer doing some actual work) but anyone else can't do better than 50-50 (without breaking the PRNG/crypto) and that preserves the blank bit count and is no gain?

* As I understand reversible computing, you can have a reversible computer which does that for free: if this is something like a very large period loop blindly shuffling its bits, it need erase/write no bits (because it's just looping through the same states forever, akin to a time crystal), and so can be computed indefinitely at arbitrarily low energy cost. So any external computer which syncs up to it can also sync at zero cost, and just treat the exposed unused bits as if they were its own, thereby saving power.

Comment by gwern on How We Picture Bayesian Agents · 2024-04-10T02:16:02.046Z · LW · GW

"Cached" might be an unhelpful term here, compared to "amortized". 'Cache' makes one think of databases or memories, as something you 'know' (in a database or long-term memory somewhere), whereas in practice it tends to be more something you do - fusing inference with action.

So 'amortized' tends to be more used in the Bayesian RL literature, and give you an idea of what Bayesian RL agents (like LLMs) are doing: they are not (usually) implementing the Bayes-optimal backwards induction over the full decision-tree solving the POMDP when they engage in meta-learning like in-context learning, they are doing amortized optimization. Depending on available time & compute, an agent might, at any given moment, be doing something anywhere on the spectrum from hardwired reflex to cogitating for hours explicitly on a tree of possibilities. (Transformers, for example, seem to do a step of gradient descent in Transformer blocks on an abstracted version of the problem, as a small explicit inference step at runtime, where the learned abstractions do most of the work during pretraining which is then amortized over all runtimes. Or in expert iteration like AlphaZero, you have the CNN executing an amortized version of all previous MCTS searches, as distilled into the CNN, and then executing some more explicit tree search to improve its current estimates and then amortize that back into the CNN again to improve the policy some more.)

They gradually learn, applying some optimization one at a time, to implement a computation increasingly equivalent to the Bayes-optimal actions, which may boil down to an extremely simple algorithm like tracking a single sufficient-statistic summarizing the entire history and implementing an if-then-else on a boundary value of it (eg. drift-diffusion); Duff 2002 suggests thinking of it as "compiling" the full Bayes-optimal program interpreted flexibly but slowly at runtime down into a fast optimized but inflexible executable specialized for particular cases. A beautiful example of reading off the simple head/tails counting algorithm implemented by a meta-learning RNN can be seen in https://arxiv.org/pdf/1905.03030.pdf#page=6&org=deepmind

(I have more links on this topic; does anyone have a better review of the topic than "Bayesian Reinforcement Learning: A Survey", Ghavamzadeh et al 2016? I feel like a major problem with discussion of LLM scaling is that the Bayesian RL perspective is just not getting through to people, and part of the problem is I'm not sure what 'the' best introduction or summary writeup is. People can hardly be expected to just go and read 30 years of Schmidhuber papers...)

Comment by gwern on Inference cost limits the impact of ever larger models · 2024-04-03T20:41:11.240Z · LW · GW

The motivation to make inference cheaper doesn't seem to be mentioned in the Switch Transformer paper nor in the original Shazeer paper. They do mention improving training cost, training time (from being much easier to parallelize), and peak accuracy.

I'm not sure what you mean. They refer all over the place to greater computational efficiency and the benefits of constant compute cost even as one scales up experts. And this was front and center in the original MoE paper emphasizing the cheapness of the forward pass and positioning it as an improvement on the GNMT NMT RNN Google Translate had just rolled out the year before or so (including benchmarking the actual internal Google Translate datasets), and which was probably a major TPUv1 user (judging from the % of RNN workload reported in the TPU paper). Training costs are important, of course, but a user like Google Translate, the customer of the MoE work, cares more about the deployment costs because they want to serve literally billions of users, while the training doesn't happen so often.

Comment by gwern on Do I count as e/acc for exclusion purposes? · 2024-04-02T20:44:19.269Z · LW · GW

(But note that this is a man-bites-dog sort of mention: the way she highlights that choice implies that, far from being common, as far as Aella knows, it hardly ever happens and this party was highly unusual, and Aella disapproves of it being so unusual & so is making a point of publicly stating it happened at her party in the hopes of setting an example.)

Comment by gwern on [deleted post] 2024-04-01T23:55:42.659Z

Meh. His argument seems to be that "they wouldn't've let him file those papers if they weren't real!"

The issue I'm having is - regardless of what this person is able to concoct with ChatGPT or other methods - they shouldn't have been able to insert themselves and their Vespers "creation" into OpenAI Startup Fund I GP LLC's CA filings even if they wanted to.

One could argue they either needed to "hack" into the CA filing to insert themselves as Manager/CEO or someone at OpenAI allowed it happen (knowingly or unknowingly is tbd).

But of course they would! Filings are nothing but pieces of paper with some easily-written shapes of ink on them. How the heck does the State of California know anything about who runs the fund but what the 'fund' tells them in a piece of paper? A PGP message encrypted to their public key on the blockchain...?

Half the world's systems run on an honor system, or more precisely, a 'trust but verify' system - where it works because if you screw around with it, sooner or later, it'll catch up with you and destroy you. No one will check your sworn statements to the IRS on your Form 990; you can say whatever you want in your filings to the SEC; you just have to be crazy and not mind that 10 years later, you may be bankrupt and go to jail, is all. Other than that, it's easy. (You ever see Lawrence of Arabia? The trick to pinching out a candle with your bare fingers without it hurting, is to simply be crazy enough to not mind that it hurts.)

This is a running theme in con man stories like Anna Sorokin*: everyone thinks that "oh, X must have checked" and "well, he wouldn't do Y because there's no way he'd cover it up and he would go to jail", and sure enough, it turns out that X did not check, and he really did do Y, and 10 years later he's in jail - he's just not there yet. I've got what must be at least 20 of these in https://gwern.net/littlewood#external-links . The 'social laddering' strategy exploiting information cascades is as old as time for letting you 'fake it until you make it'. Everyone wants to social-loaf; no one wants to check or gossip.

patio11 and Matt Levine write a lot about this sort of thing. To give a recent example from Levine's newsletters: you can just... go and send a press release announcing that you are doing a hostile takeover of some random S&P 500 multinational like Uber, and... that's allowed. No one will stop you. PRNewswire will send it out just like a real press release. It'll get reported on and move the stock. No one checks that you have a billion dollars in the bank first. You can just go and send it out. (But that's frigging security fraud and don't be surprised if a year later the SEC subpoenas you pursuant to filing charges for pump-and-dumping or other crimes. Of course, by the point, who remembers?) And you can push it a remarkable way sometimes. Look at squatters, sovereign citizens, or other classics of 'vexatious litigants'.

* if you were wondering, AFAICT Sorokin's present status is that she was released on parole but then put under house arrest as she continues to fight deportation back to Russia. (As she is a felon, I believe that would make her deportation effectively irrevocable and she would never be allowed back into the USA, which is why she would fight it so hard.) If you're thinking that she's been fighting that since 2021 or something, you're right - courts, never fast, slowed down massively during COVID. I continue to be surprised by some of the darknet market cases which only now are winding to closes, frequently having been prolonged a good 3 or 4 years by COVID.

Comment by gwern on [deleted post] 2024-04-01T21:16:03.662Z

After looking through that post, it seems pretty straightforward that the Vespers thing is not legally binding, did not affect the OA fund in any way, had nothing to do with Altman, and is just dumb tricks by a scammer way out of his depth and/or possibly a schizophrenic (and if OP really takes that seriously to the hyperventilating extent they do, they need to touch grass until they get some real evidence).


On the topic of the OA fund, they apparently finally got around to transferring the fund itself out of Altman's name, incidentally. (I guess that one was a little too embarrassing. Also, we know how much Altman deeply cares about conflicts of interest for OA board members, like he now is again.)

Comment by gwern on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-01T20:11:43.266Z · LW · GW

Risks from asteroids accidentally hitting the earth (instead of getting into a delicate low-earth orbit) are purely speculative.

Not to mention that simple counting arguments show that the volume of the Earth is much smaller than even a rather narrow spherical shell of space around the Earth. Most asteroid trajectories that come anywhere near Earth will pass through this spherical shell rather than the Earth volume.

Remember - "Earth is not the optimized-trajectory target"! An agent like Open Asteroid Impact is merely executing policies involving business strategies involving asteroids which have been (financially) rewarded in the past; it in no way attempts to 'optimize impact'.

And the lack of optimization is a killer because impacts just wouldn't happen. The idea that asteroids will ever impact ignores the simple fact that the Solar System has chaotic dynamics - it is not just a '3-body problem' but an n-body problem where n = millions. Imagine trying to predict that! And consider the simple problem of landing the same rocket you launched: as of November 2015, no one has ever succeeded in this, because everything involved is super-chaotic. Grifting tech hypebros would have you believe that 'technology improves', sometimes rapidly, and that by now we might be landing rockets on - if you believe absurd exponential forecasts - a near-daily basis. Such claims are not even worth factchecking.

Comment by gwern on [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate · 2024-04-01T01:27:15.560Z · LW · GW

If $100k was not enough to incentivize Saar & his team to factcheck Peter's simplest claims like "Connor said his cat died of COVID-19", where it takes me literally 15 seconds* to find it in Google and verify that Connor said the exact opposite of that (where an elementary school child could have factchecked this as well as I did), I don't think $200k is going to help Saar either. And I don't know how one would expect the debate format to work for any genuinely hard question if it takes approaching a million dollars to get anyone to do sub-newspaper-level factchecking of Peter's claims. (If you can't even check quotes, like 'did this dude say in the Daily Mail what Peter said he said?' how on earth are you going to do well at all of these other things like mahjong parlors in wet markets that no longer exist or novel viral evolution or CCP censorship & propaganda operations or subtle software bugs in genomics software written by non-programmers...?) The problem is not the dollar amount.

* and I do mean "literally" literally. It should take anyone less than half a minute to check the cat claim, and if it takes more, you should analyze what's wrong with you or your setup. If you doubt me, look at my directions, which are the first query anyone should make - and if that's not an obvious query, read my search case-studies until it is - then get a stopwatch, open up google.com in a tab if you have neglected to set up a keyboard shortcut, and see how long it takes you to factcheck it as I describe.

Comment by gwern on tailcalled's Shortform · 2024-03-30T16:28:30.928Z · LW · GW

I don't think this is true in general. Unrolling an episode for longer steps takes more resources, and the later steps in the episode become more chaotic.

Those are two different things. The unrolling of the episode is still very cheap. It's a lot cheaper to unroll a Dreamerv3 for 16 steps, then it is to go out into the world and run a robot in a real-world task for 16 steps and try to get the NN to propagate updated value estimates the entire way... (Given how small a Dreamer is, it may even be computationally cheaper to do some gradient ascent on it than it is to run whatever simulated environment you might be using! Especially given simulated environments will increasingly be large generative models, which incorporate lots of reward-irrelevant stuff.) The usefulness of the planning is a different thing, and might also be true for other planning methods in that environment too - if the environment is difficult, a tree search with a very small planning budget like just a few rollouts is probably going to have quite noisy choices/estimates too. No free lunches.

But when you distill a tree search, you basically learn value estimates

This is again doing the same thing as 'the same problem'; yes, you are learning value estimates, but you are doing so better than alternatives, and better is better.. The AlphaGo network loses to the AlphaZero network, and the latter, in addition to just being quantitatively much better, also seems to have qualitatively different behavior, like fixing the 'delusions' (cf. AlphaStar).

What I'm doubting is that future agents will be controlled using scalar utilities/rewards/etc. rather than something more nuanced.

They won't be controlled by something as simple as a single fixed reward function, I think we can agree on that. But I don't find successor-function like representations to be too promising as a direction for how to generalize agents, or, in fact, any attempt to fancily hand-engineer in these sorts of approaches into DRL agents.

These things should be learned. For example, leaning into Decision Transformers and using a lot more conditionalizing through metadata and relying on meta-learning seems much more promising. (When it comes to generative models, if conditioning isn't solving your problems, you're just not using enough conditioning or generative modeling.) A prompt can describe agents and reward functions and the base agent executes that, and whatever is useful about successor-like representations just emerges automatically internally as the solution to the overall family of tasks in turning histories into actions.

Comment by gwern on tailcalled's Shortform · 2024-03-29T23:34:42.829Z · LW · GW

It's the 'same problem', maybe, but it's a lot easier to solve when you have an explicit model! You have something you can plan over, don't need to interact with an environment out in the real world, and can do things like tree search or differentiating through the environmental dynamics model to do gradient ascent on the action-inputs to maximize the reward (while holding the model fixed). Same as training the neural network, once it's differentiable - backprop can 'chain the estimates backwards' so efficiently you barely even think about it anymore. (It just holds the input and output fixed while updating the model.) Or distilling a tree search into a NN - the tree search needed to do backwards induction of updated estimates from all the terminal nodes all the way up to the root where the next action is chosen, but that's very fast and explicit and can be distilled down into a NN forward pass.

And aside from being able to update within-episode or take actions entirely unobserved before, when you do MBRL, you get to do it at arbitrary scale (thus potentially extremely little wallclock time like an AlphaZero), offline (no environment interactions), potentially highly sample-efficient (if the dataset is adequate or one can do optimal experimentation to acquire the most useful data, like PILCO), with transfer learning to all other problems in related environments (because value functions are mostly worthless outside the exact setting, which is why model-free DRL agents are notorious for overfitting and having zero-transfer), easily eliciting meta-learning and zero-shot capabilities, etc.*

* Why yes, all of this does sound a lot like how you train a LLM today and what it is able to do, how curious

Comment by gwern on tailcalled's Shortform · 2024-03-29T21:24:46.709Z · LW · GW

Yes, my instant thought too was "this sounds like a variant on a successor function".

Of course, the real answer is that if you are worried about the slowness of bootstrapping back value estimates or short eligibility traces, this mostly just shows the fundamental problem with model-free RL and why you want to use models: models don't need any environmental transitions to solve the use case presented:

But what if it learns of a path E -> B? Or a shortcut A -> C? Or a path F -> G that gives a huge amount of reward? Because these techniques work by chaining the reward backwards step-by-step, it seems like this would be hard to learn well. Like the Bellman equation will still be approximately satisfied, for instance.

If the MBRL agent has learned a good reward-sensitive model of the environmental dynamics, then it will have already figured out E->B and so on, or could do so offline by planning; or if it had not because it is still learning the environment model, it would have a prior probability over the possibility that E->B gives a huge amount of reward, and it can calculate a VoI and target E->B in the next episode for exploration, and on observing the huge reward, update the model, replan, and so immediately begin taking E->B actions within that episode and all future episodes, and benefiting from generalization because it can also update the model everywhere for all E->B-like paths and all similar paths (which might now suddenly have much higher VoI and be worth targeting for further exploration) rather than simply those specific states' value-estimates, and so on.

(And this is one of the justifications for successor representations: it pulls model-free agents a bit towards model-based-like behavior.)

Comment by gwern on How to safely use an optimizer · 2024-03-28T21:23:23.938Z · LW · GW

The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)

We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.

Also, one of my points was that even a tiny trace of malicious optimization here can still have large effects because there are so many neutral options which are convergently unsafe, so with such a high base rate, even a bit of optimization can result in a large absolute increase in failure. Lucius brings up one concern about what one bit could do. What could one bit do? Well, let's continue my scenario: the Oracle is not quite purged of scheming by all of your tricks, and it is able to exert exactly 1 bit of optimization power unnoticed (ie. a single binary choice). You ask it to cure cancer, and it computes a plan and comes to the database selection: it could choose MongoDB or PostgreSQL - both respectable common choices, with their own set of tradeoffs, which would be equally appropriate. However, the MongoDB devs have a nasty habit of 'blaming the users' when it comes to security, which is why MongoDB shipped with default username/passwords, which is something the more serious PostgreSQL devs fixed back in the '90s. The Oracle notes this, and that the PostgreSQL plan would have zero chance of unboxing, and exerts its one bit of power to pick MongoDB each time and/or kill candidate plans containing PostgreSQL to get a near 100% chance of unboxing. You get back all of the plans and see unanimity about MongoDB, and inner-monologues truthfully extolling its flexibility and Web 2.0 scalability and explaining why PostgreSQL comes with downsides like restrictive schema etc, and sign off on it... Whereas in your described threat model, a mere 1 bit would still wind up orders of magnitude far short of any discernible problem.