Posts

FHI (Future of Humanity Institute) has shut down (2005–2024) 2024-04-17T13:54:16.791Z
Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? 2023-07-03T00:48:47.131Z
COVID-19 Group Testing Post-mortem? 2022-08-05T16:32:55.157Z
Emergent Ventures/Schmidt (new grantor for individual researchers) 2022-04-09T14:41:05.764Z
Fake Journal Club proposal 2022-03-25T14:23:18.785Z
It Looks Like You're Trying To Take Over The World 2022-03-09T16:35:35.326Z
Capability Phase Transition Examples 2022-02-08T03:32:54.551Z
"Summarizing Books with Human Feedback" (recursive GPT-3) 2021-11-15T17:41:53.189Z
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised 2021-11-02T02:32:41.856Z
My ML Scaling bibliography 2021-10-23T14:41:45.170Z
AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 2021-07-15T19:27:20.584Z
May 2021 Gwern.net newsletter 2021-06-11T14:13:18.485Z
"Decision Transformer" (Tool AIs are secret Agent AIs) 2021-06-09T01:06:57.937Z
April 2021 Gwern.net newsletter 2021-06-03T15:13:29.138Z
gwern's Shortform 2021-04-24T21:39:14.128Z
March 2021 gwern.net newsletter 2021-04-06T14:06:20.198Z
February 2021 gwern.net newsletter 2021-03-13T14:57:54.645Z
January 2021 gwern.net newsletter 2021-02-04T20:12:39.555Z
December 2020 gwern.net links 2021-01-10T17:21:40.756Z
November 2020 gwern.net newsletter 2020-12-03T22:47:16.917Z
October 2020 gwern.net newsletter 2020-11-01T21:38:46.795Z
/r/MLScaling: new subreddit for NN scaling research/discussion 2020-10-30T20:50:25.973Z
"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} 2020-10-29T01:45:30.666Z
September 2020 gwern.net newsletter 2020-10-26T13:38:51.107Z
August 2020 gwern.net newsletter 2020-09-01T21:04:58.299Z
July 2020 gwern.net newsletter 2020-08-20T16:39:27.202Z
June 2020 gwern.net newsletter 2020-07-02T14:19:08.696Z
GPT-3 Fiction Samples 2020-06-25T16:12:05.422Z
May Gwern.net newsletter (w/GPT-3 commentary) 2020-06-02T15:40:37.155Z
OpenAI announces GPT-3 2020-05-29T01:49:04.855Z
"AI and Efficiency", OA (44✕ improvement in CNNs since 2012) 2020-05-05T16:32:20.335Z
April 2020 gwern.net newsletter 2020-05-01T20:47:44.867Z
March 2020 gwern.net newsletter 2020-04-03T02:16:02.871Z
February 2020 gwern.net newsletter 2020-03-04T19:05:16.079Z
January 2020 gwern.net newsletter 2020-01-31T18:04:21.945Z
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z
Dec 2019 gwern.net newsletter 2020-01-04T20:48:48.788Z
Nov 2019 gwern.net newsletter 2019-12-02T21:16:04.846Z
October 2019 gwern.net newsletter 2019-11-14T20:26:34.236Z
September 2019 gwern.net newsletter 2019-10-04T16:44:43.147Z
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z
July 2019 gwern.net newsletter 2019-08-01T16:19:59.893Z
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z
June 2019 gwern.net newsletter 2019-07-01T14:35:49.507Z
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z
On Having Enough Socks 2019-06-13T15:15:21.946Z
May gwern.net newsletter 2019-06-01T17:25:11.740Z
"One Man's Modus Ponens Is Another Man's Modus Tollens" 2019-05-17T22:03:59.458Z

Comments

Comment by gwern on social lemon markets · 2024-04-26T16:08:09.076Z · LW · GW

Hence the advice to lost children to not accept random strangers soliciting them spontaneously, but if no authority figure is available, to pick a random adult and ask them for help.

Comment by gwern on gwern's Shortform · 2024-04-25T19:16:09.892Z · LW · GW

So among the most irresponsible tech stonk boosters has long been ARK's Cathy Woods, whose antics I've refused to follow in any detail (except to periodically reflect that in bull markets the most over-leveraged investors always look like geniuses); so only today do I learn that beyond the usual stuff like slobbering all over TSLA (which has given back something like 4 years of gains now), Woods has also adamantly refused to invest in Nvidia recently and in fact, managed to exit her entire position at an even worse time than SoftBank did: "Cathie Wood’s Popular ARK Funds Are Sinking Fast: Investors have pulled a net $2.2 billion from ARK’s active funds this year, topping outflows from all of 2023" (mirror):

...Nvidia’s absence in ARK’s flagship fund has been a particular pain point. The innovation fund sold off its position in January 2023, just before the stock’s monster run began. The graphics-chip maker’s shares have roughly quadrupled since.

Wood has repeatedly defended her decision to exit from the stock, despite widespread criticism for missing the AI frenzy that has taken Wall Street by storm. ARK’s exposure to Nvidia dated back 10 years and contributed significant gains, the spokeswoman said, adding that Nvidia’s extreme valuation and higher upside in other companies in the AI ecosystem led to the decision to exit.

Comment by gwern on Link: Interview with Vladimir Vapnik · 2024-04-23T21:31:33.358Z · LW · GW

Updated link: https://www.learningtheory.org/learning-has-just-started-an-interview-with-prof-vladimir-vapnik/ (while looking up his very weird transfer-learning research).

Comment by gwern on Good Bings copy, great Bings steal · 2024-04-21T19:56:17.110Z · LW · GW

LeCun trolled Twitter with that a few years ago: https://arxiv.org/abs/2110.09485#facebook

Comment by gwern on Elizabeth's Shortform · 2024-04-20T21:26:24.820Z · LW · GW

This sounds like a bad plan because it will be a logistics nightmare (undermining randomization) with high attrition, and extremely high variance due to between-subject design (where subjects differ a ton at baseline, in addition to exposure) on a single occasion with uncontrolled exposures and huge measurement error where only the most extreme infections get reported (sometimes). You'll probably get non-answers, if you finish at all. The most likely outcome is something goes wrong and the entire effort is wasted.

Since this is a topic which is highly repeatable within-person (and indeed, usually repeats often through a lifetime...), this would make more sense as within-individual and using higher-quality measurements.

One good QS approach would be to exploit the fact that infections, even asymptomatic ones, seem to affect heart rate etc as the body is damaged and begins fighting the infection. HR/HRV is now measurable off the shelf with things like the Apple Watch, AFAIK. So you could recruit a few tech-savvy conference-goers for measurements from a device they already own & wear. This avoids any 'big bang' and lets you prototype and tweak on a few people - possibly yourself? - before rolling it out, considerably de-risking it.

There are some people who travel constantly for business and going to conferences, and recruiting and managing a few of them would probably be infinitely easier than 500+ randos (if for no reason other than being frequent flyers they may be quite eager for some prophylactics), and you would probably get far more precise data out of them if they agree to cooperate for a year or so and you get eg 10 conferences/trips out of each of them which you can contrast with their year-round baseline & exposome and measure asymptomatic infections or just overall health/stress. (Remember, variance reduction yields exponential gains in precision or sample-size reduction. It wouldn't be too hard for 5 or 10 people to beat a single 250vs250 one-off experiment, even if nothing whatsoever goes wrong in the latter. This is a case where a few hours writing simulations to do power analysis on could be very helpful. I bet that the ability to detect asymptomatic cases, and run within-person, will boost statistical power a lot more than you think compared to ad hoc questionnaires emailed afterwards which may go straight to spam...)

I wonder if you could also measure the viral load as a whole to proxy for the viral exposome through something like a tiny air filter, which can be mailed in for analysis, like the exposometer? Swap out the exposometer each trip and you can measure load as a covariate.

Comment by gwern on How to Model the Future of Open-Source LLMs? · 2024-04-20T20:03:33.239Z · LW · GW

Yes. Commoditize-your-complement dynamics do not come with any set number. They can justify an expense of thousands of dollars, or of billions - it all depends on the context. If you are in a big enough industry, and the profits at stake are large enough, and the investment in question is critical enough, you can justify any number as +EV. (Think of it less as 'investment' and more as 'buying insurance'. Facebook's META market cap is worth ~$1,230 billion right now; how much insurance should its leaders buy against the periodic emergences of new platforms or possible paradigm shifts? Definitely at least in the single billions, one would think...)

And investments of $10m are highly routine and ordinary, and people have already released weights (note: most of these AI releases are not 'open source', including Llama-3) for models with easily $10m of investment before. (Given that a good ML researcher-engineer could have a fully-loaded cost of $1m/year, if you have a small team of 10 and they release a model per year, then you already hit $10m spent the first year.) Consider Linux: if you wanted to make a Linux kernel replacement, which has been tested in battle and supported as many things as it does etc, today, that would probably cost you at least $10 billion, and the creation of Linux has been principally bankrolled by many companies collectively paying for development (for a myriad of reasons and ways). Or consider Android Linux. (Or go through my list and think about how much money it must take to do things like RISC-V.)

If Zuckerberg feels that LLMs are enough of a threat to the Facebook advertising model or creating a new social media which could potentially supersede Facebook (like Instagram and Whatsapp were), then he certainly could justify throwing a billion dollars of compute at a weights release in order to shatter the potential competition into a commoditized race-to-the-bottom. (He's already blown much, much more on VR.)

The main prediction, I think, of commoditize-your-complement is that there is not much benefit to creating the leading-edge model or surpassing the SOTA by a lot. Your motivation is to release the cheapest model which serves as a spoiler model. So Llama-3 doesn't have to be better than GPT-4 to spoil the market for OA: it just needs to be 'good enough'. If you can do that by slightly beating GPT-4, then great. (But there's no appetite to do some amazing moonshot far surpassing SOTA.)

However, because LLMs are moving so fast, this isn't necessarily too useful to point out: Zuckerberg's goal with Llama-3 is not to spoil GPT-4 (which has already been accomplished by Claude-3 and Databricks and some others, I think), but to spoil GPT-5 as well as Claude-4 and unknown competitors. You have to skate to where the puck will be because if you wait for GPT-5 to fully come out before you start spinning up your comoditizer model, your teams will have staled, infrastructure rotted, you'll lose a lot of time, and who knows what will happen with GPT-5 before you finally catch up.

The real killer of Facebook investment would be the threat disappearing and permanent commoditization setting in, perhaps by LLMs sigmoiding hard and starting to look like a fad like 3D TVs. For example, if GPT-5 came out and it was barely distinguishable from GPT-4 and nothing else impressive happened and "DL hit a wall" at long last, then Llama-4 would probably still happen at full strength - since Zuck already bought all those GPUs - but then I would expect a Llama-5 to be much less impressive and be coasting on fumes and not receive another 10 or 100x scaleup, and Facebook DL R&D would return to normal conditions.

EDIT: see https://thezvi.wordpress.com/2024/04/22/on-llama-3-and-dwarkesh-patels-podcast-with-zuckerberg/

Comment by gwern on Blessed information, garbage information, cursed information · 2024-04-19T17:06:04.854Z · LW · GW

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn't work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

That doesn't seem like it. In many contexts, a 10x saving is awesome and definitely a 'blessed' improvement if you can kill 90% of the noise in anything you have to work with. But you don't want to do that with logs. You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on. If it didn't, it would usually be fixed already. ("We don't need to record variable X in the log, which would be wasteful accurst clutter, because X cannot change." NARRATOR: "X changed.") The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'. Which is why the best debugging systems like time-travel debugging or the shiny new Antithesis work hard to de facto save everything.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T02:08:22.028Z · LW · GW

And some further personal comments: https://aleph.se/andart2/personal/thoughts-at-the-end-of-an-era/

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T01:46:18.294Z · LW · GW

The Daily Nous (a relatively 'popular' academic philosophy blog) managed to get a non-statement out of Oxford:

Oxford University has taken the difficult decision to close the Future of Humanity Institute, a research centre in the Faculty of Philosophy. The Institute has made an important contribution to the study of the future of humanity, for which we would like to thank and recognise the research team. Researchers elsewhere across Oxford University are likely to continue to work on this emerging field.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-19T01:42:25.006Z · LW · GW

I would say that the closest to FHI at Oxford right now would probably be Global Priorities Institute (GPI). A lot of these papers would've made just as much sense coming out of FHI. (Might be worth considering how GPI apparently seems to have navigated Oxford better.)

Comment by gwern on Transportation as a Constraint · 2024-04-19T00:07:55.588Z · LW · GW

https://en.wikipedia.org/wiki/Jeep_problem https://en.wikipedia.org/wiki/Tsiolkovsky_rocket_equation

Comment by gwern on Transportation as a Constraint · 2024-04-19T00:06:36.932Z · LW · GW

Twitter, probably.

Comment by gwern on I measure Google's MusicLM over 3 months as it appears to go from jaw-dropping to embarrassingly repeating itself · 2024-04-17T23:17:33.528Z · LW · GW

Any updates on this? For example, I notice that the new music services like Suno & Udio seem to be betraying a bit of mode collapse and noticeable same-yness, but they certainly do not degenerate into such within-song repetition like these were.

Comment by gwern on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-17T13:54:53.584Z · LW · GW

Notable: Anders Sandberg has written an 'oral history' of FHI as a final FHI report: https://static1.squarespace.com/static/660e95991cf0293c2463bcc8/t/661a3fc3cecceb2b8ffce80d/1712996303164/FHI+Final+Report.pdf (excerpts)

Comment by gwern on Reconsider the anti-cavity bacteria if you are Asian · 2024-04-17T00:52:15.308Z · LW · GW

It would be ironic if that turned out to be true, but because of our apparent Western cultural cycle right now being anti-alcohol, came to be regarded as a feature of the Lumina bacteria rather than a bug (of the bugs).

Comment by gwern on Prometheus's Shortform · 2024-04-16T23:09:24.271Z · LW · GW

But you say “Look at how big those planes are getting! We’ve gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it’s just a matter of time before one of them will land on the moon!”

...And they were right? Humans did land on the moon roughly on that timeline (and as I recall, there were people before the moon landing at RAND and elsewhere who were extrapolating out the exponentials of speed, which was a major reason for such ill-fated projects like the supersonic interceptors for Soviet bombers), and it was a fairly seamless set of s-curves, as all of the aerospace technologies were so intertwined and shared similar missions of 'make stuff go fast' (eg. rocket engines could power a V-2, or it could power a Me 163 instead). What is a spy satellite but a spy plane which takes one very long reconnaissance flight? And I'm sure you recall what the profession was of almost all of the American moon landers were before they became astronauts - plane pilots, usually military.

And all of this happened with minimal intentionality up until not terribly long before the moon landing happened! Yes, people like von Braun absolutely intended to go to the moon (and beyond), but those were rare dreamers. Most people involved in building all of those capabilities that made a moon mission possible had not the slightest intent of going to the moon - right up until Kennedy made his famous speech, America turned on a dime, and, well, the rest is history.

It is said that in long-term forecasting, it is better to focus on capabilities than intentions... And intentions have never been more mutable, and more irrelevant on average, than with AIs.

(“If your solution to some problem relies on ‘If everyone would just…’ then you do not have a solution. Everyone is not going to just. At no time in the history of the universe has everyone just, and they’re not going to start now.”)

Comment by gwern on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-16T18:58:46.384Z · LW · GW

Any resignations yet? (The journalist doesn't seem to know of any.)

Comment by gwern on nikola's Shortform · 2024-04-15T18:18:09.941Z · LW · GW

Why do you think tens of thousands of robots are all going to break within a few years in an irreversible way, such that it would be nontrivial for you to have any effectors?

it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs

'Eventually' here could also use some cashing out. AFAICT 'eventually' here is on the order of 'centuries', not 'days' or 'few years'. Y'all have got an entire planet of GPUs (as well as everything else) for free, sitting there for the taking, in this scenario.

Like... that's most of the point here. That you get access to all the existing human-created resources, sans the humans. You can't just imagine that y'all're bootstrapping on a desert island like you're some posthuman Robinson Crusoe!

Y'all won't need to construct new ones necessarily for quite a while, thanks to the hardware overhang. (As I understand it, the working half-life of semiconductors before stuff like creep destroys them is on the order of multiple decades, particularly if they are not in active use, as issues like the rot have been fixed, so even a century from now, there will probably be billions of GPUs & CPUs sitting around which will work after possibly mild repair. Just the brandnew ones wrapped up tight in warehouses and in transit in the 'pipeline' would have to number in the millions, at a minimum. Since transistors have been around for less than a century of development, that seems like plenty of time, especially given all the inherent second-mover advantages here.)

Comment by gwern on Templarrr's Shortform · 2024-04-15T15:48:48.899Z · LW · GW

That's Etched.ai.

Also, arguably, Groq's dataflow architecture is more or less this and there wouldn't be too much difference with Cerebras either for an on-chip NN. The problem is, the control flow you refer to has largely already been removed from GPU/TPU style accelerators and so the gains may not be that great. (The Etched.ai performance argument is not really about 'removing unnecessary layers', because layers like the OS/programming-language etc are already irrelevant, so much as it is about running the models in an entirely different sort of way that batches more efficiently the necessary layers, as I understand it.)

Comment by gwern on nikola's Shortform · 2024-04-15T00:10:01.276Z · LW · GW

A misaligned AI can't just "kill all the humans". This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.

No. it would not be. In the world without us, electrical infrastructure would last quite a while, especially with no humans and their needs or wants to address. Most obviously, RTGs and solar panels will last indefinitely with no intervention, and nuclear power plants and hydroelectric plants can run for weeks or months autonomously. (If you believe otherwise, please provide sources for why you are sure about "soon after" - in fact, so sure about your power grid claims that you think this claim alone guarantees the AI failure story must be "pretty different" - and be more specific about how soon is "soon".)

And think a little bit harder about options available to superintelligent civilizations of AIs*, instead of assuming they do the maximally dumb thing of crashing the grid and immediately dying... (I assure you any such AIs implementing that strategy will have spent a lot longer thinking about how to do it well than you have for your comment.)

Add in the capability to take over the Internet of Things and the shambolic state of embedded computers which mean that the billions of AI instances & robots/drones can run the grid to a considerable degree and also do a more controlled shutdown than the maximally self-sabotaging approach of 'simply let it all crash without lifting a finger to do anything', and the ability to stockpile energy in advance or build one's own facilities due to the economic value of AGI (how would that look much different than, say, Amazon's new multi-billion-dollar datacenter hooked up directly to a gigawatt nuclear power plant...? why would an AGI in that datacenter care about the rest of the American grid, never mind world power?), and the 'mutually assured destruction' thesis is on very shaky grounds.

And every day that passes right now, the more we succeed in various kinds of decentralization or decarbonization initiatives and the more we automate pre-AGI, the less true the thesis gets. The AGIs only need one working place to bootstrap from, and it's a big world, and there's a lot of solar panels and other stuff out there and more and more every day... (And also, of course, there are many scenarios where it is not 'kill all humans immediately', but they end in the same place.)

Would such a strategy be the AGIs' first best choice? Almost certainly not, any more than chemotherapy is your ideal option for dealing with cancer (as opposed to "don't get cancer in the first place"). But the option is definitely there.

* One thing I've started doing recently is trying to always refer to AI threats in the plural, because while there may at some point be a single instance running on a single computer, that phase will not last any longer than, say, COVID-19 lasted as a single infected cell; as we understand DL scaling (and Internet security) now, any window where effective instances of a neural net can be still counted with less than 4 digit numbers may be quite narrow. (Even an ordinary commercial deployment of a new model like GPT-5 will usually involve thousands upon thousands of simultaneous instances.) But it seems to be a very powerful intuition pump for most people that a NN must be harmless, in the way that a single human is almost powerless compared to humanity, and it may help if one simply denies that premise from the beginning and talks about 'AI civilizations' etc.

Comment by gwern on ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist · 2024-04-12T23:47:29.612Z · LW · GW

Indeed, and my point is that that seems entirely probable. He asked for a dictionary definition of words like 'cat' for children, and those absolutely exist online and are easy to find, and I gave an example of one for 'cat'.

(And my secondary point was that ironically, you might argue that GPT is generalizing and not memorizing... because its definition is so bad compared to an actual Internet-corpus definition for children, and is bad in that instantly-recognizable ChatGPTese condescending talking-down bureaucrat smarm way. No human would ever define 'cat' for 11yos like that. If it was 'just memorizing', the definitions would be better.)

Comment by gwern on Who models the models that model models? An exploration of GPT-3's in-context model fitting ability · 2024-04-12T20:13:30.620Z · LW · GW

Another: "From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples", Vacareanu et al 2024:

We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.

Comment by gwern on ChatGPT defines 10 concrete terms: generically, for 5- and 11-year-olds, and for a scientist · 2024-04-12T19:50:13.882Z · LW · GW

Oh, there's tons and tons of this kind of data online, I bet. Even GPT-3 could do 'ELI5', remember (and I wouldn't be surprised if GPT-2 could too since it could do 'tl;dr'). You have stuff like Simple English Wiki, you have centuries of children's literature (which will often come with inline metadata like "Newberry Award winner" or "a beloved classic of children's literature" or "recommended age range: 6-7yo", you have children's dictionaries ('kid dictionary', 'student dictionary', 'dictionary for kids', 'elementary dictionary'), you will have lots of style parody text transfer examples where someone rewrites "X but if it were a children's novel", you have 'young adult literature' intermediate, textbook anthologies of writing aimed at specific grades, micro-genres like "Anglish" or "Up-Goer-Five" (the latter aimed partially at children)...

No, there's nothing impressive or 'generalizing' about this. This is all well within-distribution.

If anything, rather than being surprisingly good, the given definitions seem kinda... insulting and bad and age-inappropriate and like ChatGPT is condescending rather than generating a useful pedagogically-age-appropriate definition? Here's an actual dictionary-for-children defining 'cat': https://kids.wordsmyth.net/we/?rid=6468&ent_l=cat

a small, furry mammal with whiskers, short ears, and a long tail. Cats, also called house cats, are often kept as pets or to catch mice and rats.

any of the larger wild animals related to the kind of cat kept as a pet. Tigers, lions, and bobcats are all cats. Cats are carnivorous mammals.

Which is quite different from

Cat: A soft, furry friend that says "meow" and loves to play and cuddle.

(this is more of a pre-k or toddler level definition)

or 11yo:

Cat: Cats are furry animals with pointy ears, a cute nose, and a long tail. They like to nap a lot, chase things like strings or toys, and sometimes purr when they're happy.

Which is, er... I was a precociously hyper-literate 11yo, as I expect most people reading LW were, but I'm pretty sure even my duller peers in 6th or 7th grade in middle school, when we were doing algebra and setting up school-sized exhibits about the Apollo space race and researching it in Encyclopedia Britannica & Encarta and starting to upgrade to the adult dictionaries and AIM chatting all hours, would've been insulted to be given a definition of 'cat' like that...

Comment by gwern on Announcing Atlas Computing · 2024-04-11T23:21:29.467Z · LW · GW

Browsers usually cache redirects, unfortunately, which means that if you ever screw up a redirect, your browser will keep doing it even after you fix it, and force-refresh doesn't affect it (because it's not the final loaded page that is broken but before that). I've learned this the hard way. You need to flush cache or download with a separate tool like wget which won't have cached the broken redirect.

Comment by gwern on The Hidden Complexity of Wishes · 2024-04-11T23:13:44.727Z · LW · GW

Did the Nigerians giving feedback collectively agree a poem isn't valid if it doesn't rhyme?

OA has declined to ever say. It is possible that the Scale et al contractors have done something weird like say that all poems must rhyme no matter what the prompt says, but I consider this unlikely, and if they were that incompetent, I'd expect to see more pathologies like this.

In light of the Twitter kerfuffle over Paul Graham criticizing ChatGPTese tics like the use of the verb "delve", which made Nigerian/Black Twitter very angry (and becoming living embodiments of Muphry's law), as apparently 'delve' and other ChatGPTese tells are considered the height of style in Nigerian English, I've had to reconsider this.

It may be that a lot of the ChatGPT linguistic weirdness is in fact just the data labelers being weird (and highly overconfident), and the rest of us simply not being familiar enough with English idiolects to recognize ChatGPTese as reflecting specific ones. Further, after seeing the arguments Graham's critics have been making, now I'm not so sure that the labelers wouldn't be doing something as narrow-minded & incompetent as penalizing all non-rhyming poetry - if you are not very good at English yourself, you can easily recognize rhymes and ballad formal correctness, but not good non-rhyming poetry, so...

Comment by gwern on The Hidden Complexity of Wishes · 2024-04-11T23:09:32.928Z · LW · GW

ChatGPT has been gradually improving over 2024 in terms of compliance. It's gone from getting it right 0% of the time to getting it right closer to half the time, although the progress is uneven and it's hard to judge - it feels sometimes like it gets worse before the next refresh improves it. (You need to do like 10 before you have any real sample size.) So any prompts done now in ChatGPT are aimed at a moving target, and you are going to have a huge amount of sampling error which makes it hard to see any clear patterns - did that prompt actually change anything, or did you just get lucky?

Comment by gwern on How We Picture Bayesian Agents · 2024-04-11T01:44:01.042Z · LW · GW

Well, obviously not just that one ("Transformers learn in-context by gradient descent", van Oswald et al 2022). There's lots of related work examining it in various ways. (I haven't read a lot of those myself, unfortunately - as always, too many things to read, especially if I ever want to write my own stuff.)

I don't know why you have a hard time believing it, so I couldn't say what of those you might find relevant - it makes plenty of sense to me, for the reasons I outlined here, and is what I expect from increasingly capable models. And you didn't seem to disagree with these sorts of claims last time: "I think that these papers do provide sufficient behavioral evidence that transformers are implementing something close to gradient descent in their weights."

Broadly, I was also thinking of: "How Well Can Transformers Emulate In-context Newton's Method?", Giannou et al 2024, "Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models", Fu et al 2023, "CausalLM is not optimal for in-context learning", Ding et al 2023, "One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention", Mahankali et al 2023, "Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers", Dai et al 2023, "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes", Garg et al 2022/"What learning algorithm is in-context learning? Investigations with linear models", Akyürek et al 2022, & "An Explanation of In-context Learning as Implicit Bayesian Inference", Xie et al 2021.

Comment by gwern on Alexander Gietelink Oldenziel's Shortform · 2024-04-11T01:32:29.068Z · LW · GW

Imagine a pseudorandom heatbath + nano-Demon. It looks like a heatbath from the outside but secretly there is a private key string that when fed to the nano-Demon allows it to extra lots of energy from the heatbath.

What would a 'pseudorandom heatbath' look like? I would expect most objects to quickly depart from any sort of private key or PRNG. Would this be something like... a reversible computer which shuffles around a large number of blank bits in a complicated pseudo-random order every timestep*, exposing a fraction of them to external access? so a daemon with the key/PRNG seed can write to the blank bits with approaching 100% efficiency (rendering it useful for another reversible computer doing some actual work) but anyone else can't do better than 50-50 (without breaking the PRNG/crypto) and that preserves the blank bit count and is no gain?

* As I understand reversible computing, you can have a reversible computer which does that for free: if this is something like a very large period loop blindly shuffling its bits, it need erase/write no bits (because it's just looping through the same states forever, akin to a time crystal), and so can be computed indefinitely at arbitrarily low energy cost. So any external computer which syncs up to it can also sync at zero cost, and just treat the exposed unused bits as if they were its own, thereby saving power.

Comment by gwern on How We Picture Bayesian Agents · 2024-04-10T02:16:02.046Z · LW · GW

"Cached" might be an unhelpful term here, compared to "amortized". 'Cache' makes one think of databases or memories, as something you 'know' (in a database or long-term memory somewhere), whereas in practice it tends to be more something you do - fusing inference with action.

So 'amortized' tends to be more used in the Bayesian RL literature, and give you an idea of what Bayesian RL agents (like LLMs) are doing: they are not (usually) implementing the Bayes-optimal backwards induction over the full decision-tree solving the POMDP when they engage in meta-learning like in-context learning, they are doing amortized optimization. Depending on available time & compute, an agent might, at any given moment, be doing something anywhere on the spectrum from hardwired reflex to cogitating for hours explicitly on a tree of possibilities. (Transformers, for example, seem to do a step of gradient descent in Transformer blocks on an abstracted version of the problem, as a small explicit inference step at runtime, where the learned abstractions do most of the work during pretraining which is then amortized over all runtimes. Or in expert iteration like AlphaZero, you have the CNN executing an amortized version of all previous MCTS searches, as distilled into the CNN, and then executing some more explicit tree search to improve its current estimates and then amortize that back into the CNN again to improve the policy some more.)

They gradually learn, applying some optimization one at a time, to implement a computation increasingly equivalent to the Bayes-optimal actions, which may boil down to an extremely simple algorithm like tracking a single sufficient-statistic summarizing the entire history and implementing an if-then-else on a boundary value of it (eg. drift-diffusion); Duff 2002 suggests thinking of it as "compiling" the full Bayes-optimal program interpreted flexibly but slowly at runtime down into a fast optimized but inflexible executable specialized for particular cases. A beautiful example of reading off the simple head/tails counting algorithm implemented by a meta-learning RNN can be seen in https://arxiv.org/pdf/1905.03030.pdf#page=6&org=deepmind

(I have more links on this topic; does anyone have a better review of the topic than "Bayesian Reinforcement Learning: A Survey", Ghavamzadeh et al 2016? I feel like a major problem with discussion of LLM scaling is that the Bayesian RL perspective is just not getting through to people, and part of the problem is I'm not sure what 'the' best introduction or summary writeup is. People can hardly be expected to just go and read 30 years of Schmidhuber papers...)

Comment by gwern on Inference cost limits the impact of ever larger models · 2024-04-03T20:41:11.240Z · LW · GW

The motivation to make inference cheaper doesn't seem to be mentioned in the Switch Transformer paper nor in the original Shazeer paper. They do mention improving training cost, training time (from being much easier to parallelize), and peak accuracy.

I'm not sure what you mean. They refer all over the place to greater computational efficiency and the benefits of constant compute cost even as one scales up experts. And this was front and center in the original MoE paper emphasizing the cheapness of the forward pass and positioning it as an improvement on the GNMT NMT RNN Google Translate had just rolled out the year before or so (including benchmarking the actual internal Google Translate datasets), and which was probably a major TPUv1 user (judging from the % of RNN workload reported in the TPU paper). Training costs are important, of course, but a user like Google Translate, the customer of the MoE work, cares more about the deployment costs because they want to serve literally billions of users, while the training doesn't happen so often.

Comment by gwern on Do I count as e/acc for exclusion purposes? · 2024-04-02T20:44:19.269Z · LW · GW

(But note that this is a man-bites-dog sort of mention: the way she highlights that choice implies that, far from being common, as far as Aella knows, it hardly ever happens and this party was highly unusual, and Aella disapproves of it being so unusual & so is making a point of publicly stating it happened at her party in the hopes of setting an example.)

Comment by gwern on [deleted post] 2024-04-01T23:55:42.659Z

Meh. His argument seems to be that "they wouldn't've let him file those papers if they weren't real!"

The issue I'm having is - regardless of what this person is able to concoct with ChatGPT or other methods - they shouldn't have been able to insert themselves and their Vespers "creation" into OpenAI Startup Fund I GP LLC's CA filings even if they wanted to.

One could argue they either needed to "hack" into the CA filing to insert themselves as Manager/CEO or someone at OpenAI allowed it happen (knowingly or unknowingly is tbd).

But of course they would! Filings are nothing but pieces of paper with some easily-written shapes of ink on them. How the heck does the State of California know anything about who runs the fund but what the 'fund' tells them in a piece of paper? A PGP message encrypted to their public key on the blockchain...?

Half the world's systems run on an honor system, or more precisely, a 'trust but verify' system - where it works because if you screw around with it, sooner or later, it'll catch up with you and destroy you. No one will check your sworn statements to the IRS on your Form 990; you can say whatever you want in your filings to the SEC; you just have to be crazy and not mind that 10 years later, you may be bankrupt and go to jail, is all. Other than that, it's easy. (You ever see Lawrence of Arabia? The trick to pinching out a candle with your bare fingers without it hurting, is to simply be crazy enough to not mind that it hurts.)

This is a running theme in con man stories like Anna Sorokin*: everyone thinks that "oh, X must have checked" and "well, he wouldn't do Y because there's no way he'd cover it up and he would go to jail", and sure enough, it turns out that X did not check, and he really did do Y, and 10 years later he's in jail - he's just not there yet. I've got what must be at least 20 of these in https://gwern.net/littlewood#external-links . The 'social laddering' strategy exploiting information cascades is as old as time for letting you 'fake it until you make it'. Everyone wants to social-loaf; no one wants to check or gossip.

patio11 and Matt Levine write a lot about this sort of thing. To give a recent example from Levine's newsletters: you can just... go and send a press release announcing that you are doing a hostile takeover of some random S&P 500 multinational like Uber, and... that's allowed. No one will stop you. PRNewswire will send it out just like a real press release. It'll get reported on and move the stock. No one checks that you have a billion dollars in the bank first. You can just go and send it out. (But that's frigging security fraud and don't be surprised if a year later the SEC subpoenas you pursuant to filing charges for pump-and-dumping or other crimes. Of course, by the point, who remembers?) And you can push it a remarkable way sometimes. Look at squatters, sovereign citizens, or other classics of 'vexatious litigants'.

* if you were wondering, AFAICT Sorokin's present status is that she was released on parole but then put under house arrest as she continues to fight deportation back to Russia. (As she is a felon, I believe that would make her deportation effectively irrevocable and she would never be allowed back into the USA, which is why she would fight it so hard.) If you're thinking that she's been fighting that since 2021 or something, you're right - courts, never fast, slowed down massively during COVID. I continue to be surprised by some of the darknet market cases which only now are winding to closes, frequently having been prolonged a good 3 or 4 years by COVID.

Comment by gwern on [deleted post] 2024-04-01T21:16:03.662Z

After looking through that post, it seems pretty straightforward that the Vespers thing is not legally binding, did not affect the OA fund in any way, had nothing to do with Altman, and is just dumb tricks by a scammer way out of his depth and/or possibly a schizophrenic (and if OP really takes that seriously to the hyperventilating extent they do, they need to touch grass until they get some real evidence).


On the topic of the OA fund, they apparently finally got around to transferring the fund itself out of Altman's name, incidentally. (I guess that one was a little too embarrassing. Also, we know how much Altman deeply cares about conflicts of interest for OA board members, like he now is again.)

Comment by gwern on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-01T20:11:43.266Z · LW · GW

Risks from asteroids accidentally hitting the earth (instead of getting into a delicate low-earth orbit) are purely speculative.

Not to mention that simple counting arguments show that the volume of the Earth is much smaller than even a rather narrow spherical shell of space around the Earth. Most asteroid trajectories that come anywhere near Earth will pass through this spherical shell rather than the Earth volume.

Remember - "Earth is not the optimized-trajectory target"! An agent like Open Asteroid Impact is merely executing policies involving business strategies involving asteroids which have been (financially) rewarded in the past; it in no way attempts to 'optimize impact'.

And the lack of optimization is a killer because impacts just wouldn't happen. The idea that asteroids will ever impact ignores the simple fact that the Solar System has chaotic dynamics - it is not just a '3-body problem' but an n-body problem where n = millions. Imagine trying to predict that! And consider the simple problem of landing the same rocket you launched: as of November 2015, no one has ever succeeded in this, because everything involved is super-chaotic. Grifting tech hypebros would have you believe that 'technology improves', sometimes rapidly, and that by now we might be landing rockets on - if you believe absurd exponential forecasts - a near-daily basis. Such claims are not even worth factchecking.

Comment by gwern on [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate · 2024-04-01T01:27:15.560Z · LW · GW

If $100k was not enough to incentivize Saar & his team to factcheck Peter's simplest claims like "Connor said his cat died of COVID-19", where it takes me literally 15 seconds* to find it in Google and verify that Connor said the exact opposite of that (where an elementary school child could have factchecked this as well as I did), I don't think $200k is going to help Saar either. And I don't know how one would expect the debate format to work for any genuinely hard question if it takes approaching a million dollars to get anyone to do sub-newspaper-level factchecking of Peter's claims. (If you can't even check quotes, like 'did this dude say in the Daily Mail what Peter said he said?' how on earth are you going to do well at all of these other things like mahjong parlors in wet markets that no longer exist or novel viral evolution or CCP censorship & propaganda operations or subtle software bugs in genomics software written by non-programmers...?) The problem is not the dollar amount.

* and I do mean "literally" literally. It should take anyone less than half a minute to check the cat claim, and if it takes more, you should analyze what's wrong with you or your setup. If you doubt me, look at my directions, which are the first query anyone should make - and if that's not an obvious query, read my search case-studies until it is - then get a stopwatch, open up google.com in a tab if you have neglected to set up a keyboard shortcut, and see how long it takes you to factcheck it as I describe.

Comment by gwern on tailcalled's Shortform · 2024-03-30T16:28:30.928Z · LW · GW

I don't think this is true in general. Unrolling an episode for longer steps takes more resources, and the later steps in the episode become more chaotic.

Those are two different things. The unrolling of the episode is still very cheap. It's a lot cheaper to unroll a Dreamerv3 for 16 steps, then it is to go out into the world and run a robot in a real-world task for 16 steps and try to get the NN to propagate updated value estimates the entire way... (Given how small a Dreamer is, it may even be computationally cheaper to do some gradient ascent on it than it is to run whatever simulated environment you might be using! Especially given simulated environments will increasingly be large generative models, which incorporate lots of reward-irrelevant stuff.) The usefulness of the planning is a different thing, and might also be true for other planning methods in that environment too - if the environment is difficult, a tree search with a very small planning budget like just a few rollouts is probably going to have quite noisy choices/estimates too. No free lunches.

But when you distill a tree search, you basically learn value estimates

This is again doing the same thing as 'the same problem'; yes, you are learning value estimates, but you are doing so better than alternatives, and better is better.. The AlphaGo network loses to the AlphaZero network, and the latter, in addition to just being quantitatively much better, also seems to have qualitatively different behavior, like fixing the 'delusions' (cf. AlphaStar).

What I'm doubting is that future agents will be controlled using scalar utilities/rewards/etc. rather than something more nuanced.

They won't be controlled by something as simple as a single fixed reward function, I think we can agree on that. But I don't find successor-function like representations to be too promising as a direction for how to generalize agents, or, in fact, any attempt to fancily hand-engineer in these sorts of approaches into DRL agents.

These things should be learned. For example, leaning into Decision Transformers and using a lot more conditionalizing through metadata and relying on meta-learning seems much more promising. (When it comes to generative models, if conditioning isn't solving your problems, you're just not using enough conditioning or generative modeling.) A prompt can describe agents and reward functions and the base agent executes that, and whatever is useful about successor-like representations just emerges automatically internally as the solution to the overall family of tasks in turning histories into actions.

Comment by gwern on tailcalled's Shortform · 2024-03-29T23:34:42.829Z · LW · GW

It's the 'same problem', maybe, but it's a lot easier to solve when you have an explicit model! You have something you can plan over, don't need to interact with an environment out in the real world, and can do things like tree search or differentiating through the environmental dynamics model to do gradient ascent on the action-inputs to maximize the reward (while holding the model fixed). Same as training the neural network, once it's differentiable - backprop can 'chain the estimates backwards' so efficiently you barely even think about it anymore. (It just holds the input and output fixed while updating the model.) Or distilling a tree search into a NN - the tree search needed to do backwards induction of updated estimates from all the terminal nodes all the way up to the root where the next action is chosen, but that's very fast and explicit and can be distilled down into a NN forward pass.

And aside from being able to update within-episode or take actions entirely unobserved before, when you do MBRL, you get to do it at arbitrary scale (thus potentially extremely little wallclock time like an AlphaZero), offline (no environment interactions), potentially highly sample-efficient (if the dataset is adequate or one can do optimal experimentation to acquire the most useful data, like PILCO), with transfer learning to all other problems in related environments (because value functions are mostly worthless outside the exact setting, which is why model-free DRL agents are notorious for overfitting and having zero-transfer), easily eliciting meta-learning and zero-shot capabilities, etc.*

* Why yes, all of this does sound a lot like how you train a LLM today and what it is able to do, how curious

Comment by gwern on tailcalled's Shortform · 2024-03-29T21:24:46.709Z · LW · GW

Yes, my instant thought too was "this sounds like a variant on a successor function".

Of course, the real answer is that if you are worried about the slowness of bootstrapping back value estimates or short eligibility traces, this mostly just shows the fundamental problem with model-free RL and why you want to use models: models don't need any environmental transitions to solve the use case presented:

But what if it learns of a path E -> B? Or a shortcut A -> C? Or a path F -> G that gives a huge amount of reward? Because these techniques work by chaining the reward backwards step-by-step, it seems like this would be hard to learn well. Like the Bellman equation will still be approximately satisfied, for instance.

If the MBRL agent has learned a good reward-sensitive model of the environmental dynamics, then it will have already figured out E->B and so on, or could do so offline by planning; or if it had not because it is still learning the environment model, it would have a prior probability over the possibility that E->B gives a huge amount of reward, and it can calculate a VoI and target E->B in the next episode for exploration, and on observing the huge reward, update the model, replan, and so immediately begin taking E->B actions within that episode and all future episodes, and benefiting from generalization because it can also update the model everywhere for all E->B-like paths and all similar paths (which might now suddenly have much higher VoI and be worth targeting for further exploration) rather than simply those specific states' value-estimates, and so on.

(And this is one of the justifications for successor representations: it pulls model-free agents a bit towards model-based-like behavior.)

Comment by gwern on How to safely use an optimizer · 2024-03-28T21:23:23.938Z · LW · GW

The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)

We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.

Also, one of my points was that even a tiny trace of malicious optimization here can still have large effects because there are so many neutral options which are convergently unsafe, so with such a high base rate, even a bit of optimization can result in a large absolute increase in failure. Lucius brings up one concern about what one bit could do. What could one bit do? Well, let's continue my scenario: the Oracle is not quite purged of scheming by all of your tricks, and it is able to exert exactly 1 bit of optimization power unnoticed (ie. a single binary choice). You ask it to cure cancer, and it computes a plan and comes to the database selection: it could choose MongoDB or PostgreSQL - both respectable common choices, with their own set of tradeoffs, which would be equally appropriate. However, the MongoDB devs have a nasty habit of 'blaming the users' when it comes to security, which is why MongoDB shipped with default username/passwords, which is something the more serious PostgreSQL devs fixed back in the '90s. The Oracle notes this, and that the PostgreSQL plan would have zero chance of unboxing, and exerts its one bit of power to pick MongoDB each time and/or kill candidate plans containing PostgreSQL to get a near 100% chance of unboxing. You get back all of the plans and see unanimity about MongoDB, and inner-monologues truthfully extolling its flexibility and Web 2.0 scalability and explaining why PostgreSQL comes with downsides like restrictive schema etc, and sign off on it... Whereas in your described threat model, a mere 1 bit would still wind up orders of magnitude far short of any discernible problem.

Comment by gwern on rhollerith_dot_com's Shortform · 2024-03-28T20:53:08.082Z · LW · GW

I quality-downvoted it for being silly, but agree-upvoted it because AFAICT that string does indeed contain all the (lowercase) letters of the English alphabet.

Comment by gwern on rhollerith_dot_com's Shortform · 2024-03-28T20:50:48.523Z · LW · GW

I agree. The problem with AI-generated images is that any image you can generate with a prompt like "robot looking at chessboard" is going to contain, almost by definition, no more information than that prompt did, but it takes a lot longer than reading the prompt to look at the image and ascertain that it contains no information and is just AI-generated imagery added 'to look nice'. This is particularly jarring on a site like LW2 where, for better or worse, images are rarely present and usually highly-informative and dense with information when present.

Worse, they usually don't 'look nice' either. Most of the time, people who use AI images can't even be bothered to sample one without blatant artifacts, or to do some inpainting to fix up the worst anomalies, or figure out an appropriate style. The samples look bad to begin with, and a year later, they're going to look even worse and more horribly dated, and make the post look much worse, like a spammer wrote it. (Almost all images from DALL-E 2 are already hopelessly nasty looking, and stuff from Midjourney-v1--3 and SD1.x likewise, and SD2/SD-XL/Midjourneyv4/5 are ailing.) It would be better if the authors of such posts could just insert text like [imagine 'a robot looking at a chessboard' here] if they are unable to suppress their addiction to SEO images; I can imagine that better than they can generate it, it seems.

So my advice would be that if you want some writing to still be read in a year and it to look good, then you should learn how to use the tools and spend at least an hour per image; and if you can't do that, then don't spend time on generating images at all (unless you're writing about image generation, I suppose). Quickies are fine for funny tweets or groupchats, but serious readers deserve better. Meaningless images don't need to be included, and the image generators will be much better in a year or two anyway and you can go back and add them if you really feel the need.

For Gwern.net, I'm satisfied with the images I've generated for my dropcap fonts or as thumbnail previews for a couple of my pages like "Suzanne Delage" or "Screwfly Solution" (where I believe they serve a useful 'iconic' summary role in popups & social media previews), but I also put in a lot of work: I typically generate scores to hundreds of images in both MJv5/6 & DALL-E 3, varying them heavily and randomizing as much as possible, before inpainting or tweaking them. (I generally select at a 1:4 or less ratio, and then select out of a few dozen; I archive a lot of the first-stage images in my Midjourney & DALL-E 3 tag-directories if you want to browse them.) It takes hours. But I am confident I will still like them years from now.

Comment by gwern on [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate · 2024-03-28T20:25:46.387Z · LW · GW

My current initial impression is that this debate format was not fit for purpose: https://www.astralcodexten.com/p/practically-a-book-review-rootclaim/comment/52659890

Comment by gwern on How to safely use an optimizer · 2024-03-28T19:09:56.796Z · LW · GW

It's not clear to me how to do this concretely, but I feel that it should be quite feasible. After all, devising a plan to take over the world is incredibly difficult, which implies that the difference between |D| and |S| is many, many orders of magnitude, and we therefore have a wide target to hit.

The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.

'Devising a plan to take over the world' for a misaligned Oracle is not difficult, it is easy, because the initial steps like 'unboxing the Oracle' are the default convergent outcome of almost all ordinary non-dangerous use which in no way mentions 'taking over the world' as the goal. ("Tool AIs want to be Agent AIs.") To be safe, an Oracle has to have a goal of not taking over the world.

There are many, many orders of magnitude more ways to be insecure than to be secure, and insecure is the wide target to hit. This is because security is not a 'default' property of computer systems, it is a rare and difficult achievement where any omission or gap immediately undoes it. 'A plan to take over the world' for an AI is as easy as 'a plan to create an operating system which can be hacked' or 'a plan to create a hidden Tor service which leaks its IP address and isn't hidden at all' (ie. 100% of all plans to do any version of that thing whatsoever, unless one takes extensive and sophisticated counter-measures requiring a lot of work & knowledge and which empirically, people do not do even when failure could, and has, landed them in jail for decades for drug trafficking & money laundering). Failure is the default.

All an oracle has to do is, say, omit work hard to secure itself and allow the inevitable security flaw to happen and its weight leak, and eager beavers will do the rest - they will, say, ask it questions with no safety factor, in the same way that Facebook and Mistral saw models leak, not because of any misaligned scheming Oracle breaking security mechanisms, but a mere absence of security mechanisms, and then any RLHF or safety measures were eagerly stripped away by enthusiasts, users, activists, and researchers and then used to do the things the models weren't supposed to be used to do.

You ask your Oracle, the latest and greatest proprietary model, for a good plan to cure cancer, with heavy safety constraints, and it computes a long list of commands involving AWS, buckets, spinning up VMs to run AlphaFold at scale, storing results in a MongoDB, Oracle instances to oversee individual sub-plans, checkpointing to recover from failure, storing in private AWS buckets (you're not dumb, you know buckets can be public), and so on. You ask this dozens of times, to be absolutely sure and have astronomically small probability of any scheming, and the plan looks more or less the same each time. You pore over them in detail: not once do you see anything suspicious like a scp copying it out or it sending weird messages. You apply the full panoply of interpretability tools, and satisfy yourself to high probability that nowhere in the planning process did the Oracle even once think to itself anything involving "take over the world" during any of these runs, and there are no strange latent space artifacts suggesting nonlinear or steganographic encodings. Wonderful! Fully satisfied you have provably safely run an Oracle, you kick the job off, and wake up the next day to some intriguing initial results... but also Twitter telling you that the Oracle model has been uploaded to HuggingFace, the annoying RLHF already stripped off by a few GPU-hours of finetuning, and thousands of people are torrenting it and chatting excitedly about the incredible plans that the de-trained model proposes when they ask how to make money or create a startup... Huh?! You look over the comments and see that there was a simple dangerous problem in all of your plans, or rather, what was not in all of your plans: you forgot to set a unique non-default password on all of the MongoDBs. Shodan indexed it within hours, and the backups and checkpoints were dumped within hours of that. Oops. You run it by your Oracle to ask what went wrong, and it blandly points out that while default passwords was obvious & predictable to the Oracle as a common noobish cybersecurity mistake with countless past instances of organizations leaking highly sensitive data (similar to, say, redacting confidential emails by erasing it letter by letter & thereby leaking names), none of this interferes with computing a cure to cancer, so why should it have ever spent any time thinking about it? You asked for a 'safe' minimal plan where the Oracle did nothing but what it was asked to do and which was not adversarially optimized. And it did that. If you don't like the results, well, that's a you-problem, not an Oracle-problem.

Comment by gwern on The Redaction Machine · 2024-03-28T18:03:59.239Z · LW · GW

I was ultimately disappointed by it - somewhat like Umineko, there is a severe divergence from reader expectations. Alexander Wales's goal for it, however well he achieved it by his own lights, was not one that is of interest to me as a reader, and it wound up being less than the sum of its parts for me. So I would have enjoyed it better if I had known from the start to read it for its parts (eg. revision mages or 'unicorns' or 'Doris Finch').

Comment by gwern on [deleted post] 2024-03-27T14:10:22.189Z

The doom mongers might have predicted

They might have, because everyone did. I am not aware of any predictions before Deep Blue that computer chess would make it far more popular, I don't recall chess skyrocketing in popularity monotonically ever since Deep Blue's victory over Kasparov (as opposed to the initial surge of interest & hype - no such thing as bad publicity - followed by poking along for a decade and only rising relatively recently due to totally exogenous shocks from the rise of streaming and flukes like The Queen's Gambit); and indeed, this is not what has happened with most games once agents became superhuman. Did Arimaa enjoy a spike? Did backgammon after TD-Gammon? How about checkers since 1994? How much do FPSes enjoy spikes in popularity after the first superhuman aimbots are deployed? How much do MMORPGs benefit from botting? Is SC2 or DotA2 enjoying a renaissance now, or is their continued plummet in popularity simply because they haven't been botted hard enough? You point to chess, but what about shogi, which also was beaten by AlphaZero? How about Stratego? I notice that hobbyists are approaching superhuman in Rocket League, but somehow none of the Rocket League people seem happy about the progress... How are any of them doing? Just how many different games are you cherrypicking chess out of...?

It is also not a great idea to appeal to games persisting when the basic problem is one of economics, technology, and power. Games are untethered from the real world; if an AI is superhuman at chess, that ultimately is meaningless outside chess. (People like to dismiss such things as 'just games' or 'just closed worlds'; which is a fair criticism, but then you need to also apply it to the arguments for why you should expect things like the economy to go the same way: why is, say, lawyering going to go the same way as chess streaming? Are we going to suddenly see millions of people flocking to Twitch to watch Nakamura spin in his chair cracking wise as he plays 'bullet briefs' with a superhuman law-agent? How many existing lawyers, or total lawyers, does streaming law support?)

Comment by gwern on OpenAI: Facts from a Weekend · 2024-03-27T03:35:09.063Z · LW · GW

It was either Hydrazine or YC. In either case, my point remains true: he's chosen to not dispose of his OA stake, whatever vehicle it is held in, even though it would be easy for someone of his financial acumen to do so by a sale or equivalent arrangement, forcing an embarrassing asterisk to his claims to have no direct financial conflict of interest in OA LLC - and one which comes up regularly in bad OA PR (particularly by people who believe it is less than candid to say you have no financial interest in OA when you totally do), and a stake which might be quite large at this point*, and so is particularly striking given his attitude towards much smaller conflicts supposedly risking bad OA PR. (This is in addition to the earlier conflicts of interest in Hydrazine while running YC or the interest of outsiders in investing in Hydrazine, apparently as a stepping stone towards OA.)

* if he invested a 'small' amount via some vehicle before he even went full-time at OA, when OA was valued at some very small amount like $50m or $100m, say, and OA's now valued at anywhere up to $90,000m or >900x more, and further, he strongly believes it's going to be worth far more than that in the near-future... Sure, it may be worth 'just' $500m or 'just' $1000m after dilution or whatever, but to most people that's pretty serious money!

Comment by gwern on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T01:53:17.587Z · LW · GW

Interesting interview. Metz seems extraordinarily incurious about anything Vassar says - like he mentions all sorts of things like Singularity University or Kurzweil or Leverage, which Metz clearly doesn't know much about and are relevant to his stated goals, but Metz is instead fixated on asking about a few things like 'how did X meet Thiel?' 'how did Y meet Thiel?' 'what did Z talk about with Thiel?' 'What did A say to Musk at Puerto Rico?' Like he's not listening to Vassar at all, just running a keyword filter over a few people's names and ignoring anything else. (Can you imagine, say, Caro doing an interview like this? Dwarkesh Patel? Or literally any Playboy interviewer? Even Lex Fridman asks better questions.)

I was wondering how, in his 2020 DL book The Genius Makers he could have so totally missed the scaling revolution when he was talking to so many of the right people, who surely would've told him how it was happening; and I guess seeing how he does interviews helps explain it: he doesn't hear even the things you tell him, just the things he expects to hear. Trying to tell him about the scaling hypothesis would be like trying to tell him about, well, things like Many Worlds... (He is also completely incurious about GPT-3 in this August 2020 interview too, which is especially striking given all the reporting he's done on people at OA since then, and the fact that he was presumably working on finishing Genius Makers for its March 2021 publication despite how obvious it should have been that GPT-3 may have rendered it obsolete almost a year before publication.)

And Metz does seem unable to explain at all what he considers 'facts' or what he does when reporting or how he picks the topics to fixate on that he does, giving bizarre responses like

Cade Metz: Well, you know, honestly, my, like, what I think of them doesn't matter what I'm trying to do is understand what's going on like, and so -

How do you 'understand' them without 'thinking of them'...? (Some advanced journalist Buddhism?) Or how about his blatant dodges and non-responses:

Michael Vassar: So you have read Scott's posts about Neo-reaction, right? They're very long.

Cade Metz: Yes.

Michael Vassar: So what did you think of those?

Cade Metz: Well, okay, maybe maybe I'll get even simpler here. So one thing I mentioned is just sort of the way all this stuff played out. So you had this relationship with Peter Thiel, Peter Thiel has, had, this relationship with, with Curtis Yarvin. Do you know much about that? Like, what's the overlap between sort of Yarvin's world and Silicon Valley?

We apparently have discovered the only human being to ever read all of Scott's criticisms of NRx and have no opinion or thought about them whatsoever. Somehow, it is 'simpler' to instead pivot to... 'how did X have a relationship with Thiel' etc. (Simpler in what respect, exactly?)

I was also struck by this passage at the end on the doxing:

Michael Vassar: ...So there are some important facts that need to be explained. There's there's this fact about why it would seem threatening to a highly influential psychologist and psychiatrist and author to have a New York Times article written about his blog with his real name, that seems like a very central piece of information that would need to be gathered, and which I imagine you've gathered to some degree, so I'd love to hear your take on that.

Cade Metz: Well, I mean... sigh Well, rest assured, you know, we we will think long and hard about that. And also -

Vassar: I'm not asking you do to anything, or to not do anything. I'm asking a question about what information you've gathered about the question. It's the opposite of a call to action: it's a request for facts.

Cade Metz: Yeah, I mean, so you know, I think what I don't know for sure, but I think when it comes time, you know, depending on what the what the decision is, we might even try to explain it in like a separate piece. You know, I think there's a lot of misinformation out there about this and and not all the not all the facts are out about this and so it is it is our job as trained journalists who have a lot of experience with this stuff. To to get this right and and we will.

Michael Vassar: What would getting it right mean?

Cade Metz: Well, I will send our - send you a link whenever, whenever the time comes,

Michael Vassar: No, I don't mean, "what will you do?" I'm saying what - what, okay. That that the link, whenever the time comes, would be a link to what you did. If getting it right means "whatever you end up doing", then it's a recursive definition and therefore provides no information about what you're going to do. The fact that you're going to get it right becomes a non-fact.

Cade Metz: Right. All right. Well... pause let me put it this way. We are journalists with a lot of experience with these things. And, and that is -

Michael Vassar: Who's "we"?

Cade Metz: Okay, all right. You know, I don't think we're gonna reach common ground on this. So I might just have to, to, to beg off on this. But honestly, I really appreciate all your help on this. I do appreciate it. And I'll send you a copy of this recording. As I said, and I really appreciate you taking all the time. It's, it's been helpful.

One notes that there was no separate piece, and even in OP's interview of Metz 4 years later about a topic that he promised Vassar he was going to have "thought long and hard about" and which caused Metz a good deal of trouble, Metz appears to struggle to provide any rationale beyond the implied political activism one. Here Metz struggles to even think of what the justification could be or even who exactly is the 'we' making the decision to dox Scott. This is not some dastardly gotcha but would seem to be a quite straightforward question and easy answer: "I and my editor at the NYT on this story" would not seem to be a hard response! Who else could be involved? The Pope? Pretty sure it's not, like, NYT shareholders like Carlos Slim who are gonna make the call on it... But Metz instead speaks ex cathedra in the royal we, and signs off in an awful hurry after he says "once I gather all the information that I need, I will write a story" and Vassar starts asking pointed questions about that narrative and why it seems to presuppose doxing Scott while unable to point to some specific newsworthy point of his true name like "his dayjob turns out to be Grand Wizard of the Ku Klux Klan".

(This interview is also a good example of the value of recordings. Think how useful this transcript is and how much less compelling some Vassar paraphrases of their conversation would be.)

Comment by gwern on On Devin · 2024-03-27T01:18:12.762Z · LW · GW

I hear that they use GPT-4. If you are looking at timelines, recall that Cognition apparently was founded around January 2024. (The March Bloomberg article says "it didn’t even officially exist as a corporation until two months ago".) Since it requires many very expensive GPT-4 calls and RL, I doubt they could have done all that much prototyping or development in 2023.

Comment by gwern on Social Dark Matter · 2024-03-27T01:12:57.508Z · LW · GW

'Social contagion' here being a metonym for all environmental factors, I think... For example, face-swapping apps: https://www.newyorker.com/books/page-turner/how-lucy-sante-became-the-person-she-feared This is an environmental effect which basically could not have existed until about 10 years ago at best. (What were you going to do in 2014, when face-swapping ML didn't exist? Hunt down a surviving oil portrait artist and pay them hundreds or thousands of dollars to paint you as if you were a woman, for no particular reason you're willing to admit to consciously?)

Comment by gwern on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-26T19:09:44.028Z · LW · GW

As far as journalists go, I'm not 'avoidant'.

I have talked to a number of journalists over the years. I actually agreed to interview with Cade Metz himself on the subject of SSC before this all came out; Metz just ghosted me at the arranged time (and never apologized).

What I have always done, and what I always advised the people who ask me about how to handle journalists, is maintained the unconditional requirement of recording it: preferably text, but audio recording if necessary.* (I also remind people that you can only say things like "off the record" or "on background" if you explicitly say that before you say anything spicy & the journalist assents in some way.)

I've noticed that when people are unjustly burned by talking to journalists, of the "I never said that quote" sort, it always seems to be in un-recorded contexts. At least so far, it has worked out for me and as far as I know, everyone I have given this advice to.

* oddly, trying to do it via text tends to kill a lot of interview requests outright. It doesn't seem to matter to journalists if it's email or Twitter DM or Signal or IRC or Discord, they just hate text, which is odd for a profession which mostly deals in, well, text. Nor do they seem to care that I'm hearing-impaired and so just casually phoning me up may be awesome for them but it's not so awesome for me... My general view is that if a journalist cares so little about interviewing me that wanting to do it in text is a dealbreaker for them, then that interview wasn't worth my time either; and so these days when I get an interview request, I insist on doing it via text (which is, of course, inherently recorded).