Posts

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red 2025-04-21T03:52:34.759Z
Is Gemini now better than Claude at Pokémon? 2025-04-19T23:34:43.298Z
So how well is Claude playing Pokémon? 2025-03-07T05:54:45.357Z
Julian Bradshaw's Shortform 2025-02-11T17:47:54.657Z
Altman blog on post-AGI world 2025-02-09T21:52:30.631Z
Sam Altman's Business Negging 2024-09-30T21:06:59.184Z
Former OpenAI Superalignment Researcher: Superintelligence by 2030 2024-06-05T03:35:19.251Z
An AI risk argument that resonates with NYTimes readers 2023-03-12T23:09:20.458Z
WaPo: "Big Tech was moving cautiously on AI. Then came ChatGPT." 2023-01-27T22:54:50.121Z

Comments

Comment by Julian Bradshaw on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-22T19:59:11.645Z · LW · GW

Yeah it is confusing. You'd think there's tons of available data on pixelated game screens. Maybe training on it somehow degrades performance on other images?

Comment by Julian Bradshaw on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-22T19:33:19.146Z · LW · GW

I'll let you know. They're working on open-sourcing their scaffold at the moment.

Comment by Julian Bradshaw on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-21T16:31:30.575Z · LW · GW

Actually another group released VideoGameBench just a few days ago, which includes Pokémon Red among other games. Just a basic scaffold for Red, but that's fair.

As I wrote in my other post:

Why hasn't anyone run this as a rigorous benchmark? Probably because it takes multiple weeks to run a single attempt, and moreover a lot of progress comes down to effectively "model RNG" - ex. Gemini just recently failed Safari Zone, a difficult challenge, because its inventory happened to be full and it couldn't accept an item it needed. And ex. Claude has taken wildly different amounts of time to exit Mt. Moon across attempts depending on how he happens to wander. To really run the benchmark rigorously, you'd need a sample of at least 10 full playthroughs, which would take perhaps a full year, at which point there'd be new models.

I think VideoGameBench has the right approach, which is to give only a basic scaffold (less than described in this post), and when LLMs can make quick, cheap progress through Pokemon Red (not taking weeks and tens of thousands of steps) using that, we'll know real progress has been made.

Comment by Julian Bradshaw on Julian Bradshaw's Shortform · 2025-04-20T19:44:46.420Z · LW · GW

Re: biosignatures detected on K2-18b, there's been a couple popular takes saying this solves the Fermi Paradox: K2-18b is so big (8.6x Earth mass) that you can't get to orbit, and maybe most life-bearing planets are like that.

This is wrong on several bases:

  1. You can still get to orbit there, it's just much harder (only 1.3g b/c of larger radius!) (https://x.com/CheerupR/status/1913991596753797383)
  2. It's much easier for us to detect large planets than small ones (https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet), but we expect small ones to be common too (once detected you can then do atmospheric spectroscopy via JWST to find biosignatures)
  3. Assuming K2-18b does have life actually makes the Fermi paradox worse, because it strongly implies single-celled life is common in the galaxy, removing a potential Great Filter
Comment by Julian Bradshaw on Is Gemini now better than Claude at Pokémon? · 2025-04-20T19:00:42.171Z · LW · GW

I would say "agent harness" is a type of "scaffolding". I used it in this case because it's how Logan Kilpatrick described it in the tweet I linked at the beginning of the post.

Comment by Julian Bradshaw on Is Gemini now better than Claude at Pokémon? · 2025-04-20T16:16:30.184Z · LW · GW

I'm not sure that TAS counts as "AI" since they're usually compiled by humans, but the "PokeBotBad" you linked is interesting, hadn't heard of that before. It's an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I'd say this is more a programmed "bot" than an AI in the sense we care about.

Anyway, you're right that the whole reason the Pokémon benchmark exists is because it's interesting to see how well an untrained LLM can do playing it.

Comment by Julian Bradshaw on A Dissent on Honesty · 2025-04-17T21:58:24.557Z · LW · GW

since there's no obvious reason why they'd be biased in a particular direction

No I'm saying there are obvious reasons why we'd be biased towards truthtelling. I mentioned "spread truth about AI risk" earlier, but also more generally one of our main goals is to get our map to match the territory as a collaborative community project. Lying makes that harder.

Besides sabotaging the community's map, lying is dangerous to your own map too. As OP notes, to really lie effectively, you have to believe the lie. Well is it said, "If you once tell a lie, the truth is ever after your enemy."

But to answer your question, it's not wrong to do consequentialist analysis of lying. Again, I'm not Kantian, tell the guy here to randomly murder you whatever lie you want to survive. But I think there's a lot of long-term consequences in less thought-experimenty cases that'd be tough to measure.

Comment by Julian Bradshaw on A Dissent on Honesty · 2025-04-17T05:45:58.174Z · LW · GW

I'm not convinced SBF had conflicting goals, although it's hard to know. But more importantly, I don't agree rationalists "tend not to lie enough". I'm no Kantian, to be clear, but I believe rationalists ought to aspire to a higher standard of truthtelling than the average person, even if there are some downsides to that. 

Comment by Julian Bradshaw on A Dissent on Honesty · 2025-04-15T21:27:39.592Z · LW · GW

Have we forgotten Sam Bankman-Fried already? Let’s not renounce virtues in the name of expected value so lightly. 
 

Rationalism was founded partly to disseminate the truth about AI risk. It is hard to spread the truth when you are a known liar, especially when the truth is already difficult to believe. 

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-10T18:14:08.975Z · LW · GW

Huh, seems you are correct. They also apparently are heavily cannibalistic, which might be a good impetus for modeling the intentions of other members of your species…

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-10T05:48:47.236Z · LW · GW

Oh okay. I agree it's possible there's no Great Filter.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-10T04:04:00.336Z · LW · GW

Dangit I can't cease to exist, I have stuff to do this weekend.

But more seriously, I don't see the point you're making? I don't have a particular objection to your discussion of anthropic arguments, but also I don't understand how it relates to the "what part of evolution/planetary science/sociology/etc. is the Great Filter" scientific question.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-09T20:13:27.476Z · LW · GW

I think if you frame it as:

if most individuals exist inside the part of the light cone of an alien civilization, why aren't we one of them?

Then yes, 1.0 influence and 4.0 influence both count as "part of the light cone", and so for the related anthropic arguments you could choose to group them together.

But re: anthropic arguments,

Not only am I unable to explain why I'm an observer who doesn't see aliens

This is where I think I have a different perspective. Granting that anthropic arguments (here, about which observer you are and the odds of that) cause frustration and we don't want to get into them, I think there is an actual reason why we don't see aliens - maybe they aren't there, maybe they're hiding, maybe it's all a simulation, whatever - and there's no strong reason to assume we can't discover that reason. So, in that non-anthropic sense, in a more scientific inquiry sense, it is possible to explain why I'm an observer who doesn't see aliens. We just don't know how to do that yet. The Great Filter is one possible explanation behind the "they aren't there" answer, and this new information adjusts what we think the filters that would make up the Great Filter might be.

Another way to think about this: suppose we discover that actually science proves life should only arise on 1 in a googol planets. That introduces interesting anthropic considerations about how we ended up as observers on that 1 planet (can't observe if you don't exist, yadda yadda). But what I care about here is instead, what scientific discovery proved the odds should be so low? What exactly is the Great Filter that made us so rare?

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-09T19:50:34.877Z · LW · GW

I agree it's likely the Great Filter is behind us. And I think you're technically right, most filters are behind us, and many are far in the past, so the "average expected date of the Great Filter" shifts backward. But, quoting my other comment:

Every other possible filter would gain equally, unless you think this implies that maybe we should discount other evolutionary steps more as well. But either way, that’s still bad on net because we lose probability mass on steps behind us.

So even though the "expected date" shifts backward, the odds for "behind us or ahead of us" shifts toward "ahead of us". 

Let me put it this way: let's say we have 10 possible filters behind us, and 2 ahead of us. We've "lost" one filter behind us due to new information. So, 9 filters behind us gain a little probability mass, 1 filter behind us loses most probability mass, and 2 ahead of us gain a little probability mass. This does increase the odds that the filter is far behind us, since "animal with tool-use intelligence" is a relatively recent filter. But, because "animal with tool-use intelligence" was already behind us and a small amount of that "behind us" probability mass has now shifted to filters ahead of us, the ratio between all past filters and all future filters has adjusted slightly toward future filters.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-09T19:34:22.570Z · LW · GW

Interesting thought. I think you have a point about coevolution, but I don't think it explains away everything in the birds vs. mammals case. How much are birds really competing with mammals vs. other birds/other animals? Mammals compete with lots of animals, why did only birds get smarter? I tend to think intra-niche/genus competition would generate most of the pressure for higher intelligence, and for whatever reason that competition doesn't seem to lead to huge intelligence gains in most species.

(Re: octopus, cephalopods do have interactions with marine mammals. But also, their intelligence is seemingly different from mammals/birds - strong motor intelligence, but they're not really very social or cooperative. Hard to compare but I'd put them in a lower tier than the top birds/mammals for the parts of intelligence relevant to the Fermi Paradox.)

In terms of the K-T event, I think it could plausibly qualify as a filter, but asteroid impacts of that size are common enough it can't be the Great Filter on its own - it doesn't seem the specific details of the impact (location/timing) are rare enough for that.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-09T05:06:31.172Z · LW · GW

Two objections:

  1. Granting that the decision theory that would result from reasoning based on the Fermi Paradox alone is irrational, we'd still want an answer to the question[1] of why we don't see aliens. If we live in a universe with causes, there ought to be some reason, and I'd like to know the answer.
  2. "why aren't we born in a civilization which 'sees' an old alien civilization" is not indistinguishable from "why aren't we born in an old [alien] civilization ourselves?" Especially assuming FTL travel limitations hold, as we generally expect, it would be pretty reasonable to expect to see evidence of interstellar civilizations expanding as we looked at galaxies hundreds of millions or billions of lightyears away—some kind of obviously unnatural behavior, such as infrared radiation from Dyson swarms replacing normal starlight in some sector of a galaxy.[2] There should be many more civilizations we can see than civilizations we can contact. 
  1. ^

    I've seen it argued that the "Fermi Paradox" ought to be called simply the "Fermi Question" instead for reasons like this, and also that Fermi himself seems to have meant it as an honest question, not a paradox. However, it's better known as the Paradox, and Fermi Question unfortunately collides with Fermi estimation.

  2. ^

    It is technically possible that all interstellar civilizations don't do anything visible to us—the Dark Forest theory is one variant of this—but that would contradict the "old civilization would contact and absorb ours" part of your reasoning.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-09T04:22:17.749Z · LW · GW

Yes. Every other possible filter would gain equally, unless you think this implies that maybe we should discount other evolutionary steps more as well. But either way, that’s still bad on net because we lose probability mass on steps behind us.

Comment by Julian Bradshaw on birds and mammals independently evolved intelligence · 2025-04-08T20:59:37.922Z · LW · GW

Couple takeaways here. First, quoting the article:

By comparing the bird pallium to lizard and mouse palliums, they also found that the neocortex and DVR were built with similar circuitry — however, the neurons that composed those neural circuits were distinct.

“How we end up with similar circuitry was more flexible than I would have expected,” Zaremba said. “You can build the same circuits from different cell types.”

This is a pretty surprising level of convergence for two separate evolutionary pathways to intelligence. Apparently the neural circuits are so similar that when the original seminal paper on bird brains was written in 1969, it just assumed there had to be a common ancestor, and that thinking felt so logical it held for decades afterward.

Obviously, this implies strong convergent pressures for animal intelligence. It's not obvious to me that artificial intelligence should converge in the same way, not being subject to same pressures all animals face, but we should maybe expect biological aliens to have intelligence more like ours than we'd previously expected.

Speaking of aliens, that's my second takeaway: if decent-ish (birds like crows/ravens/parrots + mammals) intelligence has evolved twice on Earth, that drops the odds that the "evolve a tool-using animal with intelligence" filter is a strong Fermi Paradox filter. Thus, to explain the Fermi Paradox, we should posit increased odds that the Great Filter is in front of us. (However, my prior for the Great Filter being ahead of humanity is pretty low, we're too close to AI and the stars—keep in mind that even a paperclipper has not been Filtered, a Great Filter prevents any intelligence from escaping Earth.)

Comment by Julian Bradshaw on AI 2027: What Superintelligence Looks Like · 2025-04-03T20:40:51.516Z · LW · GW

Both the slowdown and race models predict that the future of Humanity is mostly in the hands of the United States - the baked-in disadvantage in chips from existing sanctions on China is crippling within short timelines, and no one else is contending.

So, if the CCP takes this model seriously, they should probably blockade Taiwan tomorrow? It's the only fast way to equalize chip access over the next few years. They'd have to weigh the risks against the chance that timelines are long enough for their homegrown chip production to catch up, but there seems to be a compelling argument for a blockade now, especially considering the US has unusually tense relations with its allies at the moment.

China doesn't need to perform a full invasion, just a blockade would be sufficient if you could somehow avoid escalation... though I'm not sure that you could, the US is already taking AI more seriously than China is. (It's noteworthy that Daniel Kokotajlo's 2021 prediction had US chip sanctions happening in 2024, when they really happened in 2022.)

Perhaps more AI Safety effort should be going into figuring out a practical method for international cooperation, I worry we'll face war before we get AIs that can negotiate us out of it as described in the scenarios here.

Comment by Julian Bradshaw on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T20:52:39.293Z · LW · GW

I'm generally pretty receptive to "adjust the Overton window" arguments, which is why I think it's good PauseAI exists, but I do think there's a cost in political capital to saying "I want a Pause, but I am willing to negotiate". It's easy for your opponents to cite your public Pause support and then say, "look, they want to destroy America's main technological advantage over its rivals" or "look, they want to bomb datacenters, they're unserious". (yes Pause as typically imagined requires international treaties, the attack lines would probably still work, there was tons of lying in the California SB 1047 fight and we lost in the end)

The political position AI safety has mostly taken instead on US regulation is "we just want some basic reporting and transparency" which is much harder to argue against, achievable, and still pretty valuable.

I can't say I know for sure this is the right approach to public policy. There's a reason politics is a dark art, there's a lot of triangulating between "real" and "public" stances, and it's not costless to compromise your dedication to the truth like that. But I think it's part of why there isn't as much support for PauseAI as you might expect. (the other main part being what 1a3orn says, that PauseAI is on the radical end of opinions in AI safety and it's natural there'd be a gap between moderates and them)

Comment by Julian Bradshaw on How I talk to those above me · 2025-03-30T20:23:09.339Z · LW · GW

So I realized Amad’s comment obsession was probably a defense against this dynamic - “I have to say something to my juniors when I see them”.

I think there's a bit of a trap here where, because Amad is known for always making a comment whenever he ends up next to an employee, if he then doesn't make a comment next to someone, it feels like a deliberate insult.

That said, I see the same behavior from US tech leadership pretty broadly, so I think the incentive to say something friendly in the elevator is pretty strong to start (norms of equality, first name basis, etc. in tech), and then once you start doing that you have to always do it to avoid insult.

Comment by Julian Bradshaw on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-30T20:07:05.869Z · LW · GW

I think the concept of Pausing AI just feels unrealistic at this point.

  1. Previous AI safety pause efforts (GPT-2 release delay, 2023 Open Letter calling for a 6 month pause) have come to be seen as false alarms and overreactions
  2. Both industry and government are now strongly committed to an AI arms race
  3. A lot of the non-AI-Safety opponents of AI want a permanent stop/ban in the fields they care about, not a pause, so it lacks for allies
  4. It's not clear that meaningful technical AI safety work on today's frontier AI models could have been done before they were invented; therefore a lot of technical AI safety researchers believe we still need to push capabilities further before a pause would truly be useful 

PauseAI could gain substantial support if there's a major AI-caused disaster, so it's good that some people are keeping the torch lit for that possibility, but supporting it now means burning political capital for little reason. We'd get enough credit for "being right all along" just by having pointed out the risks ahead of time, and we want to influence regulation/industry now, so we shouldn't make Pause demands that get you thrown out of the room. In an ideal world we'd spend more time understanding current models, though.

Comment by Julian Bradshaw on Tracing the Thoughts of a Large Language Model · 2025-03-27T20:11:14.128Z · LW · GW

Copying over a comment from Chris Olah of Anthropic on Hacker News I thought was good: (along with parent comment)
    
fpgaminer

> This is powerful evidence that even though models are trained to output one word at a time

I find this oversimplification of LLMs to be frequently poisonous to discussions surrounding them. No user facing LLM today is trained on next token prediction.

    
 olah3

Hi! I lead interpretability research at Anthropic. I also used to do a lot of basic ML pedagogy (https://colah.github.io/). I think this post and its children have some important questions about modern deep learning and how it relates to our present research, and wanted to take the opportunity to try and clarify a few things.

When people talk about models "just predicting the next word", this is a popularization of the fact that modern LLMs are "autoregressive" models. This actually has two components: an architectural component (the model generates words one at a time), and a loss component (it maximizes probability).

As the parent says, modern LLMs are finetuned with a different loss function after pretraining. This means that in some strict sense they're no longer autoregressive models – but they do still generate text one word at a time. I think this really is the heart of the "just predicting the next word" critique.

This brings us to a debate which goes back many, many years: what does it mean to predict the next word? Many researchers, including myself, have believed that if you want to predict the next word really well, you need to do a lot more. (And with this paper, we're able to see this mechanistically!)

Here's an example, which we didn't put in the paper: How does Claude answer "What do you call someone who studies the stars?" with "An astronomer"? In order to predict "An" instead of “A”, you need to know that you're going to say something that starts with a vowel next. So you're incentivized to figure out one word ahead, and indeed, Claude realizes it's going to say astronomer and works backwards. This is a kind of very, very small scale planning – but you can see how even just a pure autoregressive model is incentivized to do it.

Comment by Julian Bradshaw on The principle of genomic liberty · 2025-03-20T23:33:57.392Z · LW · GW

Good objection. I think gene editing would be different because it would feel more unfair and insurmountable. That's probably not rational - the effect size would have to be huge for it to be bigger than existing differences in access to education and healthcare, which are not fair or really surmountable in most cases - but something about other people getting to make their kids "superior" off the bat, inherently, is more galling to our sensibilities. Or at least mine, but I think most people feel the same way.

Comment by Julian Bradshaw on The principle of genomic liberty · 2025-03-20T19:20:04.772Z · LW · GW

Yeah referring to international sentiments. We'd want to avoid a "chip export controls" scenario, which would be tempting I think.

Comment by Julian Bradshaw on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-20T04:36:24.234Z · LW · GW

Re: HCAST tasks, most are being kept private since it's a benchmark. If you want to learn more here's the METR's paper on HCAST.

Comment by Julian Bradshaw on The principle of genomic liberty · 2025-03-20T01:07:01.063Z · LW · GW

Thanks for the detailed response!

Re: my meaning, you got it correct here:

Spiritually, genomic liberty is individualistic / localistic; it says that if some individual or group or even state (at a policy level, as a large group of individuals) wants to use germline engineering technology, it is good for them to do so, regardless of whether others are using it. Thus, it justifies unequal access, saying that a world with unequal access is still a good world.

Re: genomic liberty makes narrow claims, yes I agree, but my point is that if implemented it will lead to a world with unequal access for some substantial period of time, and that I expect this to be socially corrosive.

 

Switching to quoting your post and responding to those quotes:

To be honest, mainly I've thought about inequality within single economic and jurisdictional regimes. (I think that objection is more common than the international version.)

Yeah that's the common variant of the concern but I think it's less compelling - rich countries will likely be able to afford subsidizing gene editing for their citizens, and will be strongly incentivized to do so even if it's quite expensive. So my expectation is that the intra-country effects for rich countries won't be as bad as science fiction has generally predicted, but that the international effects will be.

(and my fear is this would play into general nationalizing trends worldwide that increase competition and make nation-states bitter towards each other, when we want international cooperation on AI)

I am however curious to hear examples of technologies that {snip}

My worry is mostly that the tech won't spread "soon enough" to avoid socially corrosive effects, less so that it will never spread. As for a tech that never fully spread but should have benefitted everyone, all that comes to mind is nuclear energy.

So maybe developing the tech here binds it up with "all people should have this".

I think this would happen, but it would be expressed mostly resentfully, not positively.

The ideology should get a separate treatment--genomic liberty but as a positive right--what I've been calling genomic emancipation.

Sounds interesting!

Comment by Julian Bradshaw on The principle of genomic liberty · 2025-03-19T21:39:00.888Z · LW · GW

This is a thoughtful post, and I appreciate it. I don't think I disagree with it from a liberty perspective, and agree there are potential huge benefits for humanity here.

However, my honest first reaction is "this reasoning will be used to justify a world in which citizens of rich countries have substantially superior children to citizens of poor countries (as viewed by both groups)". These days, I'm much more suspicious of policies likely to be socially corrosive: it leads to bad governance at a time where, because of AI risk, we need excellent governance.

I'm sure you've thought about this question, it's the classic objection. Do you have any idea how to avoid or at least mitigate the inequality adopting genomic liberty would cause? Or do you think it wouldn't happen at all? Or do you think that it's simply worth it and natural that any new technology is first adopted by those who can afford it, and that adoption drives down prices and will spread the technology widely soon enough?

Comment by Julian Bradshaw on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-19T19:24:13.713Z · LW · GW

Here's an interesting thread of tweets from one of the paper's authors, Elizabeth Barnes.
Quoting the key sections:

Extrapolating this suggests that within about 5 years we will have generalist AI systems that can autonomously complete ~any software or research engineering task that a human professional could do in a few days, as well as a non-trivial fraction of multi-year projects, with no human assistance or task-specific adaptations required.

However, (...) It’s unclear how to interpret “time needed for humans”, given that this varies wildly between different people, and is highly sensitive to expertise, existing context and experience with similar tasks. For short tasks especially, it makes a big difference whether “time to get set up and familiarized with the problem” is counted as part of the task or not.

(...)

We’ve tried to operationalize the reference human as: a new hire, contractor or consultant; who has no prior knowledge or experience of this particular task/codebase/research question; but has all the relevant background knowledge, and is familiar with any core frameworks / tools / techniques needed.

This hopefully is predictive of agent performance (given that models have likely memorized most of the relevant background information, but won’t have training data on most individual tasks or projects), whilst maintaining an interpretable meaning (it’s hopefully intuitive what a new hire or contractor can do in 10 mins vs 4hrs vs 1 week).

(...)

Some reasons we might be *underestimating* model capabilities include a subtlety around how we calculate human time. In calculating human baseline time, we only use successful baselines. However, a substantial fraction of baseline attempts result in failure. If we use human success rates to estimate the time horizon of our average baseliner, using the same methodology as for models, this comes out to around 1hr - suggesting that current models will soon surpass human performance. (However, we think that baseliner failure rates are artificially high due to our incentive scheme, so this human horizon number is probably significantly too low)

Other reasons include: For tasks that both can complete, models are almost always much cheaper, and much faster in wall-clock time, than humans. This also means that there's a lot of headroom to spend more compute at test time if we have ways to productively use it - e.g. BoK

That bit at the end about "time horizon of our average baseliner" is a little confusing to me, but I understand it to mean "if we used the 50% reliability metric on the humans we had do these tasks, our model would say humans can't reliably perform tasks that take longer than an hour". Which is a pretty interesting point.

Comment by Julian Bradshaw on Preparing for the Intelligence Explosion · 2025-03-17T02:11:18.124Z · LW · GW

More than just this. OP actually documents it pretty well, see here.

Comment by Julian Bradshaw on Preparing for the Intelligence Explosion · 2025-03-13T01:07:23.473Z · LW · GW

Random commentary on bits of the paper I found interesting:

Under Windows of opportunity that close early:

Veil of ignorance

Lastly, some important opportunities are only available while we don’t yet know for sure who has power after the intelligence explosion. In principle at least, the US and China could make a binding agreement that if they “win the race” to superintelligence, they will respect the national sovereignty of the other and share in the benefits. Both parties could agree to bind themselves to such a deal in advance, because a guarantee of controlling 20% of power and resources post-superintelligence is valued more than a 20% chance of controlling 100%. However, once superintelligence has been developed, there will no longer be incentive for the ‘winner’ to share power.

Similarly for power within a country. At the moment, virtually everyone in the US might agree that no tiny group or single person should be able to grab complete control of the government. Early on, society could act unanimously to prevent that from happening. But as it becomes clearer which people might gain massive power from AI, they will do more to maintain and grow that power, and it will be too late for those restrictions.

Strong agree here, this is something governments should move quickly on: "No duh" agreements that put up some legal or societal barriers to malfeasance later.

Next, under Space Governance:

Missions beyond the Solar System. International agreements could require that extrasolar missions should be permitted only with a high degree of international consensus. This issue isn’t a major focus of attention at the moment within space law but, perhaps for that reason, some stipulation to this effect in any new treaty might be regarded as unobjectionable.

Also a good idea. I don't want to spend hundreds of years having to worry about the robot colony five solar systems over...

Finally, under Value lock-in mechanisms:

Human preference-shaping technology. Technological advances could enable us to choose and shape our own or others’ preferences, plus those of future generations. For example, with advances in neuroscience, psychology, or even brain-computer interfaces, a religious adherent could self-modify to make it much harder to change their mind about their religious beliefs (and never self-modify to undo the change). They could modify their children’s beliefs, too.

Gotta ask, was this inspired by To the Stars at all? There's no citation, but that story is currently covering the implications of having the technology to choose/shape "preference-specifications" for yourself and for society.

Comment by Julian Bradshaw on Preparing for the Intelligence Explosion · 2025-03-13T00:52:12.325Z · LW · GW

Okay I got trapped in a Walgreens and read more of this, found something compelling. Emphasis mine:

The best systems today fall short at working out complex problems over longer time horizons, which require some mix of creativity, trial-and-error, and autonomy. But there are signs of rapid improvement: the maximum duration of ML-related tasks that frontier models can generally complete has been doubling roughly every seven months. Naively extrapolating this trend suggests that, within three to six years, AI models will become capable of automating many cognitive tasks which take human experts up to a month.

This is presented without much fanfare but feels like a crux to me. After all, the whole paper is predicated on the idea that AI will be able to effectively replace the work of human researchers. The paragraph has a footnote (44), which reads:

METR, ‘Quantifying the Exponential Growth in AI Ability to Complete Longer Tasks’ (forthcoming). See also Pimpale et al., ‘Forecasting Frontier Language Model Agent Capabilities’.

So the citation is an unreleased paper! That unreleased paper may make a splash, since (assuming this 7-month-doubling trend is not merely 1-2 years old) it strongly implies we really will find good solutions for turning LLMs agentic fairly soon.

(The second paper cited, only a couple weeks old itself, was mentioned presumably for its forecast of RE-Bench performance, key conclusion: "Our forecast suggests that agent performance on RE-Bench may reach a score of 1—equivalent to the expert baseline reported by Wijk et al. (2024)—around December 2026. We have much more uncertainty about this forecast, and our 95% CI reflects this. It has a span of over 8 years, from August 2025 to May 2033." But it's based on just a few data points from about a period of just 1 year, so not super convincing.)

Comment by Julian Bradshaw on Preparing for the Intelligence Explosion · 2025-03-12T02:59:29.825Z · LW · GW

Meta: I'm kind of weirded out by how apparently everyone is making their own high-effort custom-website-whitepapers? Is this something that's just easier with LLMs now? Did Situational Awareness create a trend? I can't read all this stuff man.

In general there seems to be way more high-effort work coming out since reasoning models got released. Maybe it's just crunchtime.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-11T01:15:37.992Z · LW · GW

I meant test-time compute as in the compute expended in the thinking Claude does playing the game. I'm not sure I'm convinced that reasoning models other than R1 took only a few million dollars, but it's plausible. Appreciate the prediction!

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-09T06:34:43.041Z · LW · GW

Amazingly, Claude managed to escape the blackout strategy somehow. Exited Mt. Moon at ~68 hours.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-08T22:36:35.140Z · LW · GW

It does have a lot of the info, but it doesn't always use it well. For example, it knows that Route 4 leads to Cerulean City, and so sometimes thinks there's a way around Mt. Moon that sticks solely to Route 4.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-08T22:34:42.808Z · LW · GW

No idea. Be really worried, I guess—I tend a bit towards doomer. There's something to be said for not leaving capabilities overhangs lying around, though. Maybe contact Anthropic?

The thing is, the confidence the top labs have in short-term AGI makes me think there's a reasonable chance they have the solution to this problem already. I made the mistake of thinking they didn't once before - I was pretty skeptical that "more test-time compute" would really unhobble LLMs in a meaningful fashion when Situational Awareness came out and didn't elaborate at all on how that would work. But it turned out that at least OpenAI, and probably Anthropic too, already had the answer at the time.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-08T22:27:59.710Z · LW · GW

I think this is a fair criticism, but I think it's also partly balanced out by the fact that Claude is committed to trying to beat the game. The average person who has merely played Red probably did not beat it, yes, but also they weren't committed to beating it. Also, Claude has pretty deep knowledge of Pokémon in its training data, making it a "hardcore gamer" both in terms of knowledge and willingness to keep playing. In that way, the reference class of gamers who put forth enough effort to beat the game is somewhat reasonable.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-07T23:57:51.304Z · LW · GW

It's definitely possible to get confused playing Pokémon Red, but as a human, you're much better at getting unstuck. You try new things, have more consistent strategies, and learn better from mistakes. If you tried as long and as consistently as long as Claude is, even as a 6-year-old, you'd do much better.

I played Pokémon Red as a kid too (still have the cartridge!), it wasn't easy, but I beat it in something like that 26 hour number IIRC. You have a point that howlongtobeat is biased towards gamers, but it's the most objective number I can find, and it feels reasonable to me.

Comment by Julian Bradshaw on So how well is Claude playing Pokémon? · 2025-03-07T20:20:31.227Z · LW · GW

Thanks for the correction! I've added the following footnote:

Actually it turns out this hasn't been done, sorry! A couple RNG attempts were completed, but they involved some human direction/cheating. The point still stands only in the sense that, if Claude took more random/exploratory actions rather than carefully-reasoned shortsighted actions, he'd do better.

Comment by Julian Bradshaw on On the Rationality of Deterring ASI · 2025-03-06T00:17:11.332Z · LW · GW

I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.

If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there's a general agreement that's trusted by all sides, then it's substantially more likely superintelligence isn't used to perform first strikes (and that it doesn't kill everyone), because who would agree without strong guarantees against that?

(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined - you can't as easily prove "hey this is is just a civilian nuclear reactor, we're not making weapons-grade stuff here". But an attempt is perhaps worthwhile.)

Comment by Julian Bradshaw on On the Rationality of Deterring ASI · 2025-03-05T20:49:50.313Z · LW · GW

This is creative.

TL;DR: To mitigate race dynamics, China and the US should deliberately leave themselves open to the sabotage ("MAIMing") of their frontier AI systems. This gives both countries an option other than "nuke the enemy"/"rush to build superintelligence first" if superintelligence appears imminent: MAIM the opponent's AI. The deliberately unmitigated risk of being MAIMed also encourages both sides to pursue carefully-planned and communicated AI development, with international observation and cooperation, reducing AINotKillEveryone-ism risks.

The problem with this plan is obvious: with MAD, you know for sure that if you nuke the other guy, you're gonna get nuked in return. You can't hit all the silos, all the nuclear submarines. With MAIM, you can't be so confident: maybe the enemy's cybersecurity has gotten too good, maybe efficiency has improved and they don't need all their datacenters, maybe their light AGI has compromised your missile command.

So the paper argues for at least getting as close as possible to assurance that you'll get MAIMed in return: banning underground datacenters, instituting chip control regimes to block rogue actors, enforcing confidentiality-preserving inspections of frontier AI development.

Definitely worth considering. Appreciate the writeup.

Comment by Julian Bradshaw on A History of the Future, 2025-2040 · 2025-02-17T20:31:21.534Z · LW · GW

After an inter-party power-struggle, the CCP commits to the perpetual existence of at least one billion Han Chinese people with biological reproductive freedom

You know, this isn't such a bad idea - that is, explicit government commitments against discarding their existing, economically-unproductive populace. Easier to ask for today, rather than later.

Hypothetically this is more valuable in autocracies than in democracies, where the 1 person = 1 vote rule keeps political power in the hands of the people, but I think I'd support adding a constitutional amendment in the United States that offered some further guarantee. 

Obviously those in power could perhaps ignore the guarantees later, but in this scenario we're dealing with basically aligned AIs, which may be enforcing laws and constitutions better than your average dictator/president would.

Comment by Julian Bradshaw on My model of what is going on with LLMs · 2025-02-14T00:19:23.507Z · LW · GW

It's unclear exactly what the product GPT-5 will be, but according to OpenAI's Chief Product Officer today it's not merely a router between GPT-4.5/o3.

swyx
appreciate the update!!

in gpt5, are gpt* and o* still separate models under the hood and you are making a model router? or are they going to be unified in some more substantive way?


Kevin Weil
Unified 👍

Comment by Julian Bradshaw on SWE Automation Is Coming: Consider Selling Your Crypto · 2025-02-13T21:22:37.419Z · LW · GW

Here's a fun related hypothetical. Let's say you're a mid-career software engineer making $250k TC right now. In a world with no AI progress you plausibly have $5m+ career earnings still coming. In a world with AGI, maybe <$1m. Would you take a deal where you sell all your future earnings for, say, $2.5m right now?

(me: no, but I might consider selling a portion of future earnings in such a deal as a hedge)

Is there any way to make this kind of trade? Arguably a mortgage is kind of like this, but you have to pay that back unless the government steps in when everyone loses their jobs...

Comment by Julian Bradshaw on My model of what is going on with LLMs · 2025-02-13T20:58:13.716Z · LW · GW

You're right that there's nuance here. The scaling laws involved mean exponential investment -> linear improvement in capability, so yeah it naturally slows down unless you go crazy on investment... and we are, in fact, going crazy on investment. GPT-3 is pre-ChatGPT, pre-current paradigm, and GPT-4 is nearly so. So ultimately I'm not sure it makes that much sense to compare the GPT1-4 timelines to now. I just wanted to note that we're not off-trend there.

Comment by Julian Bradshaw on My model of what is going on with LLMs · 2025-02-13T09:22:45.250Z · LW · GW

soon when we were racing through GPT-2, GPT-3, to GPT-4. We just aren't in that situation anymore

I don't think this is right.

GPT-1: 11 June 2018
GPT-2: 14 February 2019 (248 days later)
GPT-3: 28 May 2020 (469 days later)
GPT-4: 14 March 2023 (1,020 days later)

Basically, wait until next model doubled every time. By that pattern, GPT-5 ought to come around September 20, 2028, but Altman said today it'll be out within months. (and frankly, I think o1 qualifies as a sufficiently-improved successor model, and that released December 5, 2024, or really September 12, 2024, if you count o1-preview; either way, shorter than the GPT-3 to 4 gap)

Comment by Julian Bradshaw on Julian Bradshaw's Shortform · 2025-02-11T17:47:54.655Z · LW · GW

Still-possible good future: there's a fast takeoff to ASI in one lab, contemporary alignment techniques somehow work, that ASI prevents any later unaligned AI from ruining world, ASI provides life and a path for continued growth to humanity (and to shrimp, if you're an EA).


Copium perhaps, and certainly less likely in our race-to-AGI world, but possible. This is something like the “original”, naive plan for AI pre-rationalism, but it might be worth remembering as a possibility?

Comment by Julian Bradshaw on Altman blog on post-AGI world · 2025-02-10T03:26:47.623Z · LW · GW

The only sane version of this I can imagine is where there's either one aligned ASI, or a coalition of aligned ASIs, and everyone has equal access. Because the AI(s) are aligned they won't design bioweapons for misanthropes and such, and hopefully they also won't make all human effort meaningless by just doing everything for us and seizing the lightcone etc etc.

Comment by Julian Bradshaw on Dario Amodei: On DeepSeek and Export Controls · 2025-01-30T03:17:30.969Z · LW · GW

It's strange that he doesn't mention DeepSeek-R1-Zero anywhere in that blogpost, which is arguably the most important development DeepSeek announced (self-play RL on reasoning models). R1-Zero is what stuck out to me in DeepSeek's papers, and ex. the Arc Prize team behind the Arc-Agi benchmark says

R1-Zero is significantly more important than R1.

Was R1-Zero already obvious to the big labs, or is Amodei deliberately underemphasizing that part?