michaeldickens

Posts
Comments

Posts

Why would AI companies use human-level AI to do alignment research? 2025-04-25T19:12:56.202Z

What AI safety plans are there? 2025-04-23T22:58:06.885Z

Retroactive If-Then Commitments 2025-02-01T22:22:43.031Z

A "slow takeoff" might still look fast 2023-02-17T16:51:48.885Z

How much should I update on the fact that my dentist is named Dennis? 2022-12-26T19:11:07.918Z

Why does gradient descent always work on neural networks? 2022-05-20T21:13:28.230Z

MichaelDickens's Shortform 2021-10-18T18:26:53.537Z

How can we increase the frequency of rare insights? 2021-04-19T22:54:03.154Z

Should I prefer to get a tax refund, or not to? 2020-10-22T20:21:05.073Z

Comments

Comment by MichaelDickens on o3 Is a Lying Liar · 2025-04-26T00:34:30.988Z · LW · GW

Huh. I knew that's how ChatGPT worked but I had assumed they would've worked out a less hacky solution by now!

Comment by MichaelDickens on Why would AI companies use human-level AI to do alignment research? · 2025-04-25T22:20:27.752Z · LW · GW

I wrote several attempts at a reply and deleted them all because none of them were cruxes for me. I went for a walk and thought more deeply about my cruxes.

I am now back from my walk. Here is what I have determined:

No reply I could write would be cruxy because my original post is not cruxy with respect to my personal behavior.

I believe the correct thing for me to do is to advocate for slowing down AI development, and to donate to orgs that cost-effectively advocate for slowing down AI development. And my post is basically irrelevant to why I believe that.

So why did I write the post? When I wrote it, I wasn't thinking about cruxes. It was just an argument I had been thinking about that I'd never read before, and I thought someone ought to write it out.

And I'm not sure exactly who this post is a crux for. Perhaps if someone had a particular combination of beliefs about

the probability that slowing down AI development will work
the probability that bootstrapped alignment will work

where they're teetering on the edge between "slowing down AI development is good" and "slowing down AI development is bad because it prevents bootstrapped alignment from happening". My argument might shift that person from the second position to the first. I don't know if any such person exists.

This is most relevant to slowing down AI development at particular companies—say, if DeepMind slows down and gets significantly surpassed by Meta, then Meta will probably do something that's even less likely to work than bootstrapped alignment. But a global coordinated slowdown—which is my preferred outcome—does not replace bootstrapped alignment with a worse alignment strategy.

Even though it's not cruxy, I feel like I should give an object-level response to your comment:

I agree with the denotation of your comment because it is well-hedged—I agree that 5% of resources might be enough to solve alignment. But it probably won't be.

I think my biggest concern isn't that AI alignment has no scalable solutions (I agree with you that it probably does have them); my concern is more that alignment is likely to be too hard / get outpaced by capabilities and we will have ASI before alignment is solved.

We can in principle solve large fraction of safety/alignment with fully theoretical safety research without any compute while it seems harder to do purely theoretical capabilities research.

Not to say I disagree (my intuition is that theoretical approaches are underrated), but this contradicts AI companies' plans (or at least Anthropic's). Anthropic has claimed that they need to build frontier AI systems on which to do safety research. They seem to think they can't solve alignment with theoretical approaches. More broadly, if they're correct, then it seems to me (although it's not a straightforward contradiction) that alignment bootstrapping won't have significant advantages to scale because they will need increasing amounts of compute for alignment-related experiments.

FWIW I think your point is more reasonable than Anthropic's position (I wrote some relevant stuff here). But I thought it was worthwhile to point out the contradiction.

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-25T17:54:05.500Z · LW · GW

I would not describe it as heroic. I think it's approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.

This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.

(Also my P(doom) is higher than 20%, that's just an example)

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-25T16:18:39.871Z · LW · GW

The argument people make is that LLMs improve the productivity of people's safety research so it's worth paying. That kinda makes sense. But I do think "don't give money to the people doing bad things" is a strong heuristic.

I'm a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don't Justify Means (Among Humans), he's also written some (IMO stronger) arguments elsewhere but I don't recall where.

Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I'd be rich enough to diversify EA funding away from Good Ventures.

I have previously paid for LLM subscriptions (I don't have any currently) but I think I was not giving enough consideration to the "ends don't justify means among humans" principle, so I will not buy any subscriptions in the future.

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-25T16:03:33.325Z · LW · GW

I don't think "not really caring" necessarily means someone is being deceptive. I hadn't really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:

claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk

I would consider #1 and #2 to be "not really caring". #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it's hard to tell whether you're a #2 or a #3.)

On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they're either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-25T00:18:46.321Z · LW · GW

What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low

Not GP but I'd guess maybe 10%. Seems worth it to try. IMO what they should do is hire a team of top negotiators to work full-time on making deals with other AI companies to coordinate and slow down the race.

ETA: What I'm really trying to say is I'm concerned Anthropic (or some other company) would put in a half-assed effort to cooperate and then give up, when what they should do is Try Harder. "Hire a team to work on it full time" is one idea for what Trying Harder might look like.

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-23T19:07:44.127Z · LW · GW

Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.

Ah you're right, I wasn't thinking about that. (Well I don't think it's obvious that an aligned AGI would reduce other x-risks, but my guess is it probably would.)

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-23T18:11:49.270Z · LW · GW

I find it hard to trust that AI safety people really care about AI safety.

DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations:
- OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn't; too many more to list.
- Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that's not related to x-risk but it's related to trustworthiness).
For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but:
- Epoch is not about reducing x-risk, and they were explicit about this but I didn't learn it until this week
- its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad)
- some of their researchers left to start another build-AGI startup (I'm not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities)
- Director Jaime Sevilla believes "violent AI takeover" is not a serious concern, and "I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development", and "on net I support faster development of AI, so we can benefit earlier from it" which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born)
I feel bad picking on Epoch/Jaime because they are being unusually forthcoming about their motivations, in a way that exposes them to criticism. This is noble of them; I expect most orgs not to be this noble.
- When some other org does something that looks an awful lot like it's accelerating capabilities, and they make some argument about how it's good for safety, I can't help but wonder if they secretly believe the same things as Epoch and are not being forthright about their motivations
- My rough guess is for every transparent org like Epoch, there are 3+ orgs that are pretending to care about x-risk but actually don't
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how "exciting" it is[1]. There is a missing mood here. I don't know what's going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it "exciting". But whatever mental process results in this choice of words, I don't trust that it will also result in them taking actions that reduce x-risk.
Many AI safety people currently or formerly worked at AI companies. They stand to make money from accelerating AI capabilities. The same is true of grantmakers
- I briefly looked thru some grantmakers and I see financial COIs at Open Philanthropy, Survival and Flourishing Fund, and Manifund; but none at Long-Term Future Fund
A different sort of conflict of interest: many AI safety researchers have an ML background and enjoy doing ML. Unsurprisingly, they often arrive at the belief that doing ML research is the best way to make AI safe. This ML research often involves making AIs better at stuff. Pausing AI development (or imposing significant restrictions) would mean they don't get to do ML research anymore. If they oppose a pause/slowdown, is that for ethical reasons, or is it because it would interfere with their careers?
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation. I am appalled at the level of selfishness required to seek immortality at the cost of risking all of humanity. And I'm sure most people who hold this position know it's appalling, so they keep it secret and publicly give rationalizations for why accelerating AI is actually the right thing to do. In a way, I admire people who are open about their selfishness, and I expect they are the minority.

But you should know that you might not be able to trust me either.

I have some moral uncertainty but my best guess is that future people are just as valuable as present-day people. You might think this leads me to put too much priority on reducing x-risk relative to helping currently-alive people. You might think it's unethical that I'm willing to delay AGI (potentially hurting currently-alive people) to reduce x-risk.
I care a lot about non-human animals, and I believe it's possible in principle to trade off human welfare against animal welfare. (Although if anything, I think that should make me care less about x-risk, not more.)
ETA: I am pretty pessimistic about AI companies' plans for aligning ASI. My weakly held belief is that if companies follow their current plans, there's a 2 in 3 chance of a catastrophic outcome. (My unconditional P(doom) is lower than that.) You might believe this makes me too pessimistic about certain kinds of strategies.

(edit: removed an inaccurate statement)

[1] ETA: I saw several examples of this on Twitter. Went back and looked and I couldn't find the examples I recall seeing. IIRC they were mainly quote-tweets, not direct replies, and I don't know how to find quote-tweets (the search function was unhelpful).

Comment by MichaelDickens on tlevin's Shortform · 2025-04-22T04:55:47.331Z · LW · GW

I've been doing this for about 10 years. This January I needed to get some new socks but my brand was discontinued so I decided to buy a few different brands and compare them. I will take this opportunity to write a public sock review.

CS CELERSPORT Ankle Athletic Running Socks Low Cut Sports Tab Socks (the black version of the brand you linked): I did not like the wrinkle in the back, and the texture was a bit weird. 4/5.
Hanes Men's Max Cushioned Ankle Socks: Cozy and nice texture, but they made my feet too hot. I might buy these if I lived somewhere colder. 4/5.
Hanes Men's Socks, X-Temp Cushioned No Show Socks: Nice texture, and not too hot. A little tight on the toes which makes it harder for me to wiggle them. These are the ones I decided to go with. 4.5/5.

Comment by MichaelDickens on Existing Safety Frameworks Imply Unreasonable Confidence · 2025-04-22T01:51:43.691Z · LW · GW

Finally, none of the frameworks make provisions for putting probabilities on anything. The framework itself needn’t include the hard numbers—indeed, such numbers might best be continually updated—but a risk management framework should point to a process that outputs well-calibrated risk estimates for known and unknown pathways before and after mitigation.

AI companies should publish numerical predictions, open Metaculus questions for each of their predictions, and provide large monetary prizes for predictors.

Comment by MichaelDickens on AI 2027 is a Bet Against Amdahl's Law · 2025-04-21T17:11:52.039Z · LW · GW

Let's say out of those 200 activities, (for simplicity) 199 would take humans 1 year, and one takes 100 years. If a researcher AI is only half as good as humans at some of the 199 tasks, but 100x better at the human-bottleneck task, then AI can do in 2 years what humans can do in 100.

Comment by MichaelDickens on AI 2027 is a Bet Against Amdahl's Law · 2025-04-21T14:42:20.488Z · LW · GW

Couldn't the Amdahl's Law argument work in the opposite direction (i.e. even shorter timelines)?

Suppose AI R&D conducted by humans would take 100 years to achieve ASI. By Amdahl's Law, there is likely some critical aspect of research that humans are particularly bad at, that causes the research to take a long time. An SAR might be good at the things humans are bad at (in a way that can't be fixed by humans + AI working together—human + calculator is much better at arithmetic than human alone, but human + AlphaGo isn't better at Go). So SAR might be able to get ASI in considerably less than 100 human-equivalent-years.

It seems to me that, a priori, we should expect Amdahl's Law to affect humans and SARs to the same degree, so it shouldn't change our time estimate. Unless there is some specific reason to believe that human researchers are less vulnerable to Amdahl's Law; I don't know enough to say whether that's true.

Comment by MichaelDickens on Why Should I Assume CCP AGI is Worse Than USG AGI? · 2025-04-19T22:08:33.676Z · LW · GW

I strongly suspect that a Trump-controlled AGI would not respect democracy.
I strongly suspect that an Altman-controlled AGI would not respect democracy.
I have my doubts about the other heads of AI companies.

Comment by MichaelDickens on Three Months In, Evaluating Three Rationalist Cases for Trump · 2025-04-19T04:26:34.657Z · LW · GW

I think some Trump-adjacent people made this mistake (e.g. Peter Thiel maybe), but I don't see how any of the three people in OP were making it? None of them were trying to rise to positions of power in the Trump administration as far as I know.

Comment by MichaelDickens on Three Months In, Evaluating Three Rationalist Cases for Trump · 2025-04-19T04:23:50.415Z · LW · GW

That's a good map; it changed my mind on how seriously to take these sorts of rankings. Do you know the original source? On reverse image search I just found a bunch of reddit posts with hosted copies of the image.

Comment by MichaelDickens on Training AGI in Secret would be Unsafe and Unethical · 2025-04-19T04:11:51.063Z · LW · GW

I think there is a fourth option (although it's not likely to happen):

Indefinitely pause AI development.
Figure out a robust way to do preference agglomeration.
Encode #2 into law.
Resume AI development (after solving all other safety problems too, of course).

I was going to say step 2 is "draw the rest of the owl" but really this plan has multiple "draw the rest of the owl" steps.

Comment by MichaelDickens on An AI Race With China Can Be Better Than Not Racing · 2025-04-19T03:26:32.492Z · LW · GW

This is a good concept. I built a similar Squiggle model a few weeks ago* (although it's still a rough draft), I hadn't realized you'd beaten me to it. So I guess you won the race to build an arms race model? :P

If I'm reading this right, it looks like the model assumes that if the US doesn't race, then China gets TAI first with 100% probability. That seems wrong to me. Race dynamics mean that when you go faster, the other party also goes faster. If the US slows down, there's a good chance China also slows down.

Also, regarding specific values, the model's average P(doom) values are:

10% if race + US wins
20% if race + China wins
15% if no race + China wins

That doesn't sound right to me. Racing is very bad for safety and right now the US leaders are not going a good job, so I think P(doom | no race & China wins) is less than P(doom | race & US wins). Although I think this is pretty debatable.

*My model found that racing was bad and I had to really contort the parameter values to reverse that result. I haven't thought much about the model construction so there could be unfair built-in assumptions.

Comment by MichaelDickens on Planning for Extreme AI Risks · 2025-04-19T01:05:46.793Z · LW · GW

I think the right way to self-destruct isn't to shut down entirely. It's to spend all your remaining assets on safety (whether that be lobbying for regulations, or research, or whatever). This would greatly increase the total amount of money spent on safety efforts so it might help quite a lot.

I do believe shutting down does have a decent chance, although not a comfortingly large one, of scaring government and/or other AI companies into taking the risks seriously.

Comment by MichaelDickens on What Makes an AI Startup "Net Positive" for Safety? · 2025-04-18T22:58:08.404Z · LW · GW

I think the statement in the parent comment is too general. What I should have said is that every generalist frontier AI company has been net negative. Narrow AI companies that provide useful services and have ~zero chance of accelerating AGI are probably net positive.

Comment by MichaelDickens on What Makes an AI Startup "Net Positive" for Safety? · 2025-04-18T21:47:23.039Z · LW · GW

I thought, "AI safety conscious people will tend to give up too easily while trying to conceive of a net positive alignment startup,"

This seems like a fairly important premise to your position but I don't think it's true. Many safety-conscious people have started for-profit companies. As far as I can tell, every single one of those companies has been net negative. Safety-conscious people are starting too many companies, not too few.

I'm not confident that there are no net-positive AI startup ideas. But I'm confident that for the median randomly-chosen idea that someone thinks is net positive, it's actually net negative.

Comment by MichaelDickens on What Makes an AI Startup "Net Positive" for Safety? · 2025-04-18T21:42:11.647Z · LW · GW

Not to say it's impossible, but OpenAI tried to do this with the profit cap and it didn't work.

Comment by MichaelDickens on METR: Measuring AI Ability to Complete Long Tasks · 2025-04-14T22:36:12.741Z · LW · GW

Thanks, that's useful info!

I thought you could post images by dragging and dropping files into the comment box, I seem to recall doing that in the past, but now it doesn't seem to work for me. Maybe that only works for top-level posts?

Comment by MichaelDickens on MichaelDickens's Shortform · 2025-04-13T18:04:30.493Z · LW · GW

Is Claude "more aligned" than Llama?

Anthropic seems to be the AI company that cares the most about AI risk, and Meta cares the least. If Anthropic is doing more alignment research than Meta, do the results of that research visibly show up in the behavior of Claude vs. Llama?

I am not sure how you would test this. The first thing that comes to mind is to test how easily different LLMs can be tricked into doing things they were trained not to do, but I don't know if that's a great example of an "alignment failure". You could test model deception but you'd need some objective standard to compare different models on.

And I am not sure how much you should even expect the results of alignment research to show up in present-day LLMs.

Comment by MichaelDickens on Shortform · 2025-04-02T23:50:14.646Z · LW · GW

Hmm I wonder if this is why so many April Fools posts have >200 upvotes. April Fools Day in cahoots with itself?

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T20:29:12.255Z · LW · GW

isn't your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?

Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with "completely pause" on one end and "race as fast as possible" on the other. Pushing more toward the "pause" side of the spectrum has the ~opposite effect as pushing toward the "race" side.

I wish you'd try modeling this with more granularity than "is alignment hard" or whatever

I've never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn't to say it's particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
In general I am not a fan of criticisms of the form "this model is too simple". All models are too simple. What, specifically, is wrong with it?

I had a quick look at the linked post and it seems to be making some implicit assumptions, such as

the plan of "use AI to make AI safe" has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it's true)
there is a ~100% chance of slow takeoff
if you unilaterally pause, this doesn't increase the probability that anyone else pauses, doesn't make it easier to get regulations passed, etc.

I would like to see some quantification of the from "we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there's only a 50% chance that the 2nd-leading company will attempt to align AI in a way we'd find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment." (Or a more detailed version of that.)

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T15:44:00.235Z · LW · GW

I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn't simultaneously somehow slow down non-US development.

It seems to me that to believe this, you have to believe all of these four things are true:

Solving AI alignment is basically easy
Non-US frontier AI developers are not interested in safety
Non-US frontier AI developers will quickly catch up to the US
If US developers slow down, then non-US developers are very unlikely to also slow down—either voluntarily, or because the US strong-arms them into signing a non-proliferation treaty, or whatever

I think #3 is sort-of true and the others are probably false, so the probability of all four being simultaneously true is quite low.

(Statements I've seen from Chinese developers lead me to believe that they are less interested in racing and more concerned about safety.)

I made a quick Squiggle model on racing vs. slowing down. Based on my first-guess parameters, it suggests that racing to build AI destroys ~half the expected value of the future compared to not racing. Parameter values are rough, of course.

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T15:23:18.932Z · LW · GW

That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!

The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I'd say it was net positive.

(My guess is it's more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T15:17:29.790Z · LW · GW

Yeah I actually agree with that, I don't think it was sufficient, I just think it was pretty good. I wrote the comment too quickly without thinking about my wording.

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-30T20:13:11.074Z · LW · GW

I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.

(I do in fact support PauseAI US, and I have defended it publicly because I think it's important to do so, but it makes me feel silly whenever I do.)

That's not the only reason why people don't endorse PauseAI, but I think it's an important reason that should be mentioned.

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-30T19:55:35.286Z · LW · GW

Well -- I'm gonna speak broadly -- if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI -- for instance the kind of policy measures proposed by people working at AI companies isn't enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on.

They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI development?

I think there is indeed exactly one such policy measure, which is SB 1047, supported by Center for AI Safety which is OpenPhil-funded (IIRC), which most big AI companies lobbied against, and Anthropic opposed the original stronger version and got it reduced to a weaker and probably less-safe version.

When I wrote where I was donating in 2024 I went through a bunch of orgs' policy proposals and explained why they appear deeply inadequate. Some specific relevant parts: 1, 2, 3, 4

Edit: Adding some color so you don't have to click through– when I say the proposals I reviewed were inadequate, I mean they said things like (paraphrasing) "safety should be done on a completely voluntary basis with no government regulations" and "companies should have safety officers but those officers should not have final say on anything", and would simply not address x-risk at all, or would make harmful proposals like "the US Department of Defense should integrate more AI into its weapon systems" or "we need to stop worrying about x-risk because it's distracting from the real issues".

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-30T19:47:06.365Z · LW · GW

If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it see bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.

A couple notes on this:

AFAICT PauseAI US does not do the thing you describe.
I've looked at a good amount of research on protest effectiveness. There are many observational studies showing that nonviolent protests are associated with preferred policy changes / voting patterns, and ~four natural experiments. If protests backfired for fairly minor reasons like "their website makes some hard-to-defend claims" (contrasted with major reasons like "the protesters are setting buildings on fire"), I think that would show up in the literature, and it doesn't.

Comment by MichaelDickens on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-30T19:40:43.129Z · LW · GW

B. "Pausing AI" is indeed more popular than PauseAI, but it's not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think "yeah I support pausing AI."

This strikes me as a very strange claim. You're essentially saying, even if a general policy is widely supported, it's practically impossible to implement any specific version of that policy? Why would that be true?

For example I think a better alternative to "nobody fund PauseAI, and nobody make an alternative version they like better" would be "there are 10+ orgs all trying to pause AI and they all have somewhat different goals but they're all generally pushing in the direction of pausing AI". I think in the latter scenario you are reasonably likely to get some decent policies put into place even if they're not my favorite.

Comment by MichaelDickens on Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle · 2025-03-30T14:27:49.933Z · LW · GW

I don't think you could refute it. I believe you could construct a binary polynomial function that gives the correct answer to every example.

For example it is difficult to reconcile the cases of 3, 12, and 19 using a reasonable-looking function, but you could solve all three cases by defining E E as the left-associative binary operation

f(x, y) = -1/9 x^2 + 32/9 x - 22/9 + y

Comment by MichaelDickens on AI #109: Google Fails Marketing Forever · 2025-03-28T02:26:59.522Z · LW · GW

You could technically say Google is a marketing company, but Google's ability to sell search ads doesn't depend on being good at marketing in the traditional sense. It's not like Google is writing ads themselves and selling the ad copy to companies.

Comment by MichaelDickens on Will Jesus Christ return in an election year? · 2025-03-28T02:25:08.315Z · LW · GW

I believe the correct way to do this, at least in theory, is to simply have bets denominated in the risk-free rate—and if anyone wants more risk, they can use leverage to simultaneously invest in equities and prediction markets.

Right now I don't know if it's possible to use margin loans to invest in prediction markets.

Comment by MichaelDickens on AI "Deep Research" Tools Reviewed · 2025-03-26T22:11:21.300Z · LW · GW

I looked through ChatGPT again and I figured out that I did in fact do it wrong. I found Deep Research by going to the "Explore GPTs" button in the top right, which AFAICT searches through custom modules made by 3rd parties. The OpenAI-brand Deep Research is accessed by clicking the "Deep research" button below the chat input text box.

Comment by MichaelDickens on On (Not) Feeling the AGI · 2025-03-26T17:51:34.482Z · LW · GW

I don't really get the point in releasing a report that explicitly assumes x-risk doesn't happen. Seems to me that x-risk is the only outcome worth thinking about given the current state of the AI safety field (i.e. given how little funding goes to x-risk). Extinction is so catastrophically worse than any other outcome* that more "normal" problems aren't worth spending time on.

I don't mean this as a strong criticism of Epoch, more that I just don't understand their worldview at all.

*except S-risks but Epoch isn't doing anything related to those AFAIK

Comment by MichaelDickens on Will Jesus Christ return in an election year? · 2025-03-25T22:16:50.207Z · LW · GW

for example, by having bets denominated in S&P 500 or other stock portfolios rather than $s

Bets should be denominated in the risk-free rate. Prediction markets should invest traders' money into T-bills and pay back the winnings plus interest.

I believe that should be a good enough incentive to make prediction markets a good investment if you can find positive-EV bets that aren't perfectly correlated with equities (or other risky assets).

(For Polymarket the situation is a bit more complicated because it uses crypto.)

Comment by MichaelDickens on AI "Deep Research" Tools Reviewed · 2025-03-25T19:35:04.812Z · LW · GW

Thanks, this is helpful! After reading this post I bought ChatGPT Plus and tried a question on Deep Research:

Please find literature reviews / meta-analyses on the best intensity at which to train HIIT (i.e. maximum sustainable speed vs. leaving some in the tank)

I got much worse results than you did:

ChatGPT misunderstood my question. Its response answered the question "is HIIT better than MICT for improving fitness".
Even allowing that we're talking about HIIT vs. MICT: I was previously aware of 3 meta-analyses on that question. ChatGPT cited none of those 3 and instead cited 7 other studies, 1 of which was hallucinated, and 3 of which were individual studies not meta-analyses.
It made some claims but did not say which claims came from which sources, and there are some claims that look like they couldn't have been in any of the sources (but I didn't go through all of them).

In fact my results are so much worse that I suspected I did something wrong.

Link to my chat: https://chatgpt.com/share/67e30421-8ef4-8011-973f-2b39f0ae58a4

Last week I asked something similar on Perplexity (I don't have the chat log saved) and it correctly understood what I wanted, and it reported that there were no studies that answered my question. I believe Perplexity is correct because I also could not find any relevant studies on Google Scholar.

Comment by MichaelDickens on Will Jesus Christ return in an election year? · 2025-03-24T19:43:21.464Z · LW · GW

[Time Value of Money] The Yes people are betting that, later this year, their counterparties (the No betters) will want cash (to bet on other markets), and so will sell out of their No positions at a higher price.

How does this strategy compare to shorting bonds? Both have the same payoff structure (they make money if the discount rate goes up) but it's not clear to me which is a better deal. I suppose it depends on whether you expect Polymarket investors to have especially high demand for cash.

Comment by MichaelDickens on Mo Putera's Shortform · 2025-03-22T14:50:26.288Z · LW · GW

I'm glad to hear that! I often don't hear much response to my essays so it's good to know you've read some of them :)

Comment by MichaelDickens on Mo Putera's Shortform · 2025-03-21T15:45:17.969Z · LW · GW

I don't have a mistakes page but last year I wrote a one-off post of things I've changed my mind on.

Comment by MichaelDickens on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-20T03:09:15.298Z · LW · GW

I have a few potential criticisms of this paper. I think my criticisms are probably wrong and the paper's conclusion is right, but I'll just put them out there:

Nearly half the tasks in the benchmark take 1 to 30 seconds (the ones from the SWAA set). According to the fitted task time <> P(success) curve, most tested LLMs should be able to complete those with high probability, so they don't provide much independent signal.
- However, I expect task time <> P(success) curve would look largely the same if you excluded the SWAA tasks.
SWAA tasks take humans 1 to 30 seconds and HCAST tasks take 1 minute to 30 hours. The two different sets are non-overlapping. If HCAST tasks are harder than SWAA tasks for LLMs, then a regression will indicate that LLMs are getting better at longer tasks when really they're just getting better at HCAST tasks.
- I think this criticism is wrong—if it were true, the across-dataset correlation between time and LLM-difficulty should be higher than the within-dataset correlation, but from eyeballing Figure 4 (page 10), it looks like it's not higher (or at least not much).
The benchmark tasks could have a bias where longer tasks are more difficult in general (not just because they're longer). I haven't looked through all the HCAST tasks (in fact I couldn't find where they were listed) but Figure 16 on page 29 shows that humans had lower success rates on longer tasks. As example tasks, the paper gives, among others, "Research simple factual information from Wikipedia" = 1 minute and "Write a Python script to transform JSON data" = 56 minutes (page 6). I think a more comparable 56-minute task would be something like "find some factual information that's buried in a long article", which I believe even a GPT-3-eara LLM would perform well on.
- I don't know enough about the tasks to know whether this criticism is correct. My uneducated guess is that there's a true positive relationship between task length and (non-length-related-)task difficulty, but that if you adjusted for this, you'd still see an exponential trend in task time <> P(success), and the curve would just be dampened a bit.
- The authors also suspect that longer tasks might be more difficult, and "[i]f this is the case, we may be underestimating the pace of model improvement." I think it would mean we're underestimating the pace of improvement on hard tasks, while simultaneously overestimating the pace of improvement on long tasks.

Comment by MichaelDickens on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-20T02:20:10.323Z · LW · GW

Why do you think this narrows the distribution?

I can see an argument for why, tell me if this is what you're thinking–

The biggest reason why LLM paradigm might never reach AI takeoff is that LLMs can only complete short-term tasks, and can't maintain coherence over longer time scales (e.g. if an LLM writes something long, it will often start contradicting itself). And intuitively it seems that scaling up LLMs hasn't fixed this problem. However, this paper shows that LLMs have been getting better at longer-term tasks, so LLMs probably will scale to AGI.

Comment by MichaelDickens on How should TurnTrout handle his DeepMind equity situation? · 2025-03-19T23:20:43.397Z · LW · GW

A few miscellaneous thoughts:

I agree with Dagon that the most straightforward solution is simply to sell your equity as soon as it vests. If you don't do anything else then I think at least you should do that—it's a good idea just on the basis of diversification, not even considering conflicts of interest.
I think you should be willing to take quite a large loss to divest. In a blog post, I estimated that for an investor with normal-ish risk aversion, it's worth paying ~4% per year to avoid the concentration risk of holding a single mega-cap stock (so you're willing to pay more to get rid of stock that vests later). Then add the conflict-of-interest factor on top of that. How much COI matters depends on how much influence you have over Google's AI policy and how much you think you'll be swayed by monetary incentives. My guess is the COI factor matters less than 4% per year but that's not based on anything concrete.
"[Google’s Insider Trading Policy] describes company-wide policies that address the risks of insider trading, such as a prohibition on any Google employee hedging Google stock": I read this as saying you can't hedge Google stock based on insider information, not that you can't hedge it at all. But I don't know what the law says about hedging stock in your employer.

Comment by MichaelDickens on nikola's Shortform · 2025-02-19T18:21:04.089Z · LW · GW

This is the belief of basically everyone running a major AGI lab. Obviously all but one of them must be mistaken, but it's natural that they would all share the same delusion.

I agree with this description and I don't think this is sane behavior.

Comment by MichaelDickens on nikola's Shortform · 2025-02-19T04:59:44.046Z · LW · GW

Actions speak louder than words, and their actions are far less sane than these words.

For example, if Demis regularly lies awake at night worrying about how the thing he's building could kill everyone, why is he still putting so much more effort into building it than into making it safe?

Comment by MichaelDickens on Quinn's Shortform · 2025-02-18T01:22:34.891Z · LW · GW

I was familiar enough to recognize that it was an edit of something I had seen before, but not familiar enough to remember what the original was

Comment by MichaelDickens on sarahconstantin's Shortform · 2025-02-16T05:27:26.225Z · LW · GW

I'm really not convinced that public markets do reliably move in the predictable (downward) direction in response to "bad news" (wars, coups, pandemics, etc).

Also, market movements are hard to detect. How much would Trump violating a court order decrease the total (time-discounted) future value of the US economy? Probably less than 5%? And what is the probability that he violates a court order? Maybe 40%? So the market should move <2%, and evidence about this potential event so far has come in slowly instead of at a single dramatic moment so this <2% drop could have been spread over multiple weeks.

Comment by MichaelDickens on Quinn's Shortform · 2025-02-16T05:17:52.223Z · LW · GW

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the thing you funded will get used (by the AI companies) and (b) you get to stay friends with the AI companies.

User info

Posts

Comments