tom-davidson

Posts
Comments

Posts

Comments

Comment by Tom Davidson on Will compute bottlenecks prevent a software intelligence explosion? · 2025-04-05T08:37:17.119Z · LW · GW

I meant at any point, but was imagining the period around full automation yeah. Why do you ask?

Comment by Tom Davidson on ryan_greenblatt's Shortform · 2025-01-22T21:25:40.263Z · LW · GW

I'll post about my views on different numbers of OOMs soon

Comment by Tom Davidson on ryan_greenblatt's Shortform · 2025-01-22T21:24:00.520Z · LW · GW

Sorry, for my comments on this post I've been referring to "software only singularity?" only as "will the parameter r >1 when we f first fully automate AI RnD", not as a threshold for some number of OOMs. That's what Ryan's analysis seemed to be referring to.

I separately think that even if initially r>1 the software explosion might not go on for that long

Comment by Tom Davidson on ryan_greenblatt's Shortform · 2025-01-22T18:53:15.174Z · LW · GW

Obviously the numbers in the LLM case are much less certain given that I'm guessing based on qualitative improvement and looking at some open source models,

Sorry,I don't follow why they're less certain?

based on some first principles reasoning and my understanding of how returns diminished in the semi-conductor case

I'd be interested to hear more about this. The semi conductor case is hard as we don't know how far we are from limits, but if we use Landauer's limit then I'd guess you're right. There's also uncertainty about how much alg progress we will and have met

Comment by Tom Davidson on Human takeover might be worse than AI takeover · 2025-01-13T08:22:01.353Z · LW · GW

Why are they more recoverable? Seems like a human who seized power would seek asi advice on how to cement their power

Comment by Tom Davidson on How will we update about scheming? · 2025-01-12T15:36:18.740Z · LW · GW

Thanks for this!

Compared to you, I more expect evidence of scheming if it exists.

You argue weak schemers might just play nice. But if so, we can use them to do loads of intellectual labour to make fancy behavioral red teaming and interp to catch out the next gen of AI.

More generally, the plan of bootstrapping to increasingly complex behavioral tests and control schemes seems likely to work. It seems like if one model has spent a lot of thinking time designing a scheme then another model would have to be much smarter to zero shot cause a catastrophe without the scheme detecting it. Eg. analogies with humans suggest this.

Comment by Tom Davidson on Human takeover might be worse than AI takeover · 2025-01-12T15:13:00.521Z · LW · GW

I agree the easy vs hard worlds influence the chance of AI taking over.

But are you also claiming it influences the badness of takeover conditional on it happening? (That's the subject of my post)

Comment by Tom Davidson on Human takeover might be worse than AI takeover · 2025-01-11T10:49:34.821Z · LW · GW

So you predict that if Claude was in a situation where it knew that it had complete power over you and could make you say that you liked it then it would stop being nice? I think would continue to be nice in any situation of that rough kind which suggests it's actually nice not just narcissistically pretending

Comment by Tom Davidson on Human takeover might be worse than AI takeover · 2025-01-11T10:46:32.185Z · LW · GW

But a human could instruct an aligned ASI to help it take over and do a lot of damage

Comment by Tom Davidson on Human takeover might be worse than AI takeover · 2025-01-11T10:41:49.578Z · LW · GW

That structural difference you point to seems massive. The reputational downsides of bad behavior will be multiplied 100-fold+ for AI as it reflects on millions of instances and the company's reputation.

And it will be much easier to record and monitor ai thinking and actions to catch bad behaviour.

Why unlikely we can detect selfishness? Why can't we bootstrap from human-level?

Comment by Tom Davidson on By default, capital will matter more than ever after AGI · 2024-12-29T18:50:43.457Z · LW · GW

One dynamic initially preventing stasis in influence post-AGI is that different ppl have different discount rates, so those with lower discounts will slowly gain influence over time

Comment by Tom Davidson on When Is Insurance Worth It? · 2024-12-27T22:42:19.689Z · LW · GW

Yep I'm saying you're wrong about this. If money compounds but you don't have utility=log($) then you shouldn't Kelly bet

Comment by Tom Davidson on When Is Insurance Worth It? · 2024-12-24T16:04:12.885Z · LW · GW

Your formula is only valid if utility = log($).

With that assumption the equation compares your utility with and without insurance. Simple!

If you had some other utility function, like utility = $, then you should make insurance decisions differently.

I think the Kelly betting stuff is a big distraction, and that ppl with utility=$ shouldn't bet like that. I think the result that Kelly betting maximizes long term $ bakes in assumptions about utility functions and is easily misunderstood - someone with utility=$ probably goes bankrupt but might become insanely rich AI is happy not to Kelly bet. (I haven't explained this point properly, but recall reading about this and it's just wrong on it's face that someone with utility=$ should follow your formula)

Comment by Tom Davidson on What is it to solve the alignment problem? (Notes) · 2024-09-16T14:40:13.207Z · LW · GW

I enjoyed reading this, thanks.

I think your definition of solving alignment here might be too broad?

If we have superintelligent agentic AI that tries to help its user but we end up missing out of the benefits of AI bc of catastrophic coordination failures, or bc of misuse, then I think you're saying we didn't solve alignment bc we didn't elicit the benefits?

You discuss this, but I prefer to separate out control and alignment. Where I wouldn't count us as having solved alignment if we only elicit behavior via intense/exploitative control schemes. So I'd adjust your alignment definition with the extra requirement that we avoided takeover while not doing super-intense control schemes relative to what is acceptable to do to humans today. Which is a higher bar, and separates it from the thing we care about --avoiding takeover and eliciting benefits-- but I think that's a better def

Comment by Tom Davidson on Conflicts between emotional schemas often involve internal coercion · 2024-08-08T21:20:11.595Z · LW · GW

I enjoyed it, and think that ideas are important, but found it hard to follow at points

Some suggestions:

explain more why self criticism allows one part to assert control
give more examples throughout, especially the second half. I think some paragraphs don't have examples and are harder to understand
flesh out examples to make them longer and more detailed

Comment by Tom Davidson on An illustrative model of backfire risks from pausing AI research · 2023-12-02T19:19:30.630Z · LW · GW

I think your model will underestimate the benefits of ramping up spending quickly today.

You model the size of the $ overhang as constant. But in fact it's doubling every couple of years as global spending on producing on AI chips grows. (The overhang relates to the fraction of chips used in the largest training run, not the fraction of GWP spent on the largest training run.) That means that ramping up spending quickly (on training runs or software or hardware research) gives that $ overhang less time to grow

Comment by Tom Davidson on But why would the AI kill us? · 2023-05-13T05:14:28.282Z · LW · GW

Why are you at 50% ai kills >99% ppl given the points you make in the other direction?

Comment by Tom Davidson on Richard Ngo's Shortform · 2023-01-07T00:12:27.729Z · LW · GW

So far causally upstream of the human evaluator's opinion? Eg an AI counselor optimizing for getting to know you

Comment by Tom Davidson on Richard Ngo's Shortform · 2023-01-05T17:33:18.437Z · LW · GW

I think the "soup of heuristics" stories (where the AI is optimizing something far causally upstream of reward instead of something that is downstream or close enough to be robustly correlated) don't lead to takeover in the same way

Why does it not lead to takeover in the same way?

Comment by Tom Davidson on On the Diplomacy AI · 2022-11-29T15:05:01.878Z · LW · GW

AI understands that the game ends after 1908 and modifies accordingly.

Does it? In the game you link it seems like the bot doesn't act accordingly in the last move phase. Turkey misses a chance to grab Rumania, Germany misses a chance to grab London, and I think France misses something as well.

Comment by Tom Davidson on Towards a Formalisation of Returns on Cognitive Reinvestment (Part 1) · 2022-06-13T03:51:01.118Z · LW · GW

Glad you added these empirical research directions! If I were you I'd prioritize these over the theoretical framework.

Comment by Tom Davidson on What can the principal-agent literature tell us about AI risk? · 2020-02-13T01:39:43.011Z · LW · GW

So either one must claim that AI-related unawareness is of a very different type or scale from ordinary human cases in our world today, or one must implicitly claim that unawareness modeling would in fact be a contribution to the agency literature.

I agree that the Bostrom/Yudkowsky scenario implies AI-related unawareness is of a very different scale from ordinary human cases. From an outside view perspective, this is a strike against the scenario. However, this deviation from past trends does follow fairly naturally (though not necessarily) from the hypothesis of a sudden and massive intelligence gap

Comment by Tom Davidson on What can the principal-agent literature tell us about AI risk? · 2020-02-10T21:56:07.649Z · LW · GW

Re the difference between Monopoly rents and agency rents: monopoly rents would be eliminated by competition between firms whereas agency rents would be eliminated by competition between workers. So they're different in that sense.

User info

Posts

Comments