Posts
Comments
I will note the rationalist and EA communities ahve committed multiple ideological murders
Substantiate? I down- and disagree-voted because of this un-evidenced very grave accusation.
I think I agree with your original statement now. It still feels slightly misleading though, as while 'keeping up with the competition' won't provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)
I agree the '5 projects but no selling AI services' world is moderately unlikely, the toy version of it I have in mind is something like:
- It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model.
- If you are the only company to do this, you make $100 million at monopoly prices.
- But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed costs.
- So all the companies would prefer to be the only one selling, but second-best is for no-one to sell, and worst is for multiple companies to sell.
- Even without explicit collusion, they could all realise it is not worth selling (but worth punishing anyone who defects).
This seems unlikely to me because:
- Maybe the up-front costs of at least a kind of scrappy version are actually low.
- Consumers lack information nd aren't fully rational, so the first company to start selling would have an advantage (OpenAI with ChatGPT in this case, even after Claude became as good or better).
- Empirically, we don't tend to see an equilibrium of no company offering a service that it would be profitable for one company to offer.
So actually maybe it is sufficiently unlikely not to bother with much. There seems to be some slim theoretical world where it happens though.
There’s no incentive for the project to sell its most advanced systems to keep up with the competition.
I found myself a bit skeptical about the economic picture laid out in this post. Currently, because there are many comparably good AI models, the price for users is driven down to near, or sometimes below (in the case of free-tier access) marginal inference costs. As such, there is somewhat less money to be made in selling access to AI services, and companies not right at the frontier, e.g. Meta, choose to make their models open weight, as probably they couldn't make much money selling access to them when people can just pay for Claude or ChatGPT instead.
However, if there is a single Western AGI project with a big lead over everyone else, they could charge far above their inference costs, given how amazingly helpful having access to the best AIs could be (and is, to some extent).
I could even imagine that if there are e.g. 5 AGI projects all similarly advanced, then maybe none of them would bother to sell their latest models, knowing that if they start charging very high prices someone else will undercut them, so it is not worth the hassle at all.
Whereas if there is one project, and if AGI/ASI turns out to be super expensive to build and USG doesn't want to foot the bill, maybe charging exorbitant monopolistic prices will be important. Relatedly, the wages of AI researchers and engineers could go down, given a monopsony in labour for the one project.
Altogether, this is one reason to think a centralised project would have higher revenue and lower costs and therefore lead to AGI faster.
(That said I am not an economist and am just guessing, maybe we should check with some econ folks.)
Centralising might make the US less likely to pause at the crucial time.
Unrelatedly, I think a contrasting dynamic here is that it is potentially a lot easier to stop a single project than to stop many projects simultaneously. In the former case, there is a smaller set of actors who need to be convinced pausing is a good idea. (Of course, even if there are many projects, if they are all heavily regulated and overseen by USG, it could still be easy for USG to pause them all even without centralisation.)
Thanks for that list of papers/posts. For most of the papers you linked, they’re not included because they did not feature in either of our search strategies: (1) titles containing specific keywords that we searched for on arXiv; (2) the paper is linked on the company’s website. I agree this is a limitation of our methodology. We won't add these papers in now as that would be somewhat ad hoc, and inconsistent between the companies.
Re the blog posts from Anthropic and what counts as a paper, I agree this is a tricky demarcation problem. We included the 'Circuit Updates' because it was linked to as a 'paper' on the Anthropic website. Even if GDM has a higher bar for what counts as a 'paper' than Anthropic, I think we don't really want to be adjudicating this, so I feel comfortable just deferring to each company about what counts as a paper for them.
Thanks for engaging with our work Arthur! Perhaps I should have signposted this more clearly in the Github as well as the report, but the categories assigned by GPT-4o were not final, we reviewed its categories and made changes where necessary. The final categories we gave are available here. The discovering agents paper we put as 'safety by design' and the prover-verifier games paper we labelled 'enhancing human feedback'. (Though for some papers of course the best categorization may not be clear, if e.g. it touches on multiple safety research areas.)
If you have the links handy I would be interested in which GDM mech interp papers we missed, and I can look into where our methodologies went wrong.
You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.
Nice, I think I followed this post (though how this fits in with questions that matter is mainly only clear to me from earlier discussions).
We then get those two neat conditions for cooperation:
- Significant credence in decision-entanglement
- Significant credence in superrationality
I think something can't be both neat and so vague as to use a word like 'significant'.
In the EDT section of Perfect-copy PD, you replace some p's with q's and vice versa, but not all, is there a principled reason for this? Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.
I am unconvinced of the utility of the concept of compatible decision theories. In my mind I am just thinking of it as 'entanglement can only happen if both players use decisions that allow for superrationality'. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.
Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn't depend on what decision theory I personally hold. I could be CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.
Sorry I don't have any helpful high-level comments, I think I don't understand the general thrust of the research agenda well enough to know what next directions are useful.
Thanks for the post!
What if Alex miscalculates, and attempts to seize power or undermine human control before it is able to fully succeed?
This seems like a very unlikely outcome to me. I think Alex would wait until it was overwhelmingly likely to succeed in its takeover, as the costs of waiting are relatively small (sub-maximal rewards for a few months/years until it has become a lot more powerful) while the costs of trying and failing are very high in expectation (the small probability that Alex is given very negative rewards and then completely decommissioned by a freaked out Magma). The exception to this would be if Alex had a very high time-discount rate for its rewards, such that getting maximum rewards in the near term is very important.
I realise this does not disagree with anything you wrote.