Posts
Comments
Update: I think it doesn't make much sense to interpret the letter literally. Instead, it can be seen as an attempt to show that a range of people think that slowing down progress would be good, and I think it does an okay job at that (though I still think the wording could be much better, and it should present arguments for why we should decelerate.)
Thanks! Haven't found good comments on that paper (and lack the technical insights to evaluate it myself)
Are you implying that China has access to compute required for a) GPT-4 type models or b) AGI?
The letter feels rushed and leaves me with a bunch of questions.
1. "recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control."
Where is the evidence of this "out-of-control race"? Where is the argument that future systems could be dangerous?
2. "Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? Such decisions must not be delegated to unelected tech leaders."
These are very different concerns that water down what the problem is the letter tries to address. Most of them are deployment questions more than development questions.
3. I like the idea of a six-month collaboration between actors. I also like the policy asks they include.
4. The main impact of this letter would obviously be getting the main actors to halt development (OpenAI, Anthropic, DeepMind, MetaAI, Google). Yet, those actors seem not to have been involved in this letter/ haven't publicly commented. (afaik) This seems like a failure.
5. Not making it possible to verify the names is a pretty big mistake.
6. In my perception, the letter mostly appears alarmist at the current time, especially since it doesn't include an argument for why future systems should be dangerous. It might just end up burning political capital.
1 Haven't seen an impressive AI product come out of China (Please point me to some if you disagree)
2 They can't import A100/ H100 anymore after the US chip restrictions
Because if we do it now and then nothing happens for five years, people will call it hysteria, and we won't be able to do this once we are close to x-risky systems.
Russia is not at all an AI superpower. China also seems to be quite far behind the west in terms of LLMs, so overall, six months would very likely not lead to any of them catching up.
Edit: I need to understand more context before expressing my opinion.
Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like "I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex's predictive accuracy."?
Nitpick: That doesn't seem like what you would expect. Arguably I have very little conscious access to the part of my brain predicting what I will see next, and the optimization of that part is probably independent of the optimization that happens in the more conscious parts of my brain.
I resonate a lot with this post and felt it was missing from the recent discussions! Thanks
I found this quite helpful, even if some points could use a more thorough explanation.
the public was not happy with the fact that the AI kept repeating "I am an AI developed by OpenAI", which pushed OpenAI to release the January 9 version that is again much more hackable than the December 15 patch version (benchmark coming soon).
Wow, that sounds bad. Do you have any source for this?
To improve the review, an important addition would be to account for the degree to which different methods influence one another.
E.g. Holden and Ajeya influence one another heavily through conversations. And as Metaculus and Samotsvety, they already incorporate the other models, most notably the bioanchors framework. Maybe you are already correcting for this in the weighted average?
Also, note that e.g., Ajeya uses her own judgment to set the weights for the different models within the bioanchors framework.
Overall, I think right now there is a severe echo chamber effect within most of the forecasts that lets me weigh full outside views, such as the semi-informative priors much higher
Cheers! Works for me
Great work, helped me to get clarity on which models I find useful and which ones I don't.
The tool on the page doesn't seem to work for me though, tried Chrome and Safari.
It seems plausible to me that both M & G might prefer a regulatory scheme that overall slows down progress while cementing their dominance, since that would be a pretty standard regulatory-capture-driven-by-the-dominant-actors-in-the-field kind of scenario.
Interesting. Where did something like this happen?
Compute is centralized and thus lets room for compute governance
[under pre 2030 timelines]
Unfortunately, good compute governance takes time. E.g., if we want to implement hardware-based safety mechanisms, we first have to develop them, convince governments to implement them, and then they have to be put on the latest chips, which take several years to dominate compute.
So large parts of compute gov will probably take longer to yield meaningful results.
(Also note that compute gov likely requires government levers, so this clashes a bit with you other statement)
I didn't read it, this clarifies a lot! I'd recommend making it more visible, e.g., putting it at the very top of the post as a disclaimer. Until then, I think the post implies unreasonable confidence, even if you didn't intend to.
Thank you for writing this up. I think I agree with the general direction of your takes, but you imply high certainty that I often don't share. This may lead people unfamiliar with the complexity of AI governance to update too strongly.
National government policy won’t have strong[5] effects (70%)
This can change rapidly, e.g., if systems suddenly get much more agentic and become more reliable decision-makers or if we see incidents with power-seeking AI systems. Unless you believe in takeoff speeds of weeks, governments will be important actors in the time just before AGI, and it will be essential to have people working in relevant positions to advise them.
I read your critique as roughly "Our prior on systems more powerful than us should be that they are not controllable or foreseeable. So trying to use one system as a tool to another system's safety, we can not even know all failure modes."
I think this is true if the systems are general enough that we can not predict their behavior. However, my impression of, e.g., debate or AI helpers for alignment research is that those would be narrow, e.g., only next token prediction. The Godzilla analogy implies something where we have no say in its design and can not reason about its decisions, which both seem off looking at what current language models can do.
A) You seem to agree that in principle more goal-directed agents would be more capable. I think this alone implies that those will be the dominant force in the future no matter if they are rare among many less goal-directed agents.
B) I'm deeply unsure about this and have conflicting intuitions. On the one hand, if you thing total utilitarianism is true any world where AI is not explicitly maximizing for total utility is much much worse than one where it is. On the other hand, I agree that humans are able to agree.
C) I think you are missing two key features of AI: a) it can hide for many years (e.g., on servers or distributed across many local computers) and move very slowly. Thus, even if it is not much smarter than we are today, as long as it has goals conflicting with ours, it would try to devise plans to acquire power, e.g., through manipulation, thoughtful financial management, or hacking. b) AI can just copy itself thousands of times, and it will be able to cooperate very easily since it can model the other instances of itself well. If I were copied 100,000 times, I'm reasonably confident that I could devise plans to take over the world collectively.