Posts
Comments
No worries!
You did say it would be premised on either "inevitable or desirable for normal institutions to be eventually lose control". In some sense I do think this is "inevitable" but only in the same sense as past "normal human institutions" lost control.
We now have the internet and widespread democracy so almost all governmental institutions needed to change how they operate. Future technological change will force similar changes. But I don't put any value in the literal existence of our existing institutions, what I care about is whether our institutions are going to make good governance decisions. I am saying that the development of systems much smarter than current humans will change those institutions, very likely within the next few decades, making most concerns about present institutional challenges obsolete.
Of course something that one might call "institutional challenges" will remain, but I do think there really will be a lot of buck-passing that will happen from the perspective of present day humans. We do really have a crunch time of a few decades on our hands, after which we will no longer have much influence over the outcome.
I don't think I understand. It's not about human institutions losing control "to a small regime". It's just about most coordination problems being things you can solve by being smarter. You can do that in high-integrity ways, probably much higher integrity and with less harmful effects than how we've historically overcome coordination problems. I de-facto don't expect things to go this way, but my opinions here are not at all premised on it being desirable for humanity to lose control?
This IMO doesn't really make any sense. If we get powerful AI, and we can either control it, or ideally align it, then the gameboard for both global coordination and building institutions completely changes (and of course if we fail to control or align it, the gameboard is also flipped, but in a way that removes us completely from the picture).
Does anyone really think that by the time you have systems vastly more competent than humans, that we will still face the same coordination problems and institutional difficulties as we have right now?
It does really look like there will be a highly pivotal period of at most a few decades. There is a small chance humanity decides to very drastically slow down AI development for centuries, but that seems pretty unlikely, and also not clearly beneficial. That means it's not a neverending institutional challenge, it's a challenge that lasts a few decades at most, during which humanity will be handing off control to some kind of cognitive successor which is very unlikely to face the same kinds of institutional challenges as we are facing today.
That handoff is not purely a technical problem, but a lot of it will be. At the end of the day, whether your successor AI systems/AI-augmented-civilization/uplifted-humanity/intelligence-enhanced-population will be aligned with our preferences over the future has a lot of highly technical components.
Yes, there will be a lot of social problems, but the size and complexity of the problems are finite, at least from our perspective. It does appear that humanity is at the cusp of unlocking vast intelligence, and after you do that, you really don't care very much about the weird institutional challenges that humanity is currently facing, most of which can clearly be overcome by being smarter and more competent.
I mean, you saw people make fun of it when Eliezer said it, and then my guess is people conservatively assumed that this would generalize to the future. I've had conversations with people where they tried to convince me that Eliezer mentioning kinetic escalation was one of the worst things that anyone has ever done for AI policy, and they kept pointing to twitter threads and conversations where opponents made fun of it as evidence. I think there clearly was something real here, but I also think people really fail to understand the communication dynamics here.
My sense is a lot of the x-risk oriented AI policy community is very focused on avoiding "gaffes" and have a very short-term and opportunistic relationship with reputation and public relations and all that kind of stuff. My sense is that people in the space don't believe being principled or consistently honest basically ever gets rewarded or recognized, so the right strategy is to try to identify what the overton window is, only push very conservatively on expanding it, and focus on staying in the good graces of whatever process determines social standing, which is generally assumed to be pretty random and arbitrary.
I think many people in the space, if pushed, would of course acknowledge that kinetic responses are appropriate in many AI scenarios, but they would judge it as an unnecessarily risky gaffe, and that perception of a gaffe creates a pretty effective enforcement regime for people to basically never bring it up, lest you be judged as politically unresponsible.
Promoted to curated: I have various pretty substantial critiques of this work, but I do overall think this is a pretty great effort at crossing the inferential distance from people who think AGI will be a huge deal and potentially dangerous, to the US government and national security apparatus.
The thing that I feel most unhappy about is that the document feels to me like it follows a pattern that Situational Awareness also had, where it seemed to me like it kept framing various things that it wanted to happen, as "inevitable to happen", while also arguing that they are a good idea, in a way that felt to me like it tried too hard to make some kind of self-fulfilling prophecy.
But overall, I feel like this document speaks with surprising candor and clarity about many things that have been left unsaid in many circumstances. I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters. Relevant quotes:
Should these measures falter, some leaders may contemplate kinetic attacks on datacenters, arguing that allowing one actor to risk dominating or destroying the world are graver dangers, though kinetic attacks are likely unnecessary. Finally, under dire circumstances, states may resort to broader hostilities by climbing up existing escalation ladders or threatening non-AI assets. We refer to attacks against rival AI projects as "maiming attacks."
I also particularly appreciated this proposed policy for how to handle AIs capable of recursive self-improvement:
In the near term, geopolitical events may prevent attempts at an intelligence recursion. Looking further ahead, if humanity chooses to attempt an intelligence recursion, it should happen in a controlled environment with extensive preparation and oversight—not under extreme competitive pressure that induces a high risk tolerance.
Research engineers I talk to already report >3x speedups from AI assistants
Huh, I would be extremely surprised by this number. I program most days, in domains where AI assistance is particularly useful (frontend programming with relatively high churn), and I am definitely not anywhere near 3x total speedup. Maybe a 1.5x, maybe a 2x on good weeks, but definitely not a 3x. A >3x in any domain would be surprising, and my guess is generalization for research engineer code (as opposed to churn-heavy frontend development) is less.
I tried and it looks bad for some reason, I think because the current order of the symbol reflects the position on the numberline and if you invert them it looks worse. I don't feel confident but think I prefer the current situation.
Interesting. I am concerned about this effect, but I do really like a lot of quick takes. I wonder whether maybe this suggests a problem with how we present posts.
Here are some initial thoughts:
I do think there are a bunch of good donation opportunities these days, especially in domains where Open Philanthropy withdrew funding recently. Some more thoughts and details here.
At the highest level, I think what the world can use most right now is a mixture of:
- Clear explanations for the core arguments around AI x-risk, both so that people can poke holes in them, and because they will enable many more people who are in positions to do something about AI to do good things
- People willing to publicly, with their real identity, argue that governments and society more broadly should do pretty drastic things to handle the rise of AGI
I think good writing and media production is probably at the core of a lot of this. I particularly think that writing and arguments directed at smart educated people who do not necessarily have any kind of AI or ML background is more valuable than things that are more directed at AI and ML people, mostly because there has been a lot of the latter, the incentives on engaging in discourse with them are less bad, and because I think collectively there is often a temptation to create priesthoods around various kinds of knowledge and then to insist on deferring to those priesthoods, which I think usually causes worse collective decision-making, and writing in a more accessible way helps push against that.
I think both of these things can benefit a decent amount from funding. I do think the current funding distribution landscape is pretty hard to navigate. I am on the Long Term Future Fund which in some sense is trying to address this, but IMO we aren't really doing an amazing job at identifying and vetting opportunities here, so I am not sure whether I would recommend donations to us, but also, nobody else is doing a great job, so I am not sure.
My tentative guess is that the best choice is to spend a few hours trying to identify one or two organizations that seem particularly impactful and at least somewhat funding constrained, then make a public comment or post asking about critical thoughts from other people on those organizations, and then iterate that a few times until you find something good. This is a decent amount of work, but I don't think there currently exist good and robust deference chains in this space that would cause you to have a reliably positive impact on things by just trusting them.
I tentatively think that writing a single essay or reasonably popular tweet under your real-identity where you express concern about AI x-risk, as a pretty successful business person, is also quite valuable. I don't think it has to be anything huge, but I do think it's good if it's more than just a paragraph or a retweet. Something that people could refer to if they try to list non-crazy people who think these kinds of concerns are real, and that can meaningfully be weighed as part of the public discussion on these kinds of topics.
I do also think visiting one of the hubs where people who work on this stuff a lot tend to work is pretty valuable. You could attend LessOnline or EA Global or something in that space, and talk to people about these topics. I do think there is a risk of ending up unduly influenced by social factors and various herd mentality dynamics, but there are a lot of smart people around who spend all day thinking about what things are most helpful, and there is lots of useful knowledge to extract.
I whipped up a very quick example in GPT-4.5, which unfortunately 'moderation' somehow forbids me from sharing, but my initial prompt went like this:
(If this is referring to LW moderation that's inaccurate. In general I am in favor of people sharing LLM snippets to discuss their content, as well as for the purpose of background sources in collapsible sections.)
My guess is you know this, but the sidenote implementation appears to be broken. When clicking on the footnote labeled "1" it opens up a footnote labeled "2", and also, the footnotes overlap on the right in very broken looking ways:
Yeah, our policy is to reject anything that looks like it was written or heavily edited with LLMs from new users, and I tend to downvote LLM-written content from approved users, but it is getting harder and harder to detect the difference on a quick skim, so content moderation has been getting harder.
Ah, oops, now I get it. Yes, I what I wrote sure didn't make any sense. In my first paragraph I meant to write something like "if no home schoolers are allowed to be as bad as bad or average public schools, the costs of homeschooling increase a lot, constituting effectively a tax on homeschooling" and then in my second paragraph I meant to strengthen it into "the very worst public school". I did sure write the same clarifiers in each paragraph, being very confusing.
Huh, it grammatically reads fine to me. I am assuming the first paragraph reads fine, so I'll clarify just the second.
In my first paragraph I said that making sure that most reasonable interpretations of "a right to an education at least as good as voluntary public school education" would put undue cost on homeschooling. In my second paragraph I then suggested one reading that does not plausibly incur that cost, which is a right to an education at least better than the worst voluntary public school education. However, it appears to me that students already have a right to an education at least better than the worst voluntary public school education, as I am sure the worst public school education violates many straightforward human rights and would be prosecutable under current law (just nobody is bothering to do that), suggesting that adding an additional right with such a low threshold wouldn't really make any difference.
Hope that helps!
Something else I'm unsure about, but not necessarily a hill I want to die on given that government resources aren't unlimited, is the question of whether kids should have a right to "something at least similarly good as voluntary public school education."
This seems like it would punish variance a lot, and de-facto therefore be a huge tax on homeschooling. Some public schools are extremely bad, if no home schoolers are allowed to be as bad as the worst public schools, the costs of homeschooling increase a lot, constituting effectively a tax on homeschooling.
Maybe you mean "a right to an education at least as good as the worst public school education", but my guess is the worst public school education is so bad that these would already be covered by almost any reasonable approach to human rights (like, my guess is it already involves continuous ongoing threats of violence, being lied to, frequent physical violence, etc.).
IMO this would be a great top-level post (as would many other of the posts on your Substack I just discovered!)
I strong-upvoted and strong-disagree voted, since I also agree the current voting distribution didn't make much sense.
I do think you are doing something in your comment that feels pretty off. For example you link to aphyer's comment as a "fully general counterargument that clearly prove[s] way too much", but I don't buy it, I think it's a pretty reasonable argument. The prior should be towards liberty, and if the higher-liberty option is also safer, then I don't see any reason to mess with it for now.
Like, it seems fine to improve things, but I do think state involvement in education has been really very terrifying and I sense a continuous missing mood throughout your comments of not understanding how costly marginal regulation can be.
To be clear, I think your comment is fine and doesn't deserve downvoting, and disagree-voting feels like the appropriate dimension.
Yeah, my guess is we should change that UI a bit. IMO it makes sense to make comments a bit less prevalent on wiki and tag pages (because many comments will be more outdated), but the current text is too much about just proposing changes.
Thank you! I'll see whether I can do some of my own thinking on this, as I care a lot about the issue, but do feel like I would have to really dig into it. I appreciate your high-level gloss on the size of the overestimate.
I greatly appreciate this kind of critique, thank you!
My guess is this is too big of an ask, and I am already grateful for your post, but do you have a prediction about how much of the variance would turn out to be causal in the relevant way?
My current best guess is we are going to be seeing some of these technologies used in the animal breeding space relatively soon (within a few years), and so early predictions seem helpful for validating models, and also might also just help people understand how much you currently think the post overestimates the impact of edits.
Assuming the second refers to "Stuttgart 21"?
Yep!
but I don't think these examples seem well-described as not having precedents / lots of societal and cultural preconditions
I totally think there are lots of cultural preconditions and precedents, I just think they mostly don't look like "small protests for many years that gradually or even suddenly grew into larger ones". My best guess is if you see a protest movement not have substantial growth for many months, it's unlikely to start growing, and it's not that valuable to have started it earlier (and somewhat likely to have a bit of an inoculation effect, though I also don't think that effect is that big).
I don't understand, I don't think there was any ambiguity in what you said. Even not taking things literally, you implied that having big protests without having small protests is at least highly unusual. That also doesn't match my model. I think it's pretty normal. The thing that I think happens before big protests is big media coverage and social media discussion, not many months and years of small protests. I am not sure of this, but that's my current model.
The specific ones I was involved in? Pretty sure they didn't. They were SOPA related and related to what people thought was a corrupt construction of a train station in my hometown. I don't think there was much organizing for either of these before they took off. I knew some of the core organizers, they did not create many small protests before this.
Aw man, sad to hear that, and I am glad you seem to be doing better.
What leapt out to me about your model was that is was very focused how an observer of the protests would react with a rationalist worldview. You didn’t seem to have given much thought to the breadth of social movements and how a diverse public would have experienced them. Like, most people aren’t gonna think PauseAI is anti-tech in general and therefore similar to the unabomber. Rationalists think that way, and few others.
I am confused, did you somehow accidentally forget a negation here? You can argue that Thane is confused, but clearly Thane was arguing from what the public believes, and of course Thane himself doesn't think that PauseAI is similar to the Unabomber based on vague associations, and certainly almost nobody else on this site believes that (some might believe that non-rationalists believe that, but isn't that exactly the kind of thinking you are asking for?).
When I was involved with various forms of internet freedom activism, as well as various protests around government misspending in Germany, I do not remember a run-up of many months of small protests before the big ones. It seemed that people basically directly organized some quite big ones, and then they grew a bit bigger over the course of a month, and then became smaller again. I do not remember anything like the small PauseAI protests on those issues.
(This isn't to say it isn't a good thing in the case of AGI, I am just disputing that "small protests are the only way to get big protests")
I've engaged with Gary 3-4 times in good faith. He responded in very frustrating and IMO bad faith ways every time. I've also seen this 10+ times in other threads.
Promoted to curated: I think concrete specific scenarios for how things might go with AI are IMO among the most helpful tools to help people start forming their own models about how this whole AI thing might go. Being specific is good, grounding things in concrete observable consequences is good. Somewhat sticking your neck out and making public predictions is good.
This is among the best entries I've seen in this genre, and I hope there will be more. Thank you for writing it!
Sorry about that! I am adding a donate link back to the frontpage sometime this week. Here is the link for now: https://www.lesswrong.com/donate
It's true
Seems good!
FWIW, at least in my mind this is in some sense approximately the only and central core of the alignment problem, and so having it left unaddressed feels confusing. It feels a bit like making a post about how to make a nuclear reactor where you happen to not say anything about how to prevent the uranium from going critical, but you did spend a lot of words about the how to make the cooling towers and the color of the bikeshed next door and how to translate the hot steam into energy.
Like, it's fine, and I think it's not crazy to think there are other hard parts, but it felt quite confusing to me.
Someone I trust on this says:
AFAICT what's going on here is just that AISI and CHIPS are getting hit especially hard by the decision to fire probationary staff across USG, since they're new and therefore have lots of probationary staff - it's not an indication (yet) that either office is being targeted to be killed
The central problem of any wiki system is [1]"what edits do you accept to a wiki page?". The lenses system is trying to provide a better answer to that question.
My default experience on e.g. Wikipedia when I am on pages where I am highly familiar with the domain is "man, I could write a much better page". But writing a whole better page is a lot of effort, and the default consequence of rewriting the page is that the editor who wrote the previous page advocates for your edits to be reverted, because they are attached to their version of the page.
With lenses, if you want to suggest large changes to a wiki page, your default action is now "write a new lens". This leaves the work of the previous authors intact, while still giving your now page the potential for substantial readership. Lenses are sorted in order of how many people like them. If you think you can write a better lens, you can make a new lens, and if it's better, it can replace the original lens after it got traction.
More broadly, wikis suffer a lot from everything feeling like it is written by a committee. Lenses enable more individual authorship, while still trying to have some collective iteration on canonicity and structure of the wiki.
- ^
Well, after you have solved the problem of "does anyone care about this wiki?"
To the extent the tool just gets gamed, you can iterate until you find detection tools that are more robust (or find ways of training against detection tools that don't game them so hard).
How do you iterate? You mostly won't know whether you just trained away your signal, or actually made progress. The inability to iterate is kind of the whole central difficulty of this problem.
(To be clear, I do think there are some methods of iteration, but it's a very tricky kind of iteration where you need constant paranoia about whether you are fooling yourself, and that makes it very different from other kinds of scientific iteration)
Yeah, I've been very glad to have that up. It does lack a quite large fraction of Arbital features (such as UI for picking between multiple lenses, probabilistic claims, and tons of other small UI things which were a lot of work to import), but it's still been a really good resource for linking to.
Ajeya gave 15% to AGI before 2036, with little of that in the first few years after her report; maybe she'd have said 10% between 2025 and 2036.
Just because I was curious, here is the most relevant chart from the report:
This is not a direct probability estimate (since it's about probability of affordability), but it's probably within a factor of 2. Looks like the estimate by 2030 was 7.72% and the estimate by 2036 is 17.36%.
This thought might be detectable. Now the problem of scaling safety becomes a problem of detecting [...] this kind of conditional, deceptive reasoning.
What do you do when you detect this reasoning? This feels like the part where all plans I ever encounter fail.
Yes, you will probably see early instrumentally convergent thinking. We have already observed a bunch of that. Do you train against it? I think that's unlikely to get rid of it. I think at this point the natural answer is "yes, your systems are scheming against you, so you gotta stop, because when you train against it, you are probably primarily making it a better schemer".
I would be very surprised if you have a 3-month Eliezer that is not doing scheming the first time, and training your signals away is much easier than actually training away the scheming.
What does that mean? It doesn't affect any recent content, and it's one of the most prominent options if you are looking through all historical posts.
I reviewed it. It didn't trigger my "LLM generated content" vibes, though I also don't think it's an amazing essay.
EOD AOE on February 15th, and honestly, I am not going to throw out your application if it comes in on the 16th either.
While artificial intelligence has made impressive strides in specialized domains like coding, art, and medicine, I think its potential to automate high-level strategic thinking has been surprisingly underrated. I argue that developing "AI Intellectuals" - software systems capable of sophisticated strategic analysis and judgment - represents a significant opportunity that's currently being overlooked, both by the EA/rationality communities and by the public.
FWIW, this paragraph reads LLM generated to me (then I stopped reading because I have a huge prior that content that reads that LLM-edited is almost universally low-quality).
Keep in mind that I'm talking about agent scaffolds here.
Yeah, I have failed to get any value out of agent scaffolds, and I don't think I know anyone else who has so far. If anyone has gotten more value out of them than just the Cursor chat, I would love to see how they do it!
All things like Cursor composer and codebuff and other scaffolds have been worse than useless for me (though I haven't tried it again after o3-mini, which maybe made a difference, it's been on my to-do list to give it another try).
And 1 hour on software engineering.
FWIW, this seems like an overestimate to me. Maybe o3 is better than other things, but I definitely can't get equivalents of 1-hour chunks out of language models, unless it happens to be an extremely boilerplate-heavy step. My guess is more like 15-minutes, and for debugging (which in my experience is close to most software-engineering time), more like 5-10 minutes.
Presumably "Elephant Seal 3"
FWIW, my sense is that it's a bad paper. I expect other people will come out with critiques in the next few days that will expand on that, but I will write something if no one has done it in a week or two. I think the paper notices some interesting weak correlations, but man, it really doesn't feel like the way you would go about answering the central question it is trying to answer and I keep having the feeling of it very much having been written to produce the thing that on the most shallow read will produce the most surface-level similar object in order to persuade and be socially viral, and not to inform.
Alas, also looks like our font is lacking some relevant character sets:
What should have been the trigger? When she started wearing black robes? When she started calling herself Ziz? When she started writing up her own homegrown theories of psychology? Weird clothes, weird names, and weird beliefs are part and parcel of the rationalist milieu.
FWIW, I think I had triggers around them being weird/sketchy that would now cause me to exclude them from many community things, so I do think there were concrete triggers, and I did update on that.
I mean, someone must have been running it somehow. In case that has been done by some group of people, I feel like saying why they now want a boss would also answer my question.
What happened to the previous CEO?