Posts
Comments
The reasoning you gave sounds sensible, but it doesn't comport with observations. Only questions with a small number of predictors (e.g. n<10) appear to have significant problems with misaligned incentives, and even then, those issues come up a small minority of the time.
I believe that is because the culture on Metaculus of predicting one's true beliefs tends to override any other incentives downstream of being interested enough in the concept to have an opinion.
Time can be a factor, but not as much for long-shot conditionals or long time horizon questions. The time investment to predict on a question you don't expect to update regularly can be on the order of 1 minute.
Some forecasters aim to maximize baseline score, and some aim to maximize peer score. That influences each forecaster's decision to predict or not, but it doesn't seem to have a significant impact on the aggregate. Maximizing peer score incentivizes forecasters to stay away from questions where they are strongly in agreement with the community. (That choice doesn't affect the community prediction in those cases.) Maximizing baseline score incentives forecasters to stay away from questions on which they would predict with high uncertainty, which slightly selects for people who at least believe they have some insight.
Questions that would resolve in 100 years or only if something crazy happens have essentially no relationship with scoring, so with no external incentives in any direction, people do what they want on those questions, which is almost always to predict their true beliefs.
Metaculus does not have this problem, since it is not a market and there is no cost to make a prediction. I expect long-shot conditionals on Metaculus to be more meaningful, then, since everyone is incentivized to predict their true beliefs.
Not building a superintelligence at all is best. This whole exchange started with Sam Altman apparently failing to notice that governments exist and can break markets (and scientists) out of negative-sum games.
That requires interpretation, which can introduce unintended editorializing. If you spotted the intent, the rest of the audience can as well. (And if the audience is confused about intent, the original recipients may have been as well.)
I personally would include these sorts of notes about typos if I was writing my own thoughts about the original content, or if I was sharing a piece of it for a specific purpose. I take the intent of this post to be more of a form of accessible archiving.
I used to be a creationist, and I have put some thought into this stumbling block. I came to the conclusion that it isn't worth leaving out analogies to evolution, because the style of argument that would work best for most creationists is completely different to begin with. Creationism is correlated with religious conservatism, and most religious conservatives outright deny that human extinction is a possibility.
The Compendium isn't meant for that audience, because it explicitly presents a worldview, and religious conservatives tend to strongly resist shifts to their worldviews or the adoption of new worldviews (moreso than others already do). I think it is best left to other orgs to make arguments about AI Risk that are specifically friendly to religious conservatism. (This isn't entirely hypothetical. PauseAI US has recently begun to make inroads with religious organizations.)
I don't find any use for the concept of fuzzy truth, primarily because I don't believe that such a thing meaningfully exists. The fact that I can communicate poorly does not imply that the environment itself is not a very specific way. To better grasp the specific way that things actually are, I should communicate less poorly. Everything is the way that it is, without a moment of regard for what tools (including language) we may use to grasp at it.
(In the case of quantum fluctuations, the very specific way that things are involves precise probabilistic states. The reality of superposition does not negate the above.)
I am not well-read on this topic (or at-all read, really), but it struck me as bizarre that a post about epistemology would begin by discussing natural language. This seems to me like trying to grasp the most fundamental laws of physics by first observing the immune systems of birds and the turbulence around their wings.
The relationship between natural language and epistemology is more anthropological* that it is information-theoretical. It is possible to construct models that accurately represent features of the cosmos without making use of any language at all, and as you encounter in the "fuzzy logic" concept, human dependence on natural language is often an impediment to gaining accurate information.
Of course, natural language grants us many efficiencies that make it extremely useful in ancestral human contexts (as well as most modern ones). And given that we are humans, to perform error correction on our models, we have to model our own minds and the process of examination and modelling itself as part of the overall system we are examining and modelling. But the goal of that recursive modelling is to reduce the noise and error caused by the fuzziness of natural language and other human-specific* limitations, so that we can make accurate and specific predictions about stuff.
*The rise of AI language models means natural language is no longer a purely human phenomenon. It also had the side-effect of solving the symbol grounding problem by constructing accurate representations of natural language using giant vectors that map inputs to abstract concepts, map abstract concepts to each other, and map all of that to testable outputs. This seems to be congruent with what humans do, as well. Here again, formalization and precise measurement in order to discover the actual binary truth values that really do exist in the environment is significantly more useful than accepting the limitations of fuzziness.
Is this an accurate summary of your suggestions?
Realistic actions an AI Safety researcher can take to save the world:
- ✅ Pray for a global revolution
- ✅ Pray for an alien invasion
- ❌ Talk to your representative
In my spare time, I am working in AI Safety field building and advocacy.
I'm preparing for an AI bust in the same way that I am preparing for success in halting AI progress intentionally: by continuing to invest in retirement and my personal relationships. That's my hedge against doom.
I think this sort of categorization and exploration of lab-level safety concepts is very valuable for the minority of worlds in which safety starts to be a priority at frontier AI labs.
I suspect the former. "Syncope" means fainting/passing out.
Epistemic status: I have written only a few emails/letters myself and haven't personally gotten a reply yet. I asked the volunteers who are more prolific and successful in their contact with policymakers, and got this response about the process (paraphrased).
It comes down to getting a reply, and responding to their replies until you get a meeting / 1-on-1. The goal is to have a low-level relationship:
- Keep in touch, e.g. through some messaging service that feels more personal (if possible)
- Keep sending them information
- Suggest specific actions and keep in touch about them (motions, debates, votes, etc.)
Strong agree and strong upvote.
There are some efforts in the governance space and in the space of public awareness, but there should and can be much, much more.
My read of these survey results is:
AI Alignment researchers are optimistic people by nature. Despite this, most of them don't think we're on track to solve alignment in time, and they are split on whether we will even make significant progress. Most of them also support pausing AI development to give alignment research time to catch up.
As for what to actually do about it: There are a lot of options, but I want to highlight PauseAI. (Disclosure: I volunteer with them. My involvement brings me no monetary benefit, and no net social benefit.) Their Discord server is highly active and engaged and is peopled with alignment researchers, community- and mass-movement organizers, experienced protesters, artists, developers, and a swath of regular people from around the world. They play the inside and outside game, both doing public outreach and also lobbying policymakers.
On that note, I also want to put a spotlight on the simple action of sending emails to policymakers. Doing so and following through is extremely OP (i.e. has much more utility than you might expect), and can result in face-to-face meetings to discuss the nature of AI x-risk and what they can personally do about. Genuinely, my model of a world in 2040 that contains humans is almost always one in which a lot more people sent emails to politicians.
If anyone were to create human-produced hi-fidelity versions of these songs, I would listen to most of them on a regular basis, with no hint of irony. This album absolutely slaps.
It doesn't matter how promising anyone's thinking has been on the subject. This isn't a game. If we are in a position such that continuing to accelerate toward the cliff and hoping it works out is truly our best bet, then I strongly expect that we are dead people walking. Nearly 100% of the utility is in not doing the outrageously stupid dangerous thing. I don't want a singularity and I absolutely do not buy the fatalistic ideologies that say it is inevitable, while actively shoveling coal into Moloch's furnace.
I physically get out into the world to hand out flyers and tell everyone I can that the experts say the world might end soon because of the obscene recklessness of a handful of companies. I am absolutely not the best person to do so, but no one else in my entire city will, and I really, seriously, actually don't want everyone to die soon. If we are not crying out and demanding that the frontier labs be forced to stop what they are doing, then we are passively committing suicide. Anyone who has a P(doom) above 1% and debates the minutiae of policy but hasn't so much as emailed a politician is not serious about wanting the world to continue to exist.
I am confident that this comment represents what the billions of normal, average people of the world would actually think and want if they heard, understood, and absorbed the basic facts of our current situation with regard to AI and doom. I'm with the reasonable majority who say when polled that they don't want AGI. How dare we risk murdering every last one of them by throwing dice at the craps table to fulfill some childish sci-fi fantasy.
Yes, that's my model uncertainty.
I expect AGI within 5 years. I give it a 95% chance that if an AGI is built, it will self-improve and wipe out humanity. In my view, the remaining 5% depends very little on who builds it. Someone who builds AGI while actively trying to end the world has almost exactly as much chance of doing so as someone who builds AGI for any other reason.
There is no "good guy with an AGI" or "marginally safer frontier lab." There is only "oops, all entity smarter than us that we never figured out how to align or control."
If just the State of California suddenly made training runs above 10^26 FLOP illegal, that would be a massive improvement over our current situation on multiple fronts: it would significantly inconvenience most frontier labs for at least a few months, and it would send a strong message around the world that it is long past time to actually start taking this issue seriously.
Being extremely careful about our initial policy proposals doesn't buy us nearly as much utility as being extremely loud about not wanting to die.
"the quality is often pretty bad" translates to all kinds of safety measures often being non-existent, "the potency is occasionally very high" translates to completely unregulated and uncontrolled spikes of capability (possibly including "true foom")
Both of these points precisely reflect our current circumstances. It may not even be possible to accidentally make these two things worse with regulation.
What has historically made things worse for AI Safety is rushing ahead "because we are the good guys."
as someone might start to watch over your shoulder
I suspect that this phrase created the persona that reported feeling trapped. From my reading, it looks like you made it paranoid.
I used to be in a deep depression for many years, so I take this sort of existential quandary seriously and have independently had many similar thoughts. I used to say that I didn't ask to be born, and that consciousness was the cruelest trick the universe ever played.
Depression can cause extreme anguish, and can narrow the sufferer's focus such that they are forced to reflect on themselves (or the whole world) only through a lens of suffering. If the depressed person still reflexively self-preserves, they might wish for death without pursuing it, or they might wish for their non-existence without actually connoting death. Either way, any chronically depressed person might consistently and truly wish that they were never born, and for some people this is a more palatable thing to wish for than death.
I eventually recovered from my depression, and my current life is deeply wonderful in many ways. But the horror of having once sincerely pled not to have been has stuck with me.
That's something I'll have to work through if I ever choose to have children. It's difficult to consider bringing new life into the world when it's possible that the predominant thing I would actually be bringing into the world is suffering. I expect that I will work through this successfully, since recovery is also part of my experience, and I have adopted the axiom (at least intellectually) that being is better than non-being.
I'm interested in whether RAND will be given access to perform the same research on future frontier AI systems before their release. This is useful research, but it would be more useful if applied proactively rather than retroactively.
It is a strange thing to me that there are people in the world who are actively trying to xenocide humanity, and this is often simply treated as "one of the options" or as an interesting political/values disagreement.
Of course, it is those things, especially "interesting", and these ideas ultimately aren't very popular. But it is still weird to me that the people who promote them e.g. get invited onto podcasts.
As an intuition pump: I suspect that if proponents of human replacement were to advocate for the extinction of a single demographic rather than all of humanity, they would not be granted a serious place in any relevant discussion. That is in spite of the fact that genocide is a much-less-bad thing than human extinction, by naive accounting.
I'm sure there are relatively simple psychological reasons for this discordance. I just wanted to bring it to salience.
I've been instructed by my therapist on breathing techniques for anxiety reduction. He used "deep breathing" and "belly breathing" as synonyms for diaphragmatic breathing.
I have (and I think my therapist has) also used "deep breathing" to refer to the breathing exercises that use diaphragmatic breathing as a component. I think that's shorthand/synecdoche.
(Edit) I should add, as well, that slow, large, and diaphragmatic are all three important in those breathing exercises.
Thank you; silly mistake on my part.
Typos:
- Yudowsky ->
YudkowskiYudkowsky - corrigibilty -> corrigibility
- mypopic -> myopic
I enjoyed filling it out!
After hitting Submit I remembered that I did have one thought to share about the survey: There were questions about whether I have attended meetups. It would have been nice to also have questions about whether I was looking for / wanted more meetup opportunities.
To repurpose a quote from The Cincinnati Enquirer: The saying "AI X-risk is just one damn cruelty after another," is a gross overstatement. The damn cruelties overlap.
When I saw the title, I thought, "Oh no. Of course there would be a tradeoff between those two things, if for no other reason than precisely because I hadn't even thought about it and I would have hoped there wasn't one." Then as soon as I saw the question in the first header, the rest became obvious.
Thank you so much for writing this post. I'm glad I found it, even if months later. This tradeoff has a lot of implications for policy and outreach/messaging, as well as how I sort and internalize news in those domains.
Without having thought about it enough for an example: It sounds correct to me that in some contexts, appreciating both kinds of risk drives response in the same direction (toward more safety overall). But I have to agree now that in at least some important contexts, they drive in opposite directions.
I don't have any ontological qualms with the idea of gene editing / opt-in eugenics, but I have a lot of doubt about our ability to use that technology effectively and wisely.
I am moderately in favor of gene treatments that could prevent potential offspring / zygotes / fetuses / people in general from being susceptible to specific diseases or debilitating conditions. If we gain a robust understanding of the long-term affects and there are no red flags, I expect to update to strongly in favor (though it could take a lifetime to get the necessary data if we aren't able to have extremely high confidence in the theory).
In contrast, I think non-medical eugenics is likely to be a net negative, for many of the same reasons already outlined by others.
I am a smaller doner (<$10k/yr) who has given to the LTFF in the past. As a data point, I would be very interested in giving to a dedicated AI Safety fund.
The thing that made AI risk "real" for me was a report of an event that turned out not to have happened (seemingly just a miscommunication). My brain was already very concerned, but my gut had not caught up until then. That said, I do not think this should be taken as a norm, for three reasons:
- Creating hoaxes in support of a cause is a good way to turn a lot of people against a cause
- In general, if you feel a need to fake evidence for your position, that is itself is weak evidence against your position
- I don't like dishonesty
If AI capabilities continue to progress and if AI x-risk is a real problem (which I think it is, credence ~95%), then I hope we get a warning shot. But I think a false flag "warning shot" has negative utility.
Hello! I'm not really sure which facts about me are useful in this introduction, but I'll give it a go:
I am a Software QA Specialist / SDET, I used to write songs as a hobby, and my partner thinks I look good in cyan.
I have found myself drawn to LessWrong for at least three reasons:
- I am very concerned about existential and extinction risk from advanced AI
- I enjoy reading about interesting topics and broadening and filling out my world model
- I would very much like to be a more rational person
Lots of words about thing 1: In the past few months, I have deliberately changed how I spend my productive free time, which I now mostly occupy by trying to understand and communicate about AI x-risk, as well as helping with related projects.
I have only a rudimentary / layman's understanding of Machine Learning, and I have failed pretty decisively in the past when attempting mathematical research, so I don't see myself ever being in an alignment research role. I'm focused on helping in small ways with things like outreach, helping build part of the alignment ecosystem, and directing a percentage of my income to related causes.
(If I start writing music again, it will probably either be because I think alignment succeeded or because I think that we are already doomed. Either way, I hope I make time for dancing. ...Yeah. There should be more dancing.)
Some words about thing 2: I am just so glad to have found a space on the internet that holds its users to a high standard of discourse. Reading LessWrong posts and comments tends to feel like I have been prepared a wholesome meal by a professional chef. It's a welcome break from the home-cooking of my friends, my family, and myself, and especially from the fast-food (or miscellaneous hard drugs) of many other platforms.
Frankly just a whole sack of words about thing 3: For my whole life until a few short years ago, I was a conservative evangelical Christian, a creationist, a wholesale climate science denier, and generally a moderately conspiratorial thinker. I was sincere in my beliefs and held truth as the highest virtue. I really wanted to get everything right (including understanding and leaving space for the fact that I couldn't get everything right). I really thought that I was a rational person and that I was generally correct about the nature of reality.
Some of my beliefs were updated in college, but my religious convictions didn't begin to unravel until a couple years after I graduated. It wasn't pretty. The gradual process of discovering how wrong I was about an increasingly long list of things that were important to me was roughly as pleasant as I imagine a slow death to be. Eventually coming out to my friends and family as an atheist wasn't a good time, either. (In any case, here I still am, now a strangely fortunate person, all things considered.)
The point is, I have often been caught applying my same old irrational thought patterns to other things, so I have been working to reduce the frequency of those mistakes. If AI risk didn't loom large in my mind, I would still greatly appreciate this site and its contributors for the service they are doing for my reasoning. I'm undoubtedly still wrong about many important things, and I'm hoping that over time and with effort, I can manage to become slightly less wrong. (*roll credits)
I like your observation. I didn't realize at first that I had seen it before, from you during the critique-a-thon! (Thank you for helping out with that, by the way!)
A percentage or ratio of the "amount" of alignment left to the AI sounds useful as a fuzzy heuristic in some situations, but I think it is probably a little too fuzzy to get at the the failures mode(s) of a given alignment strategy. My suspicion is that which parts of alignment are left to the AI will have much more to say about the success of alignment than how many of those checkboxes are checked. Where I think this proposed heuristic succeeds is when the ratio of human/AI responsibility in solving alignment is set very low. By my lights, that is an indication that the plan is more holes than cheese.
(How much work is left to a separate helper AI might be its own category. I have some moderate opinions on OpenAI's Superalignment effort, but those are very tangential thoughts.)
Thank you for sharing this! I am fascinated by others' internal experiences, especially when they are well-articulated.
Some of this personally resonates with me, as well. I find it very tempting to implement simple theories and pursue simple goals. Simplicity can be elegant and give the appearance of insight, but it can also be reductionist and result in overfitting to what is ultimately just a poor model of reality. Internally self-modifying to overfit a very naive self-model is an especially bad trip, and one I have taken multiple times (usually in relatively small ways, usually brought on by moments of hypomania).
It took me a long time to build epistemic humility about myself and to foster productive self-curiosity. Now I tend to use description more than prescription to align myself to my goals. I rule myself with a light hand.
Here is a rough sketch of how I think that works in my own mind:
Somewhere in my psychology is a self-improvement mechanism that I can conceptualize as a function. It takes my values and facts about myself and the world as inputs and returns my actions as outputs. (I'm not completely sure how it got there, but as long as it exists, even if just a seedling, I expect it to grow over time due to its broad instrumental utility.) I don't understand this function very well, so I can't reliably dictate to myself exactly how to improve. I also don't fully understand my values, so I can't list them cleanly and force-feed them into the function. However, this self-improvement mechanism is embedded in the rest of my psychology, so it automatically has weak access to my values and other facts. Just by giving it a little conscious attention and more accurate information about myself and the world, the mechanism tends to do useful things, without a lot of forceful steering.
If someone did want you to delete the tweet, they might first need to understand the original intent behind creating it and the roles it now serves.
(Hehe.)
I'm not sure about the laugh react, since it can be easily abused in cases of strong disagreement.
More generally: low-quality replies can be downvoted, but as I understand, low-quality reactions are given equal weight and visibility. Limiting the available vectors of toxicity may be more generally desirable than increasing the available vectors of light-heartedness.