Posts
Comments
I just came here to point out that even nuclear weapons were a slow takeoff in terms of their impact on geopolitics and specific wars. American nuclear attacks on Hiroshima and Nagasaki were useful but not necessarily decisive in ending the war on Japan; some historians argue that the Russian invasion of Japanese-occupied Manchuria, the firebombing of Japanese cities with massive conventional bombers, and the ongoing starvation of the Japanese population due to an increasingly successful blockade were at least as influential in the Japanese decision to surrender.
After 1945, the American public had no stomach for nuclear attacks on enough 'enemy' civilians to actually cause large countries like the USSR or China to surrender, and nuclear weapons were too expensive and too rare to use them to wipe out large enemy armies -- the 300 nukes America had stockpiled at the start of the Korean War in 1950 would not necessarily have been enough to kill the 3 million dispersed Chinese soldiers who actually fought in Korea, let alone the millions more who would likely have volunteered to retaliate against a nuclear attack.
The Soviet Union had a similarly-sized nuclear stockpile and no way to deliver it to the United States or even to the territory of key US allies; the only practical delivery vehicle at that time was via heavy bomber, and the Soviet Union had no heavy bomber force that could realistically penetrate western air defense systems -- hence the Nike anti-aircraft missiles rusting along ridgelines near the California coast and the early warning stations dotting the Canadian wilderness. If you can shoot their bombers down before they can reach your cities, then they can't actually win a nuclear war against you.
Nukes didn't become a complete gamechanger until the late 1950s, when the increased yields from hydrogen bombs and the increased range from ICBMs created a truly credible threat of annihilation.
I suspect we're talking about two different things.
If you just naively program a super-intelligent AI to satisfice a goal, then, sure, most of the candidate pathways to satisfice will involve accruing a lot of some type of power, because power is useful for achieving goals. That's a valid point, and it's important to understand that merely switching from optimizers to satisficers won't adequately protect us against overly ambitious AIs.
However, that doesn't mean that it's futile to explicitly penalize most (but not literally all) of the paths that the AI could take to accumulate too much power. Suppose you adequately penalize all of the tactics that would have catastrophic side effects except for, oh, I don't know, cornering the world market on all types of grain, because you forgot to think of that one particular path to power. Would most of the candidate plans that the AI submits for human approval then turn out to involve secretly cornering the grain market?
I don't see why they would. All else being equal, sure, the most powerful tactic available is going to be somewhat more attractive than other, less powerful tactics. But how much more attractive? Would an AI generate 1,000x more plans that involve one specific globally disruptive tactic like cornering the grain market vs. all of the various tactics that are only locally disruptive, like publishing propaganda or hiring away competing researchers or hacking into competing research labs or interrupting the flow of electricity to those labs? 10x more such plans? 2x more such plans? I don't think that's the kind of estimate you can make just by gesturing at basic principles of game theory; you'd need some concrete domain knowledge about the AI's specific planning algorithms.
If the truly dangerous plans are only 10x more common on the initial brainstorming list, then we can probably make sure those plans aren't chosen by filtering for plans that the AI rates as safe and transparent. We can use 3 different AIs on airgapped equipment -- one that only "cares" about evaluating whether a plan is safe (i.e., would the plan cause unacceptably large changes if implemented?), one that only "cares" about evaluating whether a plan is transparent (i.e., can the plan be explained to humans so that the humans will have an accurate understanding of what actions will be taken as part of the plan?), and one that generates plans that satisfice for a substantive goal (in this case, shutting down opposing AI research).
Of the plans that are approved by all 3 AIs, most of them probably won't involve accumulating excessive amounts of power, because usually if you accumulate too much power, then that is (a) unsafe, and (b) likely to involve lying to 'your' human programmers.
If powerful plans are 10x more common on the initial brainstorming list, then you need filters for safety and/or transparency that are at least 90% effective at catching unacceptably powerful plans in order to get back up to a 50% chance of humanity surviving. If powerful plans are more than 10x more common, then you need a set of filters that's more reliable than 90%. I don't think we have any idea what those numbers are yet, but I do think it's worth trying to reduce how common it is for excessively powerful plans to show up on the initial brainstorming list, and I think we can do that by training AIs to recognize dangerously disruptive plans and to try to avoid those types of plans. It's better to at least try to get AIs to engage with the concept of "this plan is too disruptive" then to throw up our hands and say, "Oh, power is an attractor in game theory space, so there's no possible way to get brilliant AIs that don't seize infinite power."
Sure, the metaphor is strained because natural selection doesn't have feelings, so it's never going to feel satisfied, because it's never going to feel anything. For whatever it's worth, I didn't pick that metaphor; Eliezer mentions contraception in his original post.
As I understand it, the point of bringing up contraception is to show that when you move from one level of intelligence to another, much higher level of intelligence, then the more intelligent agent can wind up optimizing for values that would be anathema to the less intelligent agents, even if the less intelligent agents have done everything they can to pass along their values. My objection to this illustration is that I don't think anyone's demonstrated that human goals could plausibly be described as "anathema" to natural selection. Overall, humans are pursuing a set of goals that are relatively well-aligned with natural selection's pseudo-goals.
One of my assumptions is that it's possible to design a "satisficing" engine -- an algorithm that generates candidate proposals for a fixed number of cycles, and then, assuming at least one proposal with estimated utility greater than X has been generated within that amount of time, selects one of the qualifying proposals at random. If there are no qualifying candidates, the AI takes no action.
If you have a straightforward optimizer that always returns the action with the highest expected utility, then, yeah, you only have to miss one "cheat" that improves "official" utility at the expense of murdering everyone everywhere and then we all die. But if you have a satisficer, then as long as some of the qualifying plans don't kill everyone, there's a reasonable chance that the AI will pick one of those plans. Even if you forget to explicitly penalize one of the pathways to disaster, there's no special reason why that one pathway would show up in a large majority of the AI's candidate plans.
Sure, I agree! If we miss even one such action, we're screwed. My point is that if people put enough skill and effort into trying to catch all such actions, then there is a significant chance that they'll catch literally all the actions that are (1) world-ending and that (2) the AI actually wants to try.
There's also a significant chance we won't, which is quite bad and very alarming, hence people should work on AI safety.
Right, I'm not claiming that AGI will do anything like straightforwardly maximize human utility. I'm claiming that if we work hard enough at teaching it to avoid disaster, it has a significant chance of avoiding disaster.
The fact that nobody is artificially mass-producing their genes is not a disaster from Darwin's point of view; Darwin is vaguely satisfied that instead of a million humans there are now 7 billion humans. If the population stabilizes at 11 billion, that is also not a Darwinian disaster. If the population spreads across the galaxy, mostly in the form of emulations and AIs, but with even 0.001% of sentient beings maintaining some human DNA as a pet or a bit of nostalgia, that's still way more copies of our DNA than the Neanderthals were ever going to get.
There are probably some really convincing analogies or intuition pumps somewhere that show that values are likely to be obliterated after a jump in intelligence, but I really don't think evolution/contraception is one of those analogies.
I think we're doing a little better than I predicted. Rationalists seem to be somewhat better able than their peers to sift through controversial public health advice, to switch careers (or retire early) when that makes sense, to donate strategically, and to set up physical environments that meet their needs (homes, offices, etc.) even when those environments are a bit unusual. Enough rationalists got into cryptocurrency early enough and heavy enough for that to feel more like successful foresight than a lucky bet. We're doing something at least partly right.
That said, if we really did have a craft of reliably identifying and executing better decisions, and if even a hundred people had been practicing that craft for a decade, I would expect to see a lot more obvious results than the ones I actually see. I don't see a strong correlation between the people who spend the most time and energy engaging with the ideas you see on Less Wrong, and the people who are wealthy, or who are professionally successful, or who have happy families, or who are making great art, or who are doing great things for society (with the possible exception of AI safety, and it's very difficult to measure whether working on AI safety is actually doing any real good).
If anything, I think the correlation might point the other way -- people who are distressed or unsuccessful at life's ordinary occupations are more likely to immerse themselves in rationalist ideas as an alternate source of meaning and status. There is something actually worth learning here, and there are actually good people here; it's not like I would want to warn anybody away. If you're interested in rationality, I think you should learn about it and talk about it and try to practice it. However, I also think some of us are still exaggerating the likely benefits of doing so. Less Wrong isn't objectively the best community; it's just one of many good communities, and it might be well-suited to your needs and quirks in particular.
I mostly agree with the reasoning here; thank you to Eliezer for posting it and explaining it clearly. It's good to have all these reasons here in once place.
The one area I partly disagree with is Section B.1. As I understand it, the main point of B.1 is that we can't guard against all of the problems that will crop up as AI grows more intelligent, because we can't foresee all of those problems, because most of them will be "out-of-distribution," i.e., not the kinds of problems where we have reasonable training data. A superintelligent AI will do strange things that wouldn't have occurred to us, precisely because it's smarter than we are, and some of those things will be dangerous enough to wipe out all human life.
I think this somewhat overstates the problem. If we tell an AI not to invent nanotechnology, not to send anything to protein labs, not to hack into all of the world's computers, not to design weird new quantum particles, not to do 100 of the other most dangerous and weirdest things we can think of, and then ask it to generalize and learn not to do things of that sort and build avoidance of catastrophic danger as a category into its utility function...
And then we test whether the AI is actually doing these things and successfully using something like the human category of "catastrophe" when the AI is only slightly smarter than humans...
And then learn from those tests and honestly look at the failures and improve the AI's catastrophe-avoidance skills based on what we learn...
Then the chances that that AI won't immediately destroy the world seem to me to be much much larger than 0.1%. They're still low, which is bad, but they're not laughably insignificant, either, because if you make an honest, thoughtful, sustained effort to constrain the preferences of your successors, then often you at least partially succeed.
If natural selection had feelings, it might not be maximally happy with the way humans are behaving in the wake of Cro-Magnon optimization...but it probably wouldn't call it a disaster, either. Despite the existence of contraception, there sure are a whole lot more Cro-Magnons than there ever were Neanderthals, and the population is still going up every year.
Similarly, training an AI to act responsibly isn't going to get us a reliably safe AI, but whoever launches the first super-intelligent AI puts enough effort into that kind of training, then I don't see any reason why we shouldn't expect at least a 50% chance of a million or better survivors. I'm much more worried about large, powerful organizations that "vocally disdain all talk of AGI safety" than I am about the possibility that AGI safety research is inherently futile. It's inherently imperfect in that there's no apparent path to guaranteeing the friendliness of superintelligence...but that's not quite the same thing as saying that we shouldn't expect to be able to increase the probability that superintelligence is at least marginally friendly.
I appreciate how much detail you've used to lay out why you think a lack of human agency is a problem -- compared to our earlier conversations, I now have a better sense of what concrete problem you're trying to solve and why that problem might be important. I can imagine that, e.g., it's quite difficult to tell how well you've fit a curve if the context in which you're supposed to fit that curve is vulnerable to being changed in ways whose goodness or badness is difficult to specify. I look forward to reading the later posts in this sequence so that I can get a sense of exactly what technical problems are arising and how serious they are.
That said, until I see a specific technical problem that seems really threatening, I'm sticking by my opinion that it's OK that human preferences vary with human environments, so long as (a) we have a coherent set of preferences for each individual environment, and (b) we have a coherent set of preferences about which environments we would like to be in. Right, like, in the ancestral environment I prefer to eat apples, in the modern environment I prefer to eat Doritos, and in the transhuman environment I prefer to eat simulated wafers that trigger artificial bliss. That's fine; just make sure to check what environment I'm in before feeding me, and then select the correct food based on my environment. What do you do if you have control over my environment? No big deal, just put me in my preferred environment, which is the transhuman environment.
What happens if my preferred environment depends on the environment I'm currently inhabiting, e.g., modern me wants to migrate to the transhumanist environment, but ancestral me thinks you're scary and just wants you to go away and leave me alone? Well, that's an inconsistency in my preferences -- but it's no more or less problematic than any other inconsistency. If I prefer oranges when I'm holding an apple, but I prefer apples when I'm holding an orange, that's just as annoying as the environment problem. We do need a technique for resolving problems of utility that are sensitive to initial conditions when those initial conditions appear arbitrary, but we need that technique anyway -- it's not some special feature of humans that makes that technique necessary; any beings with any type of varying preferences would need that technique in order to have their utility fully optimized.
It's certainly worth noting that standard solutions to Goodhart's law won't work without modification, because human preferences vary with their environments -- but at the moment such modifications seem extremely feasible to me. I don't understand why your objections are meant to be fatal to the utility of the overall framework of Goodhart's Law, and I hope you'll explain that in the next post.
Hmm. Nobody's ever asked me to try to teach them that before, but here's my advice:
- Think about what dimensions or components success at the task will include. E.g., if you're trying to play a song on the guitar, you might decide that a well-played song will have the correct chords played with the correct fingering and the correct rhythm.
- Think about what steps are involved in each of the components of success, with an eye toward ordering those steps in terms of which steps are easiest to learn and which steps are logical prerequisites for the others. E.g., in order to learn how to play a rhythm, you first need an understanding of rhythmic concepts like beats and meters. Then, once you have a language that you can use to describe a rhythm, you need some concrete examples of rhythms, e.g., a half note followed by two quarter-notes. Then you need to translate that into the physical motions taken on the guitar, e.g., downstrokes and upstrokes with greater or lesser emphasis. Those are two different steps; first you teach the difference between a downstroke and an upstroke, and then you teach the difference between a stressed beat and an unstressed beat. You might change the order of those steps if you are working with a student who's more comfortable with physical techniques than with language, e.g., demonstrate some rhythms first, and then only after that explain what they mean in words. In general, most values will have a vocabulary that lets you describe them, a series of examples that help you understand them, and a set of elements that constitute them; using each new word in the vocabulary and recognizing each type of example and recognizing each element and using each element is a separate step in learning the technique.
- Leave some room at the end for integration, e.g., if you've learned rhythm and fingering and chords, you still need some time to practice using all three of those correctly at once. This may include learning how to make trade-offs among the various components, e.g., if you've got some very tricky fingering in one measure, maybe you simplify the chord to make that easier.
I'm curious about the source of your intuition that we are obligated to make an optimal selection. You mention that the utility difference between two plausibly best meals could be large, which is true, especially when we drop the metaphor and reflect on the utility difference between two plausibly best FAI value schemes. And I suppose that, taken literally, the utilitarian code urges us to maximize utility, so leaving any utility on the table would technically violate utilitarianism.
On a practical level, though, I'm usually not in the habit of nitpicking people who do things for me that are sublimely wonderful yet still marginally short of perfect, and I try not to criticize people who made a decision that was plausibly the best available decision simply because some other decision was also plausibly the best available decision. If neither of us can tell for sure which of two options is the best, and our uncertainty isn't of the kind that seems likely to be resolvable by further research, then my intuition is that the morally correct thing to do is just pick one and enjoy it, especially if there are other worse options that might fall upon us by default if we dither for too long.
I agree with you that a trusted moral authority figure can make it easier for us to pick one of several plausibly best options...but I disagree with you that such a figure is morally necessary; instead, I see them as useful moral support for an action that can be difficult due to a lack of willpower or self-confidence. Ideally, I would just always pick a plausibly best decision by myself; since that's hard and I am a human being who sometimes experiences angst, it's nice when my friends and my mom help me make hard decisions. So the role of the moral authority, in my view, isn't that they justify a hard decision, causing it to become correct where it was not correct prior to their blessing; it's that the moral authority eases the psychological difficulty of making a decision that was hard to accept but that was nevertheless correct even without the authority's blessing.
Thank you for sharing this; there are several useful conceptual tools in here. I like the way you've found crisply different adjectives to describe different kinds of freedom, and I like the way you're thinking about the computational costs of surplus choices.
Building on that last point a bit, I might say that a savvy agent who has already evaluated N choices could try to keep a running estimate of their expected gains from choosing the best option available after considering X more choices and then compare that gain to their cost of computing the optimal choice out of X + N options. Right, like if the utility of an arbitrary choice follows anything like a normal distribution, then as N increases, we expect U(N+X) to have tinier and tinier advantages over U(N), because N choices already cover most of the distribution, so it's unlikely that an even better choice is available within the X additional choices you look at, and even if you do find a better choice, it's probably only slightly better. Yet for most humans, computing the best choice out of N+X options is more costly than computing the best choice for only N options, because you start to lose track of the details of the various options you're considering as you add more and more possibilities to the list, and the list starts to feel boring or overwhelming, so it gets harder to focus. So there's sort of a natural stopping point where the cost of considering X additional options can be confidently predicted to outweigh the expected benefit of considering X additional options, and when you reach that point, you should stop and pick the best choice you've already researched.
I like having access to at least some higher-order freedoms because I enjoy the sensation of planning and working toward long-term goal, but I don't understand why the order of a freedom is important enough to justify orienting our entire system of ethics around it. Right, like, I can imagine some extremely happy futures where everyone has stable access to dozens of high-quality choices in all areas of their lives, but, sadly, none of those choices exceed order 4, and none of them ever will. I think I'd take that future over our present and be quite grateful for the exchange. On the other hand, I can imagine some extremely dark futures where the order of choices is usually increasing for most people, because, e.g., they're becoming steadily smarter and/or more resilient and they live in a complicated world, but they're trapped in a kind of grindy hellscape where they have to constantly engage in that sort of long-term planning in order to purchase moderately effective relief from their otherwise constant suffering.
So I'd question whether the order of freedoms is (a) one interesting heuristic that is good to look at when considering possible futures, or (b) actually the definition of what it would mean to win. If it's (b), I think you have some more explaining to do.
I agree with this post. I'd add that from what I've seen of medical school (and other high-status vocational programs like law school, business school, etc.), there is still a disproportionate emphasis on talking about the theory of the subject matter vs. building skill at the ultimate task. Is it helpful to memorize the names of thousands of arteries and syndromes and drugs in order to be a doctor? Of course. Is that *more* helpful than doing mock patient interviews and mock chart reviews and live exercises where you try to diagnose a tumor or a fracture or a particular kind of pus? Is it *so* much more helpful that it makes sense to spend 40x more hours on biochemistry than on clinical practice? Because my impression of medical school is that you do go on clinical rounds and do internships and things, but that the practical side of things is mostly a trial-by-fire where you are expected to improvise many of your techniques, often after seeing them demonstrated only once or twice, often with minimal supervision, and usually with little or no coaching or after-the-fact feedback. The point of the internships and residencies seems to be primarily to accomplish low-prestige medical labor, not primarily to help medical students improve their skills.
I'd be curious to hear from anyone who disagrees with me about medical school. I'm not super-confident about this assessment of medical school; I'm much more confident that an analogous critique applies well to law school and business school. Lawyers learn the theory of appellate decision-making, not how to prepare a case for trial or negotiate a settlement or draft a contract. MBAs learn economics and financial theory, not how to motivate or recruit or evaluate their employees.
As far as *why* we don't see more discussion about how to improve technique, I think part of it is just honest ignorance. Most people aren't very self-reflective and don't think very much about whether they're good at their jobs or what it means to be good at their jobs or how they could become better. Even when people do take time to reflect on what makes a good [profession], they may not have the relevant background to draw useful conclusions. Academic authorities often have little or no professional work experience; the median law professor has tried zero lawsuits; the median dean of a business school has never launched a startup; the median medical school lecturer has never worked as a primary care physician in the suburbs.
Some of it may be, as Isnasene points out, a desire to avoid unwanted competition. If people are lazy and want to enjoy high status that they earned a long time ago without putting in further effort, they might not want to encourage comparisons of skill levels.
Finally, as Isusr suggests, some of the taboo probably comes from an effort to preserve a fragile social hierarchy, but I don't think the threat is "awareness of internal contradictions;" I think the threat is simply a common-sense idea of fairness or equity. If authorities or elites are no more objectively skillful than a typical member of their profession, then there is little reason for them to have more power, more money, or easier work. Keeping the conversation firmly fixed on discussion *about* the profession (rather than discussion about *how to do* the profession) helps obscure the fact that the status of elites is unwarranted.
I like the style of your analysis. I think your conclusion is wrong because of wonky details about World War 2. 4 years of technical progress at anything important, delivered for free on a silver platter, would have flipped the outcome of the war. 4 years of progress in fighter airplanes means you have total air superiority and can use enemy tanks for target practice. 4 years of progress in tanks means your tanks are effectively invulnerable against their opponents, and slice through enemy divisions with ease. 4 years of progress in manufacturing means you outproduce your opponent 2:1 at the front lines each and overwhelm them with numbers. 4 years of progress in cryptography means you know your opponent's every move and they are blind to your strategy.
Meanwhile, the kiloton bombs were only able to cripple cities "in a single mission" because nobody was watching out for them. Early nukes were so heavy that it's doubtful whether the slow clumsy planes that carried them could have arrived at their targets against determined opposition.
There is an important sense in which fission energy is discontinuously better than chemical energy, but it's not obvious that this translates into a discontinuity in strategic value per year of technological progress.
1) I agree with the very high-level point that there are lots of rationalist group houses with flat / egalitarian structures, and so it might make sense to try one that's more authoritarian to see how that works. Sincere kudos to you for forming a concrete experimental plan and discussing it in public.
2) I don't think I've met you or heard of you before, and my first impression of you from your blog post is that you are very hungry for power. Like, you sound like you would really, really enjoy being the chief of a tribe, bossing people around, having people look up to you as their leader, feeling like an alpha male, etc. The main reason this makes me uncomfortable is that I don't see you owning this desire anywhere in your long post. Like, if you had said, just once, "I think I would enjoy being a leader, and I think you might enjoy being led by me," I would feel calmer. Instead I'm worried that you have convinced yourself that you are grudgingly stepping up as a leader because it's necessary and no one else will. If you're not being fully honest about your motivations for nominating yourself to be an authoritarian leader, what else are you hiding?
3) Your post has a very high ratio of detailed proposals to literature review. I would have liked to see you discuss other group houses in more detail, make reference to articles or books or blog posts about the theory of cohousing and of utopian communities more generally, or otherwise demonstrate that you have done your homework to find out what has worked, what has not worked, and why. None of your proposals sound obviously bad to me, and you've clearly put some thought and care into articulating them, but it's not clear whether your proposals are backed up by research, or whether you're just reasoning from your armchair.
4) Why should anyone follow you on an epic journey to improve their time management skills if you're sleep-deprived and behind schedule on writing a blog post? Don't you need to be more or less in control of your own lifestyle before you can lead others to improve theirs?
And if you think you can explain the concept of "systematically underestimated inferential distances" briefly, in just a few words, I've got some sad news for you...
"I know [evolution] sounds crazy -- it didn't make sense to me at first either. I can explain how it works if you're curious, but it will take me a long time, because it's a complicated idea with lots of moving parts that you probably haven't seen before. Sometimes even simple questions like 'where did the first humans come from?' turn out to have complicated answers."
I am always trying to cultivate a little more sympathy for people who work hard and have good intentions! CFAR staff definitely fit in that basket. If your heart's calling is reducing AI risk, then work on that! Despite my disappointment, I would not urge anyone who's longing to work on reducing AI risk to put that dream aside and teach general-purpose rationality classes.
That said, I honestly believe that there is an anti-synergy between (a) cultivating rationality and (b) teaching AI researchers. I think each of those worthy goals is best pursued separately.
Yeah, that pretty much sums it up: do you think it's more important for rationalists to focus even more heavily on AI research so that their example will sway others to prioritize FAI, or do you think it's more important for rationalists to broaden their network so that rationalists have more examples to learn from?
Shockingly, as a lawyer who's working on homelessness and donating to universal income experiments, I prefer a more general focus. Just as shockingly, the mathematicians and engineers who have been focusing on AI for the last several years prefer a more specialized focus. I don't see a good way for us to resolve our disagreement, because the disagreement is rooted primarily in differences in personal identity.
I think the evidence is undeniable that rationality memes can help young, awkward engineers build a satisfying social life and increase their productivity by 10% to 20%. As an alum of one of CFAR's first minicamps back in 2011, I'd hoped that rationality would amount to much more than that. I was looking forward to seeing rationalist tycoons, rationalist Olympians, rationalist professors, rationalist mayors, rationalist DJs. I assumed that learning how to think clearly and act accordingly would fuel a wave of conspicuous success, which would in turn attract more resources for the project of learning how to think clearly, in a rapidly expanding virtuous cycle.
Instead, five years later, we've got a handful of reasonably happy rationalist families, an annual holiday party, and a couple of research institutes dedicated to pursuing problems that, by definition, will provide no reliable indicia of their success until it is too late. I feel very disappointed.
Well, like I said, AI risk is a very important cause, and working on a specific problem can help focus the mind, so running a series of AI-researcher-specific rationality seminars would offer the benefit of (a) reducing AI risk, (b) improving morale, and (c) encouraging rationality researchers to test their theories using a real-world example. That's why I think it's a good idea for CFAR to run a series of AI-specific seminars.
What is the marginal benefit gained by moving further along the road to specialization, from "roughly half our efforts these days happen to go to running an AI research seminar series" to "our mission is to enlighten AI researchers?" The only marginal benefit I would expect is the potential for an even more rapid reduction in AI risk, caused by being able to run, e.g., 4 seminars a quarter for AI researchers, instead of 2 for AI researchers and 2 for the general public. I would expect any such potential to be seriously outweighed by the costs I describe in my main post (e.g., losing out on rationality techniques that would be invented by people who are interested in other issues), such that the marginal effect of moving from 50% specialization to 100% specialization would be to increase AI risk. That's why I don't want CFAR to specialize in educating AI researchers to the exclusion of all other groups.
I dislike CFAR's new focus, and I will probably stop my modest annual donations as a result.
In my opinion, the most important benefit of cause-neutrality is that it safeguards the integrity of the young and still-evolving methods of rationality. If it is official CFAR policy that reducing AI risk is the most important cause, and CFAR staff do almost all of their work with people who are actively involved with AI risk, and then go and do almost all of their socializing with rationalists (most of whom also place a high value on reducing AI risk), then there will be an enormous temptation to discover, promote, and discuss only those methods of reasoning that support the viewpoint that reducing AI risk is the most important value. This is bad partly because it might stop CFAR from changing its mind in the face of new evidence, but mostly because the methods that CFAR will discover (and share with the world) will be stunted -- students will not receive the best-available cognitive tools; they will only receive the best-available cognitive tools that encourage people to reduce AI risk. You might also lose out on discovering methods of (teaching) rationality that would only be found by people with different sorts of brains -- it might turn out that the sort of people who strongly prioritize friendly AI think in certain similar ways, and if you surround yourself with only those people, then you limit yourself to learning only what those people have to teach, even if you somehow maintain perfect intellectual honesty.
Another problem with focusing exclusively on AI risk is that it is such a Black Swan-type problem that it is extremely difficult to measure progress, which in turn makes it difficult to assess the value or success of any new cognitive tools. If you work on reducing global warming, you can check the global average temperature. More importantly, so can any layperson, and you can all evaluate your success together. If you work on reducing nuclear proliferation for ten years, and you haven't secured or prevented a single nuclear warhead, then you know you're not doing a good job. But how do you know if you're failing to reduce AI risk? Even if you think you have good evidence that you're making progress, how could anyone who's not already a technical expert possibly assess that progress? And if you propose to train all of the best experts in your methods, so that they learn to see you as a source of wisdom, then how many of them will retain the capacity to accuse you of failure?
I would not object to CFAR rolling out a new line of seminars that are specifically intended for people working on AI risk -- it is a very important cause, and there's something to be gained in working on a specific problem, and as you say, CFAR is small enough that CFAR can't do it all. But what I hear you saying that the mission is now going to focus exclusively on reducing AI risk. I hear you saying that if all of CFAR's top leadership is obsessed with AI risk, then the solution is not to aggressively recruit some leaders who care about other topics, but rather to just be honest about that obsession and redirect the institution's policies accordingly. That sounds bad. I appreciate your transparency, but transparency alone won't be enough to save the CFAR/MIRI community from the consequences of deliberately retreating into a bubble of AI researchers.
Does anyone know what happened to TC Chamberlin's proposal? In other words, shortly after 1897, did he in fact manage to spread better intellectual habits to other people? Why or why not?
Thank you! I see that some people voted you down without explaining why. If you don't like someone's blurb, please either contribute a better one or leave a comment to specifically explain how the blurb could be improved.
Sure!
Again, fair point -- if you are reading this, and you have experience designing websites, and you are willing to donate a couple of hours to build a very basic website, let us know!
Sounds good to me. I'll keep an eye out for public domain images of the Earth exploding. If the starry background takes up enough of the image, then the overall effect will probably still hit the right balance between alarm and calm.
A really fun graphic would be an asteroid bouncing off a shield and not hitting Earth, but that might be too specific.
Great! Pick one and get started, please. If you can't decide which one to do, please do asteroids.
It would go to the best available charity that is working to fight that particular existential risk. For example, the 'donate' button for hostile AI might go to MIRI. The donate button for pandemics might go the Center for Disease Control, and the donate button for nuclear holocaust might go to the Global Threat Reduction Initiative. If we can't agree on which agency is best for a particular risk, we can pick one at random from the front-runners.
If you have ideas for which charities are the best for a particular risk, please share them here! That is part of the work that needs to get done.
Hi Dorikka,
Yes, I am also concerned that the banner is too visually complicated -- it's supposed to be a scene of a flooded garage workshop, suggesting both major problems and a potential ability to fix them, but the graphic is not at all iconic. If you have another idea for the banner (or can recommend a particular font that would work better), please chime in.
I am not convinced that www.existential-risk.org is a good casual landing page, because (a) most of the content is in the form of an academic CV, (b) there is no easy-to-read summary telling the reader about existential risks, and (c) there is no donate button.
It's probably "Song of Light," or if you want a more literal translation, "Hymn to Light."
You might be wrestling with a hard trade-off between wanting to do as much good as possible and wanting to fit in well with a respected peer group. Those are both good things to want, and it's not obvious to me that you can maximize both of them at the same time.
I have some thoughts on your concepts of "special snowflake" and "advice that doesn't generalize." I agree that you are not a special snowflake in the sense of being noticeably smarter, more virtuous, more disciplined, whatever than the other nurses on your shift. I'll concede that you and them have -basically- the same character traits, personalities, and so on. But my guess is that the cluster of memes hanging out in your prefrontal cortex is more attuned to strategy than their meme-clusters -- you have a noticeably different set of beliefs and analytical tools. Because strategic meme-clusters are very rare compared to how useful they are, having those meme-clusters makes you "special" in a meaningful way even if in all other respects you are almost identical to your peers. The 1% more-of-the-time that you spend strategizing about how best to accomplish goals can double or triple your effectiveness at many types of tasks, so your small difference in outlook leads to a large difference in what kinds of activities you want to devote your life to. That's OK.
Similarly, I agree with you that it would be bad if all the nurses in your ward quit to enter politics -- someone has to staff the bloody ward, or no amount of political re-jiggering will help. The algorithm that I try to follow when I'm frustrated that the advice I'm giving myself doesn't seem to generalize is to first check and see if -enough- people are doing Y, and then switch from X to Y if and only if fewer-than-enough people are doing Y. As a trivial example, if forty of my friends and I are playing soccer, we will probably all have more fun if one of us agrees to serve as a referee. I can't offer the generally applicable advice "You should stop kicking the ball around and start refereeing." That would be stupid advice; we'd have forty referees and no ball game. But I can say "Hm, what is the optimal number of referees? Probably 2 or 3 people out of the 40 of us. How many people are currently refereeing? Hm, zero. If I switch from playing to refereeing, we will all have more fun. Let me check and see if everyone is making the same leap at the same time and scrambling to put on a striped shirt. No? OK, cool, I'll referee for a while." That last long quote is fully generalizable advice -- I wish literally everyone would follow it, because then we'd wind up with close to an optimal number of referees.
OK, but why is "chair" shorter than "furniture"? Why is "blue" shorter than "color"? Furniture and color don't strike me as words that are so abstract as to rarely see use in everyday conversation.
I'm confused. What makes "chair" the basic category? I mean, obviously more basic categories will have shorter words -- but who decided that "solid object taking up roughly a cubic meter designed to support the weight of a single sitting human" was a basic category?
That's an important warning, and I'm glad you linked me to the post on ethical inhibitions. It's easy to be mistaken about when you're causing harm, and so allowing a buffer in honor of the precautionary principle makes sense. That's part of why I never mention the names of any of my clients in public and never post any information about any specific client on any public forums -- I expect that most of the time, doing so would cause no harm, but it's important to be careful.
Still, I had the sense when I first read your comment six weeks ago that it's not a good ethical maxim to "never provide any information (even in the mathematical/Bayesian sense of "information") to anyone who doesn't have an immediate need to know it."
I think I've finally put my finger on what was bothering me: in order to provide the best possible service to my clients, I need to make use of my social and emotional support structure. If I carried all of the burdens of my work solely on my own shoulders, letting all of my client's problems bounce around solely in my head, I'd go a little crazier than I already am, and I'd provide worse service. My clients would suffer from my peculiar errors of viewpoint. In theory, I can discuss my clients with my boss or with my assistants, but both of those relationships are too charged with competition to serve as an effective emotional safety valve -- I don't really want to rely on my boss for a dose of perspective; I'm too busy signalling to my boss that I'm competent.
I think this is probably generally applicable -- I want my doctors to have a chance to chat about me (without using my real name) in the break room or with their poker buddies, so that they can be as stable and relaxed as possible about giving me the best possible treatment. Same thing with my accountant -- I'm much more concerned that my accountant is going to forget to apply for a legal tax exemption that'll net me thousands of dollars than I am that my accountant is going to leak details about me to his friend who, unbeknownst to the accountant, is friends with the husband of an IRS agent who will then decide to give me an unfriendly audit. Sure, it's important to me that my medical and financial details stay reasonably private, but I'm willing to trade a small amount of privacy for a moderate increase in professional competence.
Do you feel differently? I suspect that some of the people who make bold, confident assertions about how "nobody should ever disclose any private information under any circumstances" are simply signalling their loyalty and discretion, rather than literally describing their preferred policies or honestly describing their intended behavior. Perhaps I'm just falling prey to the Typical Mind fallacy, though.
It seems to me that educated people should know something about the 13-billion-year prehistory of our species and the basic laws governing the physical and living world, including our bodies and brains. They should grasp the timeline of human history from the dawn of agriculture to the present. They should be exposed to the diversity of human cultures, and the major systems of belief and value with which they have made sense of their lives. They should know about the formative events in human history, including the blunders we can hope not to repeat. They should understand the principles behind democratic governance and the rule of law. They should know how to appreciate works of fiction and art as sources of aesthetic pleasure and as impetuses to reflect on the human condition.
On top of this knowledge, a liberal education should make certain habits of rationality second nature. Educated people should be able to express complex ideas in clear writing and speech. They should appreciate that objective knowledge is a precious commodity, and know how to distinguish vetted fact from superstition, rumor, and unexamined conventional wisdom. They should know how to reason logically and statistically, avoiding the fallacies and biases to which the untutored human mind is vulnerable. They should think causally rather than magically, and know what it takes to distinguish causation from correlation and coincidence. They should be acutely aware of human fallibility, most notably their own, and appreciate that people who disagree with them are not stupid or evil. Accordingly, they should appreciate the value of trying to change minds by persuasion rather than intimidation or demagoguery.
Steven Pinker, The New Republic 9/4/14
You're...welcome? For what it's worth, mainstream American legal ethics try to strike a balance between candor and advocacy. It's actually not OK for lawyers to provide unabashed advocacy; lawyers are expected to also pay some regard to epistemic accuracy. We're not just hired mercenaries; we're also officers of the court.
In a world that was full of Bayesian Conspiracies, where people routinely teased out obscure scraps of information in the service of high-stakes, well-concealed plots, I would share your horror at what you describe as "disclosing personal information." Mathematically, you're obviously correct that when I say anything about my client(s) that translates as anything other than a polite shrug, it has the potential to give my clients' enemies valuable information. As a practical matter, though, the people I meet at dinner parties don't know or care about my clients. They can't be bothered to hack into my firm's database, download my list of clients, hire an investigator to put together dossiers on each client, and then cross-reference the dossier with my remarks to revise their probability estimate that a particular client is faking his injury. Even if someone chose to go to all that trouble, nobody would buy the resulting information -- the defense lawyers I negotiate with are mathematically illiterate. Finally, even if someone bought the resulting information, it's not clear what the defense lawyers would do if they could confidently upgrade their estimate of the chance that Bob was faking his injury from 30% up to 60% -- would they tail him with a surveillance crew? They do that anyway. Would they drive a hard bargain in settlement talks? They do that anyway. Civil legal defense tactics aren't especially sensitive to this kind of information.
All of which is to say that I take my duties to my clients very seriously, and I would never amuse myself at a cocktail party in ways that I thought had more than an infinitesimal chance of harming them. If you prefer your advocates to go beyond a principle of 'do no harm' and live by a principle of 'disclose no information', and you are willing to pay for the extra privacy, then more power to you -- but beware of lawyers who smoothly assure you that they would never disclose any client info under any circumstances. It's a promise that's easy to make and hard to verify.
Is that revelation grounds for a lawsuit, a criminal offense or merely grounds for disbarment?
None of the above, really, unless you have so few murder cases that someone could plausibly guess which one you were referring to. I work with about 100 different plaintiffs right now, and my firm usually accepts any client with a halfway decent case who isn't an obvious liar. Under those conditions, it'd be alarming if I told you that 100 out of 100 were telling the truth -- someone's bound to be at least partly faking their injury. I don't think it undermines the justice system to admit as much in the abstract.
If you indiscreetly named a specific client who you thought was guilty, though, that could get you a lawsuit, a criminal offense, and disbarment.
I'm confused about how this works.
Suppose the standard were to use 80% confidence. Would it still be surprising to see 60 of 60 studies agree that A and B were not linked? Suppose the standard were to use 99% confidence. Would it still be surprising to see 60 of 60 studies agree that A and B were not linked?
Also, doesn't the prior plausibility of the connection being tested matter for attempts to detect experimenter bias this way? E.g., for any given convention about confidence intervals, shouldn't we be quicker to infer experimenter bias when a set of studies conclude (1) that there is no link between eating lithium batteries and suffering brain damage vs. when a set of studies conclude (2) that there is no link between eating carrots and suffering brain damage?
Yes, Voldemort could probably teach DaDA without suffering from the curse, and a full-strength Voldemort with a Hogwarts Professorship could probably steal the stone.
I'm not sure either of those explains how Voldemort got back to full-strength in the first place, though. Did Voldemort fake the charred hulk of his body? And Harry forgot that apparent charred bodies aren't perfectly reliable evidence of a dead enemy because his books have maxims like "don't believe your enemy is dead until you see the body?" But then what was Voldemort doing between 1975 and 1990? He was winning the war until he tackled Harry; why would he suddenly decide to stop?
Puzzle:
Who is ultimately in control of the person who calls himself Quirrell?
- Voldemort
If Voldemort is possessing the-person-pretending-to-be-Quirrell using the path Dumbledore & co. are familiar with, or for that matter by drinking unicorn blood, then why isn't Voldy's magic noticeably weaker than before? Quirrell seems like he could at least hold his own against Dumbledore, and possibly defeat him.
If Voldemort took control of the-person-pretending-to-be-Quirrell's body outright using incredibly Dark magic, then why would Quirrell openly suggest that possibility to the DMLE Auror in Taboo Tradeoffs I?
If Voldemort returned to life via the Philosopher's Stone, then how did he get past the 'legendary' and 'fantastic' wards on the forbidden corridor without so much as triggering an alarm?
- David Monroe
If Monroe disappeared on purpose in 1975, and has been having random other international adventures since then, and has only just now decided to teach Battle Magic at Hogwarts (thereby ensuring his demise, per the Dark Lord's curse on the position) because his zombie syndrome is worsening and he is worried about living out the year, then what is his purpose in teaching Battle Magic? Is it just for the fun of it? This seems unlikely; he is very serious about his subject and rarely indulges in jokes or in irrelevant scholastic diversions.
Is it because he expects that teaching the students Battle Magic will help them learn to fight back and resist Dark wizards? Then why did he plan so poorly for his big Yuletide speech about resistance and unity as to allow Harry to seriously disrupt it? Could someone as intelligent as Monroe, whose major goal is to sway political opinion, really only give one big political speech and then, at that speech, fail to prevent one (admittedly precocious) student from giving a moderately persuasive opposing speech? Why not, e.g., cast a silent, wandless Silencio charm on Harry? Or simply inform him that he has 30 words in which to state his backup wish, or else it is forfeit? Or pretend to honor the wish that he would teach Defense against the Dark Arts next year? All of these alternatives (plus others) seem obviously better to me than tolerating such blatant interference with his primary goal.
- Lucius Malfoy
If he had those kinds of powers, he would wield them openly and just take over Britain. Also, it's hard to imagine he wouldn't have been keeping a closer watch on his son, to the point where he would know if his son was involved in a duel and/or sitting around freezing for six to eight hours.
- Slytherin's Monster
It has mysteriously powerful lore from the ancient past, and there's no firm evidence that it was killed or locked back in the Chamber of Secrets after Voldy broke in. In fact, the person who claims that Voldy's last words to the Monster would have been Avada Kedavra is...Quirrell. Not exactly a trustworthy source if Quirrell is the Monster.
OTOH, this would be ludicrously under-foreshadowed -- canon!Monster was a non-sentient beast, and the only HPMOR foreshadowing for the Monster focused on its being very long lived and able to speak Parseltongue. It's not clear how a rationalist would deduce, from available information, that the Monster was responsible -- we have very little data on what the Monster is like, so it's very hard to strongly match the actions we observe to the actions we expect from the Monster.
- Albus Dumbledore
Lots of pieces of weak evidence point here; Dumbledore and Quirrell are two of the highest-powered wizards around, and are two of the weirdest wizards around, and have roughly the same power level, so the hypothesis that says they are both caused by the same phenomenon gets a simplicity bonus. Dumbledore is frequently absent without a good explanation; Quirrell is frequently zombie-ish without a good explanation; Quirrell is zombieish more often as Dumbledore starts to get more energetic and activate the Order of the Pheonix; I cannot think of any scenes where both Dumbledore and Quirrell are being very active at exactly the same time. Sometimes Dumbledore expresses skepticism at something Quirrell says, but I cannot think of any examples of them engaging in magical cooperation or confrontation. If they are the same person, then it is convenient that Quirrell made Dumbledore promise not to investigate who Quirrell is.
We know Dumbledore snuck into Harry's room (in his own person) and left messages for Harry warning Harry not to trust Dumbledore; perhaps Dumbledore also turns into Quirrell and warns Harry in Quirrell's body not to trust Dumbledore. It is a little unclear why Dumbledore would want to limit Harry's trust in him, but it could have to do with the idea of heroic responsibility (nihil supernum) or even just standard psychology -- if Quirrell and Dumbledore agree on something, even though Quirrell says not to trust Dumbledore, then Harry is very likely to believe it.
It is hard to imagine Dumbledore murdering Hermione in cold blood, but, as Harry has been musing, you can only say "that doesn't seem like his style" so many times before the style defense becomes extremely questionable. Dumbledore prevents Hermione from receiving a Time-Turner, was suspiciously absent at the time of the troll attack (but showed up immediately after it was complete, with just enough time in between to have obliviated Fred and George, who, conveniently, handed the Marauder's map over to the Headmaster and then forgot all about it).
OTOH, having Hermione attempt to kill Draco and then having the troll kill Hermione on school grounds is terrible for Dumbledore's political agenda -- he winds up losing support from the centrists over the attack on Draco, and losing support from everyone over incompetent security. The school, where he has been Headmaster for decades and where he must keep the Philosopher's Stone, might even be closed. It's hard to understand how putting his entire power base in grave jeopardy could be a deliberate plot on his part, nor is it easily explained in terms of feeling plot-appropriate (it doesn't) or Dumbledore's insanity (a fully general explanation).
Is there more to the Soylent thing than mixing off-the-shelf protein shake powder, olive oil, multivitamin pills, and mineral supplement pills and then eating it?
Isn't there a very wide middle ground between (1) assigning 100% of your mental probability to a single model, like a normal curve and (2) assigning your mental probability proportionately across every conceivable model ala Solomonoff?
I mean the whole approach here sounds more philosophical than practical. If you have any kind of constraint on your computing power, and you are trying to identify a model that most fully and simply explains a set of observed data, then it seems like the obvious way to use your computing power is to put about a quarter of your computing cycles on testing your preferred model, another quarter on testing mild variations on that model, another quarter on all different common distribution curves out of the back of your freshman statistics textbook, and the final quarter on brute-force fitting the data as best you can given that your priors about what kind of model to use for this data seem to be inaccurate.
I can't imagine any human being who is smart enough to run a statistical modeling exercise yet foolish enough to cycle between two peaks forever without ever questioning the assumption of a single peak, nor any human being foolish enough to test every imaginable hypothesis, even including hypotheses that are infinitely more complicated than the data they seek to explain. Why would we program computers (or design algorithms) to be stupider than we are? If you actually want to solve a problem, you try to get the computer to at least model your best cognitive features, if not improve on them. Am I missing something here?
What's the percent chance that I'm doing it wrong?
I once heard a story about the original writer of the Superman Radio Series. He wanted a pay rise, his employers didn't want to give him one. He decided to end the series with Superman trapped at the bottom of a well, tied down with kryptonite and surrounded by a hundred thousand tanks (or something along these lines). It was a cliffhanger. He then made his salary demands. His employers refused and went round every writer in America, but nobody could work out how the original writer was planning to have Superman escape. Eventually the radio guys had to go back to him and meet his wage demands. The first show of the next series began "Having escaped from the well, Superman hurried to..." There's a lesson in there somewhere, but I've no idea what it is.
-http://writebadlywell.blogspot.com/2010/05/write-yourself-into-corner.html
I would argue that the lesson is that when something valuable is at stake, we should focus on the simplest available solutions to the puzzles we face, rather than on ways to demonstrate our intelligence to ourselves or others.
Ironically, this is my most-upvoted comment in several months.
OK, so how else might we get people to gate-check the troublesome, philosophical, misleading parts of their moral intuitions that would have fewer undesirable side effects? I tend to agree with you that it's good when people pause to reflect on consequences -- but then when they evaluate those consequences I want them to just consult their gut feeling, as it were. Sooner or later the train of conscious reasoning had better dead-end in an intuitively held preference, or it's spectacularly unlikely to fulfill anyone's intuitively held preferences. (I, of course, intuitively prefer that such preferences be fulfilled.)
How do we prompt that kind of behavior? How can we get people to turn the logical brain on for consequentialism but off for normative ethics?
Given at least moderate quality, upvotes correlate much more tightly with accessibility / scope of audience than quality of writing. Remember, the article score isn't an average of hundreds of scalar ratings -- it's the sum of thousands of ratings of [-1, 0, +1] -- and the default rating of anyone who doesn't see, doesn't care about, or doesn't understand the thrust of a post is 0. If you get a high score, that says more about how many people bothered to process your post than about how many people thought it was the best post ever.
OK, let's say you're right, and people say "awesome" without thinking at all. I imagine Nyan_Sandwich would view that as a feature of the word, rather than as a bug. The point of using "awesome" in moral discourse is precisely to bypass conscious thought (which a quick review of formal philosophy suggests is highly misleading) and access common-sense intuitions.
I think it's fair to be concerned that people are mistaken about what is awesome, in the sense that (a) they can't accurately predict ex ante what states of the world they will wind up approving of, or in the sense that (b) what you think is awesome significantly diverges from what I (and perhaps from what a supermajority of people) think is awesome, or in the sense that (c) it shouldn't matter what people approve of, because the 'right' think to do is something else entirely that doesn't depend on what people approve of.
But merely to point out that saying "awesome" involves no conscious thought is not a very strong objection. Why should we always have to use conscious thought when we make moral judgments?
To say that something's 'consequentialist' doesn't have to mean that it's literally forward-looking about each item under consideration. Like any other ethical theory, consequentialism can look back at an event and determine whether it was good/awesome. If you going white-water rafting was a good/awesome consequence, then your decision to go white-water rafting and the conditions of the universe that let you do so were good/awesome.
Also, this book was a horrible agglomeration of irrelevant and un-analyzed factoids. If you've already read any two Malcolm Gladwell books or Freakonomics, It'd be considerably more educational to skip this book and just read the cards in a Trivial Pursuit box.
The undergrad majors at Yale University typically follow lukeprog's suggestion -- there will be 20 classes on stuff that is thought to constitute cutting-edge, useful "political science" or "history" or "biology," and then 1 or 2 classes per major on "history of political science" or "history of history" or "history of biology." I think that's a good system. It's very important not to confuse a catalog of previous mistakes with a recipe for future progress, but for the same reasons that general history is interesting and worthwhile for the general public to know something about, the history of a given discipline is interesting and worthwhile for students of that discipline to look into.