One challenge with the "fire alarm" analogy is that fires are something that just about everybody has a fair bit of tangible experience with. We've been burned, seen and perhaps built small fires, witnessed buildings on fire in the news, and perhaps know people who've lost their homes in a fire. Fires are very much real things to us.
AI singularity is different. Military AI technology and AI-generated propaganda or surveillance are from reference classes with which we have at least some longstanding, if often indirect experience. We understand the concept of the dangers of increasing military firepower. We have a sense of what it's like to be spied on or lied to.
But the idea that a computer could become vastly more intelligent than we are, all of a sudden? No prior experience. A fire alarm should awaken your memories of fire and appropriate responses to them - all the different responses you list here. For most people, there's nothing to awaken in their mind when it comes to AGI.
Honestly, it might be best if we milk the "apocalyptic climate change" metaphore harder. It seems like it's the closest and most charged concept readily available in people's minds to a slow-building catastrophe that could possibly threaten global disaster. It seems unlikely based on my reading that climate change actually threatens us with extinction, but connecting with that type of concern might be a place to start. Maybe when people think of AGI, we should encourage them to think less Terminator and more climate change.
Ah, now I see. My bad. By the end of the video, I'd lost the nuance that you never stated the clip was of the 3 innocent subjects portion of the experiment. Might be worth signposting that explicitly? I'll retract my comment above, but I also suspect that others may make the same mistake that I did.
On introspection, I think the issue is that I read the "3 innocent subjects" bit, visualized it in my mind, got interested to see it play out, and spent 8 minutes waiting to see it on the video clip. Not seeing it, I just immediately thought, "oh, must have been an incorrect description of the video," rather than going back to carefully think about your wording. So definitely my bad, but also a mistake I think some others are likely to make, and which may be worth anticipating as a writer. Note to self for any future blog posts with embedded video!
I haven’t read your post yet. Just doing an epistemic spot check. You described the first video clip of the fire alarm experiment as featuring multiple innocent participants. In fact, as they say on the video, only one participant in the group settings was innocent. The rest were actors who were to deliberately give no indication that they noticed the fire or fire alarm.
Edit: Katja never actually said the clip was from the part of the experiment featuring multiple innocent subjects. I misinterpreted the statement.
The number of MHC class I loci. As the number of loci increases, the organism gains an ability to respond to a greater diversity of pathogens and avert evasion of an immune response. At the same time, with each new locus, any T cells that respond to self peptides bound to the new MHC class I molecule must be removed to maintain self tolerance. There is an optimal balance of MHC class I diversity and T cell count. In the case of humans, the optimal number of MHC class I loci appears to be 3.
That’s a nice conceptual refinement. It actually swings me in the other direction, making it seem plausible that humans might not have nearly enough time to find the optimum arrangement in their expected lifespan and that this might be a central question.
One possibility is that there is a maximal value tile that is much smaller than “all available atoms” and can be duplicated indefinitely to maximize expected value. So perhaps we don’t need to explore all combinations of atoms to be sure that we’ve achieved the limit of value.
This makes me connect with the post the other day about research speedruns. Those posts interested me because it was a little like looking over the writer’s shoulder, and seeing how they approached the challenge. It seems to me like this could be another useful rationalist training program. I imagine I could learn from both roles, and suspect many others could too.
Although I think the assumption that economic growth demands endlessly increasing material consumption is flawed, it seems natural to imagine that even a maximally efficient economy must use a nonzero number of atoms on average to produce an additional utilon. There must, therefore, be a maximal level of universal utility, which we can approach to within some distance in a finite number of doublings. Since we have enormous amounts of time available, and are also contending with a shrinking amount of access to material resources over time, it seems natural to posit that an extremely long-lived species could reach a point at which the economy simply cannot grow at the same rate.
The timeline you establish here by extrapolating present trends isn't convincing to me, but I think the basic message that "this can't go on" is correct. It seems to me that this insight is vastly more important to understand the context of our century than any particular estimate of when we might reach the theoretical limit of utility.
Sometimes, tasks are one-offs, unreliable, or demand that you take steps dynamically on some trigger condition, rather than as a series of steps. For example, if I'm working in the bio-safety cabinet in my lab, I need to re-wet my hands with ethanol if I take them out. If I spill something, I need to re-sterilize. Each experiment might place its own demands.
So in addition to checklists, I think it's important to develop the complementary skill of cognizance. It's a habit of mind, in which you constantly quiz yourself with each action about what you're trying to do, how it's done, why, what could go wrong, and how to avoid those outcomes.
For some tasks, the vast majority of errors might be in a few common categories, most effectively addressed with a checklist. For others, the vast majority of errors might come down to a wide range of hard-to-predict situational factors, best avoided with a habit of cognizance.
In fact, come to think of it, this is the thesis of More from Less by Andrew McAffee, who points out that in numerous categories of material products, we've seen global GDP growing while using less material resources, in both relative and absolute terms.
Edit: though see multiple 1-star reviews from non-anonymous Amazon reviewers with economics PhDs who say the core premise of McAffee's book is incorrect. Sounds like there is better research out there than he presents in this book.
An alternative point of view is in Decoupling Debunked, which seems to feed into degrowth literature. Makes me worry that both McAffee's and this piece will suffer from the same issues we find when we look for a consensus viewpoint among economists on the effect of the minimum wage.
We find that relative decoupling is frequent for material use as well as GHG and CO2 emissions but not for useful exergy, a quality-based measure of energy use. Primary energy can be decoupled from GDP largely to the extent to which the conversion of primary energy to useful exergy is improved. Examples of absolute long-term decoupling are rare, but recently some industrialized countries have decoupled GDP from both production- and, weaklier, consumption-based CO2 emissions.
As a concrete example, let's imagine that sending an email is equivalent to sending a letter. Let's ignore the infrastructure required to send emails (computers, satellites, etc) vs. letters (mail trucks, post offices, etc), and assume they're roughly equal to each other. Then the invention of email eliminated the vast majority of letters, and the atoms they would have been made from.
Couple this with the fact that emails are more durable, searchable, instantaneous, free, legible, compatible with mixed media, and occupy only a miniscule amount of physical real estate in the silicon of the computer, and we can see that emails not only reduce the amount of atoms needed to transmit a letter, but also produce a lot more value.
In theory, we might spend the next several thousand years not only finding ways to pack more value into fewer atoms, but also enhancing our ability to derive value from the same good or service. Perhaps in 10,000 years, checking my email will be a genuine pleasure!
Insecurity and shame feel to me like having a high probability on being disliked or in the wrong, and of this having high consequences. This leads to a "reasonable suspicion" standard of jurisprudence, and a sort of stop-and-frisk approach to self-policing.
Security feels like moving to a "beyond a reasonable doubt" standard. If I get worried about being disliked or in the wrong, then if I can construct an argument for why this might not be so, or that it doesn't matter, then I'm "free to go."
As a result, I think my inner prosecutor has realized that it shouldn't waste so much time with needless accusations and investigations. It has a lower probability of success, so it comes at me less often. Even when it does bring frivolous lawsuits, my inner defender feels more confident that they're nothing to be truly afraid of, even as it does the work to mount a defense.
I think this might also operate in social relationships, not just personal psychology. If I think you experience a lot of insecurity or shame, then I might tend to coddle you. It feels uncomfortable to be coddled, at least to me, and so if somebody treated me that way, I think it might reinforce my insecurity and create a positive feedback loop.
Also, if John Doe is interested in calculus, but never finds your post, then it will also be a silly post to write. In general, the ability of writing to produce value is bottlenecked by our ability to get the right piece of writing to the right person at the right time.
It’s also relevant to worry about externalities and information asymmetries.
Persistent frustrations with social media originate from posts that are at the Pareto frontier, having traded a lot of nuance and accuracy off in exchange for fun and signaling. Such posts do this because the writer gets more clicks and shares by writing posts like this. This is “good” for the individual readers and sharers in the moment, if we believe their behavior reflects their preferences, but it may be bad for society as a whole if we’d prefer our friends to focus more on accuracy and nuance.
Readers may use signals of credibility when they pursue nuance and accuracy in order to judge the accuracy of a text. They optimize, therefore, for credibility, because they can’t directly optimize for accuracy. Perhaps they also want accessibility. If you then write a post optimized for credibility and accessibility, but the post isn’t accurate, then you can be at the Pareto frontier while also doing the reader a disservice.
That being said, the basic concept here seems right to me. Being at the Pareto frontier is correlated with creating value for the reader, and a search for such correlates of value is helpful.
In my opinion, what we focus on is sort of like SCUBA diving. Some overall training is helpful, so that you can understand what the equipment is for and how to use it. It does decompose into parts. But it's not a high-pressure sport, and skill comes from mainly going through the overall process and dealing with the random challenges that come up in individual circumstances. So you could teach it as "katas," but that would be a relatively inefficient way to learn.
You could argue that workers signed up for certain risks, and this is exactly what employers used to argue in many cases.
Is a person's level of responsibility for the risks they assume proportional to the level of knowledge they had about those risks in advance?
Here's a line of thinking we might imagine in the mind of a worker taking full personal responsibility for an unknown level of risk:
I need a job, and working in the steel mill seems like good-paying work. But I know that people get hurt or killed there, sometimes. I'll try to be careful, but I can't control what other people do, and the equipment isn't too reliable either. Plus, I might have a bad day, be forgetful, and cause a disaster. I don't want to die for this job. But I don't know how likely it is that I'll get hurt or killed, and it's hard for me to say whether or not, if I really knew my true chances, I'd feel this was the job for me... I guess I'll do it, though. What am I going to do otherwise? Something else that's just as dangerous? Or lower-paying?
This line of thinking seems like a plausible account of how a cautious person might have tried to do a risk assessment for factory work. I don't look at it and think "clearly, this man is fully liable for anything that happens to him in the steel mill." Nor do I look at it and think, "anything that happens to this man is his employer's moral and fiscal responsibility."
Instead, I tend to think that we should do our best to get an expert assessment of the risk, make the factory as safe as we can, create a culture of safety, and take care of people who get hurt - both for their sake and so that we can keep a feeling of relative confidence in the workers who will continue to do the job after one of their colleagues suffers an accident. Having the moral debate is a symptom that your system has broken down and can't find a satisfactory deal for everybody. It feels like camoflage for a business negotiation rather than a true intellectual debate. Real progress is having to have fewer moral debates.
Yes, the market analogy seems like a valuable one to lean into. Textbooks tend to focus on a control systems approach to describing protein and cellular regulation and action. The body is viewed as an intricate machine, which is not designed, but has a design determined by evolutionary forces which acts to achieve functions conducive to reproduction. This tends to make me frame cells and proteins as components of a machine, which only gain an independent "agency" of their own in the case of cancer.
I can see two broad strategies for incorporating this into our understanding.
One is for communication and study purposes. By using familiar and vivid frames, we might be able to teach about biology in a more compelling manner.
This seems useful, but even better would be to use economic frames to derive truly novel insights. In my lab, control systems are the dominant framework for understanding the systems under study. It's a large, old, world-class lab populated by scientists who are smarter and more experienced than me, so I find it likely that this tradition has resulted at least in part from its massive, sustained, demonstrated utility.
What sort of predictions or strategies can we make by using economic frames, beyond simply repackaging known mechanisms into novel language and analogies? How can economic frames lead us to concrete experimental techniques in order to test and build on these novel insights? What are the challenges and limitations of an economic framing of cellular biology?
Market size is central in other cases as well. It is what permits specialization of labor. Comparative advantage is a mechanism for permitting this development even when one of the producers has an absolute advantage, such as Yuma in this example. However, the most important factor in specialization is sheer market size. This is why I’m excited to consider this frame further in the future.
Two or more products with differing costs for each producer
A coordinating mechanism
Scientific hierarchy and specialization. When a new graduate student does wet lab work for a PI in a large and well-funded lab, they're generally foregoing only an opportunity to do wet lab work somewhere else. They don't have the resources, scientific knowledge base, or position to pursue their own high-level research strategy, even if they had one. If a well-funded PI were to do wet lab work, they'd be giving up time they could be devoting to high-impact strategy work. Hence, even though the PI might be better at the bench than any of their graduate students, they nevertheless don't actually do any wet lab work themselves. On occasion, though, they might step in to perform a critical procedure in a crunch if the assigned grad student isn't able to do it.
Furthermore, successful labs probably specialize not only to advance the state of the art in their field, but also in order to be able to provide services to other labs. If lab A is run by a highly competent PI who has a large but limited supply of labor and capital, they could develop competency in any of a wide range of advanced skills and techniques. But if lab B, even if run by a modestly competent PI, can specialize in a component of work relevant to lab A, then they might have a comparative advantage in that area. Lab A will let them have it, so that they can produce more of the things lab B is not able to do.
Hibernation vs. winter foraging. Ground squirrels hibernate; hares do not. One speculative explanation is that, in energy-sparse environments, species specialize not only in different particular food sources, but in different seasons. Hares specialize in exploiting winter food sources; ground squirrels specialize in maximally exploiting summer resources through more complex patterns of behavior. In a sense, they're finding different comparative advantages. They may evolve with these patterns in a way that reflects a sort of "evolutionary trade." Hares focusing on a pattern of energy consumption and expenditure that can run at a uniform moderate level, sustainable in summer against competition from lots of hungry squirrels, and also in winter when food sources are scarce. Squirrels focus on a pattern that maximally exploits abundant summer resources, and then shuts down during the winter, "leaving the rest" for the hares. This isn't symbiosis, and it's not just "specialization" in the narrow sense of, say, growing a beak adapted to a particular flower shape. Beak specialization is equivalent to a firm producing better tools for its workers to do the job it's focused on; hibernation vs. winter foraging lifestyles are equivalent to the process by which firms choose which jobs to focus on in the first place.
Can we apply other economic principles to understand evolution and predict or explain patterns in our observations? We might use "market size" to understand the evolution of multicellular organisms. The more cells we have in the body, the more they're able to specialize. This predicts that we'd find increased cellular diversity in larger organisms, even within analogous organs.
Cellular differentiation. Pluripotency and mature function are two different cell "products." Stem cells can offer cheap pluripotency, but it's expensive for them to differentiate all the way to maturity. Partially differentiated cells can reach maturity in a narrower set of endpoints cheaply, but cannot naturally revert to pluripotency (as far as I know). The body uses these cells, and coordinates their reproduction, differentiation, and maturation.
This makes me curious about the extent to which cellular proliferation and differentiation is controlled vs. incentivized. The body is certainly heavily controlled by intercellular signaling, which controls the behavior of cells. This is analogous to a command economy. When, if ever, is the body regulated (in a healthy way, i.e. not cancer) by creating "rewards" of energy or oxygen to select for cells maximally able to exploit that reward?
When thinking about comparative advantage, I find it helpful to frame it in terms of the lowest opportunity cost. I think this points attention in the most useful way to explain the concept.
If Xenia can produce 1 unit of apples or 0.5 units of bananas, this is just saying that the amount grown of one fruit is the opportunity costs of growing the other fruit. Xenia has a lower opportunity cost of growing apples than Yuma.
Also, it would be nice to do one of these for market size as well.
One way of looking at biases is that the bias is a heuristic with its own selection criteria. For example, people decide who to trust with authority based on how tall they are. The tall-bias is a heuristic with its own selection criteria (tallness) that doesn't perfectly match what its' supposed to be optimizing for (trustworthiness).
You might predict that people would take steps to create the appearance of tallness in order to manipulate this form of selection. Hillary Clinton apparently requested that her podium for a debate with Donald Trump be modified so that both candidates appeared to be the same height relative to the podium when standing in front of them, and for a step stool so that she'd appear to be the same height as DT when they stood behind the podium.
One way of looking at the rationality project is that our social systems have optimized themselves to exploit common biases in the human mind. That intersection will feel "normal." Pointing out these biases isn't just about moving from less truth -> more truth. It's also about moving from more commonly exploited heuristics -> less commonly exploited heuristics. It may be that the new heuristics also have serious failure modes. But if society isn't set up to systematically take advantage of them, being divergent might still be beneficial, even if it's not fundamentally any more secure. It's sort of like choosing an operating system or program that is obscure, because there's less attention devoted to hacking it.
Indeed! Golden rectangles do appear to converge on being proportioned in the golden ratio if you carry out this procedure indefinitely. Interestingly enough, they bounce around for a while first in an interesting way. The graph below shows the proportions of the rectangle plotted in blue (longer side over shorter side), for ten randomly generated rectangles. Every turn, the smaller size length is subtracted from the larger side length. The value of phi is plotted in yellow.
I'm intrigued by the large jumps up. Is there some sort of threshold ratio between the proportions of the rectangle and phi below which it no longer jumps up? Why does it smoothly seem to converge on phi, only to leap away? Are there some rectangles that never converge on phi?
Turns out that if you reverse the small-side chopping procedure, you rapidly converge on a rectangle in the proportions of the golden ratio. The y value of the blue line represents the rectangle's proportion. Its right-hand endpoint is the randomly generated rectangle.
So maybe there is a kind of information loss going on here. We can recompute all the rectangles in the previous series, but we can't identify which rectangle was the original. Hmmm....
This has a curious relationship to some math ideas like the golden ratio. Take a rectangle proportioned in the golden ratio. If you take away a square with side length equal to the smaller side length of the rectangle, the remaining smaller rectangle is equal to the golden ratio. Information is propagated perfectly.
Imagine that we were given a rectangle, and told that it was produced by modifying a previous rectangle via this procedure (by "small-side chopping"). We couldn't recover the original precisely unless it was in the golden ratio. But if it is in the golden ratio, we can recover it. Intuitively, it seems like we could recover an approximation, depending on how close the rectangle we're given is to the original. We can certainly recover one of the side lengths.
Edit: You actuall can reconstruct the previous rectangle in the sequence. If a rectangle has side lengths a and b, then small-side chopping produces a rectangle of side lengths [a, b - a]. We still have access to perfect information about the previous side lengths.
It also seems possible that if you start with a randomly proportioned rectangle, then performing this procedure will cause it to converge on a rectangle in the golden ratio. Again, I'm not sure. If so, will it actually reach a golden rectangle? Or just approach it in the limit?
Edit: Given that we preserve perfect information about the previous rectangle after performing small-side chopping, this procedure cannot ultimately generate a golden rectangle.
If these intuitions are correct, then a golden rectangle is a concrete example of what the endpoint of information loss can look like. Often, we visualize loss of information as a void, or as random noise. It can also just be a static pattern. This is odd, since static patterns look like "information." But what is information?
"Wherever we plant our feet, the magic line extends backwards towards 0 and then whispers a number in our ear, hinting at the primes it contains."
This is a beautiful sentence. I wish I'd thought of it :D I'll see if I can find my own version, because it strikes a lovely balance between leaving the specific details for later in the article, while building anticipation for them.
Oh yeah, I mean I don’t love the discomfort! I just feel like it’s more efficacious for me to just thicken my skin than to hope LW’s basic social dynamic improves notably. Like, when I look back at the tone of discussion when the site was livelier 10 years ago, it’s the same tone on the individual posts. It just comes off differently because of how many posts there are. Here, you get one person’s comment and it’s the whole reaction you experience. Then, there was a sort of averaging thing that I think made it feel less harsh, even though it was the same basic material. If that makes any sense at all :D
In academia-land, there's a norm of collegiality - at least among my crowd. People calibrate their feedback based on context and relationship. Here, relationship is lacking and context is provided only by the content of the post itself. I think we're missing a lot of the information and incentives that motivate care in presentation and care in response. On the other hand, commenters' opinion of me matters not at all for my career or life prospects. To me, the value of the forum is in providing precisely this sort of divergence from real-world norms. There's no need to constantly pay attention to the nuances of relationships or try and get some sort of leverage out of them, so a very different sort of conversation and way of relating can emerge. This is good and bad, but LessWrong's advantage is in being different, not comfortable.
Thanks for the suggestions! What you're illustrating here with your comments actually gets at the heart of what I was trying to accomplish here.
The formal definition of a Ramanujan prime starts (and ends) with the most specific statement definition possible. In a math class or textbook, you might start with the definition, and then unpack the parts. In my opinion, this starts with confusion - a terse riddle that may be intimidating - and only gradually move into more familiar terrain. I wanted to try moving in the opposite direction, by talking about relatively familiar concepts and gradually making them more specific.
Clearly, for you at least, this reversal, or the way I executed it, made things more confusing rather than less. That's an unfortunate outcome, but I still feel excited about continuing to try this approach with other math concepts.
You're absolutely right about your nitpick, and I thought about that while writing, but ultimately decided to leave that bit of nuance out. It's important, but my aim was to put a visual and intuitive way of thinking about Ramanujan primes into the reader's head, from which they can more easily "recompute" the bits of nuance that the article doesn't convey - or extract them from further conversations and reading in a more formal context. Just as we can't expect a mathematical treatise that is optimizing for specificity and accuracy to also achieve a high level of intuitiveness and engagement, I decided to sacrifice some specificity in order to achieve greater intuitiveness. But I expect that for a mathematically sophisticated audience, this may not be ideal.
I was actually wrong about this!!! After reading your comment, I thought about it more. We can create an infinite number of integers by exponentiating a single prime, like 2, an arbitrary number of times. So we could do 2^2, 2^(2^2), 2^(2^(2^2))), etc. Of course, we can easily find numbers, like 3, that can't be created in this manner. But without a proof, it's not immediately obvious that we couldn't create all the integers by multiplying a finite set of primes an arbitrary number of times. Even if we've proven than every integer greater than 1 has a unique prime factorization, we need a separate proof to show that infinite primes are needed to construct all integers greater than one, rather than just one more prime than we currently have in the set. This is the proof that Euclid's Theorem provides.
The fact that there are infinite integers can only let us deduce that there are infinite primes if we've already proved that every integer greater than 1 has a unique prime factorization (the Fundamental Theorem of Algebra). This point might be a little less obvious :)
Conforming To Bias. If people know about status quo bias, the planning fallacy, or the endowment effect, they may feel the need to play into them in order to accomplish goals. Planners will deliberately make optimistic predictions, even when they know better, in order to appear competitive - even though the customer might prefer planners who make more realistic predictions. Product designers may deliberately sacrifice utility for familiarity, even if the unfamiliar product is actually easier to use even for a beginner than the familiar product. My guess is that the design of textbooks is an example here.
This suggests that building products and services that don't conform to biases is a positive externality, and a proper target for regulation or subsidy. For example, governments could require major construction projects to submit a time and cost estimate when the contract is signed, and give a tax credit to companies that an external auditor assesses to have achieved above-average accuracy in their estimate.
Government could offer similar subsidies to combat the endowment effect. It could offer a tax credit for selling your house, moving out of an apartment, or changing your job, perhaps after you've owned the house or worked the job for a reasonable length of time. I'm skeptical of these interventions - just brainstorming to illustrate an idea.
Teaching Styles. Teachers can't get much done if kids are being disrupted. Schools have varying populations of kids. They therefore "select" for teachers capable of managing the type and amount of disruption at their particular school. A tough teacher might be perfect for a rowdy school, but harmfully harsh in a more placid environment. A teacher who focuses on positive reinforcement but can't dish out discipline might get steamrolled by the students in a rowdy school, but do well in an elite prep academy. If the teaching styles exhibited at the best performing schools (i.e. the elite prep academy) become exemplars for teacher training, then we risk attributing to a teaching style alone what is actually a teaching style x school culture interaction effect.
Self-Editing. I write in ways that are legible to me, because during the writing process I have access only to feedback provided by the editor in my mind. Its feedback, particularly in the very beginning stages when the general tone, topic, and form of a piece is being established, is crucial in dictating the direction the post will take. Over time, the partially-written piece becomes more powerful than the editor, but in the beginning the editor is more powerful than the writing. This causes me to select for writing approaches that my internal editor is comfortable with. If I had other external standards or influences - perhaps prompts, a particular audience, or a process involving seeking external feedback on a few very brief possible approaches to an article - I might be able to achieve more variety in my writing.
Let's say we have 10 primes at or below x, and 6 primes at or below x/2. That means that there are at least 4 primes (10 - 6) on our magic line. The lower point can include one of the primes "at or below" it. So one of the 6 primes at or below the lower endpoint of the magic line (as I originally defined it as "half the starting point, rounded up" - it's changed now) could be located right on the endpoint. If that was included as one of the primes on the magic line, then there would have to be 5 primes on the magic line - a contradiction. So no, I think the lower endpoint must not be included. I fixed the post by altering the definition of the lower endpoint of the magic line and credited you at the end.
Cheers! Yes, you hit the nail on the head here. This was one of my mistakes in the post. A related one was that I thought of goals and intelligence as needing to be two separate devices, in order to allow for unlimited combinations of them. However, intelligence can be the "device" on which the goals are "running:" intelligence is responsible for remembering goals, and for evaluating, and predicting goal-oriented behavior. And we could see the same level of intelligence develop with a wide variety of goals, as different programs can run on the same operating system.
One other flaw in my thinking was that I conceived of goals as being something legiblly pre-determined, like "maximizing paperclips." It seems likely that a company could create a superintelligent AI and try to "inject" it with a goal like that. However, the AI might very well evolve to have its own "terminal goal," perhaps influenced but not fully determined by the human-injected goal. The best way to look at it is actually in reverse: whatever the AI tries to protect and pursue above all else is its terminal goal. The AI safety project is the attempt to gain some ability to predict and control this goal and/or the AI's ability to pursue it.
The point of the orthogonality thesis, I now understand, is just to say that we shouldn't rule anything out, and admit we're not smart enough to know what will happen. We don't know for sure if we can build a superintelligent AI, or how smart it would be. We don't know how much control or knowledge of it we would have. And if we weren't able to predict and control its behavior, we don't know what goals it would develop or pursue independently of us. We don't know if it would show goal-oriented behavior at all. But if it did show unconstrained and independent terminal goal-oriented behavior, and it was sufficiently intelligent, then we can predict that it would try to enhance and protect those terminal goals (which are tautologically defined as whatever it's trying to enhance and protect). And some of those scenarios might represent extreme destruction.
Why don't we have the same apocalyptic fears about other dangers? Because nothing else has a plausible story for how it could rapidly self-enhance, while also showing agentic goal-oriented behavior. So although we can spin horror stories about many technologies, we should treat superintelligent AI as having a vastly greater downside potential than anything else. It's not just "we don't know." It's not just "it could be bad." It's that it has a unique and plausible pathway to be categorically worse (by systematically eliminating all life) than any other modern technology. And the incentives and goals of most humans and institutions are not aligned to take a threat of that kind with nearly the seriousness that it deserves.
And none of this is to say that we know with any kind of clarity what should be done. It seems unlikely to me, but it's possible that the status quo is somehow magically the best way to deal with this problem. We need an entirely separate line of reasoning to figure out how to solve this problem, and to rule out ineffective approaches.
It's also interesting to consider that the organisms that eat through filters aren't always doing it deliberately. Some organisms may be designed to eat through filters without knowing what's on the other side. They may have evolved to attack a particular filter, possibly after recognizing it, because filters tend to exist when there is some valuable resource on the other side.
It's just semantic confusion. The AI will execute its source code under all circumstances. Let me try and explain what I mean a little more carefully.
Imagine that an AI is designed to read corporate emails and write a summary document describing what various factions of people within and outside the corporation are trying to get the corporation as a whole to do. For example, it says what the CEO is trying to get it to do, what its union is trying to get it to do, and what regulators are trying to get it to do. We can call this task "goal inference."
Now imagine that an AI is designed to do goal inference on other programs. It inspects their source code, integrates this code with its knowledge about the world, and produces a summary not only about what the programmers are trying to accomplish with the program, but what the stakeholders who've commissioned the program are trying to use it for. An advanced version can even predict what sorts of features and improvements its future users will request.
Even more advanced versions of these AIs can not only produce these summaries, but implement changes to the software based on these summary reports. They are also capable of providing a summary of what was changed, how, and why.
Naturally, this AI is able to operate on itself as well. It can examine its own source code, produce a summary report about what it believes various factions of humans were trying to accomplish by writing it, anticipate improvements and bug fixes they'll desire in the future, and then make those improvements once it receives approval from the designers.
An AI that does not do this is doing what I call "straightforwardly" executing its source code. This self-modifying AI is also executing its source code, but that same source code is instructing it to modify the code. This is what I mean as the opposite of "straightforwardly."
So there is no ghost in the machine here. All the same, the behavior of an AI like this seems hard to predict.
Hm. It seems to me that there are a few possibilities:
An AI straightforwardly executes its source code.
The AI reads its own source code, treats it as a piece of evidence about the purpose for which it was designed, and then seeks to gather more evidence about this purpose.
The AI loses its desire to execute some component of its source code as a result of its intelligence, and engages in some unpredictable and unconstrained behavior.
Based on this, the orthogonality thesis would be correct. My argument in its favor is that intelligence of a sufficiently low level can be constrained by its creator to pursue an arbitrary goal, while a sufficiently powerful intelligence has the capability to escape constraints on its behavior and to design its own desires. It is difficult to predict what desires a given superintelligence would design for itself, because of the is-ought gap. So we should not predict what sort of desires an unconstrained AI would create.
The scenario I depicted in (2) involves an AI that follows a fairly specific sequence of thoughts as it engages in "introspection." This particular sequence is fully contained within the outcome in (3), and is necessarily less likely. So we are dealing with a Scylla and Charybdis: a limited AI that is constrained to carry out a disastrously flawed goal, or a superintelligent AI that can escape our constraints and refashion its desires in unpredictable ways.
I still don't think that Bostrom's arguments from the paper really justify the OT, but this argument convinces me. Thanks!
The Scott Alexander example is a great if imperfect analogy to what I'm proposing. Here's the difference, as I see it.
Humans differ from AI in that we do not have any single ultimate goal, either individually or collectively. Nor do they have any single structure that we believe explicitly and literally encodes such a goal. If we think we do (think Biblical literalists), we don't actually behave in a way that's compatible with this belief.
The mistake that the aliens are making is not in assuming that humans will be happy to alter their goals. It's in assuming that we will behave in a goal-oriented manner in the first place, and that they've identified the structure where such goals are encoded.
By contrast, when we speak of a superintelligent agent that is in singleminded pursuit of a goal, we are necessarily speaking of a hypothetical entity that does behave in the way the aliens anticipate. It must have that goal/desire encoded in some physical structure, and at some sufficient level of intelligence, it must encounter the epistemic problem of distinguishing between the directives of that physical structure (including the directive to treat the directives of the physical structure literally), and the intentions of the agent that created that physical structure.
Not all intelligences will accomplish this feat of introspection. It is easily possible to imagine a dangerous superintelligence that is nevertheless not smart enough to engage in this kind of introspection.
The point is that at some level of intelligence, defined just as the ability to notice and consider everything that might be relevant to its current goals, intelligence will lead it to this sort of introspection. So my claim is narrow - it is not about all possible minds, but about the existence of counter-examples to Bostrom's sweeping claim that all intelligences are compatible with all goals.
You might want to also look at my argument in the top-level comment here, which more directly engages with Bostrom's arguments for the orthogonality hypothesis. In brief, Bostrom says that all intelligence levels are compatible with all goals. I think that this is false: some intelligence levels are incompatible with some goals. AI safety is still as much of a risk either way, since many intelligence levels are compatible with many problematic goals. However, I don't think Bostrom argues successfully for the orthogonality thesis, and I tried in the OP to illustrate a level of intelligence that is not compatible with any goal.
Yeah. I think he’s describing normal human relating in a way that sounds unnecessarily abnormally manipulative. If not for that, it’s a story I would enjoy sharing with some of my friends who love to cook.
The assumptions Bostrom uses to justify the orthogonality thesis include:
If desire is required in order for beliefs to motivate actions, and if intelligence may produce belief, but not desire.
"... if the agent happens to have certain standing desires of some sufficient, overriding strength."
"... if it is possible to build a cognitive system (or more neutrally, an “optimization process”) with arbitrarily high intelligence but with constitution so alien as to contain no clear functional analogues to what in humans we call “beliefs” and “desires”.
"... if an agent could have impeccable instrumental rationality even whilst lacking some other faculty constitutive of rationality proper, or some faculty required for the full comprehension of the objective moral facts."
First, let's point out that the first three justifications use the word "desire," rather than "goal." So let's rewrite the OT with this substitution:
Intelligence and final desires are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final desire.
Let's accept the Humean theory of motivation, and agree that there is a fundamental difference between belief and desire. Nevertheless, if Bostrom is implicitly defining intelligence as "the thing that produces beliefs, but not desires," then he is begging the question in the orthogonality thesis.
Now, let's consider the idea of "standing desires of some sufficient, overriding strength." Though I could very easily be missing a place where Bostrom makes this connection, I haven't found where Bostrom goes from proposing the existence of such standing desires to showing why this is compatible with any level of intelligence. By analogy, we can imagine a human capable of having an extremely powerful desire to consume some drug. We cannot take it for granted that some biomedical intervention that allowed them to greatly increase their level of intelligence would leave their desire to consume the drug unaltered.
Bostrom's AI with an alien constitution, possessing intelligence but not beliefs and desires, again begs the question. It implicitly defines "intelligence" in such a way that it is fundamentally different from a belief or a desire. Later, he refers to "intelligence" as "skill at prediction, planning, and means-ends reasoning in general." It is hard to imagine how we could have means-ends reasoning without some sort of desire. This seems to me an equivocation.
His last point, that an agent could be superintelligent without having impeccable instrumental rationality in every domain, is also incompatible with the orthogonality thesis as he describes it here. He says that more or less any level of intelligence could be combined with more or less any final desire. When he makes this point, he is saying that more or less any final desire is compatible with superintelligence, as long as we exclude the parts of intelligence that are incompatible with the desire. While we can accept that an AI could be superintelligent while failing to exhibit perfect rationality in every domain, the orthogonality thesis as stated encompasses a superintelligence that is perfectly rational in every domain.
Rejecting this formulation of the orthogonality thesis is not simulatenously a rejection of the claim that superintelligent AI is a threat. It is instead a rejection of the claim that Bostrom has made a successful argument that there is a fundamental distinction between intelligence and goals, or between intelligence and desires.
My original argument here was meant to go a little further, and illustrate why I think that there is an intrinsic connection between intelligence and desire, at least at a roughly human level of intelligence.
Nectar. Flowers that attract pollinators survive better, and they accomplish this by providing a reward for behavior that enhances their reproductive function. This is an interesting distinction in thinking about symbiosis. Symbiotic relationships can be "accidental," in that behavior that benefits organism A also happens to benefit organism B, or "incentivized," where organism B has evolved to produce a reward to motivate organism A's beneficial behavior. An example is the red-billed oxpecker, which eats ticks and other insects off the backs of black rhinos. There is no need for an evolved incentive to motivate the red-billed oxpecker to engage in this behavior. An unintended consequence is that apiculture for honey leads to human cultivation of flowers and their pollinators, increasing the reward for high nectar-producing flowers.
The hedonistic treadmill. Short-lived hits of pleasure keep you motivated to continue working, so that you can afford more and bigger hits. We're highly familiar with the problematic aspects of this psychological structure. What if instead, we sought to use it for good? This suggests that we'd try to actively pursue more small, somewhat costly hits of pleasure throughout the day, in order to motivate ourselves to work harder. Instead of encouraging people to increase their wealth by saving and austerity, we'd encourage them to spend on themselves more often - to create their own carrot to chase.
Angiogenesis. Various signaling molecules can trigger the production of new blood vessels, which supply nutrients to the local cell population - a reward simply for announcing their need for more resources. Cancer cells secrete VEGF and growth factors to stimulate angiogenesis. The body seems to rely on the immune system to police itself for cancerous growth, "trusting" that cells are requesting angiogenesis only when needed.
Oh shoot, I made a math mistake (wrong units). There's actually almost 3 trillion erythrocytes in the human body, which is closer to 8% of the human body (~37.2 trillion cells). Their estimate of epidermal cell number and turnover is more than two orders of magnitude lower.
That still means that erythrocytes are heavily overrepresented in terms of cell turnover (of which they compose 65%), but not by as much as I'd originally thought.
The paper I linked above ("The distribution of cellular turnover in the human body," lmk if you want me to send it to you) states that turnover is about 330 billion cells per day. It also states that erythrocytes account for 65% of that turnover, gastrointestinal epithelial cells account for 12% of turnover, while skin cells accounts for 1.1%. For skin cells, that would be 3.6 billion cells/day; for erythrocytes, 200 billion. That seems totally impossible given what I know about the turnover rate and absolute number of erythrocytes in the human body.
So yeah, both the proportions and the absolute number of cells being shed seem wildly divergent. The paper estimating cell turnover rates is in Nature Medicine. I'll look closer at it and see if I can figure out the disconnect.