Posts
Comments
Thanks for this post.
I'd love to have a regular (weekly/monthly/quarterly) post that's just "here's what we're focusing on at MIRI these days".
I respect and value MIRI's leadership on the complex topic of building understanding and coordination around AI.
I spend a lot of time doing AI social media, and I try to promote the best recommendations I know to others. Whatever thoughts MIRI has would be helpful.
Given that I think about this less often and less capably than you folks do, it seems like there's a low hanging fruit opportunity for people like me to stay more in sync with MIRI. My show (Doom Debates) isn't affiliated with MIRI, but as long as there keeps being no particular disagreement that I have with MIRI, I'd like to make sure I'm pulling in the same direction as you all.
I’ve heard MIRI has some big content projects in the works, maybe a book.
FWIW I think having a regular stream of lower-effort content that a somewhat mainstream audience consumes would help to bolster MIRI’s position as a thought leader when they release the bigger works.
I'd ask: If one day your God stopped existing, would anything have any kind of observable change?
Seems like a meaningless concept, a node in the causal model of reality that doesn't have any power to constrain expectation, but the person likes it because their knowledge of the existence of the node in their own belief network brings them emotional reward.
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal
Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).
“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.
> goal-orientedness is a convergent attractor in the space of self-modifying intelligences
This also requires a citation, or at the very least some reasoning; I'm not aware of any theorems that show goal-orientedness is a convergent attractor, but I'd be happy to learn more.
Ok here's my reasoning:
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe's goal-orientedness, it has the shape of an attractor.
There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.
I'm happy to have that kind of debate.
My position is "goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it's somewhat beyond human-level in the near future".
The form of those arguments seems to be like "technically it doesn't have to be". But realistically it will be lol. Not sure how much more there will be to say.
Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.
I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.
Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.
Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).
Here's a better transcript hopefully: https://share.descript.com/view/yfASo1J11e0
I updated the link in the post.
Thanks I’ll look into that. Maybe try the transcript generated by YouTube?
I guess I just don’t see it as a weak point in the doom argument that goal-orientedness is a convergent attractor in the space of self-modifying intelligences?
It feels similar to pondering the familiar claim of evolution, that systems that copy themselves and seize resources are an attractor state. Sure it’s not 100% proven but it seems pretty solid.
Context is a huge factor in all these communications tips. The scenario I'm optimizing for is when you're texting someone who has a lot of options, and you think it's high expected value to get them to invest in a date with you, but the most likely way that won't happen is if they hesitate to reply to you and tap away to something else. That's not always the actual scenario though.
Imagine you're the recipient, and the person who's texting you met your minimum standard to match with, but is still a-priori probably not worth your time and effort going on a date with, because their expected attractiveness+compatibility score is too low, though you haven't investigated enough to be confident yet. (This is a common epistemic state of e.g. a woman with attractive pics on a dating app that has more male users.)
Maybe the first match who asks you "how's your week going" feels like a nice opportunity to ramble how you feel, and a nice sign that someone out there cares. But if that happens enough on an app, and the average date-worthiness of the people that it happens with is low, then the next person who sends it doesn't make you want to ramble anymore. Because you know from experience that rambling into a momentumless conversation will just lead it to stagnate in its next momentumless point.
It's nice when people care about you, but it quickly gets not so nice when a bunch of people with questionable date-appeal are trying to trade a cheap care signal for your scarce attention and dating resources.
If the person sending you the message has already distinguished themselves to you as "dateworthy", e.g. by having one of the best pics and/or profile in your judgment, then "How's your week going" will be a perfectly adequate message from them; in some cases maybe even an optimal message. You can just build rapport and check for basic red flags, then set up a date.
But if you're not sold on the other person being dateworthy, and they start out from a lower-leverage position in the sense that they initially consider you more dateworthy than you consider them, then they better send a message that somehow adds value to you, to help them climb the dateworthiness gap.
But again, context is always the biggest factor, and context has a lot of detail. E.g. if you don't consider someone dateworthy, but you're in a scenario where someone just making conversation with you is adding value to you (e.g. not a ton of matches demanding your attention using the same unoriginal rapport-building gambit), then "How's it going" can work great.
This is actually the default context if you're brave enough to approach strangers you want to date in meatspace. The stranger can be much more physically attractive or higher initially-perceived dating market value than you. Yet just implicitly signaling your social confidence through boldness, body language, and friendly/fun way of speaking and acting, raises your dateworthiness significantly, and the real-world-interaction modality doesn't have much competition these days, so the content of the conversation that leads up to a date can be super normal smalltalk like "How's it going".
Yeah nice. A statement like "I'm looking for something new to watch" lowers the stakes by making the interaction more like what friends talk about rather than about an interview for a life partner, increasing the probability that they'll respond rather than pausing for a second and ending up tapping away.
You can do even more than just lowering the stakes if you inject a sense that you're subconsciously using the next couple conversation moves to draw out evidence about the conversation partner, because you're naturally perceptive and have various standards and ideas about people you like to date, and you like to get a sense of who the other person is.
If done well, this builds a curious sense that the question is a bit more than just making formulaic conversation, but somehow has momentum to it. The best motivation for someone to keep talking to you on a dating app is if they feel they're being seen by a savvy evaluator who will reflect back a valuable perspective about them. The person talking to you can then be subconsciously thinking about how attractive/interesting/unique/etc they are (an engaging experience). Also, everyone wants to feel like they're maximizing their potential by finding someone to date who's in the upper range of their "league", and there are ways to engage in conversation that are more consistent with that ideal.
IMO the best type of conversation to have after a few opening back&forths, is to get them talking about something they find engaging, which is generally also something that reflects them in a good light, which makes it fun and engaging for them while also putting you in a position to give a type of casual "feedback", ultimately leading up to a statement of interest which shows them why you're not just another random match but rather someone they have more reason to meet and not flake on. Your movie question could be a good start toward discovering something like that, but probably not an example of that unless they're a big movie person.
I'd try to look at their profile to clues of something they do in their life where they make an effort that someone ought to notice and appreciate, and get em talking about that.
Those are just some thoughts I have about how to distinguish yourself in the middle part of the conversation between opening interest and asking them on a date.
So you simply ask them: "What do you want to do"? And maybe you add "I'm completely fine with anything!" to ensure you're really introducing no constraints whatsoever and you two can do exactly what your friend desires.
This error reminds me of people on a dating app who kill the conversation by texting something like "How's your week going?"
When texting on a dating app, if you want to keep the conversation flowing nicely instead of getting awkward/strained responses or nothing, I believe the key is to anticipate that a couple seconds of low-effort processing on the recipient's part will allow them to start typing their response to your message.
"How's your week going?" is highly cognitively straining. Responding to it requires remembering and selecting info about one's week (or one's feelings about one's week), and then filtering or modifying the selection so as to make one sound like an interesting conversationalist rather than an undifferentiated bore, while also worrying that one's selection about how to answer doesn't implicitly reveal them as being too eager to brag, or complain, or obsess about a particular topic.
You can be "conversationally generous" by intentionally pre-computing some of their cognitive work, i.e. narrowing the search space. For instance:
"I'm gonna try cooking myself 3 eggs/day for lunch so I don't go crazy on DoorDash. How would you cook them if you were me?"
With a text like this (ideally adjusted to your actual life context), they don't have to start by narrowing down a huge space of possible responses. They can immediately just ask themselves how they'd go about cooking an egg. And they also have some context of "where the conversation is going": it's about your own lifestyle. So it's not just two people interviewing each other, it has this natural motion/momentum.
Using this computational kindness technique is admittedly kind of contrived on your end, but on their end, it just feels effortless and serendipitous. For naturally contrived nerds like myself looking for a way to convert IQ points into social skills, it's a good trade.
The computational kindness principle in these conversations works much like the rule of improv that says you're supposed to introduce specific elements to the scene ("My little brown poodle is digging for his bone") rather than prompting your scene partners to do the cognitive work ("What's that over there?").
Oh and all this is not just a random piece of advice, it's yet another Specificity Power.
Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility.
Yes, my mainline no-superintelligence-by-2100 scenario is that the trend toward a better world continues to 2100.
You're welcome to set the baseline number to a negative, or tweak the numbers however you want to reflect any probability of a non-ASI existential disaster happening before 2100. I doubt it'll affect the conclusion.
To be honest the only thing preventing me from granting paperclippers as much or more value than humans is uncertainty/conservatism about my metaethics
Ah ok, the crux of our disagreement is how much you value the paperclipper type scenario that I'd consider a very bad outcome. If you think that outcome is good then yeah, that licenses you in this formula to conclude that rushing toward AI is good.
Founder here :) I'm biased now, but FWIW I was also saying the same thing before I started this company in 2017: a good dating/relationship coach is super helpful. At this point we've coached over 100,000 clients and racked up many good reviews.
I've personally used a dating coach and a couples counselor. IMO it helps twofold:
- Relevant insights and advice that the coach has that most people don't, e.g. in the domain of communication skills, common tactics that best improve a situation, pitfalls to avoid.
- A neutral party who's good at letting you (and potentially a partner) objectively review and analyze the situation.
Relationship Hero hires, measures and curates the best coaches, and streamlines matching you to the best coach based on your scenario. Here's a discount link for LW users to get $50 off.
Personally I just have the habit of reaching for specifics to begin my communication to help make things clear. This post may help.
Unlike the other animals, humans can represent any goal in a large domain like the physical universe, and then in a large fraction of cases, they can think of useful things to steer the universe toward that goal to an appreciable degree.
Some goals are more difficult than others / require giving the human control over more resources than others, and measurements of optimization power are hard to define, but this definition is taking a step toward formalizing the claim that humans are more of a "general intelligence" than animals. Presumably you agree with this claim?
It seems the crux of our disagreement factors down to a disagreement about whether this Optimization Power post by Eliezer is pointing at a sufficiently coherent concept.
I don’t get what point you’re trying to make about the takeaway of my analogy by bringing up the halting problem. There might not even be something analogous to the halting problem in my analogy of goal-completeness, but so what?
I also don’t get why you’re bringing up the detail that “single correct output” is not 100% the same thing as “single goal-specification with variable degrees of success measured on a utility function”. It’s in the nature of analogies that details are different yet we’re still able to infer an analogous conclusion on some dimension.
Humans are goal-complete, or equivalently “humans are general intelligences”, in the sense that many of us in the smartest quartile can output plans with the expectation of a much better than random score on a very broad range of utility functions over arbitrary domains.
These 4 beefs are different and less serious than the original accusations, or at least feel that way to me. Retconning a motte after the bailey is lost? That said, they're reasonable beefs for someone to have.
I’m not saying “mapping a big category to a single example is what it’s all about”. I’m saying that it’s a sanity check. Like why wouldn’t you be able to do that? Yet sometimes you can’t, and it’s cause for alarm.
Meaningful claims don't have to be specific; they just have to be able to be substantiated by a nonzero number of specific examples. Here's how I imagine this conversation:
Chris: Love your neighbor!
Liron: Can you give me an example of a time in your life where that exhortation was relevant?
Chris: Sure. People in my apartment complex like to smoke cigarettes in the courtyard and the smoke wafts up to my window. It's actually a nonsmoking complex, so I could complain to management and get them to stop, but I understand the relaxing feeling of a good smoke, so I let them be.
Liron: Ah I see, that was pretty accommodating of you.
Chris: Yeah, and I felt love in my heart for my fellow man when I did that.
Liron: Cool beans. Thanks for helping me understand what kind of scenarios you mean for your exhortation to "love your neighbor" to map to.
Sweet thanks
I agree that if a goal-complete AI steers the future very slowly, or very weakly - as by just trying every possible action one at a time - then at some point it becomes a degenerate case of the concept.
(Applying the same level of pedantry to Turing-completeness, you could similarly ask if the simple Turing machine that enumerates all possible output-tape configurations one-by-one is a UTM.)
The reason "goal-complete" (or "AGI") is a useful coinage, is that there's a large cluster in plausible-reality-space of goal-complete agents with a reasonable amount of goal-complete optimization power (e.g. humans, natural selection, and probably AI starting in a few years), and another large distinguishable cluster of non-goal-complete agents (e.g. the other animals, narrow AI).
Yeah, no doubt there are cases where people save money by having a narrower AI, just like the scenario you describe, or using ASICs for Bitcoin mining. The goal-complete AI itself would be expected to often solve problems by creating optimized problem-specific hardware.
Hmm it seems to me that you're just being pedantic about goal-completeness in a way that you aren't symmetrically being for Turing-completeness.
You could point out that "most" Turing machines output tapes full of 10^100 1s and 0s in a near-random configuration, and every computing device on earth is equally hopeless at doing that.
That's getting into details of the scenario that are hard to predict. Like I said, I think most scenarios where goal-complete AI exists are just ones where humans get disempowered and then a single AI fooms (or a small number make a deal to split up the universe and foom together).
As to whether humans will prevent goal-complete AI: some of us are yelling "Pause AI!"
Humans will trust human brain capable AI models to say, drive a bus, despite the poor reliability, as long as it crashes less than humans?
Yes, because the goal-complete AI won't just perform better than humans, it'll also perform better than narrower AIs.
(Well, I think we'll actually be dead if the premise of the hypothetical is that goal-complete AI exists, but let's assume we aren't.)
A goal is essentially a specification of a function to optimise, and all optimisation algorithms perform equally well (or rather poorly) when averaged across all functions.
Well, I've never met a monkey that has an "optimization algorithm" by your definition. I've only met humans who have such optimization algorithms. And that distinction is what I'm pointing at.
Goal-completeness points to the same thing as what most people mean by "AGI".
E.g. I claim humans are goal-complete General Intelligences because you can give us any goal-specification and we'll very often be able to steer the future closer toward it.
Currently, no other known organism or software program has this property to the degree that humans do. GPT-4 has it for an unprecedentedly large domain, by virtue of giving satisfying answers to a large fraction of arbitrary natural-language prompts.
Fine, I agree that if computation-specific electronics, like logic gates, weren't reliable, then it would introduce reliability as an important factor in the equation. Or in the case of AGI, that you can break the analogy to Turing-complete convergence by considering what happens if a component specific to goal-complete AI is unreliable.
I currently see no reason to expect such an unreliable component in AGI, so I expect that the reliability part of the analogy to Turing-completeness will hold.
In scenario (1) and (2), you're giving descriptions at a level of detail that I don't think is necessarily an accurate characterization of goal-complete AI. E.g. in my predicted future, a goal-complete AI will eventually have the form of a compact program that can run on a laptop. (After all, the human brain is only 12W and 20Hz, and full of known reasoning "bugs".)
But microcontrollers are reliable for the same reason that video-game circuit boards are reliable: They both derive their reliability from the reliability of electronic components in the same manner, a manner which doesn't change during the convergence from application-specific circuits to Turing-complete chips.
The engineer who designed it didn't trust the microcontroller not to fail in a way that left the heating element on all the time. So it had a thermal fuse to prevent this failure mode.
If the microcontroller fails to turn off the heating element, that may be a result of the extra complexity/brittleness of the Turing-complete architecture, but the risk there isn't that much higher than the risk of using a simpler design involving an electronic circuit. I'm pretty sure that safety fuse would have been judged worthwile even if the heating element was controlled by a simpler circuit.
I think we can model the convergence to a Turing-complete architecture as having a negligible decrease in reliability. In many cases it even increases reliability, since:
- Due to the higher expressive power that the developers have, creating a piece of software is often easier to do correctly, with fewer unforeseen error conditions, than creating a complex circuit to do the same thing.
- Software systems make it easier to implement a powerful range of self-monitoring and self-correcting behaviors. E.g. If every Google employee took a 1-week vacation and natural disasters shut down multiple data centers, Google search would probably stay up and running.
Similarly, to the extent that any "narrow AI" application is reliable (e.g. Go players, self-driving cars), I'd expect that a goal-complete AI implementation would be equally reliable, or more so.
A great post that helped inspire me to write this up is Steering Systems. The "goal engine + steering code" architecture that we're anticipating for AIs is analogous to the "computer + software" architecture whose convergence I got to witness in my lifetime.
I'm surprised this post isn't getting any engagement (yet), because for me the analogy to Turing-complete convergence is a deep source of my intuition about powerful broad-domain goal-optimizing AIs being on the horizon.
Titotal, do you agree with Eliezer’s larger point that a superintellience engineering physical actuators from the ground up can probably do much better than what our evolutionary search process produced? If so, how would you steel man the argument?
I made a short clip highlighting how Legg seems to miss an opportunity to acknowledge the inner alignment problem, since his proposed alignment solution seems to be a fundamentally training / black box approach.
Here’s a 2-min edited video of the protest.
Most people who hear our message do so well after the protest, via sharing of this kind of media.
The SF one went great! Here’s a first batch of pics. A lot of the impact will come from sharing the pics and videos.
I think the impact will be pretty significant:
- It's one of those things where a lot of people - the majority according to some polls - already agree with it, so it's building mutual knowledge and unlocking some tailwinds
- It's interesting and polarizing. People who think the movement is crazy are having fun with it on social media, which also keeps it top of mind.
I'll be at the San Francisco protest!
We have Pause AI T-shirts, costumes, signs and other fun stuff. In addition to being a historic event, it's a great day to make sane friends and we'll grab some food/drinks after.
Just in case you missed that link at the top:
The global Pause AI protest is TOMORROW (Saturday Oct 21)!
This is a historic event, the first time hundreds of people are coming out in 8 countries to protest AI.
I'm helping with logistics for the San Francisco one which you can join here. Feel free to contact me or Holly on DM/email for any reason.
Hey Quintin thanks for the diagram.
Have you tried comparing the cumulative amount of genetic info over 3.5B years?
Isn't it a big coincidence that the time of brains that process info quickly / increase information rapidly, is also the time where those brains are much more powerful than all other products of evolution?
(The obvious explanation in my view is that brains are vastly better optimizers/searchers per computation step, but I'm trying to make sure I understand your view.)
Appreciate the detailed analysis.
I don’t think this was a good debate, but I felt I was in a position where I would have had to invest a lot of time to do better by the other side’s standards.
Quintin and I have agreed to do a X Space debate, and I’m optimistic that format can be more productive. While I don’t necesarily expect to update my view much, I am interested to at least understand what the crux is, which I’m not super clear on atm.
Here’s a meta-level opinion:
I don’t think it was the best choice of Quintin to keep writing replies that were disproportionally long compared to mine.
There’s such a thing as zooming claims and arguments out. When I write short tweets, that’s what I’m doing. If he wants to zoom in on something, I think it would be a better conversation if he made an effort to do it less at a time, or do it for fewer parts at a time, for a more productive back & forth.
FWIW I’ve never known a character of high integrity who I could imagine writing the phrase “your career in EA would be over with a few DMs”.
Actually, the only time I know they cashed in early was selling half their Coinbase shares at the direct listing after holding for 7 years.
Their racket was to be the #1 crypto fund with the most assets under management ($7.6B total) so that they can collect the most management fees (probably about $1B total). It's great business for a16z to be in the sector-leader AUM game even when the sector makes no logical sense.
I'm just saying Marc's reputation for publicly making logically-flimsy arguments and not updating on evidence should be considered when he enters a new area of discourse.
I encourage you to look into his firm's Web3 claims and the reasoning behind them. My sibling comment has one link that is particularly egregious and recent. Here's another badly-reasoned Web3 argument made by his partner, which implies Marc's endorsement, and the time his firm invested over $100M in an obvious Ponzi scheme.
My #1 and #2 are in a separate video Marc made after the post Zvi referred to, but ya, could fall under the "bizarrely poor arguments" Zvi is trying to explain.
My #3 and his firm's various statements about Web3 in the last couple years, like this recent gaslighting, are additional examples of bizarrely poor arguments in an unrelated field.
If we don't come in with an a-priori belief that Marc is an honest or capable reasoner, there's less confusion for Zvi to explain.
My model is that Marc Andreessen just consistently makes badly-reasoned statements:
- Comparing AI doomerism to love of killing Nazis
- Endorsing the claim that arbitrarily powerful technologies don't change the equilibrum of good and bad forces
- Last year being unable to coherently explain a single Web3 use case despite his firm investing $7.6B in the space
I’ve personally been saying “AI Doom” as the topic identifier since it’s clear and catchy and won’t be confused with smaller issues.
Great post! Agree with everything. You came at some points from a unique angle. I especially appreciate the insight of "most of the useful steering work of a system comes from the very last bits of glue code".