Posts
Comments
I think you’re lumping “the ultimate goal” and “the primary mode of thinking required to achieve the ultimate goal” together erroneously. (But maybe the hypothetical person you’re devilishly advocating for doesn’t agree about utilitarianism and instrumentality?)
Re also also: the Reverse Streetlight effect will probably come into play. It’ll optimize not just for early deception, but for any kind of deception we can’t detect.
You’re saying that on priors, the humans are manipulative?
What do you mean by “you don’t grapple with the hard problem of consciousness”? (Is this just an abstruse way of saying “no, you’re wrong” to set up the following description of how I’m wrong? In that case, I’m not sure you have a leg to stand on when you say that I use “a lot of words”.) Edit: to be a bit more charitable, maybe it means “my model has elements that my model of your model doesn’t model”.
How can you know I see the same thing that you do? That depends on what you mean by “same”. To me, to talk about whether things are the same, we need to specify what characteristics we care about, or what category system we’re using. I know what it means for two animals to be of the same species, and what it means for two people to have the same parent. But for any two things to be the same, period, doesn’t really mean anything on its own. (You could argue that everything is the same as itself, but that’s a trivial case.)
This might seem like I’m saying that there isn’t any fundamental truth, only many ways of splitting the world up into categories. Not exactly. I don’t think there’s any fundamental truth to categories. There might be fundamental monads, or something like that, but human subjective experiences are definitely not fundamental. (And what truths can even be said of a stateless monad when considered on its own?)
For question 2, I think the human-initiated nature of AI risk could partially explain the small distance between ability and need. If we were completely incapable of working as a civilization, other civilizations might be a threat, but we wouldn’t have any AIs of our own, let alone general AIs.
I can’t tell if you already know this, but “infinite explanatory power” is equivalent to no real explanatory power. If it assigns equal probability to everything then nothing can be evidence in favor of it, and so on.
I'd assume the opposite, since I don't think physicists (and other thermodynamic scientists like some chemists) make up a majority of LW readers, but it's irrelevant. I can (and did) put both forms side-by-side to allow both physicists and non-physicists to better understand the magnitude of the temperature difference. (And since laymen are more likely to skim over the number and ignore the letter, it's disproportionately more important to include Fahrenheit.)
Edit: wait, delta-K is equivalent to delta-C. In that case, since physicists metric-users might make up the majority of LW readers, you're probably right about the number of users.
I think a "subjective experience" (edit: in the sense that two people can have the same subjective experience; not a particular instantiation of one) is just a particular (edit: category in a) categorization of possible experiences, defined by grouping together experiences that put the [person] into similar states (under some metric of "similar" that we care about). This recovers the ability to talk about "lies about subjective experiences" within a physicalist worldview.
In this case, we could look at how the AI internally changes in response to various stimuli, and group the stimuli on the basis of similar induced states. If this grouping doesn't match to its claims at all, then we can conclude that it is perhaps lying. (See: cleaving reality at its joints.) EDIT: Were you saying that AI cannot have subjective experience? Then I think this points at the crux; see my statements below about how I don't see human subjectivity as fundamentally special.
Yes, this means that we can talk about any physical thing having a "subjective experience". This is not a bug. The special thing about animals is that they have significant variance between different "subjective experiences", whereas a rock will react very similarly to any stimuli that don't break or boil it. Humans are different because they have very high meta-subjectivity and the ability to encode their "subjective experiences" into language. However, this still doesn't match up very well to human intuitions: any sort of database or measurement device can be said to have significant "subjective experiences". But my goal isn't to describe human intuitions; it's to describe the same thing that human intuitions describe. Human subjectivity doesn't seem to be fundamentally different from that of any other physical system.
He never said "will land heads", though. He just said "a flipped coin has a chance of landing heads", which is not a timeful statement. EDIT: no longer confident that this is the case
Didn't the post already counter your second paragraph? The subjective interpretation can be a superset of the propensity interpretation.
When you say "all days similar to this one", are you talking about all real days or all possible days? If it's "all possible days", then this seems like summing over the measures of all possible worlds compatible with both your experiences and the hypothesis, and dividing by the sum of the measures of all possible worlds compatible with your experiences. (Under this interpretation, jessicata's response doesn't make much sense; "similar to" means "observationally equivalent for observers with as much information as I have", and doesn't have a free variable.)
I was going to say "bootstraps don't work that way", but since the validation happens on the future end, this might actually work.
Since Eliezer is a temporal reductionist, I think he might not mean "temporally continuous", but rather "logical/causal continuity" or something similar.
Discrete time travel would also violate temporal continuity, by the way.
But where do we get Complexity(human)?
Note: since most global warming statistics are presented to the American layman in degrees Fahrenheit, it is probably useful to convert 0.7 K to 1.26 F.
One might think eliminativism is metaphysically simpler but reductionism doesn’t really posit more stuff, more like just allowing synonyms for various combinations of the same stuff.
I don't think Occam's razor is the main justification for eliminativism. Instead, consider the allegory of the wiggin: if a category is not natural, useful, or predictive, then in common English we say that the category "isn't real".
The Transcension hypothesis attempts to answer the Fermi paradox by saying that sufficiently advanced civilizations nearly invariably leave their original universe for one of their own making. By definition, a transcended civilization would have the power to create or manipulate new universes or self-enclosed pockets; this would likely require a very advanced understanding of physics. This understanding would probably be matched in other sciences.
This is my impression from a few minutes of searching. I do not know why you asked the question of “what it is” when a simple search would have been faster. I do not expect that many people here are very knowledgeable about this particular hypothesis, and this is a basic question anyway.
The hypothesis does not seem very likely to me. It claims that transcendence is the inevitable evolutionary result of civilizations, but in nature we observe many niches. Civilizations are less like individuals in a species, and more like species themselves. And since a single civilization can colonize a galaxy, it would only take one civilization to produce a world unlike the one we see today - there would have to be not only no other niches, but no mutants either.
I don’t think Transcension is a term commonly used here. This question would probably be better answered by googling.
I think that people treat IQ as giving more information than it actually does. The main disadvantage is that you will over-adjust for any information you receive.
What does it mean to "revise Algorithm downward"? Observing doesn't seem to indicate much about the current value of . Or is Algorithm shorthand for "the rate of increase of Algorithm"?
Back-of-the-envelope equilibrium estimate: if we increase the energy added to the atmosphere by 1%, then the Stefan-Boltzmann law says that a blackbody would need to be warmer, or 0.4%, to radiate that much more. At the Earth's temperature of ~288 K, this would be ~0.7 K warmer.
This suggests to me that it will have a smaller impact than global warming. Whatever we use to solve global warming will probably work on this problem as well. It's still something to keep in mind, though.
I agree that #humans has decreasing marginal returns at these scales - I meant linear in the asymptotic sense. (This is important because large numbers of possible future humans depend on humanity surviving today; if the world was going to end in a year then (a) would be better than (b). In other words, the point of recovering is to have lots of utility in the future.)
I don't think most people care about their genes surviving into the far future. (If your reasoning is evolutionary, then read this if you haven't already.) I agree that many people care about the far future, though.
Epistemic status: elaborating on a topic by using math on it; making the implicit explicit
From an collective standpoint, the utility function over #humans looks like this: it starts at 0 when there are 0 humans, slowly rises until it reaches "recolonization potential", then rapidly shoots up, eventually slowing down but still linear. However, from an individual standpoint, the utility function is just 0 for death, 1 for life. Because of the shape of the collective utility function, you want to "disentangle" deaths, but the individual doesn't have the same incentive.
Useful work consumes negentropy. A closed system can only do so much useful work. (However, reversible computations may not require work.)
What do you mean by infinite IQ? If I take you literally, that's impossible because the test outputs real numbers. But maybe you mean "unbounded optimization power as time goes to infinity" or something similar.
I'm not sure how magically plausible this is, but Dumbledore could have simplified the chicken brain dramatically. (See the recent SSC posts for how the number of neurons of an animal correlates with our sense of its moral worth.) Given that the chicken doesn't need to eat, reproduce, or anything else besides stand and squawk, this seems physically possible. It would be ridiculously difficult without magic, but wizards regularly shrink their brains down to animal size, so apparently magic is an expert neuroscientist. If this was done, the chicken would have almost no moral worth, so it would be permissible to create and torture it.
Another vaguely disconcertingly almost self-aware comment by the bot. It can, in fact, write impressively realistic comments in 10 seconds.
I think “typical X does Y” is shorthand for “many or most Xs do Y”.
That last parenthetical remark is funny when you consider how GPT-2 knows nothing new but just reshuffles the “interesting and surprising amount of writing by smart people”.
Ah. It’s a bot. I suppose the name should have tipped me off. At least I get Being More Confused By Fiction Than Reality points.
How did you write that in less than a minute?
I’m confused. Are you saying that highly-upvoted posts make a name nicer and therefore less useful? If so, can you describe the mechanisms behind this?
Can you personally (under your own power) and confidently prove that a particular tool will only recursively-trust safe-and-reliable tools, where this recursive tree reaches far enough to trust superhuman AI?
On the other hand, you can "follow" the tree for a distance. You can prove a calculator trustworthy and use it in your following proofs, for instance. This might make it more feasible.
I agree that there's a monetary incentive for more people to write clickbait, but the mechanism the post described was "naturally clickbaity people will get more views and thus more power," and that doesn't seem to involve money at all.
Which I suppose could be termed "infinitely confused", but that feels like a mixing of levels. You're not confused about a given probability, you're confused about how probability works.
Or alternatively, it's a clever turn of phrase: "infinitely confused" as in confused about infinities.
I'll try my hand at Tabooing and analyzing the words. Epistemic status: modeling other people's models.
Type A days are for changing from a damaged/low-energy state into a functioning state, while Type B days are for maintaining that functioning state by allowing periodic breaks from stressors/time to satisfy needs/?.
I think Unreal means Recovery as in "recovering from a problematic state into a better one". I'm not sure what's up with Rest - I think we lack a good word for Type B. "Rest" is peaceful/slackful, which is right, but it also seems inactive/passive which doesn't match the intended meaning. If you emphasize the inactivity/passivity of Rest then it fits better with Type A. (I think this partly explains the reversal.)
the paperclipper, which from first principles decides that it must produce infinitely many paperclips
I don't think this is an accurate description of the paperclip scenario, unless "first principles" means "hardcoded goals".
Future GPT-3 will be protected from hyper-rational failures because of the noisy nature of its answers, so it can't stick forever to some wrong policy.
Ignoring how GPT isn't agentic and handwaving an agentic analogue, I don't think this is sound. Wrong policies make up almost all of policyspace; the problem is not that the AI might enter a special state of wrongness, it's that the AI might leave the special state of correctness. And to the extent that GPT is hindered by its randomness, it's unable to carry out long-term plans at all - it's safe only because it's weak.
But isn’t the gauge itself a measurement which doesn’t perfectly correspond to that which it measures? I’m not seeing a distinction here.
Here’s my understanding of your post: “the map is not the territory, and we always act to bring about a change in our map; changes in the territory are an instrumental subgoal or an irrelevant side effect.” I don’t think this is true. Doesn’t that predict that humans would like wireheading, or “happy boxes” (virtual simulations that are more pleasant than reality)?
(You could respond that “we don’t want our map to include a wireheaded self.” I’ll try to find a post I’ve read that argues against this kind of argument.)
Obvious AI connection: goal encapsulation between humans relies on commonalities, such as mental frameworks and terminal goals. These commonalities probably won’t hold for AI: unless it’s an emulation, it will think very differently from humans, and relying on terminal agreements doesn’t work to ground terminal agreement in the first place. Therefore, we should expect it to be very hard to encapsulate goals to an AI.
(Tool AI and Agent AI approaches suffer differently from this difficulty. Agents will be hard to terminally align, but once we’ve done that, we can rely on terminal agreement to flesh out plans. Tools can’t use recursive trust like that, so they’ll need to explicitly understand more instrumental goals.)
Thanks for the explanation, and I agree now that the two are too different to infer much.
I’ve seen this done in children’s shows. There’s a song along with subtitles, and an object moves to each written word as it is spoken.
I think its arguments are pretty bad. “If you get hurt, that’s bad. If you get hurt then die, that’s worse. If you die without getting hurt, that’s just as bad. Therefore it’s bad if one of your copies dies.” It equivocates and doesn’t address the actual immortality.
On the “Darwin test”: note that memetic evolution pressure is not always aligned with individual human interests. Religions often encourage their believers to do things that help the religion at the believers’ expense. If the religion is otherwise helpful, then its continued existence may be important, but this isn’t why the religion does that.
But if you spend more time thinking about exercise, that time cost is multiplied greatly. I think this kind of countereffect cancels out every practical argument of this type.
If hunger is a perception, then “we eat not because we’re hungry, but rather because we perceive we’re hungry” makes much less sense. Animals generally don’t have metacognition, yet they eat, so eating doesn’t require perceiving perception. It’s not that meta.
What do you mean by “when we eat we regulate perception”? Are you saying that the drive to eat comes from a desire to decrease hunger, where “decrease” is regulation and “hunger” is a perception?
Begone, spambot. (Is there a “report to moderators” button? I don’t see one on mobile.)
I think this is the idea: people can form habits, and habits have friction - you'll keep doing them even if they're painful (they oppose momentary preferences, as opposed to reflective preferences). But you probably won't adopt a new habit if it's painful. Therefore, to successfully build a habit that changes your actions from momentary to reflective, you should first adopt a habit, then make it painful - don't combine the two steps.
When content creators get paid for the number of views their videos have, those whose natural way of writing titles is a bit more clickbait-y will tend to get more views, and so over time accumulate more influence and social capital in the YouTube community, which makes it harder for less clickbait-y content producers to compete.
Wouldn't this be the case regardless of whether clickbait is profitable?
Ugh. I was distracted by the issue of "is Deep Blue consequentialist" (which I'm still not sure about; maximizing the future value of a heuristic doesn't seem clearly consequentalist or non-consequentialist to me), and forgot to check my assumption that all consequentialists backchain. Yes, you're entirely right. If I'm not incorrect again, Deep Blue forwardchains, right? It doesn't have a goal state that it works backward from, but instead has an initial state and simulates several actions recursively to a certain depth, choosing the initial action that maximizes the expected heuristic of the bottom depth. (Ways I could be wrong: this isn't how Deep Blue works, "chaining" means something more specific, etc. But Google isn't helping on either.)
I'm confused. I already addressed the possibility of modeling the external world. Did you think the paragraph below was about something else, or did it just not convince you? (If the latter, that's entirely fine, but I think it's good to note that you understand my argument without finding it persuasive. Conversational niceties like this help both participants understand each other.)
An AI might model a location that happens to be its environment, including its own self. But if this model is not connected in the right way to its consequentialism, it still won't take over the world. It has to generate actions within its environment to do that, and language models simply don't work that way.
Or to put it another way, it understands how the external world works, but not that it's part of the external world. It doesn't self-model in that way. It might even have a model of itself, but it won't understand that the model is recursive. Its value function doesn't assign a high value to words that its model says will result in its hardware being upgraded, because the model and the goals aren't connected in that way.
T-shirt slogan: "It might understand the world, but it doesn't understand that it understands the world."
You might say "this sort of AI won't be powerful enough to answer complicated technical questions correctly." If so, that's probably our crux. I have a reference class of Deep Blue and AIXI, both of which answer questions at a superhuman level without understanding self-modification, but the former doesn't actually model the world and AIXI doesn't belong in discussions of practical feasibility. So I'll just point at the crux and hope you have something to say about it.
You might say, as Yudkowsky has before, "this design is too vague and you can attribute any property to it that you like; come back when you have a technical description". If so, I'll admit I'm just a novice speculating on things they don't understand well. If you want a technical description then you probably don't want to talk to me; someone at OpenAI would probably be much better at describing how language models work and what their limitations are, but honestly anyone who's done AI work or research would be better at this than me. Or you can wait a decade and then I'll be in the class of "people who've done AI work or research".
Why do you think that non-consequentialists are more limited than humans in this domain? I could see that being the case, but I could also have seen that being the case for chess, and yet Deep Blue won't take over the world even with infinite compute. (Possible counterpoint: chess is far simpler than language.)
"But Deep Blue backchains! That's not an example of a superhuman non-consequentialist in a technical domain." Yes, it's somewhat consequentialist, but in a way that doesn't have to do with the external world at all. The options it generates are all of the form "move [chess piece] to [location]." Similarly, language models only generate options of the form "[next word] comes next in [context]." No [next word] will result in the model attempting to seize more resources and recursively self-improve.
This is why I said "a consequentialist that models itself and its environment". But it goes even further than that. An AI might model a location that happens to be its environment, including its own self. But if this model is not connected in the right way to its consequentialism, it still won't take over the world. It has to generate actions within its environment to do that, and language models simply don't work that way.
Another line of thought: AIXI will drop an anvil on its head - it doesn't understand self-change. FOOM/Computronium is actually even more stringent: it has to be a non-Cartesian consequentialist that models itself in its environment. You need have to have solved the Embedded Agent problems. Now, people will certainly want to solve these at some point and build a FOOM-capable AI. It's probably necessary to solve them to build a generally intelligent AI that interacts sensibly with the world on its own. But I don't think you need to solve them to build a language model, even a superintelligent language model.