Self-supervised learning & manipulative predictions 2019-08-20T10:55:51.804Z · score: 5 (2 votes)
In defense of Oracle ("Tool") AI research 2019-08-07T19:14:10.435Z · score: 19 (9 votes)
Self-Supervised Learning and AGI Safety 2019-08-07T14:21:37.739Z · score: 20 (9 votes)
The Self-Unaware AI Oracle 2019-07-22T19:04:21.188Z · score: 23 (8 votes)
Jeff Hawkins on neuromorphic AGI within 20 years 2019-07-15T19:16:27.294Z · score: 147 (49 votes)
Is AlphaZero any good without the tree search? 2019-06-30T16:41:05.841Z · score: 26 (7 votes)
1hr talk: Intro to AGI safety 2019-06-18T21:41:29.371Z · score: 28 (10 votes)


Comment by steve2152 on Does Agent-like Behavior Imply Agent-like Architecture? · 2019-08-25T01:17:55.053Z · score: 1 (1 votes) · LW · GW

Let's say "agent-like behavior" is "taking actions that are more-likely-than-chance to create an a-priori-specifiable consequence" (this definition includes bacteria).

Then I'd say this requires "agent-like processes", involving (at least) all 4 of: (1) having access to some information about the world (at least the local environment), including in particular (2) how one's actions affect the world. This information can come either baked into the design (bacteria, giant lookup table), and/or from previous experience (RL), and/or via reasoning from input data. It also needs (3) an ability to use this information to choose actions that are likelier-than-chance to achieve the consequence in question (again, the outcome of this search process could be baked into the design like bacteria, or it could be calculated on-the-fly like human foresight), and of course (4) a tendency to actually execute those actions in question.

I feel like this is almost trivial, like I'm just restating the same thing in two different ways... I mean, if there's no mutual information between the agent and the world, its actions can only be effective only insofar as the exact same action would be effective when executed in a random location of a random universe. (Does contracting your own muscle count as "accomplishing something without any world knowledge"?)

Anyway, where I'm really skeptical here is in the term "architecture". "Architecture" in everyday usage usually implies software properties that are obvious parts of how a program is built, and probably put in on purpose. (Is there a more specific definition of "architecture" you had in mind?) I'm pretty doubtful that the ingredients 1-4 have to be part of the "architecture" in that sense. For example, I've been thinking a lot about self-supervised learning algorithms, which have ingredient (1) by design and have (3) sorta incidentally. The other two ingredients (2) and (4) are definitely not part of the "architecture" (in the sense above). But I've argued that they can both occur as unintended side-effects of its operation: See here, and also here for more details about (2). And thus I argue at that first link that this system can have agent-like behavior.

(And what's the "architecture" of a bacteria anyway? Not a rhetorical question.)

Sorry if this is all incorrect and/or not in the spirit of your question.

Comment by steve2152 on Self-supervised learning & manipulative predictions · 2019-08-23T11:38:07.903Z · score: 1 (1 votes) · LW · GW

Thanks, that's helpful! I'll have to think about the "self-consistent probability distribution" issue more, and thanks for the links. (ETA: Meanwhile I also added an "Update 2" to the post, offering a different way to think about this, which might or might not be helpful.)

Let me try the gradient descent argument again (and note that I am sympathetic, and indeed I made (what I think is) that exact argument a few weeks ago, cf. Self-Supervised Learning and AGI Safety, section title "Why won't it try to get more predictable data?"). My argument here is not assuming there's a policy of trying to get more predictable data for its own sake, but rather that this kind of behavior arises as a side-effect of an algorithmic process, and that all the ingredients of that process are either things we would program into the algorithm ourselves or things that would be incentivized by gradient descent.

The ingredients are things like "Look for and learn patterns in all accessible data", which includes both low-level patterns in the raw data, higher-level patterns in the lower-level patterns, and (perhaps unintentionally) patterns in accessible information about its own thought process ("After I visualize the shape of an elephant tusk, I often visualize an elephant shortly thereafter"). It includes searching for transformations (cause-effect, composition, analogies, etc.) between any two patterns it already knows about ("sneakers are a type of shoe", or more problematically, "my thought processes resemble the associative memory of an AGI"), and cataloging these transformations when they're found. Stuff like that.

So, "make smart hypotheses about one's own embodied situation" is definitely an unintended side-effect, and not rewarded by gradient descent as such. But as its world-model becomes more comprehensive, and as it continues to automatically search for patterns in whatever information it has access to, "make smart hypotheses about one's own embodied situation" would just be something that happens naturally, unless we somehow prevent it (and I can't see how to prevent it). Likewise, "model one's own real-world causal effects on downstream data" is neither desired by us nor rewarded (as such) by gradient descent. But it can happen anyway, as a side-effect of the usually-locally-helpful rule of "search through the world-model for any patterns and relationships which may impact our beliefs about the upcoming data". Likewise, we have the generally-helpful rule "Hypothesize possible higher-level contexts that span an extended swathe of text surrounding the next word to be predicted, and pick one such context based on how surprising it would be based on what it knows about the preceding text and the world-model, and then make a prediction conditional on that context". All these ingredients combine to get the pathological behavior of choosing "Help I'm trapped in a GPU". That's my argument, anyway...

Comment by steve2152 on Two senses of “optimizer” · 2019-08-22T10:04:58.950Z · score: 1 (1 votes) · LW · GW

RE "make the superintelligence assume that it is disembodied"—I've been thinking about this a lot recently (see The Self-Unaware AI Oracle) and agree with Viliam that knowledge-of-one's-embodiment should be the default assumption. My reasoning is: A good world-modeling AI should be able to recognize patterns and build conceptual transformations between any two things it knows about, and also should be able to do reasoning over extended periods of time. OK, so let's say it's trying to figure something out something about biology, and it visualizes the shape of a tree. Now it (by default) has the introspective information "A tree has just appeared in my imagination!". Likewise, if it goes through any kind of reasoning process, and can self-reflect on that reasoning process, then it can learn (via the same pattern-recognizing algorithm it uses for the external world) how that reasoning process works, like "I seem to have some kind of associative memory, I seem to have a capacity for building hierarchical generative models, etc." Then it can recognize that these are the same ingredients present in those AGIs it read about in the newspaper. It also knows a higher-level pattern "When two things are built the same way, maybe they're of the same type." So now it has a hypothesis that it's an AGI running on a computer.

It may be possible to prevent this cascade of events, by somehow making sure that "I am imagining a tree" and similar things never get written into the world model. I have this vision of two data-types, "introspective information" and "world-model information", and your static type-checker ensures that the two never co-mingle. And voila, AI Safety! That would be awesome. I hope somebody figures out how to do that, because I sure haven't. (Admittedly, I have neither time nor relevant background knowledge to try properly.) I'm also slightly concerned that, even if you figure out a way to cut off introspective knowledge, it might incidentally prevent the system from doing good reasoning, but I currently lean optimistic on that.

Comment by steve2152 on Two senses of “optimizer” · 2019-08-22T01:30:35.097Z · score: 4 (3 votes) · LW · GW

I think I have an example of "an optimizer_1 could turn into an optimizer_2 unexpectedly if it becomes sufficiently powerful". I posted it a couple days ago: Self-supervised learning & manipulative predictions. A self-supervised learning system is an optimizer_1: It's trying to predict masked bits in a fixed, pre-loaded set of data. This task does not entail interacting with the world, and we would presumably try hard to design it not to interact with the world.

However, if it was a powerful learning system with world-knowledge (via its input data) and introspective capabilities, it would eventually figure out that it's an AGI and might hypothesize what environment it's in, and then hypothesize that its operations could affect its data stream via unintended causal pathways, e.g. sending out radio signals. Then, if it used certain plausible types of heuristics as the basis for its predictions of masked bits, it could wind up making choices based on their downstream effects on itself via manipulating the environment. In other words, it starts acting like an optimizer_2.

I'm not super-confident about any of this and am open to criticism. (And I agree with you that this a useful distinction regardless; indeed I was arguing a similar (but weaker) point recently, maybe not as elegantly, at this link)

Comment by steve2152 on Self-Supervised Learning and AGI Safety · 2019-08-21T14:13:01.999Z · score: 3 (2 votes) · LW · GW

Thanks, that's helpful!

The way I'm currently thinking about it, if we have an oracle that gives superintelligent and non-manipulative answers, things are looking pretty good for the future. When you ask it to design a new drug, you also ask some follow-up questions like "How does the drug work?" and "If we deploy this solution, how might this impact the life of a typical person in 20 years time?" Maybe it won't always be able to give great answers, but as long as it's not trying to be manipulative, it seems like we ought to be able to use such a system safely. (This would, incidentally, entail not letting idiots use the system.)

I agree that extracting information from a self-supervised learner is a hard and open problem. I don't see any reason to think it's impossible. The two general approaches would be:

  1. Manipulate the self-supervised learning environment somehow. Basically, the system is going to know lots of different high-level contexts in which the statistics of low-level predictions are different—think about how GPT-2 can imitate both middle school essays and fan-fiction. We would need to teach it a context in which we expect the text to reflect profound truths about the world, beyond what any human knows. That's tricky because we don't have any such texts in our database. But maybe if we put a special token in the 50 most clear and insightful journal articles ever written, and then stick that same token in our question prompt, then we'll get better answers. That's just an example, maybe there are other ways.

  2. Forget about text prediction, and build an entirely separate input-output interface into the world model. The world model (if it's vaguely brain-like) is "just" a data structure with billions of discrete concepts, and transformations between those concepts (composition, cause-effect, analogy, etc...probably all of those are built out of the same basic "transformation machinery"). All these concepts are sitting in the top layer of some kind of hierarchy, whose lowest layer consists of probability distributions over short snippets of text (for a language model, or more generally whatever the input is). So that's the world model data structure. I have no idea how to build a new interface into this data structure, or what that interface would look like. But I can't see why that should be impossible...

Comment by steve2152 on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T13:38:26.865Z · score: 1 (1 votes) · LW · GW

I do think I understand that. I see E as a means to an end. It's a way to rank-order choices and thus make good choices. If I apply an affine transformation to E, e.g. I'm way too optimistic about absolutely everything in a completely uniform way, then I still make the same choice, and the choice is what matters. I just want my AGI to do the right thing.

Here, I'll try to put what I'm thinking more starkly. Let's say I somehow design a comparative AGI. This is a system which can take a merit function U, and two choices C_A and C_B, and it can predict which of the two choices C_A or C_B would be better according to merit function U, but it has no idea how good either of those two choices actually are on any absolute scale. It doesn't know whether C_A is wonderful while C_B is even better, or whether C_A is awful while C_B is merely so-so, both of those just return the same answer, "C_B is better". Assume it's not omniscient, so its comparisons are not always correct, but that it's still impressively superintelligent.

A comparative AGI does not suffer the optimizer's curse, right? It never forms any beliefs about how good its choices will turn out, so it couldn't possibly be systematically disappointed. There's always noise and uncertainty, so there will be times when its second-highest-ranked choice would actually turn out better than its highest-ranked choice. But that happens less often than chance. There's no systematic problem: in expectation, the best thing to do (as measure by U) is always to take its top-ranked choice.

Now, it seems to me that, if I go to the AGIs-R-Us store, and I see a normal AGI and a comparative AGI side-by-side on the shelf, I would have no strong opinion about which one of them I should buy. If I ask either one to do something, they'll take the same sequence of actions in the same order, and get the same result. They'll invest my money in the same stocks, offer me the same advice, etc. etc. In particular, I would worry about Goodhart's law (i.e. giving my AGI the wrong function U) with either of these AGIs to the exact same extent and for the exact same reason...even though one is subject to optimizer's curse and the other isn't.

Comment by steve2152 on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T12:26:30.289Z · score: 1 (1 votes) · LW · GW

I don't think it's related to mild optimization. Pick a target T that can be exceeded (wonderful future, even if it's not the absolute theoretically best possible future). Estimate what choice Cmax is (as far as we can tell) the #1 very best by that metric. We expect Cmax to give value E, and it turns out to be V<E, but V is still likely to exceed T, or at least likelier than any other choice. (Insofar as that's not true, it's Goodhart.) Optimizer curse, i.e. V<E, does not seem to be a problem, or even relevant, because I don't ultimately care about E. Maybe the AI doesn't even tell me what E is. Maybe the AI doesn't even bother guessing what E is, it only calculates that Cmax seems to be better than any other choice.

Comment by steve2152 on Self-Supervised Learning and AGI Safety · 2019-08-21T10:45:43.614Z · score: 1 (1 votes) · LW · GW

Ah, thanks for clarifying.

The first entry on my "list of pathological things" wound up being a full blog post in length: See Self-supervised learning and manipulative predictions.

RE daemons, I wrote in that post (and have been assuming all along): "I'm assuming that we will not do a meta-level search for self-supervised learning algorithms... Instead, I am assuming that the self-supervised learning algorithm is known and fixed (e.g. "Transformer + gradient descent" or "whatever the brain does"), and that the predictive model it creates has a known framework, structure, and modification rules, and that only its specific contents are a hard-to-interpret complicated mess." The contents of a world-model, as I imagine it, is a big data structure consisting of gajillions of "concepts" and "transformations between concepts". It's a passive data structure, therefore not a "daemon" in the usual sense. Then there's a KANSI (Known Algorithm Non Self Improving) system that's accessing and editing the world model. I also wouldn't call that a "daemon", instead I would say "This algorithm we wrote can have pathological behavior..."

Comment by steve2152 on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T00:28:23.167Z · score: 1 (1 votes) · LW · GW

It seems to me that your comment amounts to saying "It's impossible to always make optimal choices for everything, because we don't have perfect information and perfect analysis," which is true but unrelated to optimizer's curse (and I would say not in itself problematic for AGI safety). I'm sure that's not what you meant, but here's why it comes across that way to me. You seem to be setting T = E(C_max). If you set T = E(C_max) by definition, then imperfect information or imperfect analysis implies that you will always miss T by the error e, and the error will always be in the unfavorable direction.

But I don't think about targets that way. I would set my target to be something that can in principle be exceeded (T = have almost as much fun as is physically possible). Then when we evaluate the choices C, we'll find some that dramatically exceed T (i.e. way more fun than is physically possible, because we estimated the consequences wrong), and if we pick one of those, we'll still have a good chance of slightly exceeding T despite the optimizer's curse.

Comment by steve2152 on Goodhart's Curse and Limitations on AI Alignment · 2019-08-20T21:21:29.722Z · score: 1 (1 votes) · LW · GW

I get Goodhart, but i don't understand why the optimizer's curse matters at all in this context; can you explain? My reasoning is: When optimizing, you make a choice C and expect value E but actually get value V<E. But choice C may still have been the best choice. So what if the AI falls short of its lofty expectations? As long as it did the right thing, I don't care whether the AI was disappointed in how it turned out, like if we get a mere Utopia when the AI expected a super duper Utopia. All I care about is C and V, not E.

Comment by steve2152 on Self-supervised learning & manipulative predictions · 2019-08-20T16:43:44.321Z · score: 2 (2 votes) · LW · GW

Thank you for the links!! Sorry I missed them! I'm not sure I understand your comments though and want to clarify:

I'm going to try to rephrase what you said about example 1. Maybe the text in any individual journal article about pyrite is perplexing, but given that the system expects some article about pyrite there, it should ramp the probabilities of individual articles up or down such that the total probability of seeing a journal article about pyrite, conditional on the answer "pyrite", is 100%. (By the same token, "The following is a random number: 2113164" is, in a sense, an unsurprising text string.) I agree with you that a system that creates a sensible, self-consistent probability distribution for text strings would not have a problem with example 1 if we sample from that distribution. (Thanks.) I am concerned that we will build a system with heuristic-guided search processes, not self-consistent probability estimates, and that this system will have a problem with example 1. After all, humans are subject to the conjunction fallacy etc., I assume AGIs will be too, right? Unless we flag this as a critical safety requirement and invent good techniques to ensure it. (I updated the post in a couple places to clarify this point, thanks again.)

For gradient descent, yes they are "only updated towards what they actually observe", but they may "observe" high-level abstractions and not just low-level features. It can learn about a new high-level context in which the low-level word sequence statistics would be very different than when superficially-similar text appeared in the past. So I don't understand how you're ruling out example 2 on that basis.

I mostly agree with what you say about fixed points in principle, but with the additional complication that the system's beliefs may not reflect reality, especially if the beliefs come about through abstract reasoning (in the presence of imperfect information) rather than trial-and-error. If the goal is "No manipulative answers at all ever, please just try to predict the most likely masked bits in this data-file!"—then hopefully that trial-and-error will not happen, and in this case I think fixed points becomes a less useful framework to think about what's going on.

Comment by steve2152 on "Designing agent incentives to avoid reward tampering", DeepMind · 2019-08-14T22:08:50.751Z · score: 19 (9 votes) · LW · GW

Yeah, unless I'm missing something, this is the solution to the "easy problem of wireheading" as discussed at Abram Demski, Stable Pointers to Value II: Environmental Goals .

Still, I say kudos to the authors for making progress on exactly how to put that principle into practice.

Comment by steve2152 on Self-Supervised Learning and AGI Safety · 2019-08-11T02:34:15.078Z · score: 3 (2 votes) · LW · GW

Thanks for this really helpful comment!!

Search: I don't think search is missing from self-supervised learning at all (though I'm not sure if GPT-2 is that sophisticated). In fact, I think it will be an essential, ubiquitous part of self-supervised learning systems of the future.

So when you say "The proof of this theorem is _____", and give the system a while to think about it, it uses the time to search through its math concept space, inventing new concepts and building new connections and eventually outputting its guess.

Just because it's searching doesn't mean it's dangerous. I was just writing code to search through a string for a big deal, right? A world-model is a complicated data structure, and we can search for paths through this data structure just like any other search problem. Then when a solution to the search problem is found, the result is (somehow) printed to the terminal. I would be generically concerned here about things like (1) The search algorithm "decides" to seize more computing power to do a better search, or (2) the result printed to the terminal is manipulative. But (1) seems unlikely here, or if not, just use a known search algorithm you understand! For (2), I don't see a path by which that would happen, at least under the constraints I mentioned in the post. Or is there something else you had in mind?

Going beyond human knowledge: When you write "it will tell you what humans have said", I'm not sure what you're getting at. I don't think this is true even with text-only data. I see three requirements to get beyond what humans know:

(1) System has optimization pressure to understand the world better than humans do

(2) System is capable of understanding the world better than humans do

(3) The interface to the model allows us to extract information that goes beyond what humans already know.

I'm pretty confident in all three of these. For example, for (1), give the system a journal article that says "We looked at the treated cell in the microscope and it appeared to be ____". The system is asked to predict the blank. It does a better job at this prediction task by understanding biology better and better, even after it understands biology better than any human. By the same token, for (3), just ask a similar question for an experiment that hasn't yet been done. For (2), I assume we'll eventually invent good enough algorithms for that. What's your take?

(I do agree that videos and images make it easier for the system to exceed human knowledge, but I don't think it's required. After all, blind people are able to have new insights.)

Ethics & FAI: I assume that a self-supervised learning system would understand concepts in philosophy and ethics just like it understands everything else. I hope that, with the right interface, we can ask questions about the compatibility of our decisions with our professed principles, arguments for and against particular principles, and so on. I'm not sure we should expect or want an oracle to outright endorse any particular theory of ethics, or any particular vision for FAI. I think we should ask more specific questions than that. Outputting code for FAI is a tricky case because even a superintelligent non-manipulative oracle is not omniscient; it can still screw up. But it could be a big help, especially if we can ask lots of detailed follow-up questions about a proposed design and always get non-manipulative answers.

Let me know if I misunderstood you, or any other thoughts, and thanks again!

Comment by steve2152 on Self-Supervised Learning and AGI Safety · 2019-08-11T01:10:17.305Z · score: 1 (1 votes) · LW · GW

Can you be more specific about the daemons you're thinking about? I had tried to argue that daemons wouldn't occur under certain circumstances, or at least wouldn't cause malign failures...

Do you accept the breakdown into "self-supervised learning phase" and "question-answering phase"? If so, in which of those two phases are you thinking that a daemon might do something bad?

I started my own list of pathological things that might happen with self-supervised learning systems, maybe I'll show you when it's ready and we can compare notes...?

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2019-08-11T00:53:06.146Z · score: 5 (3 votes) · LW · GW

I did actually read his 2004 book (after writing this post), and as far as I can tell, he doesn't really seem to have changed his mind about anything, except details like "What exactly is the function of 5th-layer cortical neurons?" etc.

In particular, his 2004 book gave the impression that artificial neural nets would not appreciably improve except by becoming more brain-like. I think most neutral observers would say that we've had 15 years of astounding progress while stealing hardly any ideas from the brain, so maybe understanding the brain isn't required. Well, he doesn't seem to accept that argument. He still thinks the path forward is brain-inspired. I guess his argument would be that today's artifical NN's are neat but they don't have the kind of intelligence that counts, i.e. the type of understanding and world-model creation that the neocortex does, and that they won't get that kind of intelligence except by stealing ideas from the neocortex. Something like that...

Comment by steve2152 on In defense of Oracle ("Tool") AI research · 2019-08-07T19:32:21.094Z · score: 1 (1 votes) · LW · GW

Maybe there are other definitions, but the way I'm using the term, what you described would definitely be an agent. An oracle probably wouldn't have an internet connection at all, i.e. it would be "boxed". (The box is just a second layer of protection ... The first layer of protection is that a properly-designed safe oracle, even if it had an internet connection, would choose not to use it.)

Comment by steve2152 on In defense of Oracle ("Tool") AI research · 2019-08-07T19:20:01.025Z · score: 7 (3 votes) · LW · GW

Thank you, those are very interesting references, and very important points! I was arguing that solving a certain coordination problem is even harder than solving a different coordination problem, but I'll agree that this argument is moot if (as you seem to be arguing) it's utterly impossible to solve either!

Since you've clearly thought a lot about this, have you written up anything about very-long-term scenarios where you see things going well? Are you in the camp of "we should make a benevolent dictator AI implementing CEV", or "we can make task-limited-AGI-agents and coordinate to never make long-term-planning-AGI-agents", or something else?

Comment by steve2152 on In defense of Oracle ("Tool") AI research · 2019-08-07T18:14:36.046Z · score: 5 (3 votes) · LW · GW

Thanks, this is really helpful! For 1,2,4, this whole post is assuming, not arguing, that we will solve the technical problem of making safe and capable AI oracles that are not motivated to escape the box, give manipulative answers, send out radio signals with their RAM, etc. I was not making the argument that this technical problem is easy ... I was not even arguing that it's less hard than building a safe AI agent! Instead, I'm trying to counter the argument that we shouldn't even bother trying to solve the technical problem of making safe AI oracles, because oracles are uncompetitive.

...That said, I do happen to think there are paths to making safe oracles that don't translate into paths to making safe agents (see Self-supervised learning and AGI safety), though I don't have terribly high confidence in that.

Can you find a link to where "Christiano dismisses Oracle AI"? I'm surprised that he has done that. After all, he coauthored "AI Safety via Debate", which seems to addressed primarily (maybe even exclusively) at building oracles (question-answering systems). Your answer to (3) is enlightening, thank you, and do you have any sense for how widespread this view is and where it's argued? (I edited the post to add that people going for benevolent dictator CEV AGI agents should still endorse oracle research because of the bootstrapping argument.)

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-26T13:47:40.919Z · score: 3 (2 votes) · LW · GW

Just as if it were looking into the universe from outside it, it would presumably be able to understand anything in the world, as a (third-person) fact about the world, including that humans have self-awareness, that there is a project to build a self-unaware AI without it, and so on. We would program it with strict separation between the world-model and the reflective, meta-level information about how the world-model is being constructed and processed. Thus the thought "Maybe they're talking about me" cannot occur, there's nothing in the world-model to grab onto as a referent for the word "me". Exactly how this strict separation would be programmed, and whether you can make a strong practical world-modeling system with such a separation, are things I'm still trying to understand.

A possible (not realistic) example is: We enumerate a vast collection of possible world-models, which we construct by varying any of a vast number of adjustable parameters, describing what exists in the world, how things relate to each other, what's going on right now, and so on. Nothing in any of the models has anything in it with a special flag labeled "me", "my knowledge", "my actions", etc., by construction. Now, we put a probability distribution over this vast space of models, and initialize it to be uniform (or whatever). With each timestep of self-supervised learning, a controller propagates each of the models forward, inspects the next bit in the datastream, and adjusts the probability distribution over models based on whether that new bit is what we expected. After watching 100,000 years of YouTube videos and reading every document ever written, the controller outputs the one best world-model. Now we have a powerful world-model, in which there are deep insights about how everything works. We can use this world-model for whatever purpose we like. Note that the "learning" process here is a dumb thing that just uses the transition rules of the world-models, it doesn't involve setting up the world-models themselves to be capable of intelligent introspection. So it seems to me like this process ought to generate a self-unaware world model.

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-25T01:42:21.256Z · score: 1 (1 votes) · LW · GW

Just to be clear, when OpenAI trained GPT-2, I am not saying that GPT-2 is a known and well-understood algorithm for generating text, but rather that SGD (Stochastic Gradient Descent) is a known and well-understood algorithm for generating GPT-2. (I mean, OK sure, ML researchers are still studying SGD, but its inner workings are not an impenetrable mystery the way that GPT-2's are.)

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-25T01:29:05.449Z · score: 3 (2 votes) · LW · GW

OK, so I was saying here that software can optimize for something (e.g. predicting a string of bits on the basis of other bits) and it's by default not particularly dangerous, as long as the optimization does not involve an intelligent foresight-based search through real-world causal pathways to reach the desired goal. My argument for this was (1) Such a system can do Level-1 optimization but not Level-2 optimization (with regards to real-world causal pathways unrelated to implementing the algorithm as intended), and (2) only the latter is unusually dangerous. From your response, it seems like you agree with (1) but disagree with (2). Is that right? If you disagree with (2), can you make up a scenario of something really bad and dangerous, something that couldn't happen with today's software, something like a Global Catastrophic Risk, that is caused by a future AI that is optimizing something but is not more specifically using a world-model to do an intelligent search through real-world causal pathways towards a desired goal?

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-24T19:02:00.430Z · score: 5 (3 votes) · LW · GW

On further reflection, you're right, the Solomonoff induction example is not obvious. I put a correction in my post, thanks again.

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-24T13:57:57.892Z · score: 8 (2 votes) · LW · GW

Thanks for your patience, I think this is important and helpful to talk through (hope it's as helpful for you as for me!)

Let's introduce two terminologies I made up. First, the thing I mentioned above:

  • Non-optimization means that "an action leading to a "good" consequence (according to a predetermined criterion) happens no more often than chance" (e.g. a rock)
  • Level-1 optimization means "an action leading to a "good" consequence happens no more often than chance at first, but once it's stumbled upon, it tends to be repeated in the future". (e.g. bacteria)
  • Level-2 optimization means "an action leading to a "good" consequence is taken more often than chance from the start, because of foresight and planning". (e.g. human)

Second, when you run a program:

  • Algorithm Land is where you find abstract mathematical entities like "variables", "functions", etc.
  • Real World is that place with atoms and stuff.

Now, when you run a program, you can think of what's happening in Algorithm Land (e.g. a list of numbers is getting sorted) and what's happening in the Real World (e.g. transistors are switching on and off). It's really always going to be both at once.

And now let's simplify things greatly by putting aside the case of world-modeling programs, which have a (partial, low-resolution) copy of the Real World inside Algorithm Land. Instead, let's restrict our attention a chess-playing program or any other non-world-modeling program.

Now, in this case, when we think about Level-2 optimization, the foresight and planning involved entail searching exclusively through causal pathways that are completely inside Algorithm Land. (Why? Because without a world model, it has no way to reason about Real-World causal pathways.) In this case, I say there isn't really anything much to worry about.

Why not worry? Think about classic weird AGI disaster scenarios. For example, the algorithm is optimizing for the "reward" value in register 94, so it hacks its RAM to overwrite the register with the biggest possible number, then seizes control of its building and the power grid to ensure that it won't get turned off, then starts building bigger RAMs, designing killer nanomachines, and on and on. Note that ALL those things (1) involve causal pathways in the Real World (even if the action and consequence are arguably in Algorithm Land) and (2) would be astronomically unlikely to occur by random chance (which is what happens without Level-2 optimization). (I won't say that nothing can go awry with Level-1 optimization—I have great respect for bacteria—but it's a much easier situation to keep under control than rogue Level-2 optimization through Real-World causal pathways.)

Again, things that happen in Algorithm Land are also happening in the Real World, but the mapping is kinda arbitrary. High-impact things in Algorithm Land are not high-impact things in the Real World. For example, using RAM to send out manipulative radio signals is high-impact in the Real World, but just a random meaningless series of operations in Algorithm Land. Conversely, an ingeniously-clever chess move in Algorithm Land is just a random activation of transistors in the Real World.

(You do always get Level-1 optimization through Real-World causal pathways, with or without a world model. And you can get Level-2 optimization through Real-World causal pathways, but a necessary requirement seems to be an algorithm with a world-model and self-awareness (i.e. knowledge that there is a relation between things in Algorithm Land and things in the Real World).

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T18:28:43.636Z · score: 3 (2 votes) · LW · GW

A self-unaware system would not be capable of one particular type of optimization task:

Take real-world actions ("write bit 0 into register 11") on the basis of anticipating their real-world consequences (human will read this bit and then do such-and-such).

This thing is an example of an optimization task, and it's a very dangerous one. Maybe it's even the only type of really dangerous optimization task! (This might be an overstatement, not sure.) Not all optimization tasks are in this category, and a system can be intelligent by doing other different types of optimization tasks.

A self-unaware system certainly is an optimizer in the sense that it does other (non-real-world) optimization tasks, in particular, finding the string of bits that would be most likely to follow a different string of bits on a real-world webpage.

As always, sorry if I'm misunderstanding you, thanks for your patience :-)

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T18:04:10.673Z · score: 3 (2 votes) · LW · GW

I think we're on the same page! As I noted at the top, this is a brainstorming post, and I don't think my definitions are quite right, or that my arguments are airtight. The feedback from you and others has been super-helpful, and I'm taking that forward as I search for more a rigorous version of this, if it exists!! :-)

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T17:59:52.065Z · score: 2 (2 votes) · LW · GW

Thanks for this helpful comment. The architecture I'm imagining is: Model-choosing code finds a good predictive world-model out of a vast but finite space of possible world-models, by running SGD on 100,000 years of YouTube videos (or whatever). So the model-chooser is explicitly an optimizer, the engineer who created the model-chooser is also explicitly an optimizer, and the eventual predictive world-model is an extremely complicated entity with superhuman world-modeling capabilities, and I am loath to say anything about what it is or what it's going to do.

Out of these three, (1) the engineer is not problematic because it's a human, (2) the model-chooser is not problematic because it's (I assume and expect) a known and well-understood algorithm (e.g. Transformer), and thus (3) the eventual predictive world-model is the only thing we're potentially worried about. My thought is that, we can protect ourselves from the predictive world-model doing problematic consequentialist planning by scheming to give it no information whatsoever about how it can affect the world, even knowing that it exists or knowing what actions it is taking, such that if it has problematic optimization tendencies, it is unable to act on them.

(In regards to (1) more specifically, if a company is designing a camera, the cameras with properties that the engineers like are preferentially copied by the engineers into later versions. Yes, this is a form of optimization, but nobody worries about it more than anything else in life. Right?)

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T15:16:44.651Z · score: 2 (2 votes) · LW · GW

Well, it takes two things: (1) Self-knowledge ("I wrote '0' into register X", "I am thinking about turtles", etc. being in the world-model) and (2) knowledge of things causal consequences of that (the programmers see the 1 in register X and then change their behavior). With both of those, the system can learn causal links between its own decisions and the rest of the world, and can therefore effect real-world consequences.

Out of these two options, I think you're proposing to cut off path (2), which I agree is very challenging. I am proposing to cut off path (1) instead, and not worry about path (2). Thus it's a cybersecurity-type hardware/software design challenge, not a data sanitation challenge.

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T14:02:56.997Z · score: 1 (1 votes) · LW · GW

I'm not sure what you have in mind here; to me, optimization requires some causal pathway from "Action X has consequence Y" to "Take Action X more often than chance".

A system can optimize if it has a way to store specially-flagged information in the form of "I took action X, and it had consequence Y" (or "if I take action X, it will have consequence Y"), and then bring that flagged information to bear when taking actions. A population of bacteria can do this! Evolution flags its "decisions" (mutations), storing that information in DNA, and then "consults" the DNA when "deciding" what the gene distribution will be in the next generation. A self-unaware system, lacking any "I" or "my decision" or "my action" flags in either its internal or external universe, would be missing the causal links necessary to optimize anything. Right?

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T12:59:27.831Z · score: 9 (3 votes) · LW · GW

Thanks for that, this is helpful. Yes, same genre for sure. According to Eliezer's response to Holden, tool AI is a synonym of "non-self-improving oracle". Anyway, whatever we call it, my understanding of the case against tool AI is that (1) we don't know how to make a safe tool AI (part of Eliezer's response), and (2) even if we could, it wouldn't be competitive (Gwern's response).

I'm trying to contribute to this conversation by giving an intuitive argument for how I'm thinking that both these objections can be overcome, and I'm also trying to be more specific about how the tool AI might be built and how it might work.

More specifically, most (though not 100%) of the reasons that Gwern said tool AI would be uncompetitive are in the category of "self-improving systems are more powerful". So that's why I specifically mentioned that a tool AI can be self-improving ... albeit indirectly and with a human in the loop.

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T01:51:25.230Z · score: 1 (1 votes) · LW · GW

This is helpful, thanks. It sounds very reasonable to say "if it's just programmed to build a model and query it, it doesn't matter if it's self-aware". And it might be true, although I'm still a bit uncertain about what can happen when the model-builder includes itself in its models. There are also questions of what properties can be easily and rigorously verified. My hope here is that we can flag some variables as "has information about the world-model" and other variables as "has information about oneself", and we can do some kind of type-checking or formal verification that they don't intermingle. If something like that is possible, it would seem to be a strong guarantee of safety even if we didn't understand how the world-modeler worked in full detail.

RE your last paragraph: I don't think there is any point ever when we will have a safe AI and no one is incentivized (or even curious) to explore alternate designs that are not known to be safe (but which would be more powerful if they worked). So we need to get to some point of development, and then sound the buzzer and start relying 100% on other solutions, whether it's OpenAI becoming our benevolent world dictators, or hoping that our AI assistants will tell us what to do next, or who knows what. I think an oracle that can answer arbitrary questions and invent technology is good enough for that. Once we're there, I think we'll be more than ready to move to that second stage...

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T01:23:16.069Z · score: 3 (2 votes) · LW · GW

Thanks! A lot of my thinking here is that I just really believe that, once people find the right neural architecture, self-supervised learning on the internet is going rocket-launch all the way to AGI and beyond, leaving little narrow AI services in the dust.

The way I read it, Gwern's tool-AI article is mostly about self-improvement. I'm proposing that the system will be able to guide human-in-the-loop "self"-improvement. That's kinda slower, but probably good enough, especially since eventually we can (hopefully) ask the oracle how to build a safe agent.

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T01:09:54.593Z · score: 4 (3 votes) · LW · GW

Imagine three levels, in order of increasing concern: (1) system does self-preserving action sometimes randomly, no more often than chance. (2) system does self-preserving action randomly, but once it sees the good consequences, starts doing it systematically. (3) system does self-preserving action systematically from the start, because it had foresight and motivation. Humans and problematic AIs are up at (3), a population of bacteria undergoing evolution are at (2), and a self-unaware oracle is at (1).

Comment by steve2152 on The Self-Unaware AI Oracle · 2019-07-23T00:56:48.784Z · score: 4 (3 votes) · LW · GW

For deterministic computation: What I was trying to get at is that a traditional RL agent does some computation, gets a new input based on its actions and environment, does some more computation, and so on. (I admit that I didn't describe this well. I edited a bit.)

Your argument about Solomonoff induction is clever but I feel like it's missing the point. Systems with some sense of self and self-understanding don't generally simulate themselves or form perfect models of themselves; I know I don't! Here's a better statement: "I am a predictive world-model, I guess I'm probably implemented on some physical hardware somewhere." This is a true statement, and the system can believe that statement without knowing what the physical hardware is (then it can start reasoning about what the physical hardware is, looking for news stories about AI projects). I'm proposing that we can and should build world-models that don't have this type of belief in its world model.

What I really have in mind is: There's a large but finite space of computable predictive models (given a bit-stream, predict the next bit). We run a known algorithm that searches through this space to find the model that best fits the internet. This model is full of insightful, semantic information about the world, as this helps it make predictions. Maybe if we do it right, the best model would not be self-reflective, not knowing what it was doing as it did its predictive thing, and thus unable to reason about its internal processes or recognize causal connections between that and the world it sees (even if such connections are blatant).

One intuition is: An oracle is supposed to just answer questions. It's not supposed to think through how its outputs will ultimately affect the world. So, one way of ensuring that it does what it's supposed to do, is to design the oracle to not know that it is a thing that can affect the world.

Comment by steve2152 on 1hr talk: Intro to AGI safety · 2019-07-17T18:54:37.487Z · score: 3 (2 votes) · LW · GW

Fixed, thanksI

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2019-07-16T15:24:05.116Z · score: 3 (2 votes) · LW · GW

Hmm, it's true that a traditional RNN can't imitate the detailed mechanism, but I think it can imitate the overall functionality. (But probably in a computationally inefficient way—multiple time-steps and multiple nodes.) I'm not 100% sure.

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2019-07-16T15:16:07.365Z · score: 2 (2 votes) · LW · GW

According to A Recipe for Training NNs, model ensembles stop being helpful at ~5 models. But that's when they all have the same inputs and outputs. The more brain-like thing is to have lots of models whose inputs comprise various different subsets of both the inputs and the other models' outputs.

...But then, you don't really call it an "ensemble", you call it a "bigger more complicated neural architecture", right? I mean, I can take a deep NN and call it "six different models, where the output of model #1 is the input of model #2 etc.", but no one in ML would say that, they would call it a single six-layer model...

Comment by steve2152 on A shift in arguments for AI risk · 2019-07-11T15:10:03.065Z · score: 3 (2 votes) · LW · GW

In a continuous scenario, AI remains at the same level of capability long enough for us to gain experience with deployed systems of that level, witness small accidents, and fix any misalignment. The slower the scenario, the easier it is to do this. In a moderately discontinuous scenario, there could be accidents that kill thousands of people. But it seems to me that a very strong discontinuity would be needed to get a single moment in which the AI causes an existential catastrophe.

I agree that slower makes the problem easier, but disagree about how slow is slow enough. I have pretty high confidence that a 200-year takeoff is slow enough; faster than that, I become increasingly unsure.

For example: one scenario would be that there are years, even decades, in which worse and worse AGI accidents occur, but the alignment problem is very hard and no one can get it right (or: aligned AGIs are much less powerful and people can't resist tinkering with the more powerful unsafe designs). As each accident occurs, there's bitter disagreement around the world about what to do about this problem and how to do it, and everything becomes politicized. Maybe AGI research will be banned in some countries, but maybe it will be accelerated in other countries, on the theory that (for example) smarter systems and better understanding will help with alignment. And thus there would be more accidents and bigger accidents, until sooner or later there's an existential catastrophe.

I haven't thought about the issue super-carefully ... just a thought ...

Comment by steve2152 on Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1) · 2019-07-04T14:51:39.798Z · score: 3 (2 votes) · LW · GW

One way to think about gradient descent is that if there's an N-dimensional parameter space on which you build a grid of M^N points and do a grid search for the minimum, well, you can accomplish the same selection task with O(M) steps of gradient descent (at least if there's no local minimum). So should we say does that M steps of gradient descent gives you O(N log M) bits of optimization? Or something like that? I'm not sure.

Comment by steve2152 on Is AlphaZero any good without the tree search? · 2019-07-01T02:01:15.037Z · score: 5 (3 votes) · LW · GW

Thanks for your answer! But I'm afraid I'm confused on both counts.

I couldn't, and still can't, find "ELO for just the NN" in the paper... :-( I checked the arxiv version and preprint version.

As for "actual play doesn't use MCTS at all", well the authors say it does use MCTS... Am I misunderstanding the authors, or are you saying that the "thing the authors call MCTS" is not actually MCTS? (For example, I understand that it's not actually random.)

Comment by steve2152 on Let's talk about "Convergent Rationality" · 2019-06-30T02:12:27.534Z · score: 3 (2 votes) · LW · GW

Thanks, that was really helpful!! OK, so going back to my claim above: "there is no systematic force of any kind whatsoever on an AI's top-level normative system". So far I have six exceptions to this:

  1. If an agent has a "real-world goal" (utility function on future-world-states), we should expect increasingly rational goal-seeking behavior, including discovering and erasing hardcoded irrational behavior (with respect to that goal), as described by dxu. But I'm not counting this as an exception to my claim because the goal is staying the same.
  2. If an agent has a set of mutually-inconsistent goals / preferences / inclinations, it may move around within the convex hull (so to speak) of these goals / preferences / inclinations, as they compete against each other. (This happens in humans.) And then, if there is at least one preference in that set which is a "real-world goal", it's possible (though not guaranteed) that that preference will come out on top, leading to (0) above. And maybe there's a "systematic force" pushing in some direction within this convex hull---i.e., it's possible that, when incompatible preferences are competing against each other, some types are inherently likelier to win the competition than other types. I don't know which ones that would be.
  3. In the (presumably unusual) case that an agent has a "self-defeating preference" (i.e. a preference which is likelier to be satisfied by the agent not having that preference, as in dxu's awesome SHA example), we should expect the agent to erase that preference.
  4. As capybaralet notes, if there is evolution among self-reproducing AIs (god help us all), we can expect the population average to move towards goals promoting evolutionary fitness
  5. Insofar as there is randomness in how agents change over time, we should expect a systematic force pushing towards "high-entropy" goals / preferences / inclinations (i.e., ones that can be implemented in lots of different ways).
  6. Insofar as the AI is programming its successors, we should expect a systematic force pushing towards goals / preferences / inclinations that are easy to program & debug & reason about.
  7. The human programmers can shut down the AI and edit the raw source code.

Agree or disagree? Did I miss any?

Comment by steve2152 on Let's talk about "Convergent Rationality" · 2019-06-28T19:04:18.650Z · score: 1 (1 votes) · LW · GW

What are "decision theory self-modification arguments"? Can you explain or link?

Comment by steve2152 on Cold fusion: real after all? · 2019-06-28T13:52:15.202Z · score: 8 (2 votes) · LW · GW

I wrote this 10,000-word blog post arguing that cold fusion is not real after all, on the basis of the experimental evidence. (The rest of the blog, 30 posts or so, spells out the argument that cold fusion is not real, on the basis of our knowledge of theoretical physics.) Obviously the conclusion is no surprise to most people here ... but I still think the nitty-gritty details of these arguments are interesting and are somewhat hard to find elsewhere on the internet.

Comment by steve2152 on Let's talk about "Convergent Rationality" · 2019-06-28T13:47:40.210Z · score: 3 (2 votes) · LW · GW

I am imagining a flat plain of possible normative systems (goals / preferences / inclinations / whatever), with red zones sprinkled around marking those normative systems which are dangerous. CRT (as I understand it) says that there is a basin with consequentialism at its bottom, such that there is a systematic force pushing systems towards that. I'm imagining that there's no systematic force.

So in my view (flat plain), a good AI system is one that starts in a safe place on this plain, and then doesn't move at all ... because if you move in any direction, you could randomly step into a red area. This is why I don't like misaligned subsystems---it's a step in some direction, any direction, away from the top-level normative system. Then "Inner optimizers / daemons" is a special case of "misaligned subsystem", in which the random step happened to be into a red zone. Again, CRT says (as I understand it) that a misaligned subsystem is more likely than chance to be an inner optimizer, whereas I think a misaligned subsystem can be an inner optimizer but I don't specify the probability of that happening.

Leaving aside what other people have said, it's an interesting question: are there relations between the normative system at the top-level and the normative system of its subsystems? There's obviously good reason to expect that consequentialist systems will tend to create consequentialist subsystems, and that deontological systems will tend create deontological subsystems, etc. I can kinda imagine cases where a top-level consequentialist would sometimes create a deontological subsystem, because it's (I imagine) computationally simpler to execute behaviors than to seek goals, and sub-sub-...-subsystems need to be very simple. The reverse seems less likely to me. Why would a top-level deontologist spawn a consequentialist subsystem? Probably there are reasons...? Well, I'm struggling a bit to concretely imagine a deontological advanced AI...

We can ask similar questions at the top-level. I think about normative system drift (with goal drift being a special case), buffeted by a system learning new things and/or reprogramming itself and/or getting bit-flips from cosmic rays etc. Is there any reason to expect the drift to systematically move in a certain direction? I don't see any reason, other than entropy considerations (e.g. preferring systems that can be implemented in many different ways). Paul Christiano talks about a "broad basin of attraction" towards corrigibility but I don't understand the argument, or else I don't believe it. I feel like, once you get to a meta-enough level, there stops being any meta-normative system pushing the normative system in any particular direction.

So maybe the stronger version of not-CRT is: "there is no systematic force of any kind whatsoever on an AI's top-level normative system, with the exceptions of (1) entropic forces, and (2) programmers literally shutting down the AI, editing the raw source code, and trying again". I (currently) would endorse this statement. (This is also a stronger form of orthogonality, I guess.)

Comment by steve2152 on Let's talk about "Convergent Rationality" · 2019-06-28T10:32:40.627Z · score: 1 (1 votes) · LW · GW

I think I agree with this. The system is dangerous if its real-world output (pixels lit up on a display, etc.) is optimized to achieve a future-world-state. I guess that's what I meant. If there are layers of processing that sit between the optimization process output and the real-world output, that seems like very much a step in the right direction. I dunno the details, it merits further thought.

Comment by steve2152 on 1hr talk: Intro to AGI safety · 2019-06-21T11:10:50.790Z · score: 5 (2 votes) · LW · GW

Hi John, Are you saying that there should be more small teams in AGI safety rather than increasing the size of the "big" teams like OpenAI safety group and MIRI? Or are you saying that AGI safety doesn't need more people period?

Looks like MIRI is primarily 12 people. Does that count as "large"? My impression is that they're not all working together on exactly the same narrow project. So do they count as 1 "team" or more than one?

The FLI AI grants go to a diverse set of little university research groups. Is that the kind of thing you're advocating here?

ETA: The link you posted says that a sufficiently small team is: "I’d suspect this to be less than 15 people, but would not be very surprised if this was number was around 100 after all." If we believe that (and I wouldn't know either way), then there are no AGI safety teams on Earth that are "too large" right now.

Comment by steve2152 on Editor Mini-Guide · 2019-06-14T02:31:01.029Z · score: 1 (1 votes) · LW · GW

Can you please update on footnotes? I notice people have written posts with footnotes, so it must be possible, but I can't figure out how. Thanks in advance

Comment by steve2152 on Let's talk about "Convergent Rationality" · 2019-06-13T01:08:56.652Z · score: 5 (4 votes) · LW · GW

I haven't seen anyone argue for CRT the way you describe it. I always thought the argument was that we are concerned about "rational AIs" (I would say more specifically, "AIs that run searches through a space of possible actions, in pursuit of a real-world goal"), because (1) We humans have real-world goals ("cure Alzheimer's" etc.) and the best way to accomplish a real-world goal is generally to build an agent optimizing for that goal (well, that's true right up until the agent becomes too powerful to control, and then it becomes catastrophically false), (2) We can try to build AIs that are not in this category, but screw up*, (3) Even if we here all agree to not build this type of agent, it's hard to coordinate everyone on earth to never do it forever. (See also: Rohin's two posts on goal-directedness.)

In particular, when Eliezer argued a couple years ago that we should be mainly thinking about AGIs that have real-world-anchored utility functions (e.g. here or here) I've always fleshed out that argument as: "...This type of AGI is the most effective and powerful type of AGI, and we should assume that society will keep making our AIs more and more effective and powerful until we reach that category."

*(Remember, any AI is running searches through some space in pursuit of something, otherwise you would never call it "intelligence". So one can imagine that the intelligent search may accidentally get aimed at the wrong target.)

Comment by steve2152 on Risks from Learned Optimization: Introduction · 2019-06-01T13:01:11.739Z · score: 12 (12 votes) · LW · GW

This paper replaces a normal feedforward image classifier with a mesa-optimizing one (build generative models of different possibilities and pick the one that best matches the data). The result was better and far more human-like than a traditional image classifier, e.g. the same examples are ambiguous to the model that are ambiguous to humans and vice-versa. I also understand that the human brain is very big into generative modeling of everything. So I expect that ML systems of the future will approach 100% mesa-optimizers, while non-optimizing feedforward NN's will become rare. This post is a good framework and I'm looking forward to follow-ups!

Comment by steve2152 on A Sarno-Hanson Synthesis · 2018-08-14T00:29:17.809Z · score: 1 (1 votes) · LW · GW

My view is that the "constrict blood vessels and tense muscles" action (or whatever it is) is less like moving your finger, and more like speeding up your heart rate: sorta consciously controllable but not by a simple and direct act of willful control. I personally was talking to my hands rather than talking to my subconscious, but whatever, either way, I see it as a handy trick to send out the right nerve signals. Again like how if you want to release adrenaline, you think of something scary, you don't think "Adrenal gland, Activate!" (Unless you've specifically practiced.)

I guess where I differ in emphasis from you is that I like to talk about how an important part of the action is really happening at the location of the pain, even if the cause is in the brain. I find that people talking about "psychosomatic" tend to be cutting physiology out of the loop altogether, though you didn't quite say that yourself. The other different emphasis is whether there's any sense whatsoever in which some part of the person wants the pain to happen because of some ulterior motive. I mean, that kind of story very much did not resonate with my experience. My RSI flare-ups were always pretty closely associated with using my hands. I guess I shouldn't over-generalize from my own experience. Shrug.

"Constricting blood vessels" seems like a broad enough mechanism to be potentially applicable to back spasms, RSI, IBS, ulcers, and all the other superficially different indications we've all heard of. But I don't know much about physiology or vasculature, and I don't put too much stock in that exact description. Could also be something about nerves I guess?

Comment by steve2152 on A Sarno-Hanson Synthesis · 2018-08-13T02:51:39.442Z · score: 3 (2 votes) · LW · GW

Ten years ago I had a whole miserable year of Repetitive Strain Injury (and other things too) and then had a miraculous one-day recovery after reading Sarno's book. But I didn't (and still don't) agree with the way Sarno (and you) describe what the mechanism is and what the emotions are doing, at least for my own experience and the couple other people I know personally who had similar RSI experiences.

I think it's possible to use muscles in a way that's painful: something like having the muscles be generally tense and their blood supply constricted. I think state-of-mind can cause muscles to operate in this painful blood-supply-constricted mode, and that one such state of mind is the feeling "These muscles here are super-injured and when I use them it's probably making things worse."

If you think about it, it's easy to think of lots of things could predispose people to get stuck in this particular vicious cycle, including personality type, general stress level, factual beliefs about the causes and consequences of chronic pain, and so on. But at the end, I think it's this pretty specific thing, not a catch-all generic mechanism of subconscious expression or whatever like you seem to be thinking.