Book review: Rethinking Consciousness 2020-01-10T20:41:27.352Z · score: 39 (13 votes)
Predictive coding & depression 2020-01-03T02:38:04.530Z · score: 19 (4 votes)
Predictive coding = RL + SL + Bayes + MPC 2019-12-10T11:45:56.181Z · score: 23 (9 votes)
Thoughts on implementing corrigible robust alignment 2019-11-26T14:06:45.907Z · score: 16 (5 votes)
Thoughts on Robin Hanson's AI Impacts interview 2019-11-24T01:40:35.329Z · score: 16 (10 votes)
steve2152's Shortform 2019-10-31T14:14:26.535Z · score: 4 (1 votes)
Human instincts, symbol grounding, and the blank-slate neocortex 2019-10-02T12:06:35.361Z · score: 25 (10 votes)
Self-supervised learning & manipulative predictions 2019-08-20T10:55:51.804Z · score: 17 (6 votes)
In defense of Oracle ("Tool") AI research 2019-08-07T19:14:10.435Z · score: 19 (9 votes)
Self-Supervised Learning and AGI Safety 2019-08-07T14:21:37.739Z · score: 21 (10 votes)
The Self-Unaware AI Oracle 2019-07-22T19:04:21.188Z · score: 23 (8 votes)
Jeff Hawkins on neuromorphic AGI within 20 years 2019-07-15T19:16:27.294Z · score: 155 (56 votes)
Is AlphaZero any good without the tree search? 2019-06-30T16:41:05.841Z · score: 27 (8 votes)
1hr talk: Intro to AGI safety 2019-06-18T21:41:29.371Z · score: 29 (11 votes)


Comment by steve2152 on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-25T03:02:30.128Z · score: 9 (4 votes) · LW · GW

I definitely agree that the player vs character distinction is meaningful, although I would define it a bit differently.

I would identify it with cortical vs subcortical, a.k.a. neocortex vs everything else. (...with the usual footnotes, e.g. the hippocampus counts as "cortical" :-D)

The cortical system basically solves the following problem:

Here is (1) a bunch of sensory & other input data, in the form of spatiotemporal patterns of spikes on input neurons, (2) occasional labels about what's going on right now (e.g. "something good / bad / important is happening"), (3) a bunch of outgoing neurons. Your task is to build a predictive model of the inputs, and use that to choose signals to send into the outgoing neurons, to make more good things happen.

The result is our understanding of the world, our consciousness, imagination, memory, etc. Anything we do that requires understanding the world is done by the cortical system. This is your "character".

The subcortical system is responsible for everything else your brain does to survive, one of which is providing the "labels" mentioned above (that something good / bad / important / whatever is happening right now).

For example, take the fear-of-spiders instinct. If there is a black scuttling blob in your visual field, there's a subcortical vision system (in the superior colliculus) that pattern-matches that moving blob to a genetically-coded template, and thus activates a "Scary!!" flag. The cortical system sees the flag, sees the spider, and thus learns that spiders are scary, and it can plan intelligent actions to avoid spiders in the future.

I have a lot of thoughts on how to describe these two systems at a computational level, including what the neocortex is doing, and especially how the cortical and subcortical systems exchange information. I am hoping to write lots more posts with more details about the latter, especially about emotions.

even the reward and optimization mechanisms themselves may end up getting at least partially rewritten.

Well, there is such a thing as subcortical learning, particularly for things like fine-tuning motor control programs in the midbrain and cerebellum, but I think most or all of the "interesting" learning happens in the cortical system, not subcortical.

In particular, I'm not really expecting the core emotion-control algorithms to be editable by learning or thinking (if we draw an appropriately tight boundary around them).

More specifically: somewhere in the brain is an algorithm that takes a bunch of inputs and calculates "How guilty / angry / happy / smug / etc. should I feel right now?" The inputs to this algorithm come from various places, including from the body (e.g. pain, hunger, hormone levels), and from the cortex (what emotions am I expecting or imagining or remembering?), and from other emotion circuits (e.g. some emotions inhibit or reinforce each other). The inputs to the emotion calculation can certainly change, but I don't expect that the emotion calculation itself changes over time.

It feels like emotion-control calculations can change, because the cortex can be a really dominant input to those calculations, and the cortex really can change, including by conscious effort. Why is the cortex such a dominant input? Think about it: the emotion-calculation circuits don't know whether I'm likely to eat tomorrow, or whether I'm in debt, or whether Alice stole my cookie, or whether I just got promoted. That information is all in the cortex! The emotion circuits get only tiny glimpses of what's going on in the world, particularly through the cortex predicting & imagining emotions, including in empathetic simulation of others' emotions. If the cortex is predicting fear, well, the amygdala obliges by creating actual fear, and then the cortex sees that and concludes that its prediction was right all along! There's very little "ground truth" that the emotion circuits have to go on. Thus, there's a wide space of self-reinforcing habits of thought. It's a terrible system! Totally under-determined. Thus we get self-destructive habits of thought that linger on for decades.

Anyway, I have this long-term vision of writing down the exact algorithm that each of the emotion-control circuits is implementing. I think AGI programmers might find those algorithms helpful, and so might people trying to pin down "human values". I have a long way to go in that quest :-D

there's also a sense in which the player doesn't have anything that we could call values ...

I basically agree; I would describe it by saying that the subcortical systems are kinda dumb. Sure, the superior colliculus can recognize scuttling spiders, and the emotion circuits can "dislike" pain. But any sophisticated concept like "flourishing", "fairness", "virtue", etc. can only be represented in the form of something like "Neocortex World Model Entity ID #30962758", and these things cannot have any built-in relationship to subcortical circuits.

So the player's "values" are going to (1) simple things like "less pain is good", and (2) things that don't have an obvious relation to the outside world, like complicated "preferences" over the emotions inside our empathetic simulations of other people.

If a "purely character-level" model of human values is wrong, how do we incorporate the player level?

Is it really "wrong"? It's a normative assumption ... we get to decide what values we want, right? As "I" am a character, I don't particularly care what the player wants :-P

But either way, I'm all for trying to get a better understanding of how I (the character / cortical system) am "built" by the player / subcortical system. :-)

Comment by steve2152 on (A -> B) -> A in Causal DAGs · 2020-01-22T22:16:29.392Z · score: 5 (3 votes) · LW · GW

Why aren't you notationally distinguishing between "actual model" versus "what the agent believes the model to be"? Or are you and I missed it?

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-20T01:25:47.104Z · score: 3 (2 votes) · LW · GW

Thanks! I just read Luke's report Appendix F on illusionism and it's definitely pointing me in fruitful directions.

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-19T14:12:57.123Z · score: 1 (1 votes) · LW · GW

Explaining consciousness as part of the hard problem of consciousness is different to explaining-away consciousness (or explaining reports of consciousness) as part of the meta problem of consciousness.

I commented here why I think that it shouldn't be possible to fully explain reports of consciousness without also fully explaining the hard problem of consciousness in the process of doing so. I take it you disagree (correct?) but do you see where I'm coming from? Can you be more specific about how you think about that?

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-17T17:36:42.576Z · score: 3 (2 votes) · LW · GW

Hmm. I do take the view that reports of consciousness are (at least in part) caused by consciousness (whatever that is!). (Does anyone disagree with that?) I think a complete explanation of reports of consciousness necessarily include any upstream cause of those reports. By analogy, I report that I am wearing a watch. If you want a "complete and correct explanation" of that report, you need to bring up the fact that I am in fact wearing a watch, and to describe what a watch is. Any explanation omitting the existence of my actual watch would not match the data. Thus, if reports of consciousness are partly caused by consciousness, then it will not be possible to correctly explain those reports unless, somewhere buried within the explanation of the report of consciousness, there is an explanation of consciousness itself. Do you see where I'm coming from?

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-17T15:05:43.185Z · score: 1 (1 votes) · LW · GW

What is specifically ruled out by test's of Bell's inequalities is the conjunction of (local, deterministic). The one thing we know is that the two things you just asserted are not both true. What we don't know is which is false.

I think you're nitpicking here. While we don't know the fundamental laws of the universe with 100% confidence, I would suggest that based on what we do know, they are extremely likely to be local and non-deterministic (as opposed to nonlocal hidden variables). Quantum field theory (QFT) is in that category, and adding general relativity doesn't change anything except in unusual extreme circumstances (e.g. microscopic black holes, or the Big Bang—where the two can't be sensibly combined). String theory doesn't really have a meaningful notion of locality at very small scales (Planck length, Planck time), but at larger scales in normal circumstances it approaches QFT + classical general relativity, which again is local and non-deterministic. (So yes, probably our everyday human interactions have nonlocality at a part-per-googolplex level or whatever, related to quantum fluctuations of the geometry of space itself, but it's hard to imagine that this would matter for anything.)

(By non-deterministic I just mean that the Born rule involves true randomness. In Copenhagen interpretation you say that collapse is a random process. In many-worlds you would say that the laws of physics are deterministic but the quasi-anthropic question "what branch of the wavefunction will I happen to find myself in?" has a truly random answer. Either way is fine; it doesn't matter for this comment.)

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-17T10:34:59.340Z · score: 1 (1 votes) · LW · GW


We also need (I would think) for the experience of consciousness to somehow cause your brain to instruct your hands to type "cogito ergo sum". From what you wrote, I'm sorta imagining physical laws plus experience glued to it ... and that physical laws without experience glued to it would still lead to the same nerve firing pattern, right? Or maybe you'll say physical laws without experience is logically impossible? Or what?

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-17T02:46:35.673Z · score: 3 (2 votes) · LW · GW

develop your own intuitive understanding of everything

I agree 100%!! That's the goal. And I'm not there yet with consciousness. That's why I used the word "annoying and unsatisfying" to describe my attempts to understand consciousness thus far. :-P

You should not be trusting textbook authors when they say that Theorem X is true

I'm not sure you quite followed what I wrote here.

I am saying that it's possible to understand a math proof well enough to have 100% confidence—on solely one's own authority—that the proof is mathematically correct, but still not understand it well enough to intuitively grok it. This typically happens when you can confirm that each step of the proof, taken on its own, is mathematically correct.

If you haven't lived this experience, maybe imagine that I give you a proof of the Riemann hypothesis in the form of 500 pages of equations kinda like this, with no English-language prose or variable names whatsoever. Then you spend 6 months checking rigorously that every line follows from the previous line (or program a computer to do that for you). OK, you have now verified on solely your own authority that the Riemann hypothesis is true. But if I now ask you why it's true, you can't give any answer better than "It's true because this 500-page argument shows it to be true".

So, that's a bit like where I'm at on consciousness. My "proof" is not 500 pages, it's just 4 steps, but that's still too much for me to hold the whole thing in my head and feel satisfied that I intuitively grok it.

  1. I am strongly disinclined to believe (as I think David Chalmers has suggested) that there's a notion of p-zombies, in which an unconscious system could have exactly the same thoughts and behaviors as a conscious one, even including writing books about the philosophy of consciousness, for reasons described here and elsewhere.

  2. If I believe (1), it seems to follow that I should endorse the claim "if we have a complete explanation of the meta-problem of consciousness, then there is nothing left to explain regarding the hard problem of consciousness". The argument more specifically is: Either the behavior in which a philosopher writes a book about consciousness has some causal relation to the nature of consciousness itself (in which case, solving the meta-problem requires understanding the nature of consciousness), or it doesn't (in which case, unconscious p-zombies should (bizarrely) be equally capable of writing philosophy books about consciousness).

  3. I think that Attention Schema Theory offers a complete and correct answer to every aspect of the meta-problem of consciousness, at least every aspect that I can think of.

  4. ...Therefore, I conclude that there is nothing to consciousness beyond the processes discussed in Attention Schema Theory.

I keep going through these steps and they all seem pretty solid, and so I feel somewhat obligated to accept the conclusion in step 4. But I find that conclusion highly unintuitive, I think for the same reason most people do—sorta like, why should any information processing feel like anything at all?

So, I need to either drag my intuitions into line with 1-4, or else crystallize my intuitions into a specific error in one of the steps 1-4. That's where I'm at right now. I appreciate you and others in this comment thread pointing me to helpful and interesting resources! :-)

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-12T14:27:25.788Z · score: 1 (1 votes) · LW · GW

Sorry for being sloppy, you can ignore what I said about "non-physical", I really just meant the more general point that "consciousness doesn't exist (if consciousness is defined as X)" is the same statement as "consciousness does not mean X, but rather Y", and I shouldn't have said "non-physical" at all. You sorta responded to that more general point, although I'm interested in whether you can say more about how exactly you define consciousness such that illusionism is not consciousness. (As I mentioned, I'm not sure I'll disagree with your definition!)

What would it take for it to be false?

I think that if attention schema theory can explain every thought and feeling I have about consciousness (as in my silly example conversation in the "meta-problem of consciousness" section), then there's nothing left to explain. I don't see any way around that. I would be looking for (1) some observable thought / behavior that AST cannot explain, (2) some reason to think those explanations are wrong, or (3) a good argument that true philosophical zombies are sensible, i.e. that you can have two systems whose every observable thought / behavior is identical but exactly one of them is conscious, or (4) some broader framework of thinking that accepts the AST story as far as it goes, and offers a different way to think about it intuitively and contextualize it.

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-12T14:05:22.337Z · score: 3 (2 votes) · LW · GW

Can you suggest a reference which you found helpful for "loading the right intuitions" about consciousness?

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-12T02:25:44.157Z · score: 8 (2 votes) · LW · GW

Ha! Maybe!

Or maybe it's like the times I've read poorly-written math textbooks, and there's a complicated proof of Theorem X, and I'm able to check that every step of the proof is correct, but all the steps seem random, and then out of nowhere, the last step says "Therefore, Theorem X is true". OK, well, I guess Theorem X is true then.

...But if I had previously found Theorem X to be unintuitive ("it seems like it shouldn't be true"), I'm now obligated to fix my faulty intuitions and construct new better ones to replace them, and doing so can be extremely challenging. In that sense, reading and verifying the confusing proof of Theorem X is "annoying and deeply unsatisfying".

(The really good math books offer both a rigorous proof of Theorem X and an intuitive way to think about things such that Theorem X is obviously true once those intuitions are internalized. That saves readers the work of searching out those intuitions for themselves from scratch.)

So, I'm not saying that Graziano's argument is poorly-written per se, but having read the book, I find myself more-or-less without any intuitions about consciousness that I can endorse upon reflection, and this is an annoying and unsatisfying situation. Hopefully I'll construct new better intuitions sooner or later. Or—less likely I think—I'll decide that Graziano's argument is baloney after all :-)

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-11T14:26:34.880Z · score: 3 (2 votes) · LW · GW

If you say that free will and consciousness are by definition non-physical, then if course naturalist explanations explain them away. But you can also choose to define the terms to encompass what you think is really going on. This is called "compatibilism" for free will, and this is Graziano's position on consciousness. I'm definitely signed up for compatibilism on free will and have been for many years, but I don't yet feel 100% comfortable calling Graziano's ideas "consciousness" (as he does), or if I do call it that, I'm not sure which of my intuitions and associations about "consciousness" are still applicable.

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-11T09:38:50.076Z · score: 3 (2 votes) · LW · GW

Yep! I agree with you: Rethinking Consciousness and those two Eliezer posts are coming from a similar place.

(Just to be clear, the phrase "meta-problem of consciousness" comes from David Chalmers, not Graziano. More generally, I don't know exactly which aspects of really anything here are original Graziano inventions, versus Graziano synthesizing ideas from the literature. I'm not familiar with the consciousness literature, and also I listened to the audio book which omits footnotes and references.)

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-11T09:23:16.341Z · score: 3 (2 votes) · LW · GW

Fixed it, thanks!

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-01-11T09:15:40.326Z · score: 5 (3 votes) · LW · GW

Yes! I have edited to make that clearer, thanks.

Comment by steve2152 on [Book Review] The Trouble with Physics · 2020-01-05T20:38:15.206Z · score: 1 (1 votes) · LW · GW

Do you think that the True Fundamental Laws of Physics are definitely not string theory (or anything mathematically related to string theory)? If so, why?

Comment by steve2152 on [Book Review] The Trouble with Physics · 2020-01-05T20:32:34.334Z · score: 2 (2 votes) · LW · GW

It's a consensus in the field that a new revolutionary idea is needed

I disagree, I think the consensus in the field is that the fundamental laws of physics are very likely some version of string theory or something closely related to it. It's not a unanimous consensus, but it is probably the majority position among academic theoretical physicists, even among academic theoretical physicists who are not specifically string theorists themselves. (I haven't looked for surveys, I'm just guessing from personal experience.)

Comment by steve2152 on [Book Review] The Trouble with Physics · 2020-01-05T20:22:29.257Z · score: 4 (3 votes) · LW · GW

Look for a system where the predictions of GR contradict, or at least interact with, the predictions of QM.

There are two main examples: Microscopic exploding black holes, and the Big Bang.

Our current fundamental physics theories are probably adequate to explain literally everything that happens in the solar system, which is one of the reasons I don't think fundamental physics is a particularly time-sensitive topic of research. So we can just wait until we have superintelligent AGI, and ask it what the fundamental laws of physics are. :-)

  1. This energy has no gravitational consequences because it's pulling equally in every direction.

Although I appreciate the intuition that " it's pulling equally in every direction", uniform energy density does in fact have an effect in GR. It causes the whole universe to expand or contract. That's exactly what dark energy is.

Comment by steve2152 on Predictive coding & depression · 2020-01-05T19:43:09.161Z · score: 1 (1 votes) · LW · GW

I basically agree with that. See the section "Feelings of sadness, worthlessness, self-hatred, etc."

Comment by steve2152 on Predictive coding & depression · 2020-01-05T09:47:03.727Z · score: 1 (1 votes) · LW · GW

See the Low Self-Confidence section for why I think a depressed person can feel certain they'll fail despite an inability to create hypotheses with strong predictions.

Note that a "feeling / judgment of certainty about something in the everyday sense" is not exactly the same as the "certainty of a particular top-down prediction of a particular hypothesis (generative model) in the predictive coding brain". The former also involves things like the availability heuristic, process of elimination (I'll fail because I can't imagine succeeding), etc.

Does that answer your question? Do you think that makes sense?

Comment by steve2152 on Predictive coding & depression · 2020-01-04T16:08:18.323Z · score: 1 (1 votes) · LW · GW

Thanks for the comment!

In this post and the previous one you linked to, you do a good job explaining why your criterion e is possible / not ruled out by the data. But can you explain more about what makes you think it's true?

Maybe the reason for (e) would be more clear if you replace "hypothesis" with "possible course of action". Then (e) is the thing that makes us more likely to eat when we're hungry, etc.

("Course of action" is just a special case of what I call "hypothesis". "Hypothesis" is synonymous with "One possible set of top-down predictions".)

I don't think I'm departing from "Surfing Uncertainty" etc. in any big way in that previous post, but I felt that the predictive coding folks don't adequately discuss how the specific hypotheses / predictions are actually calculated in the brain. I might have been channeling the Jeff Hawkins 2004 book a bit to fill in some gaps, but it's mainly my take.

I guess I should contextualize something in my previous post: I think anyone who advocates predictive coding is obligated to discuss The Wishful Thinking Problem. It's not something specific to my little (a-e) diagram. So here is The Wishful Thinking Problem, stripped away from the rest of what I wrote:

Wishful thinking problem: If we're hungry, we have a high-level prior that we're going to eat. Well, that prior privileges predictions that we'll go to a restaurant, which is sensible... but that prior also privileges predictions that food will magically appear in our mouths, which is wishful thinking. We don't actually believe the latter. So that's The Wishful Thinking Problem.

The Wishful Thinking Problem is not a big problem!! It has an obvious solution: Our prior that "magic doesn't happen" is stronger than our prior that "we're going to eat". Thus, we don't expect food to magically appear in our mouth after all! Problem solved! That's all I was saying in that part of the previous post. Sorry if I made it sound overly profound or complicated.

I like Friston's attempt to unify these by saying that bad mood is just a claim that you're in an unpredictable environment

I encourage you to think about it more computationally! The amygdala has a circuit that takes data, does some calculation, and decides on that basis whether to emit a feeling of disgust. And it has another circuit that takes data, does some calculation, and decides whether to emit a feeling of sadness. And so on for boredom and fear and jealousy and every other emotion. Each of these circuits is specifically designed by evolution to emit its feeling in the appropriate circumstances.

So pretend that you're Evolution, designing the sadness circuit. What are you trying to calculate? I think the short answer is:

Sadness circuit design goal: Emit a feeling of sadness when: My prospects are grim, and I have no idea how to make things better.

Why is this the goal, as opposed to something else? Because this is the appropriate situation to cry for help and rethink all your life plans.

OK, so if that's the design goal, then how do you actually build a circuit in the amygdala to do that? Keep in mind that this circuit is not allowed to directly make reference to our understanding of the world, because "our understanding of the world" is an inscrutable pattern of neural activity in a massive, convoluted, learned data structure in the cortex, whereas the emotion circuits need to have specific, genetically-predetermined neuron writing. So what can you do instead? Well, you can design the circuit such that it listens for the cortex to predict rewarding things to happen (the amygdala does have easy access to this information), and to not feel sadness when that signal is occurring regularly. After all, that signal is typically a sign that we are imagining a bright future. This circuit won't perfectly match the design goal, but it's as close as Evolution can get.

(By contrast, the algorithm "check whether you're in an unpredictable environment" doesn't seem to fit, to me. Reading a confusing book is frustrating, not saddening. Getting locked in jail for life is saddening but offers predictability.)

So anyway, my speculation here is that:

(1) a lot of the input data for the amygdala's various emotion calculation circuits comes from the cortex (duh),

(2) the neural mechanism controlling the strength of predictions also controls the strength of signals from the cortex to the amygdala (I think this is a natural consequence of the predictive coding framework, although to be clear, I'm speculating),

(3) a global reduction of the strength of signals going from the cortex to the amygdala affects pretty much all of the emotion circuits, and it turns out that the result is sadness and other negative feelings (this is my pure speculation, although it seems to fit the sadness algorithm example above). I don't think there's any particularly deep reason that globally weaker signals from the cortex to the amygdala creates sadness rather than happiness. I think it just comes out of details about how the various emotion circuits are implemented, and interact.

(The claim "depression involves global weakening of signals going from cortex to amygdala" seems like it would be pretty easy to test, if I had a psych lab. First try to elicit disgust in a way that bypasses the cortex, like smelling something gross. Then try to elicit disgust in a way that requires data to pass from the cortex to the amygdala, like remembering or imagining something gross. [Seeing something gross can be in either category, I think.] My prediction is that in the case that doesn't involve cortex, you'll get the same disgust reaction for depressed vs control; and in the case that does involve cortex, depressed people will have a weaker disgust reaction, proportional to the severity of the depression.)

the best fits (washed-out visual field and psychomotor retardation) are really marginal symptoms of depression that you only find in a few of the worst cases

I guess that counts against this blog post, but I don't think it quite falsifies it. Instead I can claim that motor control works normally if the cortical control signals are above some threshold. So the signals can get somewhat weaker without creating a noticeable effect, but if they get severely weaker, it starts butting against the threshold and starts to show. (The motor control signals do, after all, get further processed by the cerebellum etc.; they're not literally controlling muscles themselves.) Ditto for washed-out visual field; the appearance of a thing you're staring at is normally a super strong prediction, maybe it can get somewhat weaker without creating a noticeable effect. Whereas maybe the amygdala is more sensitive to relatively small changes in signal levels, for whatever reason. (This paragraph might be special pleading, I'm not sure.)

There are two perspectives. One is "Let's ignore the worst of the worst cases, their brains might be off-kilter in all kinds of ways!" The other is "Let's especially look at the worst of the worst cases, because instead of trying to squint at subtle changes of brain function, the effects will be turned up to 11! They'll be blindingly obvious!"

I'm not sure what direction all of this happens in.

I think it's gotta be a vicious cycle, otherwise it wouldn't persist, right? OK how about this: "Globally weaker predictions cause sadness, and sadness causes globally weaker predictions".

I already talked about the first part. But why might sadness cause globally weaker predictions? Well, one evolutionary goal of sadness is to make us less attached to our current long-term plans, since those plans apparently aren't working out for us! (Remember the sadness circuit design goal I wrote above.) Globally weaker predictions would do that, right? As you weaken the prediction, "active plans" turn into "possible plans", then into "vague musings"...

Anyway, maybe that vicious cycle dynamic is always present to some extent, but other processes push in other directions and keep our emotions stable. ...Until a biochemical insult—or an unusually prolonged bout of "normal" sadness (e.g. from grief)—tips the system out of balance, and we get sucked into that vortex of mutually reinforcing "sadness + globally weak predictions".

Comment by steve2152 on Plausible A.I. Takeoff Scenario Short Story · 2020-01-01T20:11:16.828Z · score: 3 (3 votes) · LW · GW

Hmmm. I agree about "independent person"—I don't think a lot of "independent persons" are working on AGI, or that they (collectively) have a high chance of success (with all due respect to the John Carmacks of the world!).

I guess the question is what category you put grad students, postdocs, researchers, and others in small research groups, especially at universities. They're not necessarily "paid a ton of money" (I sure wasn't!), but they do "work on this stuff all day". If you look at the list of institutions submitting NeurIPS 2019 papers, there's a very long tail of people at small research groups, who seem to collectively comprise the vast majority of submissions, as far as I can tell. (I haven't done the numbers. I guess it depends on where we draw the line between "small research groups" and "big"... Also there are a lot of university-industry collaborations, which complicates the calculation.)

(Admittedly, not all papers are equally insightful, and maybe OpenAI & DeepMind's papers are more insightful than average, but I don't think that's a strong enough effect to make them account for "most" AI insights.)

See also: long discussion thread on groundbreaking PhD dissertations through history, ULMFiT, the Wright Brothers, Grigori Perelman, Einstein, etc.

Comment by steve2152 on Plausible A.I. Takeoff Scenario Short Story · 2020-01-01T17:42:42.229Z · score: 6 (5 votes) · LW · GW

If AGI entails a gradual accumulation of lots of little ideas and best practices, then the story is doubly implausible in that (1) the best AGI would probably be at a big institution (as you mention), and (2) the world would already be flooded with slightly-sub-AGIs that have picked low-hanging fruit like Mechanical Turk. (And there wouldn't be a sharp line between "slightly-sub-AGI" and "AGI" anyway.)

But I don't think we should rule out the scenario where AGI entails a small number of new insights, or one insight which is "the last piece of the puzzle", and where a bright and lucky grad student in Wellington could be the one who puts it all together, and where a laptop is sufficient to bootstrap onto better hardware as discussed in the story. In fact I see this "new key insight" story as fairly plausible, based on my belief that human intelligence doesn't entail that many interacting pieces (further discussion here), and some (vague) thinking about what the pieces are and how well the system would work when one of those pieces is removed.

I don't make a strong claim that it will definitely be like the "new key insight" story and not the "gradual accumulation of best practices" story. I just think neither scenario can be ruled out, or at least that's my current thinking.

Comment by steve2152 on human psycholinguists: a critical appraisal · 2020-01-01T09:17:33.253Z · score: 3 (2 votes) · LW · GW

for the most part the field has assumed something like the Transformer model is how the lower levels of speech production worked

Can you be more specific about what you mean by "something like the Transformer model"? Or is there a reference you recommend? I don't think anyone believes that there are literally neurons in the brain wired up into a Transformer, or anything like that, right?

Comment by steve2152 on human psycholinguists: a critical appraisal · 2019-12-31T19:26:30.140Z · score: 5 (4 votes) · LW · GW

I found a recent paper that ran a fun little contest on whether three seq2seq models (LSTMS2S, ConvS2S and Transformer) are "compositional", for various definitions of "compositional": The compositionality of neural networks: integrating symbolism and connectionism. (Answer: Basically yes, especially the Transformer.) That was somewhat helpful, but I still feel like I don't really understand what exactly these models are learning and how (notwithstanding your excellent Transformer blog post), or how their "knowledge" compares with the models built by the more directly brain-inspired wing of ML (example), or for that matter to actual brain algorithms. I need to think about it more. Anyway, thanks for writing this, it's a helpful perspective on these issues.

Comment by steve2152 on Parameter vs Synapse? · 2019-12-29T21:56:30.001Z · score: 1 (1 votes) · LW · GW

See AI Impacts articles on 'Human Level Hardware' if you haven't already. I haven't dug into it myself, but I agree that your question is a good one.

A simpler related question that I don't know off-hand is, what prevents trillion-parameter NN's? Does training data requirement scale with network size? (in which case, I don't expect this to be a problem for long, because I expect we'll find algorithms with human-level data efficiency before we get AGI.) Or just the limited memory capacity per GPU, and the hassle / overhead / cost of parallelization? (in which case, again I expect we'll get dramatically more parallelizable algorithms in the near future.)

Comment by steve2152 on We need to revisit AI rewriting its source code · 2019-12-28T01:32:23.238Z · score: 5 (3 votes) · LW · GW

I'll be interested if you have any more specific ideas here. I can't think of anything because:

  1. The question of "How can an AGI self-modify into a safe and beneficial AGI?" seems pretty similar to "How can a person program a safe and beneficial AGI?", at least until the system is so superhumanly advanced that it can hopefully figure out the answer itself. So in that sense, everyone is thinking about it all the time.

  2. The challenges of safe self-modification don't seem wildly different than the challenges of safe learning (after all, learning changes the agent too), including things like goal stability, ontological crises, etc. And whereas learning is basically mandatory, deeper self-modification could (probably IMO) be turned off if necessary, again at least until the system is so superhumanly advanced that it can solve the problem itself. So in that sense, at least some people are sorta thinking about it these days.

  3. I dunno, I just can't think of any experiment we could do with today's AI in this domain that would discover or prove something that wasn't already obvious. (...Which of course doesn't mean that such experiments don't exist.)

Comment by steve2152 on Understanding Machine Learning (II) · 2019-12-25T11:00:53.154Z · score: 2 (2 votes) · LW · GW

Thanks, this and the previous post helped me understand a couple things that I had previously found confusing.

Comment by steve2152 on Might humans not be the most intelligent animals? · 2019-12-24T13:27:52.755Z · score: 1 (1 votes) · LW · GW

One could respond by saying that language is a specific human instinct, and if we were all elephants we would be talking about how no other species has anything like our uniquely elephant trunk, etc. etc. (I think I took that example from a Steven Pinker book?) There are certainly cognitive tasks that other animals can do that we can't at all or as well, like dragonflies predicting the trajectories of their prey (although we could eventually program a computer to do that, I imagine). Anyway, to the larger point, I actually don't have a strong opinion about the intelligence of "humans without human culture", and don't see how it's particularly relevant to anything. "Humans without human culture" might or might not have language; I know that groups of kids can invent languages from scratch (e.g. Nicaraguan sign language) but I'm not sure about a single human.

Comment by steve2152 on Might humans not be the most intelligent animals? · 2019-12-24T02:36:49.409Z · score: 1 (1 votes) · LW · GW

If the reason for our technological dominance is due to our ability to process culture, however, then the case for a discontinuous jump in capabilities is weaker. This is because our AI systems can already process culture somewhat efficiently right now (see GPT-2) and there doesn't seem like a hard separation between "being able to process culture inefficiently" and "able to process culture efficiently" other than the initial jump from not being able to do it at all, which we have already passed.

I think you're giving GPT-2 too much credit here. I mean, on any dimension of intelligence, you can say there's a continuum of capabilities along that scale with no hard separations. The more relevant question is, might there be a situation where all the algorithms are like GPT-2, which can only pick up superficial knowledge, and then someone has an algorithmic insight, and now we can make algorithms that, as they read more and more, develop ever deeper and richer conceptual understandings? And if so, how fast could things move after that insight? I don't think it's obvious.

I do agree that pretty much everything that might make an AGI suddenly powerful and dangerous is in the category of "taking advantage of the products of human culture", for example: coding (recursive self-improvement, writing new modules, interfacing with preexisting software and code), taking in human knowledge (reading and deeply understanding books, videos, wikipedia, etc., a.k.a. "content overhang") , computing hardware (self-reproduction / seizing more computing power , a.k.a. "hardware overhang"), the ability of humans to coordinate and cooperate (social manipulation, earning money, etc.), etc. In all these cases and more, I would agree that one could in principle define a continuum of capabilities from 0 to superhuman with no hard separations, but still think that it's possible for a new algorithm to jump from "2019-like" (which is more than strictly 0) to "really able to take advantage of this tool like humans can or beyond" in one leap.

Sorry if I'm misunderstanding your point.

Comment by steve2152 on Is the term mesa optimizer too narrow? · 2019-12-16T11:06:16.738Z · score: 6 (3 votes) · LW · GW

Hmm, let me try again to explain what I'm trying to get at.

Let's say you're a prison architect, building prisons for three prisoners

  • Prisoner Alice is 100% maximally obsessed with breaking out of prison. She eats and sleeps and works and socializes precisely to the extent that these are instrumentally useful for breaking out of prison. She thinks hard and creatively about breakout strategies during every free moment.

  • Prisoner Carol does not treat breaking out as an explicit goal, although if she happened upon an open door, she would absolutely walk out. She spends her days reading books from the prison library, socializing, working, etc.

  • Prisoner Bob is somewhere between the two. For 23 hours a day, he has various persuits like Carol. But for a total of 1 hour a day, he works on the project of breaking out of prison. (For a total of ~30 minutes scattered through the day, he does opportunistic information-gathering—he watches guard behavior, he explores the prison grounds, etc. Then, for the 30 minutes before he goes to sleep each night, he sits quietly in his cell, and thinks hard and creatively about how to break out.)

Here's my question: "Is Bob more like Alice or more like Carol?"

From a psychological perspective, Bob is similar to Carol, while Alice is kinda nuts, even cartoonish. Bob and Carol are doing the same range if activities for the same reasons for 23 hours a day = 96%. They both make friends, they both exercise, they both read books, they both complain about the weather. Alice has a simple utility function, whereas Bob and Carol are actual plausible humans.

From a prison safety and security perspective, Bob and Alice are very similar to each other, and wildly different from Carol. Alice and Bob are Very High Breakout Risk, Carol is Low Breakout Risk. If you build the prison with 5 layers of security, each of which is 90% reliable, Carol will be kept locked up with 99.999% certainty, whereas either Alice or Bob will engineer a situation where all 5 layers break at once. Alice will come up with good breakout plans sooner than Bob, but not exponentially sooner; it's a difference of degree not kind, unlike Bob versus Carol.

The moral I'm trying to convey is that, when we talk about mesa-optimization, the important question is "Can we correctly explain any non-infinitesimal subset of the system's behavior as explicit optimization for a misaligned goal?" , not "Can we correctly explain 100% of the system's behavior as explicit optimization for a misaligned goal?"

Comment by steve2152 on Is the term mesa optimizer too narrow? · 2019-12-15T16:51:23.639Z · score: 5 (3 votes) · LW · GW

humans are clearly not mesa optimizers. Most optimization we do is implicit.

Sure, but some of the optimization we do is explicit, like if someone is trying to get out of debt. Are you saying there's an important safety-relevant distinction between "system that sometimes does explicit optimization but also does other stuff" versus "system that does explicit optimization exclusively"? And/or that "mesa-optimizer" only refers to the latter (in the definition you quote)? I was assuming not... Or maybe we should say that the human mind has "subagents", some of which are mesa-optimizers...?

Comment by steve2152 on Computational Model: Causal Diagrams with Symmetry · 2019-12-13T01:58:29.932Z · score: 3 (2 votes) · LW · GW

The mention of circuits in your later article reminded me of a couple arguments I had on wikipedia a few years ago (2012, 2015). I was arguing basically that cause-effect (or at least the kind of cause-effect relationship that we care about and use in everyday reasoning) is part of the map, not territory.

Here's an example I came up with:

Consider a 1kΩ resistor, in two circuits. The first circuit is the resistor attached to a 1V power supply. Here an engineer would say: "The supply creates a 1V drop across the resistor; and that voltage drop causes a 1mA current to flow through the resistor." The second circuit is the resistor attached to a 1mA current source. Here an engineer would say: "The current source pushes a 1mA current through the resistor; and that current causes a 1V drop across the resistor." Well, it's the same resistor ... does a voltage across a resistor cause a current, or does a current through a resistor cause a voltage, or both, or neither? Again, my conclusion was that people think about causality in a way that is not rooted in physics, and indeed if you forced someone to exclusively use physics-based causal models, you would be handicapping them. I haven't thought about it much or delved into the literature or anything, but this still seems correct to me. How do you see things?

Comment by steve2152 on The Credit Assignment Problem · 2019-12-12T17:40:51.248Z · score: 3 (2 votes) · LW · GW

I wrote a post inspired by / sorta responding to this one—see Predictive coding = RL + SL + Bayes + MPC

Comment by steve2152 on Computational Model: Causal Diagrams with Symmetry · 2019-12-09T16:09:43.046Z · score: 3 (2 votes) · LW · GW

I'll highlight that the brain and a hypothetical AI might not use the same primitives - they're on very different hardware, after all

Sure. There are a lot of levels at which algorithms can differ.

  • quicksort.c compiled by clang versus quicksort.c compiled by gcc
  • quicksort optimized to run on a CPU vs quicksort optimized to run on an FPGA
  • quicksort running on a CPU vs mergesort running on an FPGA
  • quicksort vs a different algorithm that doesn't involve sorting the list at all

There are people working on neuromorphic hardware, but I don't put much stock in anything coming of it in terms of AGI (the main thrust of that field is low-power sensors). So I generally think it's very improbable that we would copy brain algorithms at the level of firing patterns and synapses (like the first bullet-point or less). I put much more weight on the possibility of "copying" brain algorithms at like vaguely the second or third bullet-point level. But, of course, it's also entirely possible for an AGI to be radically different from brain algorithms in every way. :-)

Comment by steve2152 on Computational Model: Causal Diagrams with Symmetry · 2019-12-09T15:46:37.395Z · score: 5 (3 votes) · LW · GW

On further reflection, the primitive of "temporal sequences" (more specifically high-order Markov chains) isn't that different from cause-effect. High-order Markov chains are like "if A happens and then B and then C, probably D will happen next". So if A and B and C are a person moving to kick a ball, and D is the ball flying up in the air...well I guess that's at least partway to representing cause-effect...

(High-order Markov chains are more general than cause-effect because they can also represent non-causal things like the lyrics of a song. But in the opposite direction, I'm having trouble thinking of a cause-effect relation that can not be represented as a high-order Markov chain, at least at some appropriate level of abstraction, and perhaps with some context-dependence of the transitions.)

I have pretty high confidence that high-order Markov chains are one of the low-level primitives of the brain, based on both plausible neural mechanisms and common sense (e.g. it's hard to say the letters of the alphabet in reverse order). I'm less confident about what exactly are the elements of those Markov chains, and what are the other low-level primitives, and what's everything else that's going on. :-)

Just thinking out loud :-)

Comment by steve2152 on Computational Model: Causal Diagrams with Symmetry · 2019-12-09T11:23:43.172Z · score: 3 (2 votes) · LW · GW

Sounds exciting, and I wish you luck and look forward to reading whatever you come up with! :-)

Comment by steve2152 on Computational Model: Causal Diagrams with Symmetry · 2019-12-08T02:20:41.217Z · score: 3 (2 votes) · LW · GW

For what it's worth, my current thinking on brain algorithms is that the brain has a couple low-level primitives, like temporal sequences and spatial relations, and these primitives are able to represent (1) cause-effect, (2) hierarchies, (3) composition, (4) analogies, (5) recursion, and on and on, by combining these primitives in different ways and with different contextual "metadata". This is my opinion, it's controversial in the field of cognitive science and I could be wrong. But anyway, that makes me instinctively skeptical of world-modeling theories where everything revolves around cause-effect, and equally skeptical of world-modeling theories where everything revolves around hierarchies, etc. etc. I would be much more excited about world-modeling theories where all those 5 different types of relationships (and others I omitted, and shades of gray in between them) are all equally representable.

(This is just an instinctive response / hot-take. I don't have a particularly strong opinion that the research direction you're describing here is unpromising.)

Comment by steve2152 on What is Abstraction? · 2019-12-08T01:16:39.075Z · score: 5 (3 votes) · LW · GW

An abstraction like "object permanence" would be useful for a very wide variety of goals, maybe even for any real-world goal. An abstraction like "golgi apparatus" is useful for some goals but not others. "Lossless" is not an option in practice: our world is too rich, you can just keep digging deeper into any phenomenon until you run out of time and memory ... I'm sure that a 50,000 page book could theoretically be written about earwax, and it would still leave out details which for some goals would be critical. :-)

Comment by steve2152 on Seeking Power is Instrumentally Convergent in MDPs · 2019-12-07T18:42:04.579Z · score: 1 (1 votes) · LW · GW

I agree 100% with everything you said.

Comment by steve2152 on Seeking Power is Instrumentally Convergent in MDPs · 2019-12-07T02:47:10.351Z · score: 12 (4 votes) · LW · GW

This is great work, nice job!

Maybe a shot in the dark, but there might be some connection with that paper a few years back Causal Entropic Forces (more accessible summary). They define "causal path entropy" as basically the number of different paths you can go down starting from a certain point, which might be related to or the same as what you call "power". And they calculate some examples of what happens if you maximize this (in a few different contexts, all continuous not discrete), and get fun things like (what they generously call) "tool use". I'm not sure that paper really adds anything important conceptually that you don't already know, but just wanted to point that out, and PM me if you want help decoding their physics jargon. :-)

Comment by steve2152 on What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. · 2019-12-03T11:24:22.934Z · score: 8 (5 votes) · LW · GW

I think capybaralet meant ≥1%.

I don't think your last paragraph is fair; doing outreach / advocacy, and discussing it, is not particularly related to motivated cognition. You don't know how much time capybaralet has spent trying to figure out whether their views are justified; you're not going to get a whole life story in an 800-word blog post.

There is such a thing as talking to an ideological opponent who has spent no time thinking about a topic and has a dumb opinion that could not survive 5 seconds of careful thought. We should still be good listeners, not be condescending, etc., because that's just the right way to talk to people; but realistically we're probably not going to learn anything new (about this specific topic) from such a person, let alone change our own minds (assuming we've already deeply engaged with both sides of the issue).

On the other hand, when talking to an ideological opponent who has spent a lot of time thinking about an issue, we may indeed learn something or change our mind, and I'm all for being genuinely open-minded and seeking out and thinking hard about such opinions. But I think that's not the main topic of this little blog post.

Comment by steve2152 on A list of good heuristics that the case for AI x-risk fails · 2019-12-02T21:22:14.337Z · score: 4 (3 votes) · LW · GW

Is there a better reference for " a number of experts have voiced concerns about AI x-risk "? I feel like there should be by now...

I hope someone actually answers your question, but FWIW, the Asilomar principles were signed by an impressive list of prominent AI experts. Five of the items are related to AGI and x-risk. The statements aren't really strong enough to declare that those people "voiced concerns about AI x-risk", but it's a data-point for what can be said about AI x-risk while staying firmly in the mainstream.

My experience in casual discussions is that it's enough to just name one example to make the point, and that example is of course Stuart Russell. When talking to non-ML people—who don't know the currently-famous AI people anyway—I may also mention older examples like Alan Turing, Marvin Minsky, or Norbert Wiener.

Thanks for this nice post. :-)

Comment by steve2152 on Neural Annealing: Toward a Neural Theory of Everything (crosspost) · 2019-11-30T03:11:32.200Z · score: 15 (8 votes) · LW · GW

Really? I don't get it at all. When I read cognitive neuroscience I'm looking for a coalescing between (1) Some story about the brain doing biologically-useful computations, in a way that matches (2) The things we know neurons and evolution are capable of, and (3) The things we know about what the human brain does (e.g. from introspection and scientific studies). I'm not seeing this in the CSHW material—particularly the connection to biologically-useful computations. The power distribution among harmonics carries very little information—dozens of bits, not the billions or trillions of bits that are needed for human-level understanding of the world. So what's the relation between CSHW and biologically-useful computations? Again, I don't get it. (Note: This is an initial impression based on cursory reading.)

Comment by steve2152 on Self-Fulfilling Prophecies Aren't Always About Self-Awareness · 2019-11-27T13:22:31.739Z · score: 3 (2 votes) · LW · GW

Ah, OK, I buy that, thanks. What about the idea of building a system that doesn't model itself or its past predictions, and asking it questions that don't entail modeling any other superpredictors? (Like "what's the likeliest way for a person to find the cure for Alzheimers, if we hypothetically lived in a world with no superpredictors or AGIs?")

Comment by steve2152 on Thoughts on Robin Hanson's AI Impacts interview · 2019-11-24T20:10:52.612Z · score: 3 (3 votes) · LW · GW

I'm all for thinking about brain-computer interfaces—what forms might they take, how likely are they, how desirable are they? I would actually lump this into the category of AGI safety research, not just because I draw the category pretty broadly, but because it's best done and likeliest to be done by the same people who are doing other types of thinking about AGI safety. It's possible that Robin has something narrower in mind when he talks about "AI risk", so maybe there's some common ground where we both think that brain-computer interface scenarios deserve more careful analysis? Not sure.

There does seem to be a disconnect, where people like Ray Kurzweil and Elon Musk say that brain-computer interfaces are critical for AGI safety (e.g. WaitButWhy), while most AGI safety researchers (e.g. at MIRI and OpenAI) don't seem to talk about brain-computer interfaces at all. I think the latter group has come to the conclusion that brain-computer interfaces are unhelpful, and not just unlikely, but I haven't seen a good articulation of that argument. It could also just be oversight / specialization. (Bostrom's Superintelligence has just a couple paragraphs about brain-computer interfaces, generally skeptical but noncommittal, if memory serves.) It's on my list of things to think more carefully about and write up, if I ever get around to it, and no one else does it first.

Comment by steve2152 on Robin Hanson on the futurist focus on AI · 2019-11-24T01:43:22.149Z · score: 3 (2 votes) · LW · GW

Thanks a lot for doing this! I had more to say than fit in a comment ... check out my Thoughts on Robin Hanson's AI Impacts interview

Comment by steve2152 on Self-Fulfilling Prophecies Aren't Always About Self-Awareness · 2019-11-20T03:11:55.777Z · score: 1 (1 votes) · LW · GW

This is good stuff!

...if the Predict-O-Matic knows about (or forecasts the development of) anything which can be modeled using the outside view "I'm not sure how this thing works, but its predictions always seem to come true!"

Can you walk through the argument here in more detail? I'm not sure I follow it; sorry if I'm being stupid.

I'll start: There's two identical systems, "Predict-O-Matic A" and "Predict-O-Matic B", sitting side-by-side on a table. For simplicity let's say that A knows everything about B, B knows everything about A, but A is totally oblivious to the existence of A, and B to B. Then what? What's a question you might you ask it that would be problematic? Thanks in advance!

Comment by steve2152 on How common is it for one entity to have a 3+ year technological lead on its nearest competitor? · 2019-11-17T16:38:51.473Z · score: 2 (2 votes) · LW · GW

Are you interested in hardware, software, or both? Hardware tends to have slower development cycles. Also hardware is more easily patentable, which complicates your question (if it takes a decade for the patent to expire, does that count as a decade-long technological lead?)

It's entirely possible that the first real AGI algorithm will require (in practice) a custom ASIC to run it, maybe even using non-standard cleanroom fabrication processes or materials. OTOH, it's also entirely possible that it will run on a normal GPU or FPGA. Hard to say, but important to think about...

Comment by steve2152 on The Credit Assignment Problem · 2019-11-14T13:25:14.380Z · score: 3 (3 votes) · LW · GW

I think I agree with everything you wrote. I thought about it more, let me try again:

Maybe we're getting off on the wrong foot by thinking about deep RL. Maybe a better conceptual starting point for human brains is more like The Scientific Method.

We have a swarm of hypotheses (a.k.a. generative models), each of which is a model of the latent structure (including causal structure) of the situation.

How does learning-from-experience work? Hypotheses gain prominence by making correct predictions of both upcoming rewards and upcoming sensory inputs. Also, when there are two competing prominent hypotheses, then specific areas where they make contradictory predictions rise to salience, allowing us to sort out which one is right.

How do priors work? Hypotheses gain prominence by being compatible with other highly-weighted hypotheses that we already have.

How do control-theory-setpoints work? The hypotheses often entail "predictions" about our own actions, and hypotheses gain prominence by predicting that good things will happen to us, we'll get lots of reward, we'll get to where we want to go while expending minimal energy, etc.

Thus, we wind up adopting plans that balance (1) plausibility based on direct experience, (2) plausibility based on prior beliefs, and (3) desirability based on anticipated reward.

Credit assignment is a natural part of the framework because one aspect of the hypotheses is a hypothesized mechanism about what in the world causes reward.

It also seems plausibly compatible with brains and Hebbian learning.

I'm not sure if this answers any of your questions ... Just brainstorming :-)