Posts

Reward Is Not Enough 2021-06-16T13:52:33.745Z
Book review: "Feeling Great" by David Burns 2021-06-09T13:17:59.411Z
Supplement to "Big picture of phasic dopamine" 2021-06-08T13:08:00.832Z
Big picture of phasic dopamine 2021-06-08T13:07:43.192Z
Electric heat pumps (Mini-Splits) vs Natural gas boilers 2021-05-30T15:51:53.580Z
Young kids catching COVID: how much to worry? 2021-04-20T18:03:32.499Z
Solving the whole AGI control problem, version 0.0001 2021-04-08T15:14:07.685Z
My AGI Threat Model: Misaligned Model-Based RL Agent 2021-03-25T13:45:11.208Z
Against evolution as an analogy for how humans will create AGI 2021-03-23T12:29:56.540Z
Acetylcholine = Learning rate (aka plasticity) 2021-03-18T13:57:39.865Z
Is RL involved in sensory processing? 2021-03-18T13:57:28.888Z
Comments on "The Singularity is Nowhere Near" 2021-03-16T23:59:30.667Z
Book review: "A Thousand Brains" by Jeff Hawkins 2021-03-04T05:10:44.929Z
Full-time AGI Safety! 2021-03-01T12:42:14.813Z
Late-talking kids and "Einstein syndrome" 2021-02-03T15:16:05.284Z
[U.S. specific] PPP: free money for self-employed & orgs (time-sensitive) 2021-01-09T19:53:09.088Z
Multi-dimensional rewards for AGI interpretability and control 2021-01-04T03:08:41.727Z
Conservatism in neocortex-like AGIs 2020-12-08T16:37:20.780Z
Supervised learning in the brain, part 4: compression / filtering 2020-12-05T17:06:07.778Z
Inner Alignment in Salt-Starved Rats 2020-11-19T02:40:10.232Z
Supervised learning of outputs in the brain 2020-10-26T14:32:54.061Z
"Little glimpses of empathy" as the foundation for social emotions 2020-10-22T11:02:45.036Z
My computational framework for the brain 2020-09-14T14:19:21.974Z
Emotional valence vs RL reward: a video game analogy 2020-09-03T15:28:08.013Z
Three mental images from thinking about AGI debate & corrigibility 2020-08-03T14:29:19.056Z
Can you get AGI from a Transformer? 2020-07-23T15:27:51.712Z
Selling real estate: should you overprice or underprice? 2020-07-20T15:54:09.478Z
Mesa-Optimizers vs “Steered Optimizers” 2020-07-10T16:49:26.917Z
Gary Marcus vs Cortical Uniformity 2020-06-28T18:18:54.650Z
Building brain-inspired AGI is infinitely easier than understanding the brain 2020-06-02T14:13:32.105Z
Help wanted: Improving COVID-19 contact-tracing by estimating respiratory droplets 2020-05-22T14:05:10.479Z
Inner alignment in the brain 2020-04-22T13:14:08.049Z
COVID transmission by talking (& singing) 2020-03-29T18:26:55.839Z
COVID-19 transmission: Are we overemphasizing touching rather than breathing? 2020-03-23T17:40:14.574Z
SARS-CoV-2 pool-testing algorithm puzzle 2020-03-20T13:22:44.121Z
Predictive coding and motor control 2020-02-23T02:04:57.442Z
On unfixably unsafe AGI architectures 2020-02-19T21:16:19.544Z
Book review: Rethinking Consciousness 2020-01-10T20:41:27.352Z
Predictive coding & depression 2020-01-03T02:38:04.530Z
Predictive coding = RL + SL + Bayes + MPC 2019-12-10T11:45:56.181Z
Thoughts on implementing corrigible robust alignment 2019-11-26T14:06:45.907Z
Thoughts on Robin Hanson's AI Impacts interview 2019-11-24T01:40:35.329Z
steve2152's Shortform 2019-10-31T14:14:26.535Z
Human instincts, symbol grounding, and the blank-slate neocortex 2019-10-02T12:06:35.361Z
Self-supervised learning & manipulative predictions 2019-08-20T10:55:51.804Z
In defense of Oracle ("Tool") AI research 2019-08-07T19:14:10.435Z
Self-Supervised Learning and AGI Safety 2019-08-07T14:21:37.739Z
The Self-Unaware AI Oracle 2019-07-22T19:04:21.188Z
Jeff Hawkins on neuromorphic AGI within 20 years 2019-07-15T19:16:27.294Z
Is AlphaZero any good without the tree search? 2019-06-30T16:41:05.841Z

Comments

Comment by Steven Byrnes (steve2152) on Reward Is Not Enough · 2021-06-17T19:52:57.971Z · LW · GW

how does it avoid wireheading

Um, unreliably, at least by default. Like, some humans are hedonists, others aren't.

I think there's a "hardcoded" credit assignment algorithm. When there's a reward prediction error, that algorithm primarily increments the reward-prediction / value associated with whatever stuff in the world model became newly active maybe half a second earlier. And maybe to a lesser extent, it also increments the reward-prediction / value associated with anything else you were thinking about at the time. (I'm not sure of the gory details here.)

Anyway, insofar as "the reward signal itself" is part of the world-model, it's possible that reward-prediction / value will wind up attached to that concept. And then that's a desire to wirehead. But it's not inevitable. Some of the relevant dynamics are:

  • Timing—if credit goes mainly to signals that slightly precede the reward prediction error, then the reward signal itself is not a great fit.
  • Explaining away—once you have a way to accurately predict some set of reward signals, it makes the reward prediction errors go away, so the credit assignment algorithm stops running for those signals. So the first good reward-predicting model gets to stick around by default. Example: we learn early in life that the "eating candy" concept predicts certain reward signals, and then we get older and learn that the "certain neural signals in my brain" concept predicts those same reward signals too. But just learning that fact doesn't automatically translate into "I really want those certain neural signals in my brain". Only the credit assignment algorithm can make a thought appealing, and if the rewards are already being predicted then the credit assignment algorithm is inactive. (This is kinda like the behaviorism concept of blocking.)
  • There may be some kind of bias to assign credit to predictive models that are simple functions of sensory inputs, when such a model exists, other things equal. (I'm thinking here of the relation between amygdala predictions, which I think are restricted to relatively simple functions of sensory input, versus mPFC predictions, which I think can involve more abstract situational knowledge. I'm still kinda confused about how this works though.)
  • There's a difference between hedonism-lite ("I want to feel good, although it's not the only thing I care about") and hedonism-level-10 ("I care about nothing whatsoever except feeling good"). My model would suggest that hedonism-lite is widespread, but hedonism-level-10 is vanishingly rare or nonexistent, because it requires that somehow all value gets removed from absolutely everything in the world-model except that one concept of the reward signal.

For AGIs we would probably want to do other things too, like (somehow) use transparency to find "the reward signal itself" in the world-model and manually fix its reward-prediction / value at zero, or whatever else we can think of. Also, I think the more likely failure mode is "wireheading-lite", where the desire to wirehead is trading off against other things it cares about, and then hopefully conservatism (section 2 here) can help prevent catastrophe.

Comment by Steven Byrnes (steve2152) on Reward Is Not Enough · 2021-06-17T17:48:46.902Z · LW · GW

Thanks!

I had totally forgotten about your subagents post.

this post doesn't cleanly distinguish between reward-maximization and utility-maximization

I've been thinking that they kinda blend together in model-based RL, or at least the kind of (brain-like) model-based RL AGI that I normally think about. See this comment and surrounding discussion. Basically, one way to do model-based RL is to have the agent create a predictive model of the reward and then judge plans based on their tendency to maximize "the reward as currently understood by my predictive model". Then "the reward as currently understood by my predictive model" is basically a utility function. But at the same time, there's a separate subroutine that edits the reward prediction model (≈ utility function) to ever more closely approximate the true reward function (by some learning algorithm, presumably involving reward prediction errors).

In other words: At any given time, the part of the agent that's making plans and taking actions looks like a utility maximizer. But if you lump together that part plus the subroutine that keeps editing the reward prediction model to better approximate the real reward signal, then that whole system is a reward-maximizing RL agent.

Please tell me if that makes any sense or not; I've been planning to write pretty much exactly this comment (but with a diagram) into a short post.

Comment by Steven Byrnes (steve2152) on Reward Is Not Enough · 2021-06-16T18:30:23.429Z · LW · GW

I'm all for doing lots of testing in simulated environments, but the real world is a whole lot bigger and more open and different than any simulation. Goals / motivations developed in a simulated environment might or might not transfer to the real world in the way you, the designer, were expecting.

So, maybe, but for now I would call that "an intriguing research direction" rather than "a solution".

Comment by Steven Byrnes (steve2152) on Reward Is Not Enough · 2021-06-16T17:46:00.073Z · LW · GW

Right, the word "feasibly" is referring to the bullet point that starts "Maybe “Reward is connected to the abstract concept of ‘I want to be able to sing well’?”". Here's a little toy example we can run with: teaching an AGI "don't kill all humans". So there are three approaches to reward design that I can think of, and none of them seem to offer a feasible way to do this (at least, not with currently-known techniques):

  1. The agent learns by experiencing the reward. This doesn't work for "don't kill all humans" because when the reward happens it's too late.
  2. The reward calculator is sophisticated enough to understand what the agent is thinking, and issue rewards proportionate to the probability that the current thoughts and plans will eventually lead to the result-in-question happening. So the AGI thinks "hmm, maybe I'll blow up the sun", and the reward calculator recognizes that merely thinking that thought just now incrementally increased the probability that the AGI will kill all humans, and so it issues a negative reward. This is tricky because the reward calculator needs to have an intelligent understanding of the world, and of the AGI's thoughts. So basically the reward calculator is itself an AGI, and now we need to figure out its rewards. I'm personally quite pessimistic about approaches that involve towers-of-AGIs-supervising-other-AGIs, for reasons in section 3.2 here, although other people would disagree with me on that (partly because they are assuming different AGI development paths and architectures than I am).
  3. Same as above, but instead of a separate reward calculator estimating the probability that a thought or plan will lead to the result-in-question, we allow the AGI itself to do that estimation, by flagging a concept in its world-model called "I will kill all humans", and marking it as "very bad and important" somehow. (The inspiration here is a human who somehow winds up with the strong desire "I want to get out of debt". Having assigned value to that abstract concept, the human can assess for themselves the probabilities that different thoughts will increase or decrease the probability of that thing happening, and sorta issue themselves a reward accordingly.) The tricky part is (A) making sure that the AGI does in fact have that concept in its world-model (I think that's a reasonable assumption, at least after some training), (B) finding that concept in the massive complicated opaque world-model, in order to flag it. So this is the symbol-grounding problem I mentioned in the text. I can imagine solving it if we had really good interpretability techniques (techniques that don't currently exist), or maybe there are other methods, but it's an unsolved problem as of now.
Comment by Steven Byrnes (steve2152) on Looking Deeper at Deconfusion · 2021-06-13T23:28:35.325Z · LW · GW

Is there any good AI alignment research that you don't classify as deconfusion? If so, can you give some examples?

Comment by Steven Byrnes (steve2152) on Comment on the lab leak hypothesis · 2021-06-13T17:04:05.615Z · LW · GW

I'm not remotely qualified to comment on this, but fwiw in the Mojiang Mine Theory (which says it was a lab leak, but did not involve GOF), six miners caught the virus from bats (and/or each other), and then the virus spent four months replicating within the body of one of these poor guys as he lay sick in a hospital (and then of course samples were sent to WIV and put in storage).

This would explain (2) because four months in this guy's body (especially lungs) allows tons of opportunity for the virus to evolve and mutate and recombine in order to adapt to the human body, and maybe it also explains (1) either randomly or via recombination between viral and human DNA (if that makes sense?), again during those four months in this poor guy's body.

Comment by Steven Byrnes (steve2152) on Inner Alignment in Salt-Starved Rats · 2021-06-11T20:42:07.648Z · LW · GW

Thanks! This is very interesting!

there is at least one steak neuron in my own hippocampus, and it can be stimulated by hearing the word, and persistent firing of it will cause episodic memories...to rise up

Oh yeah, I definitely agree that this is an important dynamic. I think there are two cases. In the case of episodic memory I think you're kinda searching for one of a discrete (albeit large) set of items, based on some aspect of the item. So this is a pure autoassociative memory mechanism. The other case is when you're forming a brand new thought. I think of it like, your thoughts are made up of a bunch of little puzzle pieces that can snap together, but only in certain ways (e.g. you can't visualize a "falling stationary rock", but you can visualize a "blanket made of banana peels"). I think you can issue top-down mandates that there should be a thought containing a certain small set of pieces, and then your brain will search for a way to build out a complete thought (or plan) that includes those pieces. Like "wanting to fit the book in the bag" looks like running a search for a self-consistent thought that ends with the book sliding smoothly into the bag. There might be some autoassociative memory involved here too, not sure, although I think it mainly winds up vaguely similar to belief-propagation algorithms in Bayesian PGMs.

Anyway, the hunger case could look like invoking the piece-of-a-thought:

Piece-of-a-thought X: "[BLANK] and then I eat yummy food"

…and then the search algorithm looks for ways to flesh that out into a complete plausible thought.

I guess your model is more like "the brainstem reaches up and activates Piece-of-a-thought X" and my model is more like "the brainstem waits patiently for the cortex to activate Piece-of-a-thought X, and as soon as it does, it says YES GOOD THANKS, HERE'S SOME REWARD". And then very early in infancy the cortex learns (by RL) that when its own interoceptive inputs indicate hunger, then it should activate piece-of-a-thought X.

Maybe you'll say: eating is so basic, this RL mechanism seems wrong. Learning takes time, but infants need to eat, right? But then my response would be: eating is basic and necessary from birth, but doesn't need to involve the cortex. There can be a hardwired brainstem circuit that says "if you see a prey animal, chase it and kill it", and another that says "if you smell a certain smell, bite on it", and another that says "when there's food in your mouth, chew it and swallow it", etc. The cortex is for learning more complicated patterns, I think, and by the time it's capable of doing useful things in general, it can also learn this one simple little pattern, i.e. that hunger signals imply reward-for-thinking-about-eating.

insula

FWIW, in the scheme here, one part of insular cortex is an honorary member of the "agranular prefrontal cortex" club—that's based purely on this quote I found in Wise 2017: "Although the traditional anatomical literature often treats the orbitofrontal and insular cortex as distinct entities, a detailed analysis of their architectonics, connections, and topology revealed that the agranular insular areas are integral parts of an “orbital prefrontal network”". So this is a "supervised learning" part (if you believe me), and I agree with you that it may well more specifically involve predictions about "feeling better after consuming something". I also think this is probably the part relevant to your comment "the insula's supervised learning algorithms can be hacked?".

Another part of the insula is what Lisa Feldman Barrett calls "primary interoceptive cortex", i.e. she is suggesting that it learns a vocabulary of patterns that describe incoming interoceptive (body status) signals, analogously to how primary visual cortex learns a vocabulary of patterns that describe incoming visual signals, primary auditory cortex learns a vocabulary of patterns that describe incoming auditory signals, etc.

Those are the two parts of the insula that I know about. There might be other things in the insula too.

caudate

I didn't explicitly mention caudate here but it's half of "dorsal striatum". The other half is putamen—I think they're properly considered as one structure. "Dorsal striatum" is the striatum associated with motor-control cortex and executive-function cortex, more or less. I'm not sure how that breaks down between caudate and putamen. I'm also not sure why caudate was active in that fMRI paper you found.

hippocampus

I think I draw more of a distinction between plans and memories than you, and put hippocampus on the "memory" side. (I'm thinking roughly "hippocampus = navigation (in all mammals) and first-person memories (only in humans)", and "dorsolateral prefrontal cortex is executive function and planning (in humans)".) I'm not sure exactly what the fMRI task was, but maybe it involved invoking memories?

Comment by Steven Byrnes (steve2152) on The Credit Assignment Problem · 2021-06-11T13:53:08.775Z · LW · GW

a system which needs a protected epistemic layer sounds suspiciously like a system that can't tile

I stand as a counterexample: I personally want my epistemic layer to have accurate beliefs—y'know, having read the sequences… :-P

I think of my epistemic system like I think of my pocket calculator: a tool I use to better achieve my goals. The tool doesn't need to share my goals.

The way I think about it is:

  • Early in training, the AGI is too stupid to formulate and execute a plan to hack into its epistemic level.
  • Late in training, we can hopefully get to the place where the AGI's values, like mine, involve a concept of "there is a real world independent of my beliefs", and its preferences involve the state of that world, and therefore "get accurate beliefs" becomes instrumentally useful and endorsed.
  • In between … well … in between, we're navigating treacherous waters …

Second, there's an obstacle to pragmatic/practical considerations entering into epistemics. We need to focus on predicting important things; we need to control the amount of processing power spent; things in that vein. But (on the two-level view) we can't allow instrumental concerns to contaminate epistemics! We risk corruption!

I mean, if the instrumental level has any way whatsoever to influence the epistemic level, it will be able to corrupt it with false beliefs if it's hell-bent on doing so, and if it's sufficiently intelligent and self-aware. But remember we're not protecting against a superintelligent adversary; we're just trying to "navigate the treacherous waters" I mentioned above. So the goal is to allow what instrumental influence we can on the epistemic system, while making it hard and complicated to outright corrupt the epistemic system. I think the things that human brains do for that are:

  1. The instrumental level gets some influence over what to look at, where to go, what to read, who to talk to, etc.
  2. There's a trick (involving acetylcholine) where the instrumental level has some influence over a multiplier on the epistemic level's gradients (a.k.a. learning rate). So epistemic level is always updates towards "more accurate predictions on this frame", but it updates infinitesimally in situations where prediction accuracy is instrumentally useless, and it updates strongly in situations where prediction accuracy is instrumentally important.
  3. There's a different mechanism that creates the same end result as #2: namely, the instrumental level has some influence over what memories get replayed more or less often.
  4. For #2 and #3, the instrumental level has some influence but not complete influence. There are other hardcoded algorithms running in parallel and flagging certain things as important, and the instrumental level has no straightforward way to prevent that from happening. 
Comment by Steven Byrnes (steve2152) on Inner Alignment in Salt-Starved Rats · 2021-06-10T20:53:24.926Z · LW · GW

Thanks!! I largely agree with what you wrote.

I was focusing on the implementation of a particular aspect of that. Specifically, when you're doing what you call "thing modeling", the "things" you wind up with are entries in a complicated learned world-model—e.g. "thing #6564457" is a certain horrifically complicated statistical regularity in multimodal sensory data, something like: "thing #6564457" is a prediction that thing #289347 is present, and thing #89672, and thing #68972, but probably not thing #903672", or whatever.

Meanwhile I agree with you that there is some brainstem / hypothalamus function (outside the learned world-model) that can evaluate how biologically adaptive it would be to eat food with a certain profile of flavors / smells / etc., given the current readings of various sensors detecting nutrient deficiencies etc. (That component actually seems quite straightforward to me.)

And then my particular focus is how exactly the brain builds interface into and out of the world-model, which is a prerequisite for learning that this particular statistical regularity (in the learned world-model) corresponds to a particular vector of sweetness, savoriness, etc. (in the brainstem), which the brainstem can analyze and declare likely to satisfy current physiological needs, and then therefore let's try to eat it (back in the learned world-model). 

If you look closely at what you wrote, I think you'll find a few places where you need to transfer information into and out of the learned world-model. That's fine, but there has to be a way that that works, and that's the part I was especially interested in.

I guess my underlying assumption is that this interfacing isn't a trivial thing—like I don't think you can just casually say "The world-model shall have an item type called 'food' in it" and then there's an obvious way to make that happen. I think the world-model is built from the ground up, as learned patterns, and patterns in the patterns, etc., so you generally need stories for how things are learned. At any rate, that type of algorithm seems plausible to me (from everything I've read and thought about), and I haven't seen any alternative story that makes sense to me so far.

When the vector of various kinds of "hunger levels" changes, a relatively hard coded circuit probably exists that (abstractly (maybe even literally?)) assigns each food a new dot product value in terms of general food goodness, and foods with high values "sort to the top", after which a whole planning engine kicks in, confabulating plans for getting any or all such high value foods and throwing out the implausible plans, until a food with a high value and a plausible plan is left over.

It sounds like you want to start with the hypothalamus & brainstem providing a ranked list of all possible foods, and then the world-model finds one that can be eaten. But I want to go basically in the opposite direction, where the planner (working within the learned world-model) proposes a thought that involves eating ("I could eat that carrot in the fridge"), and the hypothalamus & brainstem evaluate how appealing that plan is ("carrots are sweet, I'm low in sugar right now, 7/10"), and then it sends dopamine to reward the thinking of appealing thoughts (and moreso if they're part of realistic likely-to-succeed plans). Like, if I'm really hungry, I think I'm more likely to decide to eat the first easily-accessible food that pops into my head, rather than working my way through a long list of possible foods that would be hypothetically better ("no I don't have a coconut smoothie, no I don't have fried clams, ..."). Then, through the magic of reinforcement learning, the planner gradually learns to skillfully and quickly come up with appropriate and viable foods.

Comment by Steven Byrnes (steve2152) on Book review: "Feeling Great" by David Burns · 2021-06-10T17:38:25.747Z · LW · GW

I do like the "How To Talk" book and definitely use those techniques on my kids ("Oh, you're very upset, you're sad that we ran out of red peppers..." --me 20 minutes ago) though I haven't successfully started the habit of using it on adults. (Last time I tried I was accused of being condescending, guess I haven't quite gotten it down yet.) "Nonviolent Communication" and other sources hit that theme too.

…But I don't think that's quite it. That would be "positive reframing" without "magic dial". It's not just about acknowledging that the negative thought exists to address certain needs, it's about making sure that those needs continue to be addressed. "Magic dial" is one easy way to do so—if the negative thought addresses a set of needs, then fine, keep thinking the negative thought, and think it often enough to address those needs, and no more often than that. But the other part is, by calling out the needs to awareness, and thinking about how they can be addressed, you might come up with other solutions that don't involve thinking the negative thought.

Comment by Steven Byrnes (steve2152) on Big picture of phasic dopamine · 2021-06-10T14:24:47.129Z · LW · GW

I'm proposing that (1) the hypothalamus has an input slot for "flinch now", (2) VTA has an output signal for "should have flinched", (3) there is a bundle of partially-redundant side-by-side loops (see the "probability distribution" comment) that connect specifically to both (1) and (2), by a genetically-hardcoded mechanism.

I take your comment to be saying: Wouldn't it be hard for the brain to orchestrate such a specific pair of connections across a considerable distance?

Well, I'm very much not an expert on how the brain wires itself up. But I think there's gotta be some way that it can do things like that. I feel like those kinds of feats of wiring are absolutely required for all kinds of reasons. Like, I think motor cortex connects directly to spinal hand-control nerves, but not foot-control nerves. How do the output neurons aim their paths so accurately, such that they don't miss and connect to the foot nerves by mistake? Um, I don't know, but it's clearly possible. "Molecular signaling" or something, I guess?

Alternatively we might imagine some separate mechanism for of priming the developing amygdala to start out with a diverse yet sensible array of behavior proposals, and the brainstem could learn what its outputs correspond to and then signal them appropriately.

Hmm, one reasonable (to me) possibility along these lines would be something like: "VTA has 20 dopamine output signals, and they're guided to wind up spread out across the amygdala, but not with surgical precision. Meanwhile the corresponding amygdala loops terminate in an "input zone" of the lateral hypothalamus, but not to any particular spot, instead they float around unsure of exactly what hypothalamus "entry point" to connect to. And there are 20 of these intended "entry points" (collections of neurons for flinching, scowling, etc.). OK, then during embryonic development, the entry-point neurons are firing randomly, and that signal goes around the loop—within the hypothalamus and to VTA, then up to the amygdala, then back down to that floating neuron. Then Hebbian learning—i.e. matching the random code—helps the right loop neuron find its way to the matching hypothalamus entry point."

I'm not sure if that's exactly what you're proposing, but that seems like a perfectly plausible way for the brain to orchestrate these connections during embryonic development. I do have a hunch that this isn't what happens, that the real mechanism is "molecular signaling" instead. But like I said, I'm not an expert, and I certainly wouldn't be shocked to learn that the brain embryonic wiring mechanism involves this kind of thing where it closes a loop by sending a random code around the loop and Hebbian-learning the final connection.

Comment by Steven Byrnes (steve2152) on The reverse Goodhart problem · 2021-06-10T13:44:28.575Z · LW · GW

Let me try to repair Goodhart's law to avoid these problems:

By statistics, we should very generally expect two random variables to be uncorrelated unless there's a "good reason" to expect them to be correlated. Goodhart's law says that if U and V are correlated in some distribution, then (1) if a powerful optimizer tries to maximize U, then it will by default go far out of the distribution, (2) the mere fact that U and V were correlated in the distribution does not in itself constitute a "good reason" to expect them to be correlated far out of the distribution, so by default they won't be; (3) therefore we expect Goodhart's law "by default": you optimize U, thus go out of the distribution, thus break the correlation between U and V, and then V regresses back down to its mean.

So then we can start going through examples:

  • GDP vs human flourishing: This example fits all the defaults. There is no "good reason" to expect an extremely-out-of-distribution correlation between "GDP" and "human flourishing"—really the only reason to expect a correlation is the fact that they're correlated in-distribution, and by itself that's not enough to count as a "good reason". And we definitely expect that powerfully maximizing GDP would push it far out-of-distribution. Therefore we expect Goodhart's law—if you maximize GDP hard enough, then human flourishing will stop going up and start going down as it regresses to the mean.
  • GDP vs "twice GDP minus human flourishing": Here there is a very good a priori reason to expect an extremely-out-of-distribution correlation between the two sides—namely the fact that "GDP" is part of both. So the default expectation doesn't apply.
  • GDP vs log(GDP): Here there's an even more obvious, a priori reason to expect a robust correlation across all possible configurations of matter in all possible universes. So the default expectation doesn't apply.
  • "Mass of an object" vs "total number of protons and neutrons in the object": The default expectation that "optimization takes you far out of the distribution" doesn't really apply here, because regularities hold in a much broader "distribution" if the regularity comes from basic laws of physics, rather than from regularities concerning human-sized objects and events. So you can have a quite powerful optimization process trying to maximize an object's mass, yet stay well within the distribution of environments where this particular correlation remains robust. (A powerful enough optimizer could eventually make a black hole, which would indeed break this correlation, and then we get Goodhart's law. Other physics-derived correlations would be truly unbreakable though, like inertial mass vs gravitational mass.)
  • "The utility of the worst-off human" vs "The utility of the average human": Is there a "good reason" to expect these to be correlated extremely-out-of-distribution? Yes! Mathematically, if the former goes to infinity, then the latter has to go to infinity too. So we have a sound a priori reason to at least question the Goodhart's law default. We need a more object-level analysis to decide what would happen.
Comment by Steven Byrnes (steve2152) on Big picture of phasic dopamine · 2021-06-10T01:27:21.370Z · LW · GW

Right, so I'm saying that the "supervised learning loops" get highly specific feedback, e.g. "if you get whacked in the head, then you should have flinched a second or two ago", "if a salty taste is in your mouth, then you should have salivated a second or two ago", "if you just started being scared, then you should have been scared a second or two ago", etc. etc. That's the part that I'm saying trains the amygdala and agranular prefrontal cortex.

Then I'm suggesting that the Success-In-Life thing is a 1D reward signal to guide search in a high-dimensional space of possible thoughts to think, just like RL. In this case, it's not "each plan is one loop", because there's a combinatorial explosion of possible thoughts you can think, and there are not enough loops for that. (It also wouldn't work because for pretty much every thought you think, you've never thought that exact thought before—like you've never put on this particular jacket while humming this particular song and musing about this particular upcoming party...) Instead I think compositionality is involved, such that one plan / thought can involve many simultaneous loops.

Comment by Steven Byrnes (steve2152) on The reverse Goodhart problem · 2021-06-09T23:10:02.743Z · LW · GW

Sorry, why are V and V' equally hard to define? Like if V is "human flourishing" and U is GDP then V' is "twice GDP minus human flourishing" which is more complicated than V. I guess you're gonna say "Why not say that V is twice GDP minus human flourishing?"? But my point is: for any particular set U,V, V', you can't claim that V and V' are equally simple, and you can't claim that V and V' are equally correlated with U. Right?

Comment by Steven Byrnes (steve2152) on Big picture of phasic dopamine · 2021-06-09T15:07:06.535Z · LW · GW

That's interesting, thanks!

good/bad/neutral is a thing, but it seems to be defined largely with respect to our expectation of what was going to happen in the situation we were in.

I agree that this is a very important dynamic. But I also feel like, if someone says to me, "I keep a kitten in my basement and torture him every second of every day, but it's no big deal, he must have gotten used to it by now", I mean, I don't think that reasoning is correct, even if I can't quite prove it or put my finger on what's wrong. I guess that's what I was trying to get at with that "evolutionary prior" comment: maybe there's a hardcoded absolute threshold such that you just can't "get used to" being tortured, and set that as your new baseline, and stop actively disliking it? But I don't know, I need to think about it more, there's also a book I want to read on the neuroscience of pleasure and pain, and I've also been meaning to look up what endorphins do to the brain. (And I'm happy to keep chatting here!)

I don't have a full explanation of comparing-to-baseline. At first I was gonna say "it's just the reward-prediction-error thing I described: if you expect candy based on your beliefs at 5:05:38, and then you no longer expect candy based on your beliefs at 5:05:39, then that's a big negative reward prediction error. (Because the reward-predictor makes its prediction based on slightly-stale brain status information.) But that doesn't explain why maybe we still feel raw about it 3 minutes later. Maybe it's like, you had this active piece-of-a-thought "I'm gonna get candy", but it's contradicted by the other piece-of-a-thought "no I'm not", but that appealing piece-of-a-thought "I'm gonna get candy" keeps popping back up for a while, and then keeps getting crushed by reality, and the net result is a bad feeling. Or something? I dunno.

Oh, I think there's also a thing where the brainstem can force the high-level planner to think about a certain thing; like if you get poked on the shoulder it's kinda impossible to ignore. I think I have an idea of what mechanism is involved here … involving acetylcholine and how specific and confident the top-down predictions are, I'm hoping to write this up soon … That might be relevant too. Like if you're being tortured then you can't think about anything else, because of this mechanism. Then that would be like an objective sense in which you can't get used to a baseline of torture the way you can get used to other things.

Comment by Steven Byrnes (steve2152) on Big picture of phasic dopamine · 2021-06-09T14:42:04.695Z · LW · GW

Thanks!

If you Ctrl-F the post you'll find my little paragraph on how my take differs from Marblestone, Wayne, Kording 2016.

I haven't found "meta-RL" to be a helpful way to frame either the bandit thing or the follow-up paper relating it to the brain, more-or-less for reasons here, i.e. that the normal RL / POMDP expectation is that actions have to depend on previous observations—like think of playing an Atari game—and I guess we can call that "learning", but then we have to say that a large fraction of every RL paper ever is actually a meta-RL paper, and more importantly I just don't find that thinking in those terms leads me to a better understanding of anything, but whatever, YMMV.

I don't agree with everything in the RL book chapter but it's still interesting, thanks for the link.

Comment by Steven Byrnes (steve2152) on Big picture of phasic dopamine · 2021-06-09T14:15:59.719Z · LW · GW

The least-complicated case (I think) is: I (tentatively) think that the hippocampus is more-or-less a lookup table with a finite number of discrete thoughts / memories / locations / whatever (the type of content in different in different species), and a "proposal" is just "which of the discrete things should be activated right now". 

A medium-difficulty case is: I think motor cortex stores a bunch of sequences of motor commands which execute different common action sequences. (I'm a believer in the Graziano theory that primary motor cortex, secondary motor cortex, supplementary motor cortex, etc. etc., are all doing the same kind of thing and should be lumped together.) The exact details of the data structures that the brain uses to store these sequences of motor commands are controversial and I don't want to get into it here…

Then the hardest case is the areas that "think thoughts", spawn new ideas, etc., all the cool stuff that leads to human intelligence. (e.g. dorsolateral prefrontal cortex I think.) Things like "I'm going to go to the store" or "what if I differentiate both sides of the equation?". Those things are clearly not isomorphic to a sequence of motor commands. It's higher-level than that. Again, the exact data structures and algorithms involved in representing and searching for these "thoughts" is a very big and controversial topic that I don't want to get into here…

Comment by Steven Byrnes (steve2152) on Against intelligence · 2021-06-08T19:16:20.753Z · LW · GW

I would agree with "superintelligence is not literally omnipotence" but I think I think you're making overly strong claims in the opposite direction. My reasons are basically contained in Intelligence Explosion Microeconomics, That Alien Message, and Scott Alexander's Superintelligence FAQ. For example...

power seems to be very unrelated to intelligence

I think "very" is much too strong, and insofar as this is true in the human world, that wouldn't necessarily make it true for an out-of-distribution superintelligence, and I think it very much wouldn't be. For example, all you need is superintelligence and an internet connection to find a bunch of zero-day exploits, hack into whatever you like, use it for your own purposes (and/or make tons of money), etc. All you need is superintelligence and an internet connection to carry on millions of personalized charismatic phone conversations simultaneously with people all around the world, in order to convince them, con them, or whatever. All you need is superintelligence and an internet connection to do literally every remote-work job on earth simultaneously.

Also, there are already robot bodies capable of doing I think the vast majority of physical labor jobs. The only reason they're not doing those jobs today is inadequate algorithms.

The apparatus humans use to “understand” other human is not just a complex probabilistic function based on observing them, but rather it’s an immensely complex simulation which we adjust based on our observations, a simulation that we might never be able to efficiently run on a computer.

I think being charismatic over the internet is easier than you're suggesting ... if people would open up to ELIZA, I think they would open up to an AGI that has studied ELIZA and also has had extensive practice talking to people. Also, I don't think that the algorithms underlying human empathy are as computationally intensive as you do, but that's a more complicated story, maybe not worth getting into here.

That aside, the question remains of whether or not solving all “thinking bottlenecks” would leave us with a process of scientific advancement that is somewhat faster than what we have today (slow road to progress) or exponentially faster (singularity).

I think you're overly focused on "scientific advancement". Existing scientific and technological knowledge plus AGI could bring the entire world up to first-world standards of living, and eliminate the need for any human to ever work again. That's nothing to scoff at!

The vast majority of "good thinkers" (under an IQ/math/language/memory == intelligence paradigm) are funnelled towards intern[et] companies, no extra requirements, not even a diploma, if you have enough "raw intelligence". Under the EMH that would indicate those companies have the most need for them. Yet internet companies are essentially devoid of any practical implications when it comes to reality, they aren't always engaged in "zero-sum" games, but they are still "competitive", in that their ultimate reason is to convince people they want/need more things and that those things are more valuable, they aren't "creating" any tangible things. On the other hand, research universities and companies interested in exploring the real world seem to care much less about intelligence...

I'm not sure you're applying EMH properly. EMH would imply that the most intelligent people (if choosing jobs purely based on pay) would go to jobs where they have the highest marginal impact on firm revenue, compared to a marginally less intelligent person. If research universities don't offer salaries as high as Facebook, that doesn't mean that research universities don't "care" about getting intelligent people, it probably means, for example, that the exchange rate between marginal professor intelligence and marginal grant revenue isn't high enough to support Facebook-level salaries, and moreover universities are in part of a job market where lots of smart people will definitely apply for a professor job even if the salary is much lower than Facebook's. The fact that academia has rampant credentialism and Facebook doesn't is, umm, not related to the EMH, I would suggest. I think it's more related to Eliezer's "Inadequate Equilibria" stuff.

Comment by Steven Byrnes (steve2152) on Dangerous optimisation includes variance minimisation · 2021-06-08T12:19:57.212Z · LW · GW

I agree! I'm 95% sure this is in Superintelligence somewhere, but nice to have a more-easily-linkable version.

Comment by Steven Byrnes (steve2152) on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T11:57:13.317Z · LW · GW

If you think of it less like "possibly having a lot of money post-AGI" and more like "possibly owning a share of whatever the AGIs produce post-AGI", then I can imagine scenarios where that's very good and important. It wouldn't matter in the worst scenarios or best scenarios, but it might matter in some in-between scenarios, I guess. Hard to say though ...

Comment by Steven Byrnes (steve2152) on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T11:49:20.796Z · LW · GW

I think Vicarious AI is doing more AGI-relevant work than anyone. I pore over all their papers. They're private so this doesn't directly answer your question. But what bugs me is: Their investors include Good Ventures & Elon Musk ... So how do they get away with (AFAICT) doing no safety work whatsoever ...?

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-06-04T14:48:13.079Z · LW · GW

it's all a big mess

Yup! This was a state-the-problem-not-solve-it post. (The companion solving-the-problem post is this brain dump, I guess.) In particular, just like prosaic AGI alignment, my starting point is not "Building this kind of AGI is a great idea", but rather "This is a way to build AGI that could really actually work capabilities-wise (especially insofar as I'm correct that the human brain works along these lines), and that people are actively working on (in both ML and neuroscience), and we should assume there's some chance they'll succeed whether we like it or not."

FWIW, I'm now thinking of your "value function" as expected utility in Jeffrey-Bolker terms.

Thanks, that's helpful.

how do we define whether a value function is "aligned" (in an inner sense, so, when compared to an outer objective which is being used for training it)?

One way I think I would frame the problem differently than you here is: I'm happy to talk about outer and inner alignment for pedagogical purposes, but I think it's overly constraining as a framework for solving the problem. For example, (Paul-style) corrigibility is I think an attempt to cut through outer and inner alignment simultaneously, as is interpretability perhaps. And like you say, rewards don't need to be the only type of feedback.

We can also set up the AGI to NOOP when the expected value of some action is <0, rather than having it always take the least bad action. (...And then don't use it in time-sensitive situations! But that's fine for working with humans to build better-aligned AGIs.) So then the goal would be something like "every catastrophic action has expected value <0 as assessed by the AGI (and also, the AGI will not be motivated to self-modify or create successors, at least not in a way that undermines that property) (and also, the AGI is sufficiently capable that it can do alignment research etc., as opposed to it sitting around NOOPing all day)".

So then this could look like a pretty weirdly misaligned AGI but it has a really effective "may-lead-to-catastrophe (directly or indirectly) predictor circuit" attached. (The circuit asks "Does it pattern-match to murder? Does it pattern-match to deception? Does it pattern-match to 'things that might upset lots of people'? Does it pattern-match to 'things that respectable people don't normally do'?...") And the circuit magically never has any false-negatives. Anyway, in that case the framework of "how well are we approximating the intended value function?" isn't quite the right framing, I think.

I think we need stuff from my 'learning normativity' agenda to dodge these bullets.

Yeah I'm very sympathetic to the spirit of that. I'm a bit stumped on how those ideas could be implemented, but it's certainly in the space of things that I continue to brainstorm about...

Comment by Steven Byrnes (steve2152) on An Intuitive Guide to Garrabrant Induction · 2021-06-04T13:46:55.478Z · LW · GW

Sorry if this is a stupid question but wouldn't "LI with no complexity bound on the traders" be trivial? Like, there's a noncomputable trader (brute force proof search + halting oracle) that can just look at any statement and immediately declare whether it's provably false, provably true, or neither. So wouldn't the prices collapse to their asymptotic value after a single step and then nothing else ever happens?

Comment by Steven Byrnes (steve2152) on The Alignment Forum should have more transparent membership standards · 2021-06-04T13:08:59.530Z · LW · GW

The integration with LessWrong means that anyone can still comment

Speaking of this, if I go to AF without being logged in, there's a box at the bottom that says "New comment. Write here. Select text for formatting options... SUBMIT" But non-members can't write comments right? Seems kinda misleading... Well I guess I just don't know: What happens if a non-member (either LW-but-not-AF member or neither-AF-nor-LW member) writes a comment in the box and presses submit? (I guess I could do the experiment myself but I don't want to create a test comment that someone then has to then go delete.)

To work around this (apparent?) issue, I was planning to start ending some of my AF posts (those where I expect significant non-AF readership) with a line like:

[Email me] (mailto link) or [comment on the lesswrong crosspost] (LW link)

I think this will work, but only if I obfuscate the LW link using bitly or whatever, because by default LW links are auto-converted to AF links when viewed on AF, right? I haven't tried it yet.

(Sorry to shoehorn my general AF meta questions into this thread.)

Comment by Steven Byrnes (steve2152) on The Homunculus Problem · 2021-06-03T14:42:40.526Z · LW · GW

bottom-up attention (ie attention due to interesting stimulus) can be more or less captured by surprise

Hmm. That's not something I would have said.

I guess I think of two ways that sensory inputs can impact top-level processing.

First, I think sensory inputs impact top-level processing when top-level processing tries to make a prediction that is (directly or indirectly) falsified by the sensory input, and that prediction gets rejected, and top-level processing is forced to think a different thought instead.

  • If top-level processing is "paying close attention to some aspect X of sensory input", then that involves "making very specific predictions about aspect X of sensory input", and therefore the predictions are going to keep getting falsified unless they're almost exactly tracking the moment-to-moment status of X.
  • On the opposite extreme, if top-level processing is "totally zoning out", then that involves "not making any predictions whatsoever about sensory input", and therefore no matter what the sensory input is, top-level processing can carry on doing what it's doing.
  • In between those two extremes, we get the situation where top-level processing is making a pretty generic high-level prediction about sensory input, like "there's confetti on the stage". If the confetti suddenly disappeared altogether, it would falsify the top-level hypothesis, triggering a search for a new model, and being "noticed". But if the detailed configuration of the confetti changes—and it certainly will—it's still compatible with the top-level prediction "there's confetti on the stage" being true, and so top-level processing can carry on doing what it's doing without interruption.

So just to be explicit, I think you can have a lot of low-level surprise without it impacting top-level processing. In the confetti example, down in low-level V1, the cortical columns are constantly being surprised by the detailed way that each piece of confetti jiggles around as it falls, I think, but we don't notice if we're not paying top-down attention.

The second way that I think sensory inputs can impact top-level processing is by a very different route, something like sensory input -> amygdala -> hypothalamus -> top-level processing. (I'm not sure of all the details and I'm leaving some things out; more HERE.) I think this route is kinda an autonomous subsystem, in the sense that top-down processing can't just tell it what to do, and it's not trained on the same reward signal as top-level processing is, and the information can flow in a way that totally bypasses top-level processing. The amygdala is trained (by supervised learning) to activate when detecting things that have immediately preceded feelings of excitement / scared / etc. previously in life, and the hypothalamus is running some hardcoded innate algorithm, I think. (Again, more HERE.) When this route activates, there's a chain of events that results in the forcing of top-level processing to start paying attention to the corresponding sensory input (i.e. start issuing very specific predictions about the corresponding sensory input).

I guess it's possible that there are other mechanisms besides these two, but I can't immediately think of anything that these two mechanisms (or something like them) can't explain.

What if we don't like global workspace theory?

I dunno, I for one like global workspace theory. I called it "top-level processing" in this comment to be inclusive to other possibilities :)

Comment by Steven Byrnes (steve2152) on The Homunculus Problem · 2021-06-03T13:39:12.257Z · LW · GW

How would you query low-level details from a high-level node? Don't the hierarchically high-up nodes represent things which range over longer distances in space/time, eliding low-level details like lines?

My explanation would be: it's not a strict hierarchy, there are plenty of connections from the top to the bottom (or at least near-bottom). "Feedforward and feedback projections between regions typically connect to multiple levels of the hierarchy" "It has been estimated that 40% of all possible region-to-region connections actually exist which is much larger than a pure hierarchy would suggest." (ref) (I've heard it elsewhere too.) Also, we need to do compression (throw out information) to get from raw input to top-level, but I think a lot of that compression is accomplished by only attending to one "object" at a time, rapidly flitting from one to another. I'm not sure how far that gets you, but at least it's part of the story I think, in that it reduces the need to throw out low-level details. Another thing is saccades: maybe you can't make high-level predictions about literally every cortical column in V1, but if you can access a subset of columns, then saccades can fill in the gaps.

Why is a query represented as an overconfident false belief?

I have pretty high confidence that "visual imagination" is accessing the same world-model database and machinery as "parsing a visual scene" (and likewise "imagining a sound" vs "parsing a sound", etc.) I find it hard to imagine any alternative to that. Like it doesn't seem plausible that we have two copies of this giant data structure and machinery and somehow keep them synchronized. And introspectively, it does seem to be true that there's some competition where it's hard to simultaneously imagine a sound while processing incoming sounds etc.—I mean, it's always hard to do two things at once, but this seems especially hard.

So then the question is: how can you imagine seeing something that isn't there, without the imagination being overruled by bottom-up sensory input? I guess there has to be some kind of mechanism that allows this, like a mechanism by which top-level processing can choose to prevent (a subset of) sensory input from having its usual strong influence on (a subset of) the network. I don't know what that mechanism is.

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-06-02T03:02:27.134Z · LW · GW

Hi again, I finally got around to reading those links, thanks!

I think what you're saying (and you can correct me) is: observation-utility agents are safer (or at least less dangerous) than reward-maximizers-learning-the-reward, because the former avoids falling prey to what you called "the easy problem of wireheading".

So then the context was:

First you said, If we do rollouts to decide what to do, then the value function is pointless, assuming we have access to the reward function.

Then I replied, We don't have access to the reward function, because we can't perfectly predict what will happen in a complicated world.

Then you said, That's bad because that means we're not in the observation-utility paradigm.

But I don't think that's right, or at least not in the way I was thinking of it. We're using the current value function to decide which rollouts are good vs bad, and therefore to decide which action to take. So my "value function" is kinda playing the role of a utility function (albeit messier), and my "reward function" is kinda playing the role of "an external entity that swoops in from time to time and edits the utility function". Like, if the agent is doing terrible things, then some credit-assignment subroutine goes into the value function, looks at what is currently motivating the agent, and sets that thing to not be motivating in the future.

The closest utility function analogy would be: you're trying to make an agent with a complicated opaque utility function (because it's a complicated world). You can't write the utility function down. So instead you code up an automated utility-function-editing subroutine. The way the subroutine works is that sometimes the agent does something which we recognize as bad / good, and then the subroutine edits the utility function to assign lower / higher utility to "things like that" in the future. After many such edits, maybe we'll get the right utility function, except not really because of all the problems discussed in this post, e.g. the incentive to subvert the utility-function-editing subroutine.

So it's still in the observation-utility paradigm I think, or at least it seems to me that it doesn't have an automatic incentive to wirehead. It could want to wirehead, if the value function winds up seeing wireheading as desirable for any reason, but it doesn't have to. In the human example, some people are hedonists, but others aren't.

Sorry if I'm misunderstanding what you were saying.

Comment by Steven Byrnes (steve2152) on Power dynamics as a blind spot or blurry spot in our collective world-modeling, especially around AI · 2021-06-01T19:46:07.269Z · LW · GW

FWIW my "one guy's opinion" is (1) I'm expecting people to build goal-seeking AGIs, and I think by default their goals will be opaque and unstable and full of unpredictable distortions compared to whatever was intended, and solving this problem is necessary for a good future (details), (2) Figuring out how AGIs will be deployed and what they'll be used for in a complicated competitive human world is also a problem that needs to be solved to get a good future. I don't think either of these problems is close to being solved, or that they're likely to be solved "by default".

(I'm familiar with your argument that companies are incentivized to solve single-single alignment, and therefore it will be solved "by default", but I remain pessimistic, at least in the development scenario I'm thinking about, again see here.)

So I think (1) and (2) are both very important things that people should be working on right now. However, I think I might have some intelligent things to say about (1), whereas I have nothing intelligent to say about (2). So that's the main reason I'm working on (1). :-P I do wish you & others luck—and I've said that before, see e.g. section 10 here.  :-)

Comment by Steven Byrnes (steve2152) on Which animals can suffer? · 2021-06-01T12:37:02.638Z · LW · GW

You left out the category of possible answers "Such-and-such type of computational process corresponds to suffering", and then octopuses and ML algorithms might or might not qualify depending on how exactly the octopus brain works, and what the exact ML algorithm is, and what exactly is that "such-and-such" criterion I mentioned. I definitely put far more weight on this category of answers than the two you suggested.

Comment by Steven Byrnes (steve2152) on Electric heat pumps (Mini-Splits) vs Natural gas boilers · 2021-06-01T01:59:02.036Z · LW · GW

Hmm, well I was trying to ballpark "weighted average outdoor temperature", specifically weighted by how much heat I'm using. Like, if outdoor temperature is only slightly cooler than what I want inside, I need relatively little heat regardless, so the efficiency of that heat isn't all that important. My reference temperature of 30°F (~0°C) is very far from the lowest temperature we experience, it's close to a 24-hour-average temperature during the coldest three months.

I didn't know about HSPF, thanks for the tip! It seems to assume "climate region IV" (based on here for example), which I guess corresponds to this map which suggests that where I live (Massachusetts) is somewhat colder than climate region IV. Wiki says an electric heater is 3.41 and this says that 11.8 is about the highest HSPF out there (?), so if I divide them I get a weighted-average COP of 3.5, i.e. my initial ballpark guess was right on. But since I'm colder than "climate region IV" it would be even lower than 3.5. (To be clear, there are a bunch of things in this paragraph that I'm not sure about.)

Thanks for bringing up zoning, but we already have individually-settable radiators and do in fact keep unused rooms cold, so I don't think that's relevant to me personally. I'll add a bullet point for the benefit of other readers.

I just haven't gotten around to thinking about solar. One of these days...

Comment by Steven Byrnes (steve2152) on Electric heat pumps (Mini-Splits) vs Natural gas boilers · 2021-05-31T00:45:45.851Z · LW · GW

Thanks for your comment!

That's an interesting electricity price chart. It seems like I'm paying typical rates for my state, and I don't know why it's high compared to other parts of my country. I wouldn't say that's "a flaw in my calculations", since I'm calculating it for myself and I'm not planning to move, but it definitely sheds light on why mini-splits are more attractive for people in other places.

The "COP" in my chart is specifically "COP for heating the interior of a building when it's 30°F (~0°C) outside". I don't think it's true that COP under those conditions is equal to EER/3.4, because EER is not measured under those conditions. EER seems to be measured assuming a much smaller temperature difference between outside and inside. Heat pump COPs get worse and worse as the outdoor-indoor temperature difference gets larger.

Comment by Steven Byrnes (steve2152) on The Homunculus Problem · 2021-05-28T03:06:30.289Z · LW · GW

Thanks for the thought-provoking post! Let me try...

We have a visual system, and it (like everything in the neocortex) comes with an interface for "querying" it. Like, Dileep George gives the example "I'm hammering a nail into a wall. Is the nail horizontal or vertical?" You answer that question by constructing a visual model and then querying it. Or more simply, if I ask you a question about what you're looking at, you attend to something in the visual field and give an answer.

Dileep writes: "An advantage of generative PGMs is that we can train the model once on all the data and then, at test time, decide which variables should act as evidence and which variables should act as targets, obtaining valid answers without retraining the model. Furthermore, we can also decide at test time that some variables fall in neither of the previous two categories (unobserved variables), and the model will use the rules of probability to marginalize them out." (ref) (I'm quoting that verbatim because I'm not an expert on this stuff and I'm worried I'll say something wrong. :-P )

Anyway, I would say that the word "I" is generally referring to the goings-on in the global workspace circuits in the brain, which we can think of as hierarchically above the visual system. The workspace can query the visual system, basically by sending a suite of top-down constraints into the visual system PGM ("there's definitely a vertical line here!" or whatever), allowing the visual system to do its probabilistic inference, and then branching based on the status of some other visual system variable(s).

So in everyday terms we say "When someone asks me a question that's most easily answered by visualizing something or thinking visually or attending to what I'm looking at, that's what I do!" Whereas in fancypants terms we would describe the same thing as "In certain situations, the global workspace curcuits have learned (probably via RL) that it is advantageous to query the visual system in certain ways."

So over the course of our lives, we (=global workspace circuits) learn the operation "figure out whether one thing is darker than another thing" as a specific way to query the visual system. And the checker shadow illusion has the fun property that when we query the visual system this way, it gives the wrong answer. We can still "know" the right answer by inferring it through a path that does not involve querying the visual system. Maybe it goes through abstract knowledge instead. And I guess your #3 ("I can occasionally and briefly get my brain to recognize A and B as the same shade") probably looks like a really convoluted visual system query that involves forcing a bunch of PGM variables in unusual coordinated ways that prevent the shade-corrector from activating, or something like that.

What about fixing the mistake? Well, I think the global workspace has basically no control over how the visual system PGM is wired up internally, not only because the visual system has its own learning algorithm that involves minimizing prediction error, not maximizing reward, but also for the simpler reason that it gets locked down and stops learning at a pretty young age, I think. The global workspace can learn new queries, but it might be that there just isn't any way to query the wired-up adult visual system to return the information you want (raw shade comparison). Or maybe with more practice you can get better at your convoluted #3 query...

Not sure about all this...

Comment by Steven Byrnes (steve2152) on Building brain-inspired AGI is infinitely easier than understanding the brain · 2021-05-23T18:57:16.069Z · LW · GW

Thanks! I guess my feeling is that we have a lot of good implementation-level ideas (and keep getting more), and we have a bunch of algorithm ideas, and psychology ideas and introspection and evolution and so on, and we keep piecing all these things together, across all the different levels, into coherent stories, and that's the approach I think will (if continued) lead to AGI.

Like, I am in fact very interested in "methods for fast and approximate Bayesian inference" as being relevant for neuroscience and AGI, but I wasn't really interested in it until I learned bunch of supporting ideas about what part of the brain is doing that, and how it works on the neuron level, and how and when and why that particular capability evolved in that part of the brain. Maybe that's just me.

I haven't seen compelling (to me) examples of people going successfully from psychology to algorithms without stopping to consider anything whatsoever about how the brain is constructed . Hmm, maybe very early Steve Grossberg stuff? But he talks about the brain constantly now.

One reason it's tricky to make sense of psychology data on its own, I think, is the interplay between (1) learning algorithms, (2) learned content (a.k.a. "trained models"), (3) innate hardwired behaviors (mainly in the brainstem & hypothalamus). What you especially want for AGI is to learn about #1, but experiments on adults are dominated by #2, and experiments on infants are dominated by #3, I think.

Comment by Steven Byrnes (steve2152) on SGD's Bias · 2021-05-19T19:22:02.267Z · LW · GW

That makes sense. Now it's coming back to me: you zoom your microscope into one tiny nm^3 cube of air. In a right-to-left temperature gradient you'll see systematically faster air molecules moving rightward and slower molecules moving leftward, because they're carrying the temperature from their last collision. Whereas in uniform temperature, there's "detailed balance" (just as many molecules going along a path vs going along the time-reversed version of that same path, and with the same speed distribution).

Thinking about the diode-resistor thing more, I suspect it would be a waste-of-time nerd-sniping rabbit hole, especially because of the time-domain aspects (the fluctuations are faster on one side vs the other I think) which don't have any analogue in SGD. Sorry to have brought it up.

Comment by Steven Byrnes (steve2152) on SGD's Bias · 2021-05-19T12:19:41.816Z · LW · GW

I think the "drift from high-noise to low-noise" thing is more subtle than you're making it out to be... Or at least, I remain to be convinced. Like, has anyone else made this claim, or is there experimental evidence? 

In the particle diffusion case, you point out correctly that if there's a gradient in D caused by a temperature gradient, it causes a concentration gradient. But I believe that if there's a gradient in D caused by something other than a temperature gradient, then it doesn't cause a concentration gradient. Like, take a room with a big pile of rocks in it. Intuitively you can say "when air molecules get deep into the pore spaces of the rock pile, they're stuck and can't easily get out, therefore I expect higher air concentration in the pore spaces of the rock pile than the open part of the room", but that would be wrong, the concentration is the same (as required by thermodynamics when the temperature is constant).

Another example, which is fun albeit more abstract: The classical diode-resistor generator circuit (see Gunn, Sokolov - email me if you want the PDFs). If you connect a resistor at nonzero temperature to a diode at absolute zero, the diode rectifies the resistor's Johnson noise and you get a net current in the diode's forward direction. Conversely, if you connect a resistor at absolute zero to a diode at nonzero temperature, you get a net current in the diode's reverse direction! And (obviously), if the temperatures of the diode and resistor are the same, there's no net current. The dynamics look like a 1D drift-diffusion thing, where the voltage is bouncing around by Johnson noise, while getting pulled back towards zero. But, by the Johnson noise equation, the voltage bounces around a lot more when it has one sign than the opposite sign (assuming the diode IV curve has an infinitely sharp bend at V=0), because the total circuit resistance is different. And yet, the voltage does not systematically go to the less-bouncy side, instead it averages to zero. (Note: the frequency spectrum of the fluctuations changes too depending on the sign of the voltage.) I actually did a time-domain simulation the diode-resistor generator once, a couple jobs ago (I was writing a paper kinda related to it), but was never 100% satisfied that I understood why temperature-related ∇D's can do things that constant-temperature ∇D's can't.

Very speculatively, I wonder if it matters whether you evaluate the gradient at the source of a random step, vs at the destination??

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-16T01:35:39.595Z · LW · GW

Wait, you think your prosaic story doesn't involve blind search over a super-broad space of models??

No, not prosaic, that particular comment was referring to the "brain-like AGI" story in my head...

Like, I tend to emphasize the overlap between my brain-like AGI story and prosaic AI. There is plenty of overlap. Like they both involve "neural nets", and (something like) gradient descent, and RL, etc.

By contrast, I haven't written quite as much about the ways that my (current) brain-like AGI story is non-prosaic. And a big one is that I'm thinking that there would be a hardcoded (by humans) inference algorithm that looks like (some more complicated cousin of) PGM belief propagation.

In that case, yes there's a search over a model space, because we need to find the (more complicated cousin of a) PGM world-model. But I don't think that model space affords the same opportunities for mischief that you would get in, say, a 100-layer DNN. Not having thought about it too hard... :-P

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-15T15:32:19.962Z · LW · GW

That's fair. Other possible approaches are "try to ensure that imagining dangerous adversarial intelligences is aversive to the AGI-in-training ASAP, such that this motivation is installed before the AGI is able to do so", or "intepretability that looks for the AGI imagining dangerous adversarial intelligences".

I guess the fact that people don't tend to get hijacked by imagined adversaries gives me some hope that the first one is feasible - like, that maybe there's a big window where one is smart enough to understand that imagining adversarial intelligences can be bad, but not smart enough to do so with such fidelity that it actuality is dangerous.

But hard to say what's gonna work, if anything, at least at my current stage of general ignorance about the overall training process.

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-14T15:09:11.789Z · LW · GW

Hm, I want to classify "defense against adversaries" as a separate category from both "inner alignment" and "outer alignment".

The obvious example is: if an adversarial AGI hacks into my AGI and changes its goals, that's not any kind of alignment problem, it's a defense-against-adversaries problem.

Then I would take that notion and extend it by saying "yes interacting with an adversary presents an attack surface, but also merely imagining an adversary presents an attack surface too". Well, at least in weird hypotheticals. I'm not convinced that this would really be a problem in practice, but I dunno, I haven't thought about it much.

Anyway, I would propose that the procedure for defense against adversaries in general is: (1) shelter an AGI from adversaries early in training, until it's reasonably intelligent and aligned, and then (2) trust the AGI to defend itself. I'm not sure we can do any better than that.

In particular, I imagine an intelligent and self-aware AGI that's aligned in trying to help me would deliberately avoid imagining an adversarial superintelligence that can acausally hijack its goals!

That still leaves the issue of early training, when the AGI is not yet motivated to not imagine adversaries, or not yet able. So I would say: if it does imagine the adversary, and then its goals do get hijacked, then at that point I would say "OK yes now it's misaligned". (Just like if a real adversary is exploiting a normal security hole—I would say the AGI is aligned before the adversary exploits that hole, and misaligned after.) Then what? Well, presumably, we will need to have procedure that verifies alignment before we release the AGI from its training box. And that procedure would presumably be indifferent to how the AGI came to be misaligned. So I don't think that's really a special problem we need to think about.

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-13T17:43:10.286Z · LW · GW

My hunch is that we don't disagree about anything. I think you keep trying to convince me of something that I already agree with, and meanwhile I keep trying to make a point which is so trivially obvious that you're misinterpreting me as saying something more interesting than I am.

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-13T14:46:28.955Z · LW · GW

Like, if we do gradient descent, and the training signal is "get a high score in PacMan", then "mesa-optimize for a high score in PacMan" is incentivized by the training signal, and "mesa-optimize for making paperclips, and therefore try to get a high score in PacMan as an instrumental strategy towards the eventual end of making paperclips" is also incentivized by the training signal.

For example, if at some point in training, the model is OK-but-not-great at figuring out how to execute a deceptive strategy, gradient descent will make it better and better at figuring out how to execute a deceptive strategy.

Here's a nice example. Let's say we do RL, and our model is initialized with random weights. The training signal is "get a high score in PacMan". We start training, and after a while, we look at the partially-trained model with interpretability tools, and we see that it's fabulously effective at calculating digits of π—it calculates them by the billions—and it's doing nothing else, it has no knowledge whatsoever of PacMan, it has no self-awareness about the training situation that it's in, it has no proclivities to gradient-hack or deceive, and it never did anything like that anytime during training. It literally just calculates digits of π. I would sure be awfully surprised to see that! Wouldn't you? If so, then you agree with me that "reasoning about training incentives" is a valid type of reasoning about what to expect from trained ML models. I don't think it's a controversial opinion...

Again, I did not (and don't) claim that this type of reasoning should lead people to believe that mesa-optimizers won't happen, because there do tend to be training incentives for mesa-optimization.

Comment by Steven Byrnes (steve2152) on Formal Inner Alignment, Prospectus · 2021-05-13T03:09:40.189Z · LW · GW

I guess at the end of the day I imagine avoiding this particular problem by building AGIs without using "blind search over a super-broad, probably-even-Turing-complete, space of models" as one of its ingredients. I guess I'm just unusual in thinking that this is a feasible, and even probable, way that people will build AGIs... (Of course I just wind up with a different set of unsolved AGI safety problems instead...)

The Evolutionary Story

By and large, we expect trained models to do (1) things that are directly incentivized by the training signal (intentionally or not), and (2) things that are indirectly incentivized by the training signal (they're instrumentally useful, or they're a side-effect, or they “come along for the ride” for some other reason), (3) things that are so simple to do that they can happen randomly.

So I guess I can imagine a strategy of saying "mesa-optimization won't happen" in some circumstance because we've somehow ruled out all three of those categories.

This kind of argument does seem like a not-especially-promising path for safety research, in practice. For one thing, it seems hard. Like, we may be wrong about what’s instrumentally useful, or we may overlook part of the space of possible strategies, etc. For another thing, mesa-optimization is at least somewhat incentivized by seemingly almost any training procedure, I would think.

...Hmm, in our recent conversation, I might have said that mesa-optimization is not incentivized in predictive (self-supervised) learning. I forget. But if so, I was confused. I have long believed that mesa-optimization is useful for prediction and still do. Specifically, the directly-incentivized kind of "mesa-optimization in predictive learning" entails, for example, searching over different possible approaches to process the data and generate a prediction, and then taking the most promising approach.

Anyway, what I should have said was that, in certain types of predictive learning, mesa-optimizers that search over active, real-world-manipulating plans are not incentivized—and then that's part of an argument that such mesa-optimizers are improbable. If that argument is correct, then the worst we would expect from a "misaligned mesa-optimizer" is that it will use an inappropriate prediction heuristic in some circumstances, and then we'd wind up with inaccurate predictions. That's a capability problem, not a safety problem.

So anyway, if there's a good argument along those lines, it would not be a safety argument that involves "There will be no mesa-optimizers", but rather "There will be no mesa-optimizers that think outside the box", so to speak. Details and (sketchy) argument in a forthcoming post.

Comment by Steven Byrnes (steve2152) on Mostly questions about Dumb AI Kernels · 2021-05-13T01:58:49.103Z · LW · GW

possibly related literature if you haven't seen it: Comprehensive AI Services

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-05-12T20:12:12.022Z · LW · GW

RE online learning, I acknowledge that a lot of reasonable people agree with you on that, and it's hard to know for sure. But I argued my position in Against evolution as an analogy for how humans will build AGI.

Also there: a comment thread about why I'm skeptical that GPT-N would be capable of doing the things we want AGI to do, unless we fine-tune the weights on the fly, in a manner reminiscent of online learning (or amplification)

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-05-12T20:04:03.845Z · LW · GW

So maybe you mean that the ideal value function would be precisely the sum of rewards.

Yes, thanks, that's what I should have said.

In the rollout architecture you describe, there wouldn't really be any point to maintaining a separate value function, since you can just sum the rewards (assuming you have access to the reward function).

For "access to the reward function", we need to predict what the reward function will do (which may involve hard-to-predict things like "the human will be pleased with what I've done"). I guess your suggestion would be to call the thing-that-predicts-what-the-reward-will-be a "reward function model", and the thing-that-predicts-summed-rewards the "value function", and then to change "the value function may be different from the reward function" to "the value function may be different from the expected sum of rewards". Something like that?

If so, I agree, you're right, I was wrong, I shouldn't be carelessly going back and forth between those things, and I'll change it.

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-05-12T19:39:40.597Z · LW · GW

if we're talking about predicting what ML people will do, the sentence "the value function is a function of the latent variables in the world model" makes a lot more sense than the clarification "even abstract concepts are assigned values".

OK sure, that's fair. Point well taken. I was thinking about more brain-like neural nets that parse things into compositional pieces. If I wanted to be more prosaic maybe I would say something like: "She is differentiating both sides of the equation" could have a different value than "She is writing down a bunch of funny symbols", even if both are coming from the exact same camera inputs.

Comment by Steven Byrnes (steve2152) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-05-12T17:46:45.322Z · LW · GW

Thanks!!

> The value function might be different from the reward function.

Surely this isn't relevant! We don't by any means want the value function to equal the reward function. What we want (at least in standard RL) is for the value function to be the solution to the dynamic programming problem set up by the reward function and world model (or, more idealistically, the reward function and the actual world).

Hmm. I guess I have this ambiguous thing where I'm not specifying whether the value function is "valuing" world-states, or actions, or plans, or all of the above, or what. I think there are different ways to set it up, and I was trying not to get bogged down in details (and/or not being very careful!)

Like, here's one extreme: imagine that the "planner" does arbitrarily-long-horizon rollouts of possible action sequences and their consequences in the world, and then the "value function" is looking at that whole future rollout and somehow encoding how good it is, and then you can choose the best rollout. In this case we do want the value function to converge to be (for all intents and purposes) a clone of the reward function.

On the opposite extreme, when you're not doing rollouts at all, and instead the value function is judging particular states or actions, then I guess it should be less like the reward function and more like "expected upcoming reward assuming the current policy", which I think is what you're saying.

Incidentally, I think the brain does both. Like, maybe I'm putting on my shoes because I know that this is the first step of a plan where I'll go to the candy store and buy candy and eat it. I'm motivated to put on my shoes by the image in my head where, a mere 10 minutes from now, I'll be back at home eating yummy candy. In this case, the value function is hopefully approximating the reward function, and specifically approximating what the reward function will do at the moment where I will eat candy. But maybe eventually, after many such trips to the candy store, it becomes an ingrained habit. And then I'm motivated to put on my shoes because my brain has cached the idea that good things are going to happen as a result—i.e., I'm motivated even if I don't explicitly visualize myself eating candy soon.

I guess I spend more time thinking about the former (the value function is evaluating the eventual consequences of a plan) than the latter (the value function is tracking the value of immediate world-states and actions), because the former is the component that presents most of the x-risk. So that's what was in my head when I wrote that.

(It's not either/or; I think there's a continuum between those two poles. Like I can consequentialist-plan to get into a future state that has a high cached value but no immediate reward.)

As for prosaic RL systems, they're set up in different ways I guess, and I'm not an expert on the literature. In Human Compatible, if I recall, Stuart Russell said that he thinks the ability to do flexible hierarchical consequentialist planning is something that prosaic AI doesn't have yet, but that future AGI will need. If that's right, then maybe this is an area where I should expect AGI to be different from prosaic AI, and where I shouldn't get overly worried about being insufficiently prosaic. I dunno :-P 

Well anyway, your point is well taken. Maybe I'll change it to "the value function might be misaligned with the reward function", or "incompatible", or something...

Comment by Steven Byrnes (steve2152) on Human priors, features and models, languages, and Solmonoff induction · 2021-05-10T12:53:56.049Z · LW · GW

Model splintering happens when someone has updated on enough unusual sightings that it is worth their while to change their "language".

I think of human mental model updates as being overwhelmingly "adding more things" rather than "editing existing things". Like you see a funny video of a fish flopping around, and then a few days later you say "hey, look at the cat, she's flopping around just like that fish video". I'm not sure I'm disagreeing with you here, but your language kinda implies rare dramatic changes, I guess like someone changing religion and having an ontological crisis. That's certainly an important case but much less common.

Comment by Steven Byrnes (steve2152) on Migraine hallucinations, phenomenology, and cognition · 2021-05-08T18:58:37.295Z · LW · GW

There's a nice brain-like vision model here, and it even parses optical illusions in the same way people do. As far as I understand it, if there's a sudden change of, um, color, or whatever it is for migraine aura, it has to be (1) an edge of a thing, (2) an edge of an occluding thing, (3) a change of texture within a single surface (e.g. wallpaper). When you block a head with your hand, your visual system obviously and correctly parses it as (2). But here there's no occluder model that fits all the visual input data—maybe because some of the neurons that would offer evidence of an occluding shape are messed up and not sending those signals. So (2) doesn't fit the data. And there's no single-surface theory that fits all the visual input data either, so (3) gets thrown out too. So eventually the visual system settles on (1) as the best (least bad) parsing of the scene.

I dunno, something like that, I guess.

Comment by Steven Byrnes (steve2152) on Can you get AGI from a Transformer? · 2021-04-30T17:44:02.908Z · LW · GW

I slightly edited that section header to make it clearer what the parenthetical "(matrix multiplications, ReLUs, etc.)" is referring to. Thanks!

I agree that it's hard to make highly-confident categorical statements about all current and future DNN-ish architectures.

I don't think the human planning algorithm is very much like MCTS, although you can learn to do MCTS (just like you can learn to mentally run any other algorithm—people can learn strategies about what thoughts to think, just like they can strategies about what actions to execute). I think the built-in capability is that compositional-generative-model-based processing I was talking about in this post.

Like, if I tell you "I have a banana blanket", you have a constraint (namely, I just said that I have a banana blanket) and you spend a couple seconds searching through generative models until you find one that is maximally consistent with both that constraint and also all your prior beliefs about the world. You're probably imagining me with a blanket that has pictures of bananas on it, or less likely with a blanket made of banana peels, or maybe you figure I'm just being silly.

So by the same token, imagine you want to squeeze a book into a mostly-full bag. You have a constraint (the book winds up in the bag), and you spend a couple seconds searching through generative models until you find one that's maximally consistent with both that constraint and also all your prior beliefs and demands about the world. You imagine a plausible way to slide the book in without ripping the bag or squishing the other content, and flesh that out into a very specific action plan, and then you pick the book up and do it.

When we need a multi-step plan, too much to search for in one go, we start needing to also rely on other built-in capabilities like chunking stuff together into single units, analogical reasoning (which is really just a special case of compositional-generative-model-based processing), and RL (as mentioned above, RL plays a role in learning to use metacognitive problem-solving strategies). Maybe other things too.

I don't think causality per se is a built-in feature, but I think it comes out pretty quickly from the innate ability to learn (and chunk) time-sequences, and then incorporate those learned sequences into the compositional-generative-model-based processing framework. Like, "I swing my foot and then kick the ball and then the ball is flying away" is a memorized temporal sequence, but it's also awfully close to a causal belief that "kicking the ball causes it to fly away". (...at least in conjunction with a second memorized temporal sequence where I don't kick the ball and it just stays put.) (See also counterfactuals.)

I'm less confident about any of this than I sound :)

Comment by Steven Byrnes (steve2152) on Can you get AGI from a Transformer? · 2021-04-30T15:55:53.909Z · LW · GW

Oh OK I think I misunderstood you.

So the context was: I think there's an open question about the extent to which the algorithms underlying human intelligence in particular, and/or AGI more generally, can be built from operations similar to matrix multiplication (and a couple other operations). I'm kinda saying "no, it probably can't" while the scaling-is-all-you-need DNN enthusiasts are kinda saying "yes, it probably can".

Then your response is that humans can't multiply matrices in their heads. Correct? But I don't think that's relevant to this question. Like, we don't have low-level access to our own brains. If you ask GPT-3 (through its API) to simulate a self-attention layer, it wouldn't do particularly well, right? So I don't think it's any evidence either way.

"Surpassed" seems strange to me; I'll bet that the first AGI system will have a very GPT-like module, that will be critical to its performance, that will nevertheless not be "the whole story." Like, by analogy to AlphaGo, the interesting thing was the structure they built around the convnets, but I don't think it would have worked nearly as well without the convnets.

I dunno, certainly that's possible, but also sometimes new algorithms outright replace old algorithms. Like GPT-3 doesn't have any LSTM modules in it, let alone HHMM modules, or syntax tree modules, or GOFAI production rule modules. :-P