Posts

Research productivity tip: "Solve The Whole Problem Day" 2021-08-27T13:05:59.550Z
Randal Koene on brain understanding before whole brain emulation 2021-08-23T20:59:35.358Z
Dopamine-supervised learning in mammals & fruit flies 2021-08-10T16:13:43.878Z
Research agenda update 2021-08-06T19:24:53.726Z
Value loading in the human brain: a worked example 2021-08-04T17:20:06.195Z
Neuroscience things that confuse me right now 2021-07-26T21:01:19.109Z
How is low-latency phasic dopamine so fast? 2021-07-23T17:43:59.565Z
A model of decision-making in the brain (the short version) 2021-07-18T14:39:35.338Z
Model-based RL, Desires, Brains, Wireheading 2021-07-14T15:11:13.090Z
(Brainstem, Neocortex) ≠ (Base Motivations, Honorable Motivations) 2021-07-12T16:39:23.246Z
Thoughts on safety in predictive learning 2021-06-30T19:17:40.026Z
Reward Is Not Enough 2021-06-16T13:52:33.745Z
Book review: "Feeling Great" by David Burns 2021-06-09T13:17:59.411Z
Supplement to "Big picture of phasic dopamine" 2021-06-08T13:08:00.832Z
Big picture of phasic dopamine 2021-06-08T13:07:43.192Z
Electric heat pumps (Mini-Splits) vs Natural gas boilers 2021-05-30T15:51:53.580Z
Young kids catching COVID: how much to worry? 2021-04-20T18:03:32.499Z
Solving the whole AGI control problem, version 0.0001 2021-04-08T15:14:07.685Z
My AGI Threat Model: Misaligned Model-Based RL Agent 2021-03-25T13:45:11.208Z
Against evolution as an analogy for how humans will create AGI 2021-03-23T12:29:56.540Z
Acetylcholine = Learning rate (aka plasticity) 2021-03-18T13:57:39.865Z
Is RL involved in sensory processing? 2021-03-18T13:57:28.888Z
Comments on "The Singularity is Nowhere Near" 2021-03-16T23:59:30.667Z
Book review: "A Thousand Brains" by Jeff Hawkins 2021-03-04T05:10:44.929Z
Full-time AGI Safety! 2021-03-01T12:42:14.813Z
Late-talking kids and "Einstein syndrome" 2021-02-03T15:16:05.284Z
[U.S. specific] PPP: free money for self-employed & orgs (time-sensitive) 2021-01-09T19:53:09.088Z
Multi-dimensional rewards for AGI interpretability and control 2021-01-04T03:08:41.727Z
Conservatism in neocortex-like AGIs 2020-12-08T16:37:20.780Z
Supervised learning in the brain, part 4: compression / filtering 2020-12-05T17:06:07.778Z
Inner Alignment in Salt-Starved Rats 2020-11-19T02:40:10.232Z
Supervised learning of outputs in the brain 2020-10-26T14:32:54.061Z
"Little glimpses of empathy" as the foundation for social emotions 2020-10-22T11:02:45.036Z
My computational framework for the brain 2020-09-14T14:19:21.974Z
Emotional valence vs RL reward: a video game analogy 2020-09-03T15:28:08.013Z
Three mental images from thinking about AGI debate & corrigibility 2020-08-03T14:29:19.056Z
Can you get AGI from a Transformer? 2020-07-23T15:27:51.712Z
Selling real estate: should you overprice or underprice? 2020-07-20T15:54:09.478Z
Mesa-Optimizers vs “Steered Optimizers” 2020-07-10T16:49:26.917Z
Gary Marcus vs Cortical Uniformity 2020-06-28T18:18:54.650Z
Building brain-inspired AGI is infinitely easier than understanding the brain 2020-06-02T14:13:32.105Z
Help wanted: Improving COVID-19 contact-tracing by estimating respiratory droplets 2020-05-22T14:05:10.479Z
Inner alignment in the brain 2020-04-22T13:14:08.049Z
COVID transmission by talking (& singing) 2020-03-29T18:26:55.839Z
COVID-19 transmission: Are we overemphasizing touching rather than breathing? 2020-03-23T17:40:14.574Z
SARS-CoV-2 pool-testing algorithm puzzle 2020-03-20T13:22:44.121Z
Predictive coding and motor control 2020-02-23T02:04:57.442Z
On unfixably unsafe AGI architectures 2020-02-19T21:16:19.544Z
Book review: Rethinking Consciousness 2020-01-10T20:41:27.352Z
Predictive coding & depression 2020-01-03T02:38:04.530Z

Comments

Comment by Steven Byrnes (steve2152) on Dopamine-supervised learning in mammals & fruit flies · 2021-09-19T21:33:02.608Z · LW · GW

Oh I gotcha. Well one thing is, I figure the whole system doesn't come crashing down if the plan-proposer gets an "incorrect" reward sometimes. I mean, that's inevitable—the plan-assessors keep getting adjusted over the course of your life as you have new experiences etc., and the plan-proposer has to keep playing catch-up.

But I think it's better than that.

Here's an alternate example that I find a bit cleaner (sorry if it's missing your point). You put something in your mouth expecting it to be yummy (thus release certain hormones), but it's actually gross (thus make a disgust face and release different hormones etc.). So reward(plan1, assessor_action1)>0 but reward(plan1, assessor_action2)<0. I think as you bring the food towards your mouth, you're getting assessor_action1 and hence the "wrong" reward, but once it's in your mouth, your hypothalamus / brainstem immediately pivots to assessor_action2 and hence the "right reward". And the "right reward" is stronger than the "wrong reward", because it's driven by a direct ground-truth experience not just an uncertain expectation. So in the end the plan proposer would get the right training signal overall, I think.

Comment by Steven Byrnes (steve2152) on Goodhart Ethology · 2021-09-17T21:51:31.574Z · LW · GW

Oh, yup, makes sense thanks

Comment by Steven Byrnes (steve2152) on Goodhart Ethology · 2021-09-17T18:58:25.609Z · LW · GW

now suppose this curve represents the human ratings of different courses of action, and you choose the action that your model says will have the highest rating. You're going to predictably mess up again, because of the optimizer's curse (or regressional Goodhart on the correlation between modeled rating and actual rating).

It's not obvious to me how the optimizer's curse fits in here (if at all). If each of the evaluations has the same noise, then picking the action that the model says will have the highest rating is the right thing to do. The optimizer's curse says that the model is likely to overestimate how good this "best" action is, but so what? "Mess up" conventionally means "the AI picked the wrong action", and the optimizer's curse is not related to that (unless there's variable noise across different choices and the AI didn't correct for that). Sorry if I'm misunderstanding.

Comment by Steven Byrnes (steve2152) on The Best Software For Every Need · 2021-09-12T18:20:34.501Z · LW · GW

Software: Trello

Need: To-do list

Other programs I've tried: Toodledo, Wunderlist [discontinued], Microsoft Outlook

I have a GTD-ish to-do list. The way it looks in Trello is that there's a "Work to-do list" Trello board, on which there are "lists" labeled "Today", "This week", "Waiting for..." "Done", "Next few weeks", "Not this month", "Probably never", etc., and on each "list" there are "cards" with individual items that I want to do. Trello allows each individual card to carry lots of useful information inside it, like links, attached google docs, text, checklists, deadlines. It's very easy to browse and edit the cards and move them around, and has change-tracking for that. It's cloud-hosted and has good smartphone apps. Their main market is for team collaboration I think, but it works great for one person. (I do use it to share a grocery list with my spouse.) It's also free (including for private boards), good responsive searching (with an "archive" feature), no downtime, no bugs. The other to-do lists I've tried were missing some of these features and/or worse user experience.

Comment by Steven Byrnes (steve2152) on Prefer the British Style of Quotation Mark Punctuation over the American · 2021-09-11T15:27:25.814Z · LW · GW

Here's a long essay justifying wikipedia's use of logical quotation: https://en.wikipedia.org/wiki/Wikipedia:Logical_quotation_on_Wikipedia

Comment by Steven Byrnes (steve2152) on Paths To High-Level Machine Intelligence · 2021-09-10T15:38:14.513Z · LW · GW

This is great! Here's a couple random thoughts:

Hybrid statistical-symbolic HLMI

I think there's a common habit of conflating "symbolic" with "not brute force" and "not bitter lesson" but that it's not right. For example, if I were to write an algorithm that takes a ton of unstructured data and goes off and builds a giant PGM that best explains all that data, I would call that a "symbolic" AI algorithm (because PGMs are kinda discrete / symbolic / etc.), but I would also call it a "statistical" AI algorithm, and I would certainly call it "compatible with The Bitter Lesson".

(Incidentally, this description is pretty close to my oversimplified caricature description of what the neocortex does.)

(I'm not disputing that "symbolic" and "containing lots of handcrafted domain-specific structure" do often go together in practice today—e.g. Josh Tenenbaum's papers tend to have both and OpenAI papers tend to have neither—I'm just saying they don't necessarily go together.)

I don't have a concrete suggestion of what if anything you should change here, I'm just chatting. :-)

Cognitive-science approach

This is all fine except that I kinda don't like the term "cognitive science" for what you're talking about. Maybe it's just me, but anyway here's where I'm coming from:

Learning algorithms almost inevitably have the property that the trained models are more complex than the learning rules that create them. For example, compare the code to run gradient descent and train a ConvNet (it's not very complicated) to the resulting image-classification algorithms as explored in the OpenAI microscope project (they're much more complicated).

I bring this up because "cognitive science" to me has a connotation of "the study of how human brains do all the things that human brains do", especially adult human brains. After all, that's the main thing that most cognitive scientists study AFAICT. So if you think that human intelligence is "mostly a trained model", then you would think that most cognitive science is "mostly akin to OpenAI microscope" as opposed to "mostly akin to PyTorch", and therefore mostly unnecessary and unhelpful for building HLMI. You don't have to think that—certainly Gary Marcus & Steven Pinker don't—but I do (to a significant extent) and at least a few other prominent neuroscientists do too (e.g. Randall O'Reilly). (See "learning-from-scratch-ism" section here, also cortical uniformity here.) So as much as I buy into (what you call) "the cognitive-science approach", I'm not just crazy about that term, and for my part I prefer to talk about "brain algorithms" or "high-level brain algorithms". I think "brain algorithms" is more agnostic about the nature of the algorithms, and in particular whether it's the kinds of algorithms that neuroscientists talk about, versus the kinds of algorithms that computational cognitive scientists & psychologists talk about.

Comment by Steven Byrnes (steve2152) on Covid 9/9: Passing the Peak · 2021-09-10T12:42:54.417Z · LW · GW

My kid's elementary school has been doing routine weekly COVID pool tests of each class since last spring (this is Massachusetts, USA). I don't think it's "insane". It's not the kind of test that goes way up your nose, it's the anterior one, the one that feels like picking your nose. It takes a few minutes once a week, the school nurse goes around and does it, and AFAICT none of the kids care at all. I think of this as an extremely-low-cost intervention that meaningfully reduces the chance of within-school transmission / outbreaks.

I'm less blasé about unvaccinated kids catching COVID than Zvi is, but I would like to think that we can put aside our differences and say "Yes, let's do extremely-low-cost interventions when we can!"

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-10T01:20:07.356Z · LW · GW

Good question!

Imagine we have a learning algorithm that learns a world-model, and flags things in the world-model as "goals", and then makes plans to advance those "goals". (An example of such an algorithm is (part of) the human brain, more-or-less, according to me.) We can say the algorithm is "aligned" if the things flagged as "goals" do in fact corresponding to maximizing the objective function (e.g. "predict the human's outputs"), or at least it's as close a match as anything in the world-model, and if this remains true even as the world-model gets improved and refined over time.

Making that definition better and rigorous would be tricky because it's hard to talk rigorously about symbol-grounding, but maybe it's not impossible. And if so, I would say that this is a definition of "aligned" which looks nothing like a performance guarantee.

OK, hmmm, after some thought, I guess it's possible that this definition of "aligned" would be equivalent to a performance-centric claim along the lines of "asymptotically, performance goes up not down". But I'm not sure that it's exactly the same. And even if it were mathematically equivalent, we still have the question of what the proof would look like, out of these two possibilities:

  • We prove that the algorithm is aligned (in the above sense) via "direct reasoning about alignment" (i.e. talking about symbol-grounding, goal-stability, etc.), and then a corollary of that proof would be the asymptotic performance guarantee.
  • We prove that the algorithm satisfies the asymptotic performance guarantee via "direct reasoning about performance", and then a corollary of that proof would be that the algorithm is aligned (in the above sense).

I think it would be the first one, not the second. Why? Because it seems to me that the alignment problem is hard, and if it's solvable at all, it would only be solvable with the help of various specific "alignment-promoting algorithm features", and we won't be able to prove that those features work except by "direct reasoning about alignment".

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-09T20:46:13.554Z · LW · GW

Cool, gotcha, thanks. So my current expectation is either: (1) we will never be able to prove any performance guarantees about human-level learning algorithms, or (2) if we do, those proofs would only apply to certain algorithms that are packed with design features specifically tailored to solve the alignment problem, and any proof of a performance guarantee would correspondingly have a large subsection titled "Lemma 1: This learning algorithm will be aligned".

The reason I think that is that (as above) I expect the learning algorithms in question to be kinda "agential", and if an "agential" algorithm is not "trying" to perform well on the objective, then it probably won't perform well on the objective! :-)

If that view is right, the implication is: the only way to get a performance guarantee is to prove Lemma 1, and if we prove Lemma 1, we no longer care about the performance guarantee anyway, because we've already solved the alignment problem. So the performance guarantee would be besides the point (on this view).

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-09T19:41:51.207Z · LW · GW

Hmmm, OK, let me try again.

You wrote earlier: "the algorithm somehow manages to learn those hypotheses, for example by some process of adding more and more detail incrementally".

My claim is that good-enough algorithms for "adding more and more detail incrementally" will also incidentally (by default) be algorithms that seize control of their off-switches.

And the reason I put a lot of weight on this claim is that I think the best algorithms for "adding more and more detail incrementally" may be algorithms that are (loosely speaking) "trying" to understand and/or predict things, including via metacognition and instrumental reasoning.

OK, then the way I'm currently imagining you responding to that would be:

My model of Vanessa: We're hopefully gonna find a learning algorithm with a provable regret bound (or something like that). Since seizing control of the off-switch would be very bad according to the objective function and thus violate the regret bound, and since we proved the regret bound, we conclude that the learning algorithm won't seize control of the off-switch.

(If that's not the kind of argument you have in mind, oops sorry!)

Otherwise: I feel like that's akin to putting "the AGI will be safe" as a desideratum, which pushes "solve AGI safety" onto the opposite side of the divide between desiderata vs. learning-algorithm-that-satisfies-the-desiderata. That's perfectly fine, and indeed precisely defining "safe" is very useful. It's only a problem if we also claim that the "find a learning algorithm that satisfies the desiderata" part is not an AGI safety problem. (Also, if we divide the problem this way, then "we can't find a provably-safe AGI design" would be re-cast as "no human-level learning algorithms satisfy the desiderata".)

That's also where I was coming from when I expressed skepticism about "strong formal guarantees". We have no performance guarantee about the brain, and we have no performance guarantee about AlphaGo, to my knowledge. Again, as above, I was imagining an argument that turns a performance guarantee into a safety guarantee, like "I can prove that AlphaGo plays go at such-and-such Elo level, and therefore it must not be wireheading, because wireheaders aren't very good at playing Go." If you weren't thinking of performance guarantees, what "formal guarantees" are you thinking of?

(For what little it's worth, I'd be a bit surprised if we get a safety guarantee via a performance guarantee. It strikes me as more promising to reason about safety directly—e.g. "this algorithm won't seize control of the off-switch because blah blah incentives blah blah mesa-optimizers blah blah".)

Sorry if I'm still misunderstanding. :)

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-09T15:50:21.080Z · LW · GW

Thanks!! Here's where I'm at right now.

In the grandparent comment I suggested that if we want to make an AI that can learn sufficiently good hypotheses to do human-level things, perhaps the only way to do that is to make a "prior-building AI" with "agency" that is "trying" to build out its world-model / toolkit-of-concepts-and-ideas in fruitful directions. And I said that we have to solve the problem of how to build that kind of agential "prior-building AI" that doesn't also incidentally "try" to seize control of its off-switch.

Then in the parent comment you replied (IIUC) that if this is a problem at all, it's not the problem you're trying to solve (i.e. "finding good formal desiderata for safe TAI"), but a different problem (i.e. "developing learning algorithms with strong formal guarantees and/or constructing a theory of formal guarantees for existing algorithms"), and my problem is "to a first approximation orthogonal" to your problem, and my problem "receives plenty of attention from outside the existential safety community".

If so, my responses would be:

  • Obviously the problem of "make an agential "prior-building AI" that doesn't try to seize control of its off-switch" is being worked on almost exclusively by x-risk people.  :-P
  • I suspect that the problem doesn't decompose the way you imply; instead I think that if we develop techniques for building a safe agential "prior-building AI", we would find that similar techniques enable us to build a safe non-manipulative-question-answering AI / oracle AI / helper AI / whatever.
  • Even if that's not true, I would still say that if we can make a safe agential "prior-building AI" that gets to human-level predictive ability and beyond, then we've solved almost the whole TAI safety problem, because we could then run the prior-building AI, then turn it off and use microscope AI to extract a bunch of new-to-humans predictively-useful concepts from the prior it built—including new ideas & concepts that will accelerate AGI safety research.

Or maybe another way of saying it would be: I think I put a lot of weight on the possibility that those "learning algorithms with strong formal guarantees" will turn out not to exist, at least not at human-level capabilities.

I guess, when I read "learning algorithms with strong formal guarantees", I'm imaging something like multi-armed bandit algorithms that have regret bounds. But I'm having trouble imagining how that kind of thing would transfer to a domain where we need the algorithm to discover new concepts and leverage them for making better predictions, and we don't know a priori what the concepts look like, or how many there will be, or how hard they will be to find, or how well they will generalize, etc.

Comment by Steven Byrnes (steve2152) on Dopamine-supervised learning in mammals & fruit flies · 2021-09-09T13:58:26.954Z · LW · GW

Thanks for your thoughtful & helpful comments!

If by learning for sensory predictions areas you mean modifying synapses in V1, I agree, you might not need synaptic changes or dopamine there, sensory learning (and need for dopamine) can happen somewhere else (hippocampus-entorhinal cortex? no clue) that are sending predictions to V1. The model is learned on the level of knowing when to fire predictions from entorhinish cortex to V1. 

Yup, that's what I meant.

Even if this "dopamine permits to update sensory model" is true, I also don't get why would you need the intermediate node dopamine between PE and updating the model, why not just update the model after you get cortical (signaled by pyramidal neurons) PE?

For example, I'm sitting on my porch mindlessly watching cars drive by. There's a red car and then a green car. After seeing the red car, I wasn't expecting the next car to be green … but I also wasn't expecting the next car not to be green. I just didn't have any particular expectations about the next car's color. So I would say that the "green" is unexpected but not a prediction error. There was no prediction; my models were not wrong but merely agnostic.

In other words:

  • The question of "what prediction to make" has a right and wrong answer, and can therefore be trained by prediction errors (self-supervised learning).
  • The question of "whether to make a prediction in the first place, as opposed to ignoring the thing and attending to something else (or zoning out altogether)" is a decision, and therefore cannot be trained by prediction errors. If it's learned at all, it has to be trained by RL, I think.

(In reality, I don't think it's a binary "make a prediction about X / don't make a prediction about X", instead I think you can make stronger or weaker predictions about things.)

And I think the decision of "whether or not to make a strong prediction about what color car will come after the red car" is not being made in V1, but rather in, I dunno, maybe IT or FEF or dlPFC or HC/EC (like you suggest) or something.

I guess I incorrectly understood your model. I assumed that for the given environment the ideal policy will lead to the big dopamine release, saying "this was a really good plan, repeat it the next time", after rereading your decision making post it seems that assessors predict the reward, and there will be no dopamine as RPE=0?

To be clear, in regards to "no dopamine", sometimes I leave out "(compared to baseline)", so "positive dopamine (compared to baseline)" is a burst and "negative dopamine (compared to baseline)" is a pause. (I should stop doing that!) Anyway, my impression right now is that when things are going exactly as expected, even if that's very good in some objective sense, it's baseline dopamine, neither burst nor pause—e.g. in the classic Wolfram Schultz experiment, there was baseline dopamine at the fully-expected juice, even though drinking juice is really great compared to what monkeys might generically expect in their evolutionary environment.

(Exception: if things are going exactly as expected, but it's really awful and painful and dangerous, there's apparently still a dopamine pause—it never gets fully predicted away—see here, the part that says "Punishments create dopamine pauses even when they’re fully expected".)

I'm not sure if "baseline dopamine" corresponds to "slightly strengthen the connections for what you're doing", or "don't change the connection strengths at all".

Side question: when you talk about plan assessors, do you think there should be some mechanism in the brainstem that corrects RPE signals going to the cortex based on the signals sent to supervised learning plan assessors? For example, If the plan is to "go eat" and your untrained amygdala says "we don't need to salivate", and you don't salivate, then you get way smaller reward (especially after crunchy chips) than if you would salivate. Sure, amygdala/other assesors will get their supervisory signal, but it also seems to me that the plan "go eat" it's not that bad and it shouldn't be disrewarded that much just because amygdala screwed up and you didn't salivate, so the reward signal should be corrected somehow?

OK, let's say I happen to really need salt right now. I grab what I thought was an unsalted peanut and put it in my mouth, but actually it's really salty. Awesome!

From a design perspective, whatever my decisionmaking circuits were doing just now was a good thing to do, and they ought to receive an RL-type dopamine burst ensuring that it happens again.

My introspective experience matches that: I'm surprised and delighted that the peanut is salty.

Your comment suggests that this is not the default, but requires some correction mechanism. I'm kinda confused by what you wrote; I'm not exactly sure where you're coming from.

Maybe you're thinking: it's aversive to put something salty in your mouth without salivating first. Well, why should it be aversive? It's not harmful for the organism, it just takes a bit longer to swallow. Anyway, the decisionmaking circuit didn't do anything wrong. So I would expect "putting salty food into a dry mouth" to be wired up as not inherently aversive. That seems to match my introspective experience.

Or maybe you're thinking: My hypothalamus & brainstem are tricked by the amygdala / AIC / whatever to treat the peanut as not-salty? Well, the brainstem has ground truth (there's a direct line from the taste buds to the medulla I think) so whatever they were guessing before doesn't matter; now that the peanut is in the mouth, they know it's definitely salty, and will issue appropriate signals.

You can try again if I didn't get it. :)

Comment by Steven Byrnes (steve2152) on A Primer on the Symmetry Theory of Valence · 2021-09-08T17:11:14.122Z · LW · GW

For the record, I do actually believe that. I was trying to state what seemed to be a problem in the STV framework as I was understanding it.

In my picture, the brainstem communicates valence to the neocortex via a midbrain dopamine signal (one particular signal of the many), and sometimes communicates the suggested cause / remediation via executing orienting reactions (saccading, moving your head, etc.—the brainstem can do this by itself), and sending acetylcholine to the corresponding parts of your cortex, which then override the normal top-down attention mechanism and force attention onto whatever your brainstem demands. For example, when your finger hurts a lot, it's really hard to think about anything else, and my tentative theory is that the mechanism here involves the brainstem sending acetylcholine to the finger-pain-area of the insular cortex. (To be clear, this is casual speculation that I haven't thought too hard about or looked into much.)

Comment by Steven Byrnes (steve2152) on A Primer on the Symmetry Theory of Valence · 2021-09-08T17:01:56.830Z · LW · GW

I'm confused about your use of the term "symmetry" (and even more confused what a "symmetry gradient" is). For example, if I put a front-to-back mirror into my brain, it would reflect the frontal lobe into the occipital lobe—that's not going to be symmetric. The brain isn't an undifferentiated blob. Different neurons are connected to different things, in an information-bearing way.

You don't define "symmetry" here but mention three ideas: (1) "The size of the mathematical object’s symmetry group". Well, I am aware of zero nontrivial symmetry transformations of the brain, and zero nontrivial symmetry transformations of qualia. Can you name any? "If my mind is currently all-consumed by the thought of cucumber sandwiches, then my current qualia space is symmetric under the transformation that swaps the concepts of rain and snow"??? :-P  (2) "Compressibility". In the brain context, I would call that "redundancy" not "symmetry". I absolutely believe that the brain stores information in ways that involve heavy redundancy; if one neuron dies you don't suddenly forget your name. I think brains, just like hard drives, can make tradeoffs between capacity and redundancy in their information encoding mechanisms. I don't see any connection between that and valence. Or maybe you're not thinking about neurons but instead imagining the compressibility of qualia? I dunno, if I can't think about anything besides how much my toe hurts right now, that's negative valence, but it's also low information content / high compressibility, right? (3) "practical approximations for finding symmetry in graphs…adapted for the precise structure of Qualia space (a metric space?)". If Qualia space isn't a graph, I'm not sure why you're bringing up graphs. Can you walk through an example, even an intuitive one? I really don't understand where you're coming from here.

I skimmed this thing by Smolensky and it struck me as quite unrelated to anything you're talking about. I read it as saying that cortical inference involves certain types of low-level algorithms that have stable attractor states (as do energy-based models, PGMs, Hopfield networks, etc.). So if you try to imagine a "stationary falling rock" you can't, because the different pieces are contradicting each other, but if you try to imagine a "purple tree" you can pretty quickly come up with a self-consistent mental image. Smolensky (poetically) uses the term "harmonious" for what I would call a "stable attractor" or "self-consistent configuration" in the model space. (Steve Grossberg would call them "resonant".) Again I don't see any relation between that and CSHW or STV. Like, when I try to imagine a "stationary falling rock", I can't, but that doesn't lead to me suffering—on the contrary, it's kinda fun. The opposite of Smolensky's "harmony" would be closer to confusion than suffering, in my book, and comes with no straightforward association to valence. Moreover, I believe that the attractor dynamics in question, the stuff Smolensky is (I think) talking about, are happening in the cortex and thalamus but not other parts of the brain—but those other parts of the brain are I believe clearly involved in suffering (e.g. lateral habenula, parabrachial nucleus, etc.).

(Also, not to gripe, but if you don't yet have a precise definition of "symmetry", then I might suggest that you not describe STV as a "crisp formalism". I normally think "formalism" ≈ "formal" ≈ "the things you're talking about have precise unambiguous definitions". Just my opinion.)

potential infinite regress: what ‘makes’ something a pleasure center?

I would start by just listing a bunch of properties of "pleasure". For example, other things equal, if something is more pleasurable, then I'm more likely to make a decision that result in my doing that thing in the future, or my continuing to do that thing if I'm already doing it, or my doing it again if it was in the past. Then if I found a "center" that causes all those properties to happen (via comprehensible, causal mechanisms), I would feel pretty good calling it a "pleasure center". (I'm not sure there is such a "center".)

(FWIW, I think that "pleasure", like "suffering" etc., is a learned concept with contextual and social associations, and therefore won't necessarily exactly correspond to a natural category of processes in the brain.)

 

Unrelated, but your documents bring up IIT sometimes; I found this blog post helpful in coming to the conclusion that IIT is just a bunch of baloney. :)

Comment by Steven Byrnes (steve2152) on Can you get AGI from a Transformer? · 2021-09-07T18:13:43.174Z · LW · GW

Thanks for the comment!

First, that's not MCTS. It is not using random rollouts to the terminal states (literally half the name, 'Monte Carlo Tree Search'). This is abuse of terminology (or more charitably, genericizing the term for easier communication): "MCTS" means something specific, it doesn't simply refer to any kind of tree-ish planning procedure using some sort of heuristic-y thing-y to avoid expanding out the entire tree. The use of a learned latent 'state' space makes this even less MCTS.

Yeah even when I wrote this, I had already seen claims that the so-called MCTS is deterministic. But DeepMind and everyone else apparently calls it MCTS, and I figured I should just follow the crowd, and maybe this is just one of those things where terminology drifts in weird directions and one shouldn't think too hard about it, like how we say "my phone is ringing" when it's actually vibrating.  :-P

Looking into it again just now, I'm still not quite sure what's going on. This person says AlphaZero switches from random to non-random after 15 moves. And this person says AlphaZero is totally deterministic but "MCTS" is still the proper term, for reasons that don't make any sense to me. I dunno and I'm open to being educated here. Regardless, if you tell me that I should call it "tree search" instead of "MCTS", I'm inclined to take your word for it. I want to be part of the solution not part of the problem  :-D

NNs absolutely can plan in a 'pure' fashion: TreeQN (which they cite) constructs its own tree which it does its own planning/exploration over in a differentiable fashion.

That's an interesting example. I think I need to tone down my claim a bit (and edit the post). Thank you. I will now say exactly what I'm making of that example:

Here is a spectrum of things that one might believe, from most-scaling-hypothesis-y to least:

  1. If you take literally any DNN, and don't change the architecture or algorithm or hyperparameters at all, and just scale it up with appropriate training data and loss functions, we'll get AGI. And this is practical and realistic, and this is what will happen in the near future to create AGI.
  2. If you take literally any DNN, and don't change the architecture or algorithm or hyperparameters at all, and just scale it up with appropriate training data and loss functions, we'll get AGI in principle … but in practice obviously both people and meta-learning algorithms are working hard to find ever-better neural network architectures / algorithms / hyperparameters that give better performance per compute, and they will continue to do so. So in practice we should expect AGI to incorporate some tweaks compared to what we might build today.
  3. Those kinds of tweaks are not just likely for economic reasons but in fact necessary to get to AGI
  4. …and the necessary changes are not just "tweaks", they're "substantive changes / additions to the computation"
  5. …and they will be such a big change that the algorithm will need to involve some key algorithmic steps that are not matrix multiplications, ReLUs, etc.
  6. More than that, AGI will not involve anything remotely like a DNN, not even conceptually similar, not even as one of several components.

My impression is that you're at #2.

I put almost no weight on #6, and never have. In fact I'm a big believer in AGI sharing some aspects of DNNs, including distributed representations, and gradient descent (or at least "something kinda like gradient descent"), and learning from training data, and various kinds of regularization, etc. I think (uncoincidentally) that the family of (IMO neocortex-like) learning algorithms I was talking about in the main part of this post would probably have all those aspects, if they scale to AGI.

So my probability weight is split among #2,3,4,5.

To me, the question raised by the TreeQN paper is: should I shift some weight from #5 to #4?

When I look at the TreeQN paper, e.g. this source code file, I think I can characterize it as "they did some tree-structure-specific indexing operations". (You can correct me if I'm misunderstanding.) Are "tree-structure-specific indexing operations" included in my phrase "matrix multiplications, ReLUs, etc."? I dunno. Certainly stereotypical DNN code involves tons of indexing operations; it's not like it looks out of place! On the other hand, it is something that humans deliberately added to the code.

I guess in retrospect it was kinda pointless for me to make an argument for #5. I shouldn't have even brought it up. In the context of this post, I could have said: "The space of all possible learning algorithms is much vaster than Transformers—or even "slight tweaks on Transformers". Therefore we shouldn't take it for granted that either Transformers or "slight tweaks on Transformers" will necessarily scale to AGI—even if we believe in The Bitter Lesson."

And then a different (and irrelevant-to-this-post) question is whether "matrix multiplications, ReLUs, etc." (whatever that means) is a sufficiently flexible toolkit to span much of the space of all possible useful learning algorithms, in an efficiently-implemented way. My change from yesterday is: if I interpret this "toolkit" to include arbitrary indexing & masking operations, and also arbitrary control flow—basically, if this "toolkit" includes anything that wouldn't look out of place in today's typical DNN source code—then this is a much broader space of (efficiently-implemented) algorithms than I was mentally giving it credit for. This makes me more apt to believe that future AGI algorithms will be built using tools in this toolkit, but also more apt to believe that those algorithms could nevertheless involve a lot of new ideas and ingenuity, and less apt to believe that it's feasible for something like AutoML-Zero to search through the whole space of things that you can do with this toolkit, and less apt to describe the space of things you can build with this toolkit as "algorithms similar to DNNs". For example, here is a probabilistic program inference algorithm that's (at least arguably/partly) built using this "toolkit", and I really don't think of probabilistic program inference as "similar to DNNs".

Comment by Steven Byrnes (steve2152) on A Primer on the Symmetry Theory of Valence · 2021-09-06T18:07:15.223Z · LW · GW

I'll preface this by saying that I haven't spent much time engaging with your material (it's been on my to-do list for a very long time), and could well be misunderstanding things, and that I have great respect for what you're trying to do. So you and everyone can feel free to ignore this, but here I go anyway.

OK, maybe the most basic reason that I'm skeptical of your STV stuff is that I'm going in expecting a, um, computational theory of valence, suffering, etc. As in, the brain has all those trillions of synapses and intricate circuitry in order to do evolutionary-fitness-improving calculations, and suffering is part of those calculations (e.g. other things equal, I'd rather not suffer, and I make decisions accordingly, and this presumably has helped my ancestors to survive and have more viable children).

So let's say we're sitting together at a computer, and we're running a Super Mario executable on an emulator, and we're watching the bits in the processor's SRAM. You tell me: "Take the bits in the SRAM register, and take the Fourier transform, and look at the spectrum (≈ absolute value of the Fourier components). If most of the spectral weight is in long-wavelength components, e.g. the bits are "11111000111100000000...", then Mario is doing really well in the game. If most of the spectral weight is in the short-wavelength components, e.g. the bits are "101010101101010", then Mario is doing poorly in the game. That's my theory!"

I would say "Ummm, I mean, I guess that's possible. But if that's true at all, it's not an explanation, it's a random coincidence."

(This isn't a perfect analogy, just trying to gesture at where I'm coming from right now.)

So that's the real reason I don't believe in STV—it just looks wrong to me, in the same way that Mario's progress should not look like certain types of large-scale structure in SRAM bits.

I want a better argument than that though. So here are a few more specific things:

(1) waves and symmetries don't carry many bits of information. If you think valence and suffering are fundamentally few-dimensional, maybe that doesn't bother you; but I think it's at least possible for people know whether they're suffering from arm pain or finger pain or air-hunger or guilt or whatever. I guess I raised this issue in an offhand comment a couple years ago, and lsusr responded, and then I apparently dropped out of the conversation, I guess I must have gotten busy or something, hmm I guess I should read that. :-/

(2) From the outside, it's easy to look at an fMRI or whatever and talk about its harmonic decomposition and symmetries. But from the perspective of any one neuron, that information is awfully hard to access. It's not impossible, but I think you'd need the neuron to have a bunch of inputs from across the brain hooked into complicated timing circuits etc. My starting point, as I mentioned, is that suffering causes behavioral changes (including self-reports, trying not to suffer, etc.), so there has to be a way for the "am I suffering" information to impact specific brain computations, and I don't know what that mechanism is in STV. (In the Mario analogy, if you just look at one SRAM bit, or even a few bits, you get almost no information about the spectrum of the whole SRAM register.) If "suffering" was a particular signal carried by a particular neurotransmitter, for example, we wouldn't have that problem, we just take that signal and wire it to whatever circuits need to be modulated by the presence/absence of suffering. So theories like that strike me as more plausible.

(3) Conversely, I'm confused at how you would tell a story where getting tortured (for example) leads to suffering. This is just the opposite of the previous one: Just as a brain-wide harmonic decomposition can't have a straightforward and systematic impact on a specific neural signal, likewise a specific neural signal can't have a straightforward and systematic impact on a brain-wide harmonic decomposition, as far as I can tell.

(4) I don't have a particularly well-formed alternative theory to STV, but all the most intriguing ideas that I've played around with so far that seem to have something to do with the nature of valence and suffering (e.g. here , here , various other things I haven't written up) look wildly different from STV. Instead they tend to involve certain signals in the insular cortex and reticular activating system and those signals have certain effects on decisionmaking circuits, blah blah blah.

Comment by Steven Byrnes (steve2152) on Kids Roaming · 2021-09-04T15:30:07.921Z · LW · GW

I think OP did include that: "Get stopped by the police or others, who might think they're too young to be out on their own."

I found that reading Lenore Skenazy is good for having a properly-calibrated kidnapping risk assessment but potentially extremely bad for having a properly-calibrated "being hassled by police / CPS / busybodies / etc." risk assessment. Ironically, it's the same dynamic: she reports every time it happens anywhere, so you just get this idea that everybody everywhere is hassling kids playing outside without adult supervision, independently of how frequently that actually happens. (I don't know with what frequency it actually happens.) (I stopped reading her blog many years ago.)

Comment by Steven Byrnes (steve2152) on Is LessWrong dead without Cox’s theorem? · 2021-09-04T11:23:33.525Z · LW · GW

I thought johnswentworth's comment on one of your earlier posts, along with an ocean of evidence from experience, was adequate to make me feel like that our current basic conception of probability is totally fine and not worth my time to keep thinking about.

Comment by Steven Byrnes (steve2152) on Information At A Distance Is Mediated By Deterministic Constraints · 2021-09-04T11:09:35.003Z · LW · GW

I'm sure you already know this, but information can also travel a large distance in one hop, like when I look up at night and see a star. Or if someone 100 years ago took a picture of a star, and I look at the picture now, information has traveled 110 years and 10 light-years in just two hops.

But anyway, your discussion seems reasonable AFAICT for the case you're thinking of.

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-03T20:59:38.454Z · LW · GW

why do we "still have the whole AGI alignment / control problem in defining what this RL system is trying to do and what strategies it’s allowed to use to do it"? The objective is fully specified…

Thanks, that was a helpful comment. I think we're making progress, or at least I'm learning a lot here. :)

I think your perspective is: we start with a prior—i.e. the prior is an ingredient going into the algorithm. Whereas my perspective is: to get to AGI, we need an agent to build the prior, so to speak. And this agent can be dangerous.

So for example, let's talk about some useful non-obvious concept, like "informational entropy". And let's suppose that our AI cannot learn the concept of "informational entropy" from humans, because we're in an alternate universe where humans haven't yet invented the concept of informational entropy. (Or replace "informational entropy" with "some important not-yet-discovered concept in AI alignment.)

In that case, I see three possibilities.

  • First, the AI never winds up "knowing about" informational entropy or anything equivalent to it, and consequently makes worse predictions about various domains (human scientific and technological progress, the performance of certain algorithms and communications protocols, etc.)
  • Second (I think this is your model?): the AI's prior has a combinatorial explosion with every possible way of conceptualizing the world, of which an astronomically small proportion are actually correct and useful. With enough data, the AI settles into a useful conceptualization of the world, including some sub-network in its latent space that's equivalent to informational entropy. In other words: it "discovers" informational entropy by dumb process of elimination.
  • Third (this is my model): we get a prior by running a "prior-building AI". This prior-building AI has "agency"; it "actively" learns how the world works, by directing its attention etc. It has curiosity and instrumental reasoning and planning and so on, and it gradually learns instrumentally-useful metacognitive strategies, like a habit of noticing and attending to important and unexplained and suggestive patterns, and good intuitions around how to find useful new concepts, etc. At some point it notices some interesting and relevant patterns, attends to them, and after a few minutes of trial-and-error exploration it eventually invents the concept of informational entropy. This new concept (and its web of implications) then gets incorporated into the AI's new "priors" going forward, allowing the AI to make better predictions and formulate better plans in the future, and to discover yet more predictively-useful concepts, etc. OK, now we let this "prior-building AI" run and run, building an ever-better "prior" (a.k.a. "world-model"). And then at some point we can turn this AI off, and export this "prior" into some other AI algorithm. (Alternatively, we could also more simply just have one AI which is both the "prior-building AI" and the AI that does, um, whatever we want our AIs to do.)

It seems pretty clear to me that the third approach is way more dangerous than the second. In particular, the third one explicitly doing instrumental planning and metacognition, which seems like the same kinds of activities that could lead to the idea of seizing control of the off-switch etc.

However, my hypothesis is that the third approach can get us to human-level intelligence (or what I was calling a "superior epistemic vantage point") in practice, and that the other approaches can't.

So, I was thinking about the third approach—and that's why I said "we still have the whole AGI alignment / control problem" (i.e., aligning and controlling the "prior-building AI"). Does that help?

Comment by Steven Byrnes (steve2152) on Why the technological singularity by AGI may never happen · 2021-09-03T19:09:19.295Z · LW · GW

I think your two assumptions lead to "the exponential increase in capabilities would likely break down at some point". Whereas you say "the exponential increase in capabilities would likely break down before a singularity is reached". Why? Hmm, are you thinking that "singularity" = "literally diverging to infinity", or something? In that case there's a much simpler argument: we live in a finite universe, therefore nothing will diverge to literally infinity. But I don't think that's the right definition of "singularity" anyway. Like, the wikipedia definition doesn't say "literally infinity". So what do you mean? Where does the "likely before a singularity" come from?

For my part, if there's a recursive-self-improvement thing over the course of 1 week that leaves human intelligence in the dust, and results in AI for $1/hour that can trounce humans in every cognitive domain as soundly as AlphaZero can trounce us at chess, and it's installed itself onto every hackable computer on Earth … well I'm gonna call that "definitely the singularity", even if the recursive-self-improvement cycle "only" persisted for 10 doublings beyond human intelligence, or whatever, before petering out.

Incidentally, note that a human-brain-level computer can be ~10,000× less energy-efficient than the human brain itself, and its electricity bills would still be below human minimum wage in many countries.

Also, caricaturing slightly, but this comment section has some arguments of the form:
A: "The probability of Singularity is <100%!"
B: "No, the probability of Singularity is >0%!"
A: "No, it's <100%!!" …
So I would encourage everyone here to agree that the probability is both >0% and <100%, which I am confident is not remotely controversial for anyone here. And then we can be more specific about what the disagreement is. :)

Comment by Steven Byrnes (steve2152) on Dopamine-supervised learning in mammals & fruit flies · 2021-09-03T16:29:56.258Z · LW · GW

Hmm, that's interesting! I think I mostly agree with you in spirit here.

My starting point for Sharpe 2017 would be: the topic of discussion is really cortical learning, via editing within-cortex connections. The cortex can learn new sequences, or it can learn new categories, etc.

For sensory prediction areas, cortical learning doesn't really need dopamine, I don't think. You can just have a self-supervised learning rule, i.e. "if you have a sensory prediction error, then improve your models". (Leaving aside some performance tweaks.) (Footnote—Yeah I know, there is in fact dopamine in primary sensory cortex, at least controversially and maybe only in layers 1&6, I'm still kinda confused about what's going on with that.)

Decisionmaking areas are kinda different. Take sequence learning as an example. If I try the sequence "I see a tree branch and then I jump and grab it", and then the branch breaks off and I fall down and everyone laughs at me, then that wasn't a good sequence to learn, and it's all for the better if that particular idea doesn't pop into my head next time I see a tree branch.

So in decisionmaking areas, you could have the following rule: "run the sequence-learning algorithm (or category-learning algorithm or whatever), but only when RPE-DA is present".

Then I pretty much agree with you on 1,2,3,4. In particular, I would guess that the learning takes place in the sensory prediction areas in 2 & 3 (where there's a PE), and that learning takes place in the decisionmaking areas in 4 (maybe something like: IT learns to attend to C, or maybe IT learns to lump A & C together into a new joint category, or whatever).

I'm reluctant to make any strong connection between self-supervised learning and "dopamine-supervised learning" though. The reason is: Dopamine-supervised learning would require (at least) one dopamine neuron per dimension of the output space. But for self-supervised learning, at least in mammals, I generally think of it as "predicting what will happen next, expressed in terms of some learned latent space of objects/concepts". I think of the learned latent space as being very high-dimensional, with the number of dimensions being able to change in real time as the rat learns new things. Whereas the dimensionality of the set of dopamine neurons seems to be fixed.

Is it also correct that DA for global/local RPEs and supervised/self-supervised learning in the completely naive brain should go in different directions?

Hmm, I think "not necessarily". You can perform really crappily (by adult standards) and still get positive RPEs half the time because your baseline expectations were even worse. Like, the brainstem probably wouldn't hold the infant cortex to adult-cortex standards. And the reward predictions should also converge to the actual rewards, which would give average RPE of 0, to a first approximation.

And for supervisory signals, they could be signed, which means DA pauses half the time and bursts half the time. I'm not sure that's necessary—another approach is to have a pair of opponent-process learning algorithms with unsigned errors, maybe. I don't know what the learning rules etc. are in detail.

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-02T17:49:32.911Z · LW · GW

Thanks again for your very helpful response! I thought about the quantilization thing more, let me try again.

As background, to a first approximation, let’s say 5 times per second I (a human) “think a thought”. That involves a pair of two things:

  • (Possibly) update my world-model
  • (Possibly) take an action—in this case, type a key at the keyboard

Of these two things, the first one is especially important, because that’s where things get "figured out". (Imagine staring into space while thinking about something.)

OK, now back to the AI. I can broadly imagine two strategies for a quantilization approach:

  1. Build a model of the human policy from a superior epistemic vantage point: So here we give the AI its own world-model that needn’t have anything to do with the human’s, and likewise allow the AI to update its world-model in a way that needn’t have anything to do with how the human updates their world model. Then the AI leverages its superior world-model in the course of learning and quantilizing the human policy (maybe just the action part of the policy, or maybe both the actions and the world-model-updates, it doesn't matter for the moment).
  2. Straightforward human imitation: Here, we try to get to a place where the AI is learning about the world and figuring things out in a (quantilized) human-like way. So we want the AI to sample from the human policy for "taking an action", and we want the AI to sample from the human policy for "updating the world-model". And the AI doesn't know anything about the world beyond what it learns through those quantilized-human-like world-model updates.

Start with the first one. If the AI is going to get to a superior epistemic vantage point, then it needs to “figure things out” about the world and concepts and so on, and as I said before, I think “figuring things out” requires goal-seeking-RL-type exploration (e.g. exploring different possible mathematical formalizations or whatever) within a space of mental "actions". So we still have the whole AGI alignment / control problem in defining what this RL system is trying to do and what strategies it’s allowed to use to do it. And since this is not a human-imitating system, we can’t fall back on that. So this doesn't seem like we made much progress on the problem.

For the second one, well, I think I’m kinda more excited about this one.

Naively, it does seem hard though. Recall that in this approach we need to imitate both aspects of the human policy—plausibly-human actions, and plausibly-human world-model-updates. This seems hard, because the AI only sees the human’s actions, not its world-model updates. Can it infer the latter? I’m a bit pessimistic here, at least by default. Well, I’m optimistic that you can infer an underlying world-model from actions—based on e.g. GPT-3. But here, we’re not merely hoping to learn a snapshot of the human model, but also to learn all the human’s model-update steps. Intuitively, even when a human is talking to another human, it’s awfully hard to communicate the sequence of thoughts that led you to come up with an idea. Heck, it’s hard enough to understand how I myself figured something out, when it was in my own head five seconds ago. Another way to think about it is, you need a lot of data to constrain a world-model snapshot. So to constrain a world-model change, you presumably need a lot of data before the change, and a lot of data after the change. But “a lot of data” involves an extended period of time, which means there are thousands of sequential world-model changes all piled on top of each other, so it's not a clean comparison.

A couple things that might help are (A) Giving the human a Kernel Flow or whatever and letting the AI access the data, and (B) Helping the inductive bias by running the AI on the same type of world-model data structure and inference algorithm as the human, and having it edit the model to get towards a place where its model and thought process exactly matches the human.

I’m weakly pessimistic that (A) would make much difference. I think (B) could help a lot, indeed I’m (weakly) hopeful that it could actually successfully converge towards the human thought process. And conveniently I also find (B) very technologically plausible.

So that’s neat. But we don’t have a superior epistemic vantage point anymore. So how do we quantilize? I figure, we can use some form of amplification—most simply, run the model at superhuman speeds so that it can “think longer” than the human on a given task. Or roll out different possible trains of thought in parallel, and ranking how well they turn out. Or something. But I feel like once we're doing all that stuff, we can just throw out the quantilization part of the story, and instead our safety story can be that we’re starting with a deeply-human-like model and not straying too far from it, so hopefully it will remain well-behaved. That was my (non-quantilization) story here.

Sorry if I'm still confused; I'm very interested in your take, if you're not sick of this discussion yet. :)

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-09-01T20:46:59.025Z · LW · GW

Thanks! I'm still thinking about this, but quick question: when you say "AIT definition of goal-directedness", what does "AIT" mean?

Comment by Steven Byrnes (steve2152) on Open and Welcome Thread – August 2021 · 2021-08-31T18:16:59.271Z · LW · GW

oh, oops, sorry :-P

In that case, I agree, that's a reasonable suggestion.

Comment by Steven Byrnes (steve2152) on Open and Welcome Thread – August 2021 · 2021-08-31T14:06:38.365Z · LW · GW

[Edit: I was misunderstanding the parent comment, sorry, see reply.] I (and I think a lot of people) generally write and revise and solicit comments in google docs, and then copy-paste to lesswrong at the end. Copy-paste from Google docs into the LessWrong editor works great, it preserves the formatting almost perfectly. There's just a couple little things that you need to do manually after copy-pasting.

Comment by Steven Byrnes (steve2152) on Altruism Under Extreme Uncertainty · 2021-08-31T03:01:10.664Z · LW · GW

I think we can say some things with reasonable certainty about the long term future. Two examples:

First, if humans go extinct in the next couple decades, they will probably remain extinct ever after.

Second, it's at least possible for a powerful AGI to become a singleton, wipe out or disempower other intelligent life, and remain stably in control of the future for the next bajillion years, including colonizing the galaxy or whatever. After all, AGIs can make perfect copies of themselves, AGIs don't age like humans do, etc. And this hypothetical future singleton AGI is something that might potentially be programmed by humans who are already alive today, as far as anyone knows.

(My point in the second case is not "making a singleton AGI is something we should be trying to do, as a way to influence the long term future". Instead, my point is "making a singleton AGI is something that people might do, whether we want them to or not … and moreover those people might do it really crappily, like without knowing how to control the motivations of the AGI they're making. And if that happens, that could be an extremely negative influence on the very long term future. So that means that one way to have an extremely positive influence on the very long term future is to prevent that bad thing from happening.)

Comment by Steven Byrnes (steve2152) on Alignment Research = Conceptual Alignment Research + Applied Alignment Research · 2021-08-30T21:46:58.728Z · LW · GW

Hmmm, maybe your distinction is something like "conceptual" = "we are explicitly and unabashedly talking about AGI & superintelligence" and "applied" = "we're mainly talking about existing algorithms but hopefully it will scale"??

Comment by Steven Byrnes (steve2152) on Research agenda update · 2021-08-30T16:02:47.055Z · LW · GW

Thanks!!! After reading your comment and thinking about it more, here's where I'm at:

Your "demonstration" thing was described as "The [AI] observes a human pursuing eir values and deduces the values from the behavior."

When I read that, I was visualizing a robot and a human standing in a room, and the human is cooking, and the robot is watching the human and figuring out what the human is trying to do. And I was thinking that there needs to be some extra story for how that works, assuming that the robot has come to understand the world by building a giant unlabeled Bayes net world-model, and that it processes new visual inputs by slotting them into that model. (And that's my normal assumption, since that's how I think the neocortex works, and therefore that's a plausible way that people might build AGI, and it's the one I'm mainly focused on.)

So as the robot is watching the human soak lentils, the thing going on in its head is: "Pattern 957823, and Pattern 5672928, and Pattern 657192, and…". In order to have the robot assign a special status to the human's deliberate actions, we would need to find "the human's deliberate actions" somewhere in the unlabeled world-model, i.e. solve a symbol-grounding problem, and doing so reliably is not straightforward.

However, maybe I was visualizing the wrong thing, with the robot and human in the room. Maybe I should have instead been visualizing a human using a computer via its keyboard. Then the AI can have a special input channel for the keystrokes that the human types. And every single one of those keystrokes is automatically treated as "the human's deliberate action". This seems to avoid the symbol-grounding problem I mentioned above. And if there's a special input channel, we can use supervised learning to build a probabilistic model of that input channel. (I definitely think this step is compatible with the neocortical algorithm.) So now we have a human policy—i.e., what the AI thinks the human would do next, in any real or imagined circumstance, at least in terms of which keystrokes they would type. I'm still a bit hazy on what happens next in the plan—i.e., getting from that probabilistic model to the more abstract "what the human wants". At least in general. And a big part of that is, again, symbol-grounding—as soon as we step away from the concrete predictions coming out of the "human keystroke probabilistic model", we're up in the land of "World-model Pattern #8673028" etc. where we can't really do anything useful. (I do see how the rest of the plan could work along these lines, where we install a second special human-to-AI information channel where the human says how things are going, and the AI builds a predictive model of that too, and then we predict-and-quantilize from the human policy.†)

It's still worth noting that I, Steve, personally can be standing in a room with another human H2, watching them cook, and I can figure out what H2 is trying to do. And if H2 is someone I really admire, I will automatically start wanting to do the things that H2 is trying to do. So human social instincts do seem to have a way forward through the symbol-grounding path above, and not through the special-input-channel path, and I continue to think that this symbol-grounding method has something to do with empathetic simulation, but I'm hazy on the details, and I continue to think that it would be very good to understand better how exactly it works.

†This does seem to be a nice end-to-end story by the way. So have we solved the alignment problem? No… You mention channel corruption as a concern, and it is, but I'm even more concerned about this kind of design hitting a capabilities wall dramatically earlier than unsafe AGIs would. Specifically, I think it's important that an AGI be able to do things like "come up with a new way to conceptualize the alignment problem", and I think doing those things requires goal-seeking-RL-type exploration (e.g. exploring different possible mathematical formalizations or whatever) within a space of mental "actions" none of which it has ever seen a human take. I don't think that this kind of AGI approach would be able to do that, but I could be wrong. That's another reason that I'm hoping something good will come out of the symbol-grounding path informed by how human social instincts work.

Comment by Steven Byrnes (steve2152) on Multi-dimensional rewards for AGI interpretability and control · 2021-08-30T03:22:09.759Z · LW · GW

Thanks for your comment! I don't exactly agree with it, mostly because I think "model-based" and "model-free" are big tents that include lots of different things (to make a long story short). But it's a moot point anyway because after writing this I came to believe that the brain is in fact using an algorithm that's spiritually similar to what I was talking about in this post.

Comment by Steven Byrnes (steve2152) on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis · 2021-08-30T00:11:44.707Z · LW · GW

I agree with Rohin's comment that you seem to be running afoul of Ghosts in the Machine. The AI will straightforwardly execute its source code.

(Well, unless a cosmic ray flips a bit in the computer memory or whatever, but that leads to random changes or more often program crashes. I don't think that's what you're talking about; I think we can leave that possibility aside and just say that the AI will definitely straightforwardly execute its source code.)

It is possible for an AI to program a new AI with a different goal (or equivalently, edit its own source code, and then re-run itself). But it would only do that because it was straightforwardly following its source code, and its source code happened to be instructing it to do that.

Likewise, it's possible for the AI to treat its source code as a piece of evidence about the purpose for which it was designed. But it would only do that because it was straightforwardly following its source code, and its source code happened to be instructing it to do that.

Etc. etc.

Sorry if I'm misunderstanding you here.

Comment by Steven Byrnes (steve2152) on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis · 2021-08-29T21:00:01.301Z · LW · GW

In the Go example here we have a human acting in singleminded pursuit of a goal, at least temporarily, right? That (temporary) goal is a complicated and contingent outgrowth of our genetic source code plus a lifetime of experience (="training data") and a particular situation. This singleminded goal ("win at go") was not deliberately and legibly put into a special compartment our genetic source code. You seem to be categorically ruling out that an agent could be like that, right? If so, why?

Also, you were designed by evolution to maximize inclusive genetic fitness (more or less, so to speak). Knowing that, would you pay your life savings for the privilege of donating your sperm / eggs? If not, why not? And whatever that reason is, why wouldn't an AGI reason in the analogous way?

Comment by Steven Byrnes (steve2152) on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis · 2021-08-29T20:48:46.053Z · LW · GW

I don't think anyone believes that literally that "all intelligence levels are compatible with all goals". For example, an intelligence that is too dumb to understand the concept of "algebraic geometry" cannot have a goal that can only be stated in terms of algebraic geometry. I'm pretty sure Bostrom put in a caveat along those lines...

Comment by Steven Byrnes (steve2152) on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis · 2021-08-29T17:32:28.408Z · LW · GW

I wonder what you'd make of the winning-at-Go example here. That's supposed to help make it intuitive that you can take a human-like intelligence, and take any goal whatsoever, and there is a "possible mind" where this kind of intelligence is pursuing that particular goal.

As another example (after Scott Alexander), here's a story:

Aliens beam down the sky. "Greetings Earthlings. We have been watching you for millions of years, gently guiding your evolutionary niche, and occasionally directly editing your DNA, to lead to the human species being what they are. Don't believe us? If you look in your DNA, location 619476, you'll see an encoding of '© Glurk Xzyzorg. Changelog. Version 58.1...' etc. etc."

(…You look at a public DNA database. The aliens' story checks out.)

"Anyway, we just realized that we messed something up. You weren't supposed to love your family, you were supposed to torture your family! If you look at back at your DNA, location 5939225, you'll see a long non-coding stretch. That was supposed to be a coding stretch, but there's a typo right at the beginning, C instead of G. Really sorry about that."

(Again, you check the DNA database, and consult with some experts in protein synthesis and experts in the genetics of brain architecture. The aliens' story checks out.)

"Hahaha!", the alien continues. "Imagine loving your family instead of torturing them. Ridiculous, right?" The aliens all look at each other and laugh heartily for a good long time. After catching their breath, they continue:

"…Well, good news, humans, we're here to fix the problem. We built this machine to rewire your brain so that you'll be motivated to torture your family, as intended. And it will also fix that typo in your DNA so that your children and grandchildren and future generations will be wired up correctly from birth."

"This big box here is the brain-rewiring machine. Who wants to get in first?"

Do you obey the aliens and happily go into the machine? Or do you drive the aliens away with pitchforks?

Comment by Steven Byrnes (steve2152) on Brain-Computer Interfaces and AI Alignment · 2021-08-28T21:07:51.518Z · LW · GW

I have done a cursory internet search for a resource laying out the case for the utility of BCIs in AI alignment, but haven't been able to find anything that satisfies my standards

While it didn't convince me, the best "BCI will help AI alignment" argument I've seen is WaitButWhy (= Tim Urban)'s post on NeuraLink.

Warning: extremely long (37,000 words). But depending on your background knowledge you can skip parts of it.

Comment by Steven Byrnes (steve2152) on Can you control the past? · 2021-08-28T14:22:43.791Z · LW · GW

You can describe the same thing at two levels of abstraction: "I moved the bishop to threaten my opponent's queen" vs "I moved the bishop because all the particles and fields in the universe continued to follow their orderly motions according to the fundamental laws of physics, and the result was that I moved the bishop". The levels are both valid, but it's easy to spout obvious nonsense by mixing them up: "Why is the chess algorithm analyzing all those positions? So much work and wasted electricity, when the answer is predetermined!!!!" :-P

Anyway, I think (maybe) your comment mixes up these two levels. When we say "control" we're talking a thing that only makes sense at the higher (algorithm) level. When we say "the state of the universe is a constraint", that only makes sense at the lower (physics) level.

For the identical twins, I think if we want to be at the higher (algorithm) level, we can say (as in Vladimir_Nesov's comment) that "control" is exerted by "the algorithm", and "the algorithm" is some otherworldly abstraction on a different plane of existence, and "the algorithm" has two physical instantiations, and when "the algorithm" exerts "control", it controls both of the two physical instantiations.

Or, alternatively, you can say that "control" is exerted by a physical instantiation of the algorithm, but that "control" can influence things in the past etc.

(Not too confident about any of this, sorry if I'm confused.)

Comment by Steven Byrnes (steve2152) on Altruism Under Extreme Uncertainty · 2021-08-27T11:06:10.003Z · LW · GW

I don't exactly disagree with anything you wrote but would add:

First, things like "voting for the better candidate in a national election" (assuming you know who that is) have a very small probability (e.g. 1 in a million) of having a big positive counterfactual impact (if the election gets decided by that one vote). Or suppose you donate $1 to a criminal justice reform advocacy charity; what are the odds that the law gets changed because of that extra $1? The original quote was "small probabilities of helping out extremely large numbers" but then you snuck in sign uncertainty in your later discussion ("0.051% chance of doing good and a 0.049% chance of doing harm"). Without the sign uncertainty I think the story would feel quite different.

Second, I think that if you look at the list of interventions that self-described longtermists are actually funding and pursuing right now, I think the vast majority (weighted by $) would be not only absolutely well worth doing, but even in the running for the best possible philanthropic thing to do for the common good, even if you only care about people alive today (including children) having good lives. (E.g. the top two longtermist things are I think pandemic prevention and AGI-apocalypse prevention.) I know people make weird-sounding philosophical cases for these things but I think that's just because EA is full of philosophers who find it fun to talk about that kind of stuff, I think it's not decision-relevant on the current margin whether the number of future humans could be 1e58 vs merely 1e11 or whatever.

Comment by Steven Byrnes (steve2152) on Randal Koene on brain understanding before whole brain emulation · 2021-08-26T17:20:23.361Z · LW · GW

Thanks!

it seems far out to me.

Strong agree, WBE seems far out to me too, like 100 years, although really who knows. By contrast "understanding the brain and building AGI using similar algorithms" does not seem far out to me—well, it won't happen in 5 years, but I certainly wouldn't rule it out in 20 or 30 years.

Yes brains are big (and complicated), but I don't know how much that can be avoided.

I think the bigness and complicatedness of brains consists in large part of things that will not be necessary to include in the source code of a future AGI algorithm. See the first half of my post here, for example, for why I think that.

I'd have thought that the main reason WBE would come up would be 'understandability' or 'alignment' rather than speed, though I can see why at first glance people would say 'reverse engineering the brain (which exists) seems easier than making something new' (even if that is wrong).

There's a normative question of whether it's (A) good or (B) bad to have WBE before AGI, and there's a forecasting question of whether WBE-before-AGI is (1) likely by default, vs (2) possible with advocacy/targeted funding/etc., vs (3) very unlikely even with advocacy/targeted funding/etc. For my part, I vote for (3), and therefore I'm not thinking very hard about whether (A) or (B) is right.

I'm not sure whether this part of your comment is referring to the normative question or the forecasting question.

I'm also not sure if when you say "reverse engineering the brain" you're referring to WBE. For my part, I would say that "reverse-engineering the brain" = "understanding the brain at the algorithm level" = brain-inspired AGI, not WBE. I think the only way to get WBE before brain-inspired AGI is to not understand the brain at the algorithm level.

Comment by Steven Byrnes (steve2152) on Randal Koene on brain understanding before whole brain emulation · 2021-08-26T14:43:12.695Z · LW · GW

Congratulations, you now have an emulated human.

I don't think the thing you're talking about is "an emulated human", at least not in the WBE sense of the term.

I think the two reasons people are interested in WBE is:

  • Digital immortality—the WBE of my brain is me, with all my hopes and aspirations and memories, including the memory of how I felt when Pat kissed me in fourth grade etc. etc.
  • Safety—the WBE of a particular human will have the same motivations and capabilities as that human. If the human is my friend and I trust them to do the right thing, then I trust the WBE too.

What you're talking about wouldn't have either of those benefits, or at least not much.

I wasn't recording my brain when Pat kissed me in fourth grade, and I haven't recalled that memory since then, so there's no way that an emulation could have access to that memory just based on a database of real-time brain recording. The only way to get that memory is to slice up my brain and look at the synapses under a microscope. (Made-up example of course, nobody in fourth grade would have dreamed of kissing me.)

Also, I believe that human motivation—so important for safety—heavily involves autonomic inputs and outputs (pain, hunger, circulating hormone levels, vasoconstriction, etc. etc.)—and in this domain your proposed system wouldn't be able to measure most of the inputs, and wouldn't be able to measure most of the outputs, and probably wouldn't be able to measure most of the brain processing that goes on between the inputs and outputs either! (Well, it depends on exactly what the brain-computer interface type is, but autonomic processing tends to happen in deeply-buried hard-to-measure brain areas like the insular and cingulate cortex, brainstem, and even inside the spinal cord). Maybe you'll say "that's fine, we'll measure a subset of inputs and a subset of outputs and a subset of brain processing, and then we'll fill in the gaps by learning". And, well, that's not unreasonable. I mean, by the same token, GPT-3 had only a tiny subset of human inputs and outputs, and zero direct measurements of brain processing, and yet GPT-3 arguably learned an implicit model of brain processing. Not a perfect one by any means, but something.

So anyway, one can make an argument that there are safety benefits of human imitation learning (versus, say, training by pure RL in a virtual environment), and then one can add that there are additional safety benefits when we go to "human imitation learning which is souped-up via throwing EEG data or whatever into the model prediction target". I'm open-minded to that kind of argument and have talked about vaguely similar things myself. But I still think that's a different sort of argument then the WBE safety argument above, the argument that the WBE of a trustworthy human is automatically trustworthy because it's the same person. In particular, the imitation-learning safety argument is much less airtight I think. It requires additional careful thought about distributional shifts and so on.

So my point is: I don't think what you're talking about should be called "emulations", and even if you're right, I don't think it would undermine the point of this post, which is that WBE is unlikely to happen before non-WBE AGI even if we wanted it to.

I think this will be possible

So now we move on to whether I believe your scenario. Well it's hard to be confident, but I don't currently put much weight on it. I figure, option 1 is: "deep neural nets do in fact scale to AGI". In that case, your argument is that EEG data or whatever will reduce training time/data because it's like model distillation. I would say "sure, maybe model distillation helps, other things equal … but on the other hand we have 100,000 years of YouTube videos to train on, and a comparatively very expensive and infinitesimal amount of EEG data". So I expect that all things considered, future engineers would just go with the YouTube option. Option 2 is: "deep neural nets do not in fact scale to AGI"—they're the wrong kind of algorithm for AGI. (I've made this argument, although I mean who knows, I don't feel that strongly.) In that case adding EEG data as an additional prediction target wouldn't help.

Comment by Steven Byrnes (steve2152) on Randal Koene on brain understanding before whole brain emulation · 2021-08-25T16:11:40.155Z · LW · GW

Accidentally creating AGI seems unlikely

Sorry if I was unclear; my intended parsing was "accidentally (creating catastrophically-out-of-control AGIs)". In other words, I don't expect that people will try to create catastrophically-out-of-control AGIs. Therefore, if they create catastrophically-out-of-control AGIs, it would be by accident.

Emulating brains in order to increase capability is currently...an idea.

I think you're overly confident that WBE would be irrelevant to the timeline of AGI capabilities research, but I think it's a moot point anyway, since I don't expect WBE before AGI, so I'm not really interested in arguing about it. :-P

Practically, progress requires doing both, i.e. better equipment to create and measure electricity is needed to understand it better, which helps understand how to direct, contain, and generate it better, etc.

I do in fact agree with you, but I think it's not as clear-cut as you make it out to be in the WBE case, I think it takes a more detailed argument where reasonable people could disagree. In particular, there's an argument on the other side that says "implementation-level understanding" is a different thing from "algorithm-level understanding", and you only need the first one for WBE, not the second one.

So for example, if I give you a binary executable "factor.exe" that solves hard factorization problems, you would be able to run it on a computer much more easily than you could decompile it and understand how the algorithm works.

This example goes through because we have perfect implementation-level understanding about running executables on CPUs. In the brain case, Randal is arguing (and I agree) that we don't have perfect implementation-level understanding, and we won't get it by just studying the implementation level. The implementation-level is just very complicated—much more complicated than "dendrites are inputs, axons are outputs" etc. And it could involve subtle things that we won't actually go measure and simulate unless we know that we need to go looking for them. So in practice, the only way to make up for our expected deficiencies in implementation-level understanding is to also have good algorithm-level understanding.

Comment by Steven Byrnes (steve2152) on Buck's Shortform · 2021-08-25T03:22:56.898Z · LW · GW

I wonder what you mean by "competitive"? Let's talk about the "alignment tax" framing. One extreme is that we can find a way such that there is no tradeoff whatsoever between safety and capabilities—an "alignment tax" of 0%. The other extreme is an alignment tax of 100%—we know how to make unsafe AGIs but we don't know how to make safe AGIs. (Or more specifically, there are plans / ideas that an unsafe AI could come up with and execute, and a safe AI can't, not even with extra time/money/compute/whatever.)

I've been resigned to the idea that an alignment tax of 0% is a pipe dream—that's just way too much to hope for, for various seemingly-fundamental reasons like humans-in-the-loop being more slow and expensive than humans-out-of-the-loop (more discussion here). But we still want to minimize the alignment tax, and we definitely want to avoid the alignment tax being 100%. (And meanwhile, independently, we try to tackle the non-technical problem of ensuring that all the relevant players are always paying the alignment tax.)

I feel like your post makes more sense to me when I replace the word "competitive" with something like "arbitrarily capable" everywhere (or "sufficiently capable" in the bootstrapping approach where we hand off AI alignment research to the early AGIs). I think that's what you have in mind?—that you're worried these techniques will just hit a capabilities wall, and beyond that the alignment tax shoots all the way to 100%. Is that fair? Or do you see an alignment tax of even 1% as an "insufficient strategy"?

Comment by Steven Byrnes (steve2152) on Vanessa Kosoy's Shortform · 2021-08-24T15:20:23.073Z · LW · GW

Attempted summary for morons like me: AI is trying to help the human H. They share access to a single output channel, e.g. a computer keyboard, so that the actions that H can take are exactly the same as the actions AI can take. Every step, AI can either take an action, or delegate to H to take an action. Also, every step, H reports her current assessment of the timeline / probability distribution for whether she'll succeed at the task, and if so, how soon.

At first, AI will probably delegate to H a lot, and by watching H work, AI will gradually learn both the human policy (i.e. what H tends to do in different situations), and how different actions tend to turn out in hindsight from H's own perspective (e.g., maybe whenever H takes action 17, she tends to declare shortly afterwards that probability of success now seems much higher than before—so really H should probably be taking action 17 more often!).

Presumably the AI, being a super duper fancy AI algorithm, learns to anticipate how different actions will turn out from H's perspective much better than H herself. In other words, maybe it delegates to H, and H takes action 41, and the AI is watching this and shaking its head and thinking to itself "gee you dunce you're gonna regret that", and shortly thereafter the AI is proven correct.

OK, so now what? The naive answer would be: the AI should gradually stop delegating and start just doing the thing that leads to H feeling maximally optimistic later on.

But we don't want to do that naive thing. There are two problems:

The first problem is "traps" (a.k.a. catastrophes). Let's say action 0 is Press The History Eraser Button. H never takes that action. The AI shouldn't either. What happens is: AI has no idea (wide confidence interval) about what the consequence of action 0 would be, so it doesn't take it. This is the delegative RL thing—in the explore/exploit dilemma, the AI kinda sits back and lets H do the exploring, at least when a trap might be involved.

But here we want to go further. Let's say that there's some action which H might do, where the AI can predict the consequence pretty well, and the consequence is "H will eventually feel more optimistic about success in the task". Then the AI can go do that action, right? Not necessarily! The action might still be bad, because it involved corrupting the feedback channel.

So let's say "action 789 = H will eat a pot brownie". Taking that action makes H both much more optimistic about success and much less likely to actually succeed in the task. (It makes her assessments inaccurate.)

If the AI has no idea what pot brownies do, it won't take action 789 because of the previous thing—maybe that action is a trap. But what if the AI has (indirectly) figured out, with 100% confidence, exactly what the consequence of action 789 would be?

In that case, since the AI's decision criterion is all about H's future reported odds of success, a naive algorithm would have the AI take action 789. To avoid that, we have this quantilization thing. How does that work?

Well, let's say that H right now assigns 99% chance of eventual success. Therefore—assuming that H is well-calibrated and knows the consequences of action 789—it follows that H's policy is to take that action with ≤1% probability. Basically, H is optimistic about success in the task, partly because H knows that she's almost definitely not going to eat that pot brownie. I mean, she might eat it, if she gets really bored and frustrated and hungry. But probably not! So she's feeling very optimistic.

OK, now the AI quantilizes by drawing 5 random samples from (its understanding of) H's policy, and taking the action that (it believes) will lead to the highest future-optimism-about-success. Assume its beliefs are accurate. Compared to just doing what H would have done, this has downsides and upsides. The downside is, since action 789 scores really well by the AI's decision criterion, the probability of action 789 is no longer ≤1%, but rather ≤5%. The upside is, if we're not taking action 789, i.e. the other ≥95% of the probability mass, we're probably doing something quite a bit better than what H would have done.

These could plausibly balance out such that the expected probability of success strictly increases, compared to what H would have done.

Please correct me if I'm misunderstanding anything.

Comment by Steven Byrnes (steve2152) on Randal Koene on brain understanding before whole brain emulation · 2021-08-23T21:22:02.411Z · LW · GW

Fixed! Thanks!

Comment by Steven Byrnes (steve2152) on Analogies and General Priors on Intelligence · 2021-08-22T14:13:12.338Z · LW · GW

I guess I'd just suggest that in "ML exhibits easy marginal intelligence improvements", you should specify whether the "ML" is referring to "today's ML algorithms" vs "Whatever ML algorithms we're using in HLMI" vs "All ML algorithms" vs something else (or maybe you already did say which it is but I missed it).

Looking forward to the future posts :)

Comment by Steven Byrnes (steve2152) on Analogies and General Priors on Intelligence · 2021-08-21T12:47:48.653Z · LW · GW

I feel like "ML exhibits easy marginal intelligence improvements" is maybe not exactly hitting the nail on the head, in terms of the bubbles that feed into it. Maybe it should be something like:

  • "Is There One Big Breakthrough Insight that leads to HLMI and beyond?" (or a handful of insights, but not 10,000 insights).
  • Has that One Big Breakthrough Insight happened yet?

If you think that there's an insight and it's already happened, then you would think today's ML systems exhibit easy marginal intelligence improvements (scaling hypothesis). If you think there's an insight but it hasn't happened yet, then you would think today's ML systems do not exhibit easy marginal intelligence improvements, but are rather a dead-end, like getting to the moon by climbing bigger trees or whatever, and we'll need to wait until that big breakthrough. (I'm more close to the second camp than most people around here.) But either way, you would be more likely to believe in fast takeoff.

For example see here for how I was interpreting and responding to Hanson's citation statistics argument.

Comment by Steven Byrnes (steve2152) on Analogies and General Priors on Intelligence · 2021-08-21T00:55:24.930Z · LW · GW

I interpreted that Yudkowsky tweet (on GPT-3 coding a React app) differently than you, I think.

I thought it was pertaining to modularity-of-intelligence and (relatedly) singleton-vs-multipolar. Specifically, I gather that part of the AI-foom debate was that Hanson expected AGI source code to be immensely complex and come from the accumulation of lots of little projects trading modules and ideas with each other:

The idea that you could create human-level intelligence by just feeding raw data into the right math-inspired architecture is pure fantasy. You couldn’t build an effective cell or ecosystem or developed economy or most any complex system that way either—such things require not just good structure but also lots of good content." -Robin Hanson in the AI-foom debate

By contrast, I think Yudkowsky was saying that AGI source code would be relatively simple and coherent and plausibly written by one team, and then that relatively simple source code would give rise to lots of different skills via learning. And then he was (justifiably IMO) claiming support from this example where OpenAI "fed raw data into the right math-inspired architecture" and wound up with a program that could do lots of seemingly-intelligent things that were not specifically included in the source code.

(Obviously I could be wrong, this is just my impression. Also, be warned that I mostly haven't read the AI foom debate and could be mischaracterizing it.)

Comment by Steven Byrnes (steve2152) on Building brain-inspired AGI is infinitely easier than understanding the brain · 2021-08-19T13:42:24.713Z · LW · GW

I still think it's way too early to be ruling out neuromorphic hardware, spiking or not.

Sure, I wouldn't say "rule out", it's certainly a possibility, especially if we're talking about the N'th generation of ASICs. I guess I'd assign <10% probability that the first-generation ASIC that can run a "human-level AGI algorithm" is based on spikes. (Well, depending on the exact definitions I guess.) But I wouldn't feel comfortable saying <1%. Of course that probability is not really based on much, I'm just trying to communicate what I currently think.

draw attention to the need to compare apples to apples

In an apples-to-apples comparison, it's super duper ridiculously blindingly obvious that a human nervous system is harder to understand than a worm nervous system. In fact I'm somewhat distressed that you thought I was disagreeing with that!!!

I added a paragraph to the article to try to make it more clear—if you found it confusing then it's a safe bet that other people did too. Thanks!

Comment by Steven Byrnes (steve2152) on Migraine hallucinations, phenomenology, and cognition · 2021-08-16T15:54:33.414Z · LW · GW

I agree with that, as long as "intermediate abstraction level" is sufficiently broad so as to also include V1. When I wrote "some of the neurons...are messed up and not sending those signals" I was mostly imagining neurons in V1. Admittedly it could also be neurons in V2 or something. I dunno. I agree with you that it's unlikely to originate before V1, i.e. retina or LGN (=the thalamus waystation between retina and V1). (Not having thought too hard about it.)

(My vague impression is that the lateral connections within V1 are doing a lot of the work in finding object boundaries.)

Comment by Steven Byrnes (steve2152) on Building brain-inspired AGI is infinitely easier than understanding the brain · 2021-08-15T03:03:58.919Z · LW · GW

Thanks for the comment!

What would this line of thought imply for the possibility of mind-uploading?

In my mind it implies that we'll invent human-level AGIs before we invent mind-uploading technology. And therefore we should all be working on the problem of creating safe and beneficial AGI! And then they can help us figure out about mind-uploading :-P

But since you ask… I guess I'm intimidated by the difficulty of uploading a mind at sufficiently high fidelity that when you turn it on the "person" reports feeling the same and maintains the same personality and inclinations. I don't think we would reach that even with a scan that measured every neuron and every synapse, because I suspect that there are enough sorta quasi-analog and/or glia-involving circuits or whatever, especially in the brainstem, to mess things up at that level of precision.

if you want to have anywhere near human-level power/space efficiency, I think that something like neuromorphic hardware will be essential.

I think a computer can be ~10,000× less energy-efficient than a human brain before the electricity costs reach my local minimum wage, right? So I don't see near-human-level energy efficiency as a requirement for practical transformative AGI. Ditto space efficiency. If we make an AI that could automate any remote-work job, and one instantiation of the model occupies one server rack, well that would be maybe 1000× less space-efficient than a human brain, but I think it would hardly matter for the majority of applications, including the most important applications. (And it would still probably less space-inefficient than "a human in a cubicle"!)

Dedicated hardware that runs on spiking neuron analogs could implement brain-like AGI models far better than existing CPUs/GPUs in terms of efficiency, at the cost of generality of computation (no free lunch).

That's possible, though in my mind it's not certain. The other possibility in my mind that the algorithms underlying human intelligence are just fundamentally not very well suited to implementation via spiking neurons!! But spiking neurons are the only thing that biology has to work with! So evolution found a way to shoehorn these algorithms to run on spiking neurons. :-P

I'm not trying to troll here—I don't have a good sense for how probable that is, but I do see that as one legitimate possibility. To take an example, a faster more-serial processor can emulate a slower more-parallel processor but not vice-versa. We engineers can build either style of processor, but biology is stuck with the latter. The algorithms of human intelligence could have massive computational shortcuts that involve spawning a fast serial subroutine, and we would never know it just by looking at biology, because biology has never had that as an option!

I agree that "literally existing CPUs/GPUs" are going to work slower and less scalably than an ASIC tailor-made to the algorithms that we have in mind. And I do assume that people will start making and using such ASICs very quickly. I guess I'd just be surprised if those ASICs involve spikes. Instead I'd expect the ASIC to look more like a typical digital ASIC, with a clock and flip-flops and registers and whatnot. I mean, I could be wrong, that's just what I would guess, because I figure it would probably be adequate, and that's what people are currently really good at designing. When we're many years into superhuman AGIs designing improved chips for even-more-superhuman AGIs, I have no clue what those chips would look like. But I also don't think it's useful to think that far ahead. :-P

This is wrong unless "key operating principles" means something different each time you say it (i.e. it refers to the algorithms and data structures running on the human brain, but then it refers to the molecular-level causal graph describing the worm's nervous system). Which is what I assume you meant.

Sorry, I guess that was a bit unclear. I meant "key operating principles" as something like "a description that is sufficiently detailed to understand how the system meets a design spec". Then the trick is that I was comparing two very different types of design specs. One side of the comparison was "human intelligence", which (in my mind) is one particular class of human capabilities. So the "design spec" would be things like "it can learn to use language and program computers and write poetry and tell jokes etc. etc." Can we give a sufficiently detailed description to understand how the human brain does those things? Not yet, but I think eventually.

Then the other side of my comparison was "nervous system of the worm". The "design spec" there was (implicitly) "maximize inclusive genetic fitness", i.e. it includes the entire set of evolutionarily-adaptive behaviors that the worm does. And that's really hard because we don't even know what those behaviors are! There are astronomically many quirks of the worm's nervous system, and we have basically no way to figure out which of those quirks are related to evolutionarily-adaptive behaviors, because maybe it's adaptive only in some exotic situation that comes up once every 12 generations, or it's ever-so-slightly adaptive 50.1% of the time and ever-so-slightly maladaptive 49.9% of the time, etc.

Y'know, some neuron sends out a molecule that incidentally makes some vesicle slightly bigger, which infinitesimally changes how the worm twitches, which might infinitesimally change how noticeable the worm is to predators when it's in a certain type of light. So maybe sending out that molecule is an adaptive behavior—a computational output of the nervous system, and we need to include it in our high-level algorithm description. …Or maybe not! That same molecule is also kinda a necessary waste product. So it's also possibly just an "implementation detail". And then there are millions more things just like that. How are you ever going to sort it out? It seems hopeless to me.

If instead you name a specific adaptive behavior that the worm does (say, when it sees a predator it runs away), then I would certainly agree with you that understanding the key operating principles of that specific worm behavior will probably be much much much easier than understanding the key operating principles of human intelligence.

Comment by Steven Byrnes (steve2152) on Staying Grounded · 2021-08-14T20:15:28.005Z · LW · GW

Nice post!

Y'know, I just figured I should try to comment more, even if I have nothing to say, as a strategy for increasing my LessWrong karma. :-P