[U.S. specific] PPP: free money for self-employed & orgs (time-sensitive) 2021-01-09T19:53:09.088Z
Multi-dimensional rewards for AGI interpretability and control 2021-01-04T03:08:41.727Z
Conservatism in neocortex-like AGIs 2020-12-08T16:37:20.780Z
Supervised learning in the brain, part 4: compression / filtering 2020-12-05T17:06:07.778Z
Inner Alignment in Salt-Starved Rats 2020-11-19T02:40:10.232Z
Supervised learning of outputs in the brain 2020-10-26T14:32:54.061Z
"Little glimpses of empathy" as the foundation for social emotions 2020-10-22T11:02:45.036Z
My computational framework for the brain 2020-09-14T14:19:21.974Z
Emotional valence vs RL reward: a video game analogy 2020-09-03T15:28:08.013Z
Three mental images from thinking about AGI debate & corrigibility 2020-08-03T14:29:19.056Z
Can you get AGI from a Transformer? 2020-07-23T15:27:51.712Z
Selling real estate: should you overprice or underprice? 2020-07-20T15:54:09.478Z
Mesa-Optimizers vs “Steered Optimizers” 2020-07-10T16:49:26.917Z
Gary Marcus vs Cortical Uniformity 2020-06-28T18:18:54.650Z
Building brain-inspired AGI is infinitely easier than understanding the brain 2020-06-02T14:13:32.105Z
Help wanted: Improving COVID-19 contact-tracing by estimating respiratory droplets 2020-05-22T14:05:10.479Z
Inner alignment in the brain 2020-04-22T13:14:08.049Z
COVID transmission by talking (& singing) 2020-03-29T18:26:55.839Z
COVID-19 transmission: Are we overemphasizing touching rather than breathing? 2020-03-23T17:40:14.574Z
SARS-CoV-2 pool-testing algorithm puzzle 2020-03-20T13:22:44.121Z
Predictive coding and motor control 2020-02-23T02:04:57.442Z
On unfixably unsafe AGI architectures 2020-02-19T21:16:19.544Z
Book review: Rethinking Consciousness 2020-01-10T20:41:27.352Z
Predictive coding & depression 2020-01-03T02:38:04.530Z
Predictive coding = RL + SL + Bayes + MPC 2019-12-10T11:45:56.181Z
Thoughts on implementing corrigible robust alignment 2019-11-26T14:06:45.907Z
Thoughts on Robin Hanson's AI Impacts interview 2019-11-24T01:40:35.329Z
steve2152's Shortform 2019-10-31T14:14:26.535Z
Human instincts, symbol grounding, and the blank-slate neocortex 2019-10-02T12:06:35.361Z
Self-supervised learning & manipulative predictions 2019-08-20T10:55:51.804Z
In defense of Oracle ("Tool") AI research 2019-08-07T19:14:10.435Z
Self-Supervised Learning and AGI Safety 2019-08-07T14:21:37.739Z
The Self-Unaware AI Oracle 2019-07-22T19:04:21.188Z
Jeff Hawkins on neuromorphic AGI within 20 years 2019-07-15T19:16:27.294Z
Is AlphaZero any good without the tree search? 2019-06-30T16:41:05.841Z
1hr talk: Intro to AGI safety 2019-06-18T21:41:29.371Z


Comment by steve2152 on TurnTrout's shortform feed · 2021-01-23T20:33:45.220Z · LW · GW

Happy to have (maybe) helped! :-)

Comment by steve2152 on Poll: Which variables are most strategically relevant? · 2021-01-23T20:32:59.082Z · LW · GW

How dependent is the AGI on idiosyncratic hardware? While any algorithm can run on any hardware, in practice every algorithm will run faster and more energy-efficiently on hardware designed specifically for that algorithm. But there's a continuum from "runs perfectly fine on widely-available hardware, with maybe 10% speedup on a custom ASIC" to "runs a trillion times faster on a very specific type of room-sized quantum computer that only one company on earth has figured out how to make".

If your AGI algorithm requires a weird new chip / processor technology to run at a reasonable cost, it makes it less far-fetched (although still pretty far-fetched I think) to hope that governments or other groups could control who is running the AGI algorithm—at least for a couple years until that chip / processor technology is reinvented / stolen / reverse-engineered—even when everyone knows that this AGI algorithm exists and how the algorithm works.

Comment by steve2152 on TurnTrout's shortform feed · 2021-01-23T20:03:32.860Z · LW · GW

Me too!

Comment by steve2152 on My computational framework for the brain · 2021-01-23T14:09:46.049Z · LW · GW

UPDATE: Stephen Grossberg is great! I just listened to his interview on the "Brain Inspired" podcast. His ideas seem to fit right in with the ideas I had already been latching onto. I'll be reading more. Thanks again for the tip!

Comment by steve2152 on Poll: Which variables are most strategically relevant? · 2021-01-22T18:21:22.673Z · LW · GW

public sympathy vs dehumanization? ... Like, people could perceive AI algorithms as they do now (just algorithms), or they could perceive (some) AI algorithms as deserving of rights and sympathies like they and their human friends are. Or other possibilities, I suppose. I think it would depend strongly on the nature of the algorithm, as well as on superficial things like whether there are widely-available AI algorithms with cute faces and charismatic, human-like personalities, and whether the public even knows that the algorithm exists, as well as random things like how the issue gets politicized and whatnot. A related issue is whether the algorithms are actually conscious, capable of suffering, etc., which would presumably feed into public perceptions, as well as (presumably) mattering in its own right.

Comment by steve2152 on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-21T17:38:11.713Z · LW · GW

I've been working on a draft blog post kinda related to that, if you're interested in I can DM you a link, it could use a second pair of eyes.

Comment by steve2152 on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-21T14:31:54.230Z · LW · GW

humans have a tiny horizon length

What do you mean by horizon length here?

Comment by steve2152 on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-19T12:14:47.220Z · LW · GW

"automated search"?

Comment by steve2152 on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-18T16:30:15.856Z · LW · GW

Moreover, we probably won’t figure out how to make AIs that are as data-efficient as humans for a long time--decades at least.

I know you weren't endorsing this claim as definitely true, but FYI my take is that other families of learning algorithms besides deep neural networks are in fact as data-efficient as humans, particularly those related to probabilistic programming and analysis-by-synthesis, see examples here.

Comment by steve2152 on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-18T16:14:12.637Z · LW · GW

Related to one aspect of this: my post Building brain-inspired AGI is infinitely easier than understanding the brain

Comment by steve2152 on Eli's shortform feed · 2021-01-15T12:33:32.610Z · LW · GW

I haven't seen such a document but I'd be interested to read it too. I made an argument to that effect here:

(Well, a related argument anyway. WBE is about scanning and simulating the brain rather than understanding it, but I would make a similar argument using "hard-to-scan" and/or "hard-to-simulate" things the brain does, rather than "hard-understand" things the brain does, which is what I was nominally blogging about. There's a lot of overlap between those anyway; the examples I put in mostly work for both.)

Comment by steve2152 on TurnTrout's shortform feed · 2021-01-14T22:18:42.202Z · LW · GW

Oh, hmm, I guess that's fair, now that you mention it I do recall hearing a talk where someone used "Occam's razor" to talk about the solomonoff prior. Actually he called it "Bayes Occam's razor" I think. He was talking about a probabilistic programming algorithm.

That's (1) not physics, and (2) includes (as a special case) penalizing conjunctions, so maybe related to what you said. Or sorry if I'm still not getting what you meant

Comment by steve2152 on TurnTrout's shortform feed · 2021-01-14T22:05:06.440Z · LW · GW

I agree with the principle but I'm not sure I'd call it "Occam's razor". Occam's razor is a bit sketchy, it's not really a guarantee of anything, it's not a mathematical law, it's like a rule of thumb or something. Here you have a much more solid argument: multiplying many probabilities into a conjunction makes the result smaller and smaller. That's a mathematical law, rock-solid. So I'd go with that...

Comment by steve2152 on Can you get AGI from a Transformer? · 2021-01-14T16:10:48.438Z · LW · GW

However, I do hope to make some justifiable case below for transformers being able to scale in the limit to an AGI-like model (i.e. which was an emphatic “no” from you) 

I don't feel emphatic about it. Well, I have a model in my head and within that model a transformer can't scale to AGI, and I was describing that model here, but (1) I'm uncertain that that model is the right way to think about things, (2) even if it is, I don't have high confidence that I'm properly situating transformers within that model†, (3) even if I am, there is a whole universe of ways to take a Transformer architecture and tweak / augment it—like hook it up to a random access memory or tree search or any other data structure or algorithm, or give it more recurrency, or who knows what else—and I haven't thought through all those possibilities and would not be shocked if somewhere in that space was a way to fill in what I see as the gaps.

† The paper relating Hopfield networks to transformers came out shortly after I posted this, and seems relevant to evaluating my idea that transformer networks are imitating some aspects of probabilistic programming / PGM inference, but I'm not sure, I haven't really digested it.

the transformer is being “guided” to do implicit meta-learning ... I argue that such zero-shot learning on an unseen task T requires online-learning on that task, which is being described in the given context.

I'm confused about how you're using the terms "online learning" and "meta-learning" here.

I generally understand "online learning" in the sense of this, where you're editing the model weights during deployment by doing gradient descent steps for each new piece of labeled data you get. If you're generating text with GPT-3, then there's no labeled data to update on, and the weights are fixed, so it's not online learning by definition. I guess you have something else in mind; can you explain?

I generally understand "meta-learning" to mean that there's an inner loop that has a learning algorithm, and then there's an outer loop with a learning algorithm that impacts the inner loop. I guess you could say that the 96 transformer layers involved in each word-inference is the inner loop. Is it really a learning algorithm though? It doesn't look like a learning algorithm. I mean, it certainly figures things out over the course of those 96 processing steps, and maybe it "figures something out" in step 20 and still "knows" that thing in step 84. So OK fine, I guess you can claim that there's "learning" happening within a single GPT-3 word-inference task, even if it's not the kind of learning that people normally talk about. And then it would be fair to say that the gradient descent training of GPT-3 is meta-learning. Is that what you have in mind? If so, can't you equally well say that a 96-layer ConvNet training involves "meta-learning"? Sorry if I'm misunderstanding.

Ordinary, fully connected (as well as convolutional, most recurrent) neural nets don’t generate "dynamic" weights that are then applied to any activations.

I was confused for a while because "weight" already has a definition—"weights" are the things that you find by gradient descent (a.k.a. "weights & biases" a.k.a. "parameters"). I guess what you're saying is that I was putting a lot of emphasis on the idea that the algorithm is something like:

Calculate (28th entry of the input) × 8.21 + (12th entry of the input × 2.47) + ... , then do a nonlinear operation, then do lots more calculations like that, etc. etc.

whereas in a transformer there are also things in the calculation that look like "(14th entry of the input) × (92nd entry of the input)", i.e. multiplying functions of the input by other functions of the input. Is that what you're saying?

If so, no I was using the term "matrix multiplications and ReLUs" in a more general way that didn't exclude the possibility of having functions-of-the-input be part of each of two matrices that then get multiplied together. I hadn't thought about that as being a distinguishing feature of transformers until now. I suppose that does seem important for widening the variety of calculations that are possible, but I'm not quite following your argument that this is related to meta-learning. Maybe this is related to the previous section, because you're (re)defining "weights" as  sorta "the entries of the matrices that you're multiplying by", and also thinking of "weights" as "when weights are modified, that's learning", and therefore transformers are "learning" within a single word-inference task in a way that ConvNets aren't. Is that right? If so, I dunno, it seems like the argument is leaning too much on mixing up those two definitions of "weights", and you need some extra argument that "the entries of the matrices that you're multiplying by" have that special status such that if they're functions of the inputs then it's "learning" and if they aren't then it isn't. Again, sorry if I'm misunderstanding.

Comment by steve2152 on Genetic Engineering: How would you change your DNA? · 2021-01-14T13:22:40.524Z · LW · GW

Lots of genes have their effect during early development; just because some trait is partly genetic doesn't mean you can affect it by altering the adult genome. I think it would be hard to say what you can and can't do by altering an adult genome (or at least I don't know).

Comment by steve2152 on ESRogs's Shortform · 2021-01-14T01:12:29.558Z · LW · GW

Seems reasonable.

Comment by steve2152 on ESRogs's Shortform · 2021-01-13T19:06:05.406Z · LW · GW

Yeah, I get some aesthetic satisfaction from math results being formally verified to be correct. But we could just wait until the AGIs can do it for us... :-P

Yeah, it would be cool and practically important if you could write an English-language specification for a function, then the AI turns it into a complete human-readable formal input-output specification, and then the AI also writes code that provably meets that specification.

I don't have a good sense for how plausible that is—I've never been part of a formally-verified software creation project. Just guessing, but the second part (specification -> code) seems like the kind of problem that AIs will solve in the next decade. Whereas the first part (creating a complete formal specification) seems like it would be the kind of thing where maybe the AI proposes something but then the human needs to go back and edit it, because you can't get every detail right unless you understand the whole system that this function is going to be part of. I dunno though, just guessing.

Comment by steve2152 on ESRogs's Shortform · 2021-01-13T16:56:53.193Z · LW · GW

Sorry for the stupid question, and I liked the talk and agree it's a really neat project, but why is it so important? Do you mean important for math, or important for humanity / the future / whatever?

Comment by steve2152 on My computational framework for the brain · 2021-01-12T11:04:23.500Z · LW · GW

Thanks for the gwern link!

Regarding terminal goals the only compelling one I have come across is coherent extrapolated volition as outlined in Superintelligence. But how to even program this into code is of course problematic and I haven't followed the literature closely since for rebuttals or better ideas.

I think the most popular alternatives to CEV are "Do what I, the programmer, want you to do", argued most prominently by Paul Christiano (cf. "Approval-directed agents"), variations on that (Stuart Russell's book talks about showing a person different ways that their future could unfold and have them pick their favorite), task-limited AGI ("just do this one specific thing without causing general mayhem") (I believe Eliezer was advocating for solving this problem before trying to make a CEV maximizer), and lots of ideas for systems that don't look like agents with goals (e.g. CAIS). A lot of these "kick the can down the road" and don't try to answer big questions about the future, on the theory that future people with AGI helpers will be in a better position to figure out subsequent steps forward.

evolution did not optimize for interpretability!

Sure, and neither did Yann Lecun. I don't know whether a DNN would be more or less intepretable than a neocortex with the same information content. I think we desperately need a clearer vision for what "interpretability tools" would look like in both cases, such that they would scale all the way to AGI. I (currently) see no way around having intepretability be a big part of the solution.

I don't think almost any human has an internally consistent set of morals

Strong agree. I do think we have a suite of social instincts which are largely common between people and hard-coded by evolution. But the instincts don't add up to an internally consistent framework of morality.

even if we could re-simulate them through an evolutionary like process we would get the good with the bad.

I'm generally not assuming that we will run search processes that parallel what evolution did. I mean, maybe, I just don't think it's that likely, and it's not the scenario I'm trying to think through. I think people are very good at figuring out algorithms based on their desired input-output relations, and then coding them up, whereas evolution-like searches over learning algorithms is ridiculously computationally expensive and has little precedent. (E.g. we invented ConvNets, we didn't discover ConvNets by an evolutionary search.) Evolution has put learning algorithms into the neocortex and cerebellum and amygdala, and I think humans will figure out what these learning algorithms are and directly write code implementing them. Evolution has put non-learning algorithms into the brainstem, and I suspect that the social instincts are in this category, and I suspect that if we make AGI with (some) human-like social instincts, it would be by people writing code that implements a subset of those algorithms or something similar. I think the algorithms are not understood right now, and may well not be by the time we get AGI, and I think that's a bad thing, closing off an option.

Comment by steve2152 on Open & Welcome Thread - January 2021 · 2021-01-08T01:42:44.426Z · LW · GW

Dileep George's "AGI comics" are pretty funny! He's only made ~10 of them ever; most are in this series of tweet / comics poking fun of both sides of the Gary Marcus - Yoshua Bengio debate ... see especially this funny take on what is the definition of deep learning, and one related to AGI timelines. :-)

Comment by steve2152 on Multi-dimensional rewards for AGI interpretability and control · 2021-01-07T01:38:56.402Z · LW · GW

Yeah, sure. I should have said: there's no plausibly useful & practical scheme that is fundamentally different from scalars, for this, at least that I can think of. :-)

Comment by steve2152 on Grokking illusionism · 2021-01-06T11:34:04.104Z · LW · GW

Interesting post, thanks!!

Frankish is openly hazy about exactly what it is to represent a property as phenomenal, and about the specific mechanisms via which such representations give rise to the problematic intuitions in question — this, he thinks, is a matter for further investigation.

I think Graziano's recent book picks up where Frankish left off. See my blog post:

I feel like I now have in my head a more-or-less complete account of the algorithmic chain of events in the brain that leads to a person declaring that they are conscious, and then writing essays about phenomenal consciousness. Didn't help! I find consciousness as weird and unintuitive as ever. But your post is as helpful as anything else I've read. I'll have to keep thinking about it :-D

Phenomenal properties, it turns out, are a flaw in the map

Map vs territory is a helpful framing I think. When we perceive a rock, we are open to the possibility that our perceptions are not reflective of the territory, for example maybe we're hallucinating. When we "perceive" that we are conscious, we don't intuitively have the same open-mindedness; we feel like it has to be in the territory. So yeah, how do we know we're conscious, if not by some kind of perception, and if it's some kind of perception, why can't it be inaccurate as a description of the territory, just like all other perceptions can? (I'm not sure if this framing is deep or if I'm just playing tricks with the term "perception".) Then the question would be: must ethics always be about territories, or can it be about maps sometimes? Hmm, I dunno.

Comment by steve2152 on Multi-dimensional rewards for AGI interpretability and control · 2021-01-05T01:28:02.560Z · LW · GW

Awesome, thanks so much!!!

Comment by steve2152 on Book review: Rethinking Consciousness · 2021-01-02T23:19:05.815Z · LW · GW

I don't really understand what you're getting at, and I suspect it would take more than one sentence for me to get it. If there's an article or other piece of writing that you'd suggest I read, please let me know. :-)

Comment by steve2152 on Book review: Rethinking Consciousness · 2021-01-01T13:56:05.642Z · LW · GW

I guess outside, see my comment about the watch for what I was trying to get at there.

Comment by steve2152 on Covid 12/31: Meet the New Year · 2020-12-31T21:14:16.817Z · LW · GW

I'm trying to think through how "increase in R" interacts with population heterogeneity (a frequent theme of this newsletter, but not mentioned in this particular one!)

This is me thinking out loud. Following is some paragraphs of math which readers can skip, my conclusions are in the last two paragraphs.

Imagine a histogram of everyone in a city. The horizontal axis is "how many microcovids of exposure does this person get per week?" but normalized to (i.e. divided by) the prevailing COVID rate in the city. The vertical axis is "fraction of the population" (again, this is a histogram). Heterogeneity means that the histogram is spread out—some people are almost completely isolated (a peak around 0) while others are out partying, way far to the right.

Actually, let's omit all the immune people from our histogram. So OK, now there are a lot fewer people way out on the right, because most those people have caught COVID already. There are still some people way out on the right, either because they've been lucky, or because they recently jumped rightward—having run out of ability or willingness to isolate as much as before.

Call that histogram H.

Now multiply H by "a line through the origin with slope 1" (y=x). That gives us a function F(x) = x · H(x). While H had a big peak of hardcore isolators on the left side (near 0 microcovids / week), F mostly doesn't have that peak, it got squashed down by the other factor. Conversely, out on the right, the curve for F is pulled way up, relative to H.

The area under F is proportional to R, the reproduction number.

Now what?

Herd immunity would look like reducing the area under H, which in turn reduces the area under F. Remember, immune people don't get included in H. Vaccines would push down all parts of H, whereas community spread would disproportionately push down the right part of H, which has a bigger impact on F per person. 

Lockdowns would look like squeezing H leftwards, which then reduces the area under F.

Now, let's say it's November 2020 in some USA city, and R just jumped up from 0.9 to 1.1. What's the story? There are different possibilities. One story could be: everyone uniformly increased their microcovids by 20%. We grabbed the H curve and stretched it to the right, like a rubber band. Another story could be: 1% of the population just said "f*** it" and increased their microcovids by 2,000%. A new peak appears in H, way off to the right, pulling weight from elsewhere. The consequences of these two stories are different. In the first story, we need maybe 5-15% of the population to get infected (or 20% to get vaccinated) to get back to R=0.9. In the second story, just that 1% will quickly get infected and immune, and then we're rapidly back at R=0.9. 

Now we learn that, under current conditions, the new strain is, let's say, 1.65× as infectious as the old strain. So we know that the area under F increased by 1.65×. The simplest story would be: every person has 1.65× as much microcovids as before (again, I mean "microcovids normalized to the prevailing infection rate"). That's not quite true, but I suspect it's close enough. So the new strain has stretched H horizontally, pulling it rightward by a factor of 1.65 (and squished it vertically to keep the same area under it). As usual, we have community spread pulling weight disproportionately out of the right side of H, and vaccinations pulling weight more uniformly out of H. How does it play out? I dunno, it's kinda hard to say.

So where are we at? There's a subset of people in the USA that are still almost completely isolated, including (in my narrow experience) a subset of New York Times liberals and Bernie progressives, a subset of people with serious comorbidies who can work from home or are retired, etc. They're still way out on the left, not contributing any appreciable weight to F (i.e., they're not contributing to the community spread = R). If their microcovids go up by 1.65×, well, they will still not really be contributing any weight to F. Then there is a big tail of other non-immune people with progressively more exposure, who constitute the bulk of community spread. If their microcovids go up by 1.65×, well, a lot of those people will catch COVID-19 who otherwise wouldn't, before the area under F shrinks below 1 and cases start going down.

So if you compare March-April 2021 (in cities where the new strain has caught fire) to November 2020, say, we have 1.65× more exposure holding behavior fixed, but on the other hand, we're talking about the behavior about the 50-70th percentile least isolated people (say), instead of the 70-90th percentile, because the latter already caught it in November-January. I don't have a good sense for how those balance out, but it doesn't seem so crazy to me to propose that a late-spring peak would be comparable to the peak we're in now, as opposed to much larger. Maybe each peak infects 20% of the population, or something? I don't really know, I just made that number up. But if we wind up with 50% infected at the end, well, I think I have a decent shot at not being one of those 50%, and it would make sense to keep trying...

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-12-31T18:50:01.491Z · LW · GW

Thanks! I'm sympathetic to everything you wrote, and I don't have a great response. I'd have to think about it more. :-D

Comment by steve2152 on Book review: Rethinking Consciousness · 2020-12-31T15:56:02.339Z · LW · GW

It's also true that a brain like [mine] could say this if it weren't true

This is the p-zombie thing, but I think there's a simpler way to think about it. You wrote down "I know that there's qualia because I experience it". There was some chain of causation that led to you writing down that statement. Here's a claim:

Claim: Your experience of qualia played no role whatsoever in the chain of causation in your brain that led to you writing down the statement "I know that there's qualia because I experience it".

This is a pretty weird claim, right? I mean, you remember writing down the statement. Would you agree with that claim? No way, right?

Well, if we reject that claim, then we're kinda stuck saying that if there are qualia, they are somewhere to be found within that chain of causation. And if there's nothing to be found in the chain of causation that looks like qualia, then either there are no qualia, or else qualia are not what they look like.

(Unless the claim isn't that there's no qualia, in which case I don't understand illusionism.)

I can't say I understand it very well either, and see also Luke's report Appendix F and Joe's blog post. From where I'm at right now, there's a set of phenomena that people describe using words like "consciousness" and "qualia", and nothing we say will make those phenomena magically disappear. However, it's possible that those phenomena are not what they appear to be.

We all perceive that we have qualia. You can think of statements like "I perceive X" as living on continuum, like a horizontal line. On the left extreme of the line, we can perceive things because those things are out there in the world and our senses are accurately and objectively conveying them to us. On the right extreme of the line, we can perceive things because of quirks of our perceptual systems.

So those motion illusions are closer to the right end, and "I see a rock" is closer to the left end. But as Graziano points out, there is nothing that's all the way at the left end—even "I see a rock" is very much a construction built by our brain that has some imperfect correlation with the configuration of atoms in the world. (I'm talking about when you're actually looking at a rock, in the everyday sense. If "I see a rock" when I'm hallucinating on LSD, then that's way over on the right end.)

"I have qualia" is describing a perception. Where is it on that line? I say it's over towards the right end. That's not necessarily the same as saying "no such thing as qualia". You could also say "qualia is part of our perception of the world". And so what if it is? Our perception of the world is pretty important, and I'm allowed to care about it...

If consciousness isn't real, why doesn't that just immediately imply nihilism?

There's a funny thing about nihilism: It's not decision-relevant. Imagine being a nihilist, deciding whether to spend your free time trying to bring about an awesome post-AGI utopia, vs sitting on the couch and watching TV. Well, if you're a nihilist, then the awesome post-AGI utopia doesn't matter. But watching TV doesn't matter either. Watching TV entails less exertion of effort. But that doesn't matter either. Watching TV is more fun (well, for some people). But having fun doesn't matter either. There's no reason to throw yourself at a difficult project. There's no reason not to throw yourself at a difficult project. Isn't it funny?

I don't have a grand ethical theory, I'm not ready to sit in judgment of anyone else, I'm just deciding what to do for my own account. There's a reason I ended the post with "Dentin's prayer of the altruistic nihilist"; that's how I feel, at least sometimes. I choose to care about information-processing systems that are (or "perceive themselves to be"?) conscious in a way that's analogous to how humans do that, with details still uncertain. I went them to be (or "to perceive themselves to be"?) happy and have awesome futures. So here I am :-D

Comment by steve2152 on Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) · 2020-12-31T14:50:59.115Z · LW · GW

(Now I'm trying to look at the wall of my room and to decide whether I actually do see pixels or 'line segments', which is an exercise that really puts a knot into my head.)

Sorry if I'm misunderstanding what you're getting at but...

I don't think there's any point in which there are signals in your brain that correspond directly to something like pixels in a camera. Even in the retina, there's supposedly predictive coding data compression going on (I haven't looked into that in detail). By the time the signals are going to the neocortex, they've been split into three data streams carrying different types of distilled data: magnocellular, parvocellular, and koniocellular (actually several types of konio I think), if memory serves. There's a theory I like about the information-processing roles of magno and parvo; nobody seems to have any idea what the konio information is doing and neither do I. :-P

But does it matter whether the signals are superficially the same or not? If you do a lossless transformation from pixels into edges (for example), who cares, the information is still there, right?

So then the question is, what information is in (say) V1 but is not represented in V2 or higher layers, and do we have conscious access to that information? V1 has so many cortical columns processing so much data, intuitively there has to be compression going on.

I haven't really thought much about how information compression in the neocortex works per se. Dileep George & Jeff Hawkins say here that there's something like compressed sensing happening, and Randall O'Reilly says here that there's error-driven learning (something like gradient descent) making sure that the top-down predictions are close enough to the input. Close on what metric though? Probably not pixel-to-pixel differences ... probably more like "close in whatever compressed-sensing representation space is created by the V1 columns"...?

Maybe a big part of the data compression is: we only attend to one object at a time, and everything else is lumped together into "background". Like, you might think you're paying close attention to both your hand and your pen, but actually you're flipping back and forth, or else lumping the two together into a composite object! (I'm speculating.) Then the product space of every possible object in every possible arrangement in your field of view is broken into a dramatically smaller disjunctive space of possibilities, consisting of any one possible object in any one possible position. Now that you've thrown out 99.999999% of the information by only attending to one object at a time, there's plenty of room for the GNW to have lots of detail about the object's position, color, texture, motion, etc. 

Not sure how helpful any of this is :-P

Comment by steve2152 on Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) · 2020-12-31T13:51:25.059Z · LW · GW

For the hard problem of consciousness, the steps in my mind are

1. GNW -->
2. Solution to the meta-problem of consciousness -->
3. Feeling forced to accept illusionism -->
4. Enthusiastically believing in illusionism.

I wrote the post Book Review: Rethinking Consciousness about my journey from step 1 --> step 2 --> step 3. And that's where I'm still at. I haven't gotten to step 4, I would need to think about it more. :-P

Comment by steve2152 on Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) · 2020-12-30T15:46:07.596Z · LW · GW

my vision appears to me as a continuous field of color and light, not as a highly-compressed and invariant representation of objects.

One thing is: I have an artist friend who said that when he teaches drawing classes, he sometimes has people try to focus on and draw the "negative space" instead of the objects—like, "draw the blob of wall that is not blocked by the chair". The reason is: most people find it hard to visualize the 3D world as "contours and surfaces as they appear from our viewpoint", we remember the chair as a 3D chair, not as a 2D projection of a chair, except with conscious effort. The "blob of wall not blocked by the chair" is not associated with a preconception of a 3D object so we have an easier time remembering what it actually looks like from our perspective.

Another thing is: When I look at a scene, I have a piece of knowledge "that's a chair" or "this is my room" which is not associated in any simple way with the contours and surfaces I'm looking at—I can't give it (x,y) coordinates—it's just sorta a thing in my mind, in a separate, parallel idea-space. Likewise "this thing is moving" or "this just changed" feels to me like a separate piece of information, and I just know it, it doesn't have an (x,y) coordinate in my field of view. Like those motion illusions that were going around twitter recently.

Our conscious awareness consists of the patterns in the Global Neuronal Workspace. I would assume that these patterns involve not only predictions about the object-recognition stuff going on in IT but also predictions about a sampling of lower-level visual patterns in V2 or maybe even V1. So then we would get conscious access to something closer to the original pattern of incoming light. Maybe.

I dunno, just thoughts off the top of my head.

Comment by steve2152 on Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) · 2020-12-30T15:01:01.417Z · LW · GW

Thanks for writing this nice review!

I agree about part 1. I don't think there's a meta-level / outside-view argument that AGI has to come from brain-like algorithms—or at least it's not in that book. My inside-view argument is here and I certainly don't put 100% confidence in it.

Interestingly, while airplanes are different from birds, I heard (I think from Dileep George) that the Wright Brothers were actually inspired by soaring birds, which gave them confidence that flapping wings were unnecessary for flight.

Jeff is a skilled writer

Well, the book was coauthored by a professional science writer if I recall... :-P

(If anyone spots mistakes in [part 2], please point them out.)

It's been a while, but the one that springs to mind right now is Jeff's claim (I think it's in this book, or else he's only said it more recently) that all parts of the neocortex issue motor commands. My impression was that only the frontal lobe does. For example, I think Jeff believes that the projections from V1 to the superior colliculus are issuing motor commands to move the eyes. But I thought the frontal eye field was the thing moving the eyes. I'm not sure what those projections are for, but I don't think motor commands is the only possible hypothesis. I haven't really looked into the literature, to be clear.

Relatedly, both Jeff and Steve say that about ten times as many connections are flowing down the hierarchy (except that Steve's model doesn't include a strict hierarchy) than up.

I might have gotten it from Jeff. Hmm, actually I think I've heard it from multiple sources. "Not a strict hierarchy" is definitely something that I partly got from Jeff—not from this book, I don't think, but from later papers like this.

Comment by steve2152 on Book Review: On Intelligence by Jeff Hawkins (and Sandra Blakeslee) · 2020-12-30T14:44:05.381Z · LW · GW

It is also quite interesting how motor control can be seen as a form of predictive algorithm (though frustratingly this is left at the hand-waving level and I found it surprisingly hard to convert this insight into code!). 

I'd be interested if you think my post Predictive Coding and Motor Control is helpful for filling in that gap.

Comment by steve2152 on AXRP Episode 1 - Adversarial Policies with Adam Gleave · 2020-12-30T02:22:40.477Z · LW · GW

I enjoyed listening to the podcast!

I know if someone was able to give me retrograde amnesia and just keep on playing chess against me again and again, they'd probably eventually be able to find some move where I do something, they don't just win but I do something incredibly stupid, reliably.

Makes me think of the boss battles in videogame tool-assisted speedruns. And a silly sci-fi action movie. :-)

Comment by steve2152 on Why Boston? · 2020-12-29T02:42:54.629Z · LW · GW

if you've lived there your whole life you are underrating how nice it is to have no mosquitos, and also probably underestimating how mosquito-free the West Coast is.

I spent 4½ years in my 20s in Berkeley CA and pretty much the rest of my life in Boston, and it never occurred to me that Boston had more mosquitos than Berkeley, until I read this post a couple months ago. I mean, yeah it's true, it's just that the thought hadn't crossed my mind. That's how little the mosquitos impact my life. :-P Everyone's different!

Comment by steve2152 on Defusing AGI Danger · 2020-12-25T16:33:39.399Z · LW · GW

That's fair, I should properly write out the brain-like AGI danger scenario(s) that have been in my head, one of these days. :-)

Comment by steve2152 on Covid 12/24: We’re F***ed, It’s Over · 2020-12-25T12:16:30.009Z · LW · GW

I dunno, it's not so clear to me that we should expect "fourth wave in the US" as opposed to "fourth wave in one or two cities of the US". Think of how it took 12 months to get from Wuhan to where we are today ... I don't think that's all isolation fatigue, I think it's partly that region-to-region spread takes a while. Trevor Bedford offered the mental picture: Wuhan was a spark in November, and 10 weeks later it was a big enough fire to throw off sparks that started fires in Italy, NYC, etc., then 10 weeks later those fires were big enough to throw off sparks to other cities etc., and many rural areas were finally getting their first outbreaks in the most recent couple months. Maybe something like, "R0>1 only because of a small number of unusually infectious people (either due to quirks of biology or behavior / situation), so it's likelier than you would think for a contageon to never start in a particular place, or to fizzle out immediately by chance". Then maybe it takes until late summer for the new strain to really get going in most places, by which time you have vaccines and weather / sunlight mitigating it.

...But maybe NYC and Boston are first in line again. Strong travel connections to England...? :-/

I dunno, I could see it going either way, behaviors (especially travel levels) are different now than last year, as are immunity levels.

Low confidence, I haven't thought too hard about it.

Comment by steve2152 on The Best Visualizations on Every Subject · 2020-12-21T23:02:36.308Z · LW · GW

Not sure if this is what you're looking for, but I made a bunch of math and physics diagrams and animations back in my wikipedia days: :-)

Comment by steve2152 on Gauging the conscious experience of LessWrong · 2020-12-20T21:02:26.667Z · LW · GW

I like to say: "I can't taste good food from bad food". All coffee tastes like coffee, all wine tastes like wine, etc. Always been that way. Makes it stressful to cook for other people, but very easy to cook for myself. :-) I think my smell and taste receptors are all normal, I'm just not wired to pay very close attention to them. :-P

Like niplav, I can't form a mental image of most (or all?) people's faces, not even (or maybe "especially not") people I see every day. Like, if you ask me the eye color of my own wife, when I'm not looking at her ... OK well, I do know the answer, but only as an abstract, declarative fact. I can't just visualize her face and figure it out. I'm much better at visualizing cartoon characters' faces, teddy bears, other objects, etc. :-P

Comment by steve2152 on Hierarchical planning: context agents · 2020-12-19T14:59:49.670Z · LW · GW

I think maybe a more powerful framework than discrete contexts is that there's a giant soup of models, and the models have arrows pointing at other models, and multiple models can be active simultaneously, and the models can span different time scales. So you can have a "I am in the store" model, and it's active for the whole time you're shopping, and meanwhile there are faster models like "I am looking for noodles", and slower models like "go shopping then take the bus home". And anything can point to anything else. So then if you have a group of models that mainly point to each other, and less to other stuff, it's a bit of an island in the graph, and you can call it a "context". Like everything I know about chess strategy is mostly isolated from the rest of my universe of knowledge and ideas, so I could say I have a "chess strategy context". But that's an emergent property, not part of the data structure.

My impression is that the Goodhart's law thing at the end is a bit like saying "Don't think creatively"... Thinking creatively is making new connections where they don't immediately pop into your head. Is that reasonable? Sorry if I'm misunderstanding. :)

Comment by steve2152 on Homogeneity vs. heterogeneity in AI takeoff scenarios · 2020-12-16T11:25:33.230Z · LW · GW


It's unlikely for there to exist both aligned and misaligned AI systems at the same time—either all of the different AIs will be aligned to approximately the same degree or they will all be misaligned to approximately the same degree.

Is there an argument that it's impossible to fine-tune an aligned system into a misaligned one? Or just that everyone fine-tuning these systems will be smart and careful and read the manual etc. so that they do it right? Or something else?

I'd very much like to see more discussion of the extent to which different people expect homogenous vs. heterogenous takeoff scenarios

Thinking about it right now, I'd say "homogeneous learning algorithms, heterogeneous trained models" (in a multipolar type scenario at least). I guess my intuitions are (1) No matter how expensive "training from scratch" is, it's bound to happen a second time if people see that it worked the first time. (2) I'm more inclined to think that fine-tuning can make it into "more-or-less a different model", rather than necessarily "more-or-less the same model". I dunno.

Comment by steve2152 on My computational framework for the brain · 2020-12-14T19:28:11.894Z · LW · GW

Ooh, good questions!

Without having thought about this much it seems to me like the control/alignment problem depends upon the terminal goals we provide the AGI rather than the substrate and algorithms it is running to obtain AGI level intelligence.

I agree that if an AGI has terminal goals, they should be terminal goals that yield good results for the future. (And don't ask me what those are!) So that is indeed one aspect of the control/alignment problem. However, I'm not aware of any fleshed-out plan for AGI where you get to just write down its terminal goals, or indeed where the AGI necessarily even has terminal goals in the first place. (For example, what language would the terminal goals be written in?)

Instead, the first step would be to figure out how to build the system such that it winds up having the goals that you want it to have. That part is called the "inner alignment problem", it is still an open problem, and I would argue that it's a different open problem for each possible AGI algorithm architecture—since different algorithms can acquire / develop goals via different processes. (See here for excessive detail on that.)

Can you provide some examples of what discoveries would indicate that this is an AGI route that is very dangerous or safe?


  • One possible path to a (plausibly) good AGI future is try to build AGIs that have a similar suite of moral intuitions that humans have. We want this to work really well, even in weird out-of-distribution situations (the future will be weird!), so we should ideally try to make this happen by using similar underlying algorithms as humans. Then they'll have the same inductive biases etc. This especially would involve human social instincts. I don't really understand how human social instincts work (I do have vague ideas I'm excited to further explore! But I don't actually know.) It may turn out to be the case that the algorithms supporting those instincts rely in a deep and inextricable way on having a human body and interacting with humans in a human world. That would be evidence that this potential path is not going to work. Likewise, maybe we'll find out that you fundamentally can't have, say, a sympathy instinct without also having, say, jealousy and self-interest. Again, that would be evidence that this potential path is not as promising as it initially sounded. Conversely, we could figure out how the instincts work, see no barriers to putting them into an AGI, see how to set them up to be non-subvertable (such that the AGI cannot dehumanize / turn off sympathy), etc. Then (after a lot more work) we could form an argument that this is a good path forward, and start advocating for everyone to go down this development path rather than other potential paths to AGI.
  • If we find a specific, nice, verifiable plan to keep neocortex-like AGIs permanently under human control (and not themselves suffering, if you think that AGI suffering possible & important), then that's evidence that neocortex-like AGIs are a good route. Such a plan would almost certainly be algorithm-specific. See my random example here—in this example, the idea of an AGI acting conservatively sounds nice and all, but the only way to really assess the idea (and its strengths and weaknesses and edge-cases) is to make assumptions about what the AGI algorithm will be like. (I don't pretend that that particular blog post is all that great, but I bet that with 600 more blog posts like that, maybe we would start getting somewhere...) If we can't find such a plan, can we find a proof that no such plan exists? Either way would be very valuable. This is parallel to what Paul Christiano and colleagues at OpenAI & Ought have been trying to do, but their assumption is that AGI algorithms will be vaguely similar to today's deep neural networks, rather than my assumption that AGI algorithms will be vaguely similar to a brain's. (To be clear, I have nothing against assuming that AGI algorithms will be vaguely similar to today's DNNs. That is definitely one possibility. I think both these research efforts should be happening in parallel.)

I could go on. But anyway, does that help? If I said something confusing or jargon-y, just ask, I don't know your background. :-)

Comment by steve2152 on Inner Alignment in Salt-Starved Rats · 2020-12-13T23:04:26.610Z · LW · GW

Thanks for your comment! I think that you're implicitly relying on a different flavor of "inner alignment" than the one I have in mind.

(And confusingly, the brain can be described using either version of "inner alignment"! With different resulting mental pictures in the two cases!!)

See my post Mesa-Optimizers vs "Steered Optimizers" for details on those two flavors of inner alignment.

I'll summarize here for convenience.

I think you're imagining that the AGI programmer will set up SGD (or equivalent) and the thing SGD does is analogous to evolution acting on the entire brain. In that case I would agree with your perspective.

I'm imagining something different:

The first background step is: I argue (for example here) that one part of the brain (I'm calling it the "neocortex subsystem") is effectively implementing a relatively simple, quasi-general-purpose learning-and-acting algorithm. This algorithm is capable of foresighted goal seeking, predictive-world-model-building, inventing new concepts, etc. etc. I don't think this algorithm looks much like a deep neural net trained by SGD; I think it looks like an learning algorithm that no human has invented yet, one which is more closely related to learning probabilistic graphical models than to deep neural nets. So that's one part of the brain, comprising maybe 80% of the weight of a human brain. Then there are other parts of the brain (brainstem, amygdala, etc.) that are not part of this subsystem. Instead, one thing they do is run calculations and interact with the "neocortex subsystem" in a way that tends to "steer" that "neocortex subsystem" towards behaving in ways that are evolutionarily adaptive. I think there are many different "steering" brain circuits, and they are designed to steer the neocortex subsystem towards seeking a variety of goals using a variety of mechanisms.

So that's the background picture in my head—and if you don't buy into that, nothing else I say will make sense.

Then the second step is: I'm imagining that the AGI programmers will build a learning-and-acting algorithm that resembles the "neocortex subsystem"'s learning-and-acting algorithm—not by some blind search over algorithm space, but by directly programming it, just as people have directly programmed AlphaGo and many other learning algorithms in the past. (They will do this either by studying how the neocortex works, or by reinventing the same ideas.) Once those programmers succeed, then OK, now these programmers will have in their hands a powerful quasi-general-purpose learning-and-acting (and foresighted goal-seeking, concept-inventing, etc.) algorithm. And then the programmer will be in a position analogous to the position of the genes wiring up those other brain modules (brainstem, etc.): the programmers will be writing code that tries to get this neocortex-like algorithm to do the things they want it to do. Let's say the code they write is a "steering subsystem".

The simplest possible "steering subsystem" is ridiculously simple and obvious: just a reward calculator that sends rewards for the exact thing that the programmer wants it to do. (Note: the "neocortex subsystem" algorithm has an input for reward signals, and these signals are involved in creating and modifying its internal goals.) And if the programmer unthinkingly build that kind of simple "steering subsystem", it would kinda work, but not reliably, for the usual reasons like ambiguity in extrapolating out-of-distribution, the neocortex-like algorithm sabotaging the steering subsystem, etc. But, we can hope that there are more complicated possible "steering subsystems" that would work better.

So then this article is part of a research program of trying to understand the space of possibilities for the "steering subsystem", and figuring out which of them (if any!!) would work well enough to keep arbitrarily powerful AGIs (of this basic architecture) doing what we want them to do.

Finally, if you can load that whole perspective into your head, I think from that perspective it's appropriate to say "the genome is “trying” to install that goal in the rat’s brain", just as you can say that a particular gene is "trying" to do a certain step of assembling a white blood cell or whatever. (The "trying" is metaphorical but sometimes helpful.) I suppose I should have said "rat's neocortex subsystem" instead of "rat's brain". Sorry about that.

Does that help? Sorry if I'm misunderstanding you. :-)

Comment by steve2152 on Being the (Pareto) Best in the World · 2020-12-13T15:06:42.840Z · LW · GW

I found that this post gave me a helpful outside-view lens through which to think about the capabilities of people in general and myself in particular. For example, I once did a project at work that was well outside my comfort zone, and it stressed me out, but it helped my confidence to think that the project wouldn't have been in anyone else's comfort zone either. (And I had that thought because of this post).

Comment by steve2152 on Book Summary: Consciousness and the Brain · 2020-12-13T13:48:56.576Z · LW · GW

My impression is that GNW is widely accepted to be a leading contender for explaining consciousness, an important problem. This is a nice intro, and having read both this post and the book in question I can confirm that it covers the important ground fairly. I wound up coming around to a different take on consciousness, see my Book Review: Rethinking Consciousness, but while that book didn't talk much about GNW, I found that familiarity with GNW helped me reframe those ideas and understand them better, and indeed my explanation of that theory puts GNW (which I first heard about through this post) front and center. I should add that I find GNW helpful for thinking about thinking in general, not just consciousness per se.

Also, having read both the book and the post, I probably could have just read the post and skipped the book, and wouldn't have missed much.

Comment by steve2152 on System 2 as working-memory augmented System 1 reasoning · 2020-12-13T13:40:05.265Z · LW · GW

I found this post super useful for clearing out old bad ideas how thought works and building a better model in its place, which is now thoroughly part of my worldview. I talk about this post and link to it constantly. As one concrete example, in my Can You Get AGI From a Transformer, there's a spot where I link to this post. It doesn't take up a lot of space in the post, but it's carrying a lot of weight behind the scenes.

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2020-12-13T12:19:54.187Z · LW · GW


I guess I should add: an example I'm slightly more familiar with is anomaly detection in time-series data. Numenta developed the "HTM" brain-inspired anomaly detection algorithm (actually Dileep George did all the work back when he worked at Numenta, I've heard). Then I think they licensed it into a system for industrial anomaly detection ("the machine sounds different now, something may be wrong"), but it was a modular system, so you could switch out the core algorithm, and it turned out that HTM wasn't doing better than the other options. This is a vague recollection, I could be wrong in any or all details. Numenta also made an anomaly detection benchmark related to this, but I just googled it and found this criticism. I dunno.

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2020-12-13T10:56:41.202Z · LW · GW

I decline. Thanks though.

Comment by steve2152 on Jeff Hawkins on neuromorphic AGI within 20 years · 2020-12-13T10:51:27.880Z · LW · GW

I strongly suspect that cloned hidden Markov model is going to do worse in any benchmark where there's a big randomly-ordered set of training / tasting data, which I think is typical for ML benchmarks. I think its strength is online learning and adapting in a time-varying environment (which of course brains need to do), e.g. using this variant. Even if you find such a benchmark, I still wouldn't be surprised if it lost to DNNs. Actually I would be surprised if you found any benchmark where it won.

I take (some) brain-like algorithms seriously for reasons that are not "these algorithms are proving themselves super useful today". Vicarious's robots might change that, but that's not guaranteed. Instead there's a different story which is "we know that reverse-engineered high-level brain algorithms, if sufficiently understood, can do everything humans do, including inventing new technology etc. So finding a piece of that puzzle can be important because we expect the assembled puzzle to be important, not because the piece by itself is super useful."

Comment by steve2152 on My computational framework for the brain · 2020-12-12T22:10:10.053Z · LW · GW

That's a shame. Seems like an important piece.

Well, we don't have AGI right now, there must be some missing ingredients... :-)

direct my models to the topics that I want outputs on

Well you can invoke a high-confidence model: "a detailed solution to math problem X involving ingredients A,B,C". Then the inference algorithm will shuffle through ideas in the brain trying to build a self-consistent model that involves this shell of a thought but fills in the gaps with other pieces that fit. So that would feel like trying to figure something out.

I think that's more like inference than learning, but of course you can memorize whatever useful new composite models that come up with during this process.