Posts

HCH Speculation Post #2A 2021-03-17T13:26:46.203Z
Hierarchical planning: context agents 2020-12-19T11:24:09.064Z
Modeling humans: what's the point? 2020-11-10T01:30:31.627Z
What to do with imitation humans, other than asking them what the right thing to do is? 2020-09-27T21:51:36.650Z
Charlie Steiner's Shortform 2020-08-04T06:28:11.553Z
Constraints from naturalized ethics. 2020-07-25T14:54:51.783Z
Meta-preferences are weird 2020-07-16T23:03:40.226Z
Down with Solomonoff Induction, up with the Presumptuous Philosopher 2020-06-12T09:44:29.114Z
The Presumptuous Philosopher, self-locating information, and Solomonoff induction 2020-05-31T16:35:48.837Z
Life as metaphor for everything else. 2020-04-05T07:21:11.303Z
Meta-preferences two ways: generator vs. patch 2020-04-01T00:51:49.086Z
Gricean communication and meta-preferences 2020-02-10T05:05:30.079Z
Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z
The AI is the model 2019-10-04T08:11:49.429Z
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z
Value learning for moral essentialists 2019-05-06T09:05:45.727Z
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z
A useful level distinction 2018-02-24T06:39:47.558Z
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z
Dan Dennett on Stances 2017-12-27T08:15:53.124Z
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z
Limited agents need approximate induction 2015-04-24T21:22:26.000Z

Comments

Comment by Charlie Steiner on MIRIx Part I: Insufficient Values · 2021-06-17T04:22:02.788Z · LW · GW

This is a totally fine thing to post :) I agree with most of the things, and agree about their importance.

I think the Goodhart's law side to CEV is more subtle. To be pithy, it's like it doesn't have a problem with Goodhart's law yet because it's not specific enough to even get noticed by Goodhart. If CEV hypothetically considers doing something bad, we can just reassure ourselves that surely that's not what our ideal advisor would have wanted. It's only once we pick a specific method of implementation that we have to confront in mechanistic detail what we could previously hide under the abstraction of anthropomorphic agency.

Comment by Charlie Steiner on Internal Memo from Bleggs Universal · 2021-06-17T00:59:34.958Z · LW · GW

Weirdly, I got a very specific picture of semiconductor manufacturing as I read this.

Comment by Charlie Steiner on Three Paths to Existential Risk from AI · 2021-06-17T00:49:57.287Z · LW · GW

Description? Also, none of your scenarios seem to involve a big intelligence or multitasking advantage - its certainly harder to imagine humans getting outwitted in many different ways in rapid sequence, culminating in an extremely efficient gain of power for the AI, but it actually seems more realistic to me for a fast takeoff (the other option being something like Paul's "gradual loss of control" slow takeoff).

Comment by Charlie Steiner on Can someone help me understand the arrow of time? · 2021-06-16T03:13:18.931Z · LW · GW

The point of the stairs being a spiral is that they obey some relation to each other (like how the laws of physics are a relationship between past and future). The analogue of "time passing" is stair 15 spiraling to stair 16. But the thing is, I'm not committing to agreeing with Alice by saying that. According to Bob, the stairs are already a spiral. Stair 15 already spirals up to stair 16 just by virtue of the stairs being a spiral, which is no illusion.

Comment by Charlie Steiner on Can someone help me understand the arrow of time? · 2021-06-16T03:01:03.070Z · LW · GW

Can we be weirder than the reality we are embedded in?

Sure! We just can't be weirder than reality plus the information required to locate ourselves within reality :P

Anyhow, try what without the moving cursor? Making the stair spiral?

Bob: The stairs are already a spiral, silly.

Comment by Charlie Steiner on Can someone help me understand the arrow of time? · 2021-06-15T18:01:47.886Z · LW · GW

Suppose Alice and Bob are building a three-story house, with a big spiral staircase running from floor 1 to floor 3.

Alice: Isn't it weird how we perceive this staircase as spiraling even though in reality, it's just sitting there?

Bob: What do you mean "perceive?" This is a spiral staircase. It spirals.

Alice: Not the shape of the whole staircase, I mean it seems like the stair goes in a spiral.

Bob: "The" stair?

Alice: Yeah, the stair. Right now I'm standing here on stair 15, but if I moved up to stair 16, the stair would seem to have gone in a spiral. Weird, right?

Bob: I still don't get this "perceive" thing. Of course when you go from stair 15 to stair 16 the stairs look like a spiral. That's because the stairs are a spiral.

Alice: Not the stairs, the stair! The stair that I'm on when I go from 15 to 16. It seems like it goes in a spiral. But there's actually no spiraling, and no stair. In reality it's just a bunch of stairs arranged in a helical shape. Sigh. It's so weird.

Moral of the story: Reality isn't weird. We're weird.

Comment by Charlie Steiner on Oh No My AI (Filk) · 2021-06-15T17:39:39.261Z · LW · GW

No worries!

Comment by Charlie Steiner on Self-study ideas for micro-projects in "abstract" subjects? · 2021-06-14T15:57:14.127Z · LW · GW

Go for it - though it's about selection effects as much as it's advice.

Comment by Charlie Steiner on What other problems would a successful AI safety algorithm solve? · 2021-06-14T05:39:57.463Z · LW · GW

The best technical solution might just be "use the FAI to find the solution." Friendly AI is already, at its core, just a formal method for evaluating which actions are good for humans.

It's plausible we could use AI alignment research to "align" corporations, but only in a weakened sense where there's some process that returns good answers in everyday contexts. But for "real" alignment where the corporation somehow does what's best for humans with high generality... well, that means using some process to evaluate actions, so this is the case of using FAI.

Comment by Charlie Steiner on Self-study ideas for micro-projects in "abstract" subjects? · 2021-06-14T04:26:34.665Z · LW · GW

Practicing the saxophone only has a payoff in saxophone music. Similarly, abstract exercises will often only have a payoff in nice abstractions. For example, solving the wave equation of the hydrogen atom is a classic, but it cannot gain you anything directly because it is merely knowledge, and that same knowledge can be found in a textbook or wiki faster than it can be derived. You're just gonna have to be the sort of person for whom solving the wave equation of the hydrogen atom is a juicy project.

Comment by Charlie Steiner on Comment on the lab leak hypothesis · 2021-06-12T08:40:51.236Z · LW · GW

Do you have a cite for previous work reporting or using this sequence (something like cct cgg cgg gca) for a cleavage site in viruses? I only ended up finding and looking through one bit of prior gain of function research that's the sort of genetic engineering you're hypothesizing ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168280/ ) but it used a totally different sequence. Better yet, someone from pre-covid-19 times talking about how they made their code include "cggcgg" as a marker.

Comment by Charlie Steiner on Oh No My AI (Filk) · 2021-06-11T22:34:16.984Z · LW · GW

To scan, the structure should be something like:

(2 syllable pickup) (3-4 syllable measure) (2-3 syllable measure) (2-3 syllable measure)

(optional 1-syllable pickup) (4 syllable measure [the last 2 being AI])

(1-2 syllable pickup) (3-4 syllable measure) (2-3 syllable beat) (2-3 syllable beat)

(optional 1-syllable pickup) (4 syllable measure [the last 2 being AI])

(1-2 syllable pickup) (2-4 syllable measure) (2-3 syllable beat) (2-3 syllable beat)

(optional 1-syllable pickup) (4 syllable measure [the last 2 being AI])

(optional 1 syllable pickup) (4 syllable measure [the last 2 being AI])

(1 syllable pickup) (4 syllable measure [the last 2 being AI])

(1 syllable pickup) (4 syllable measure [the last 2 being AI])

 

E.g.

(I was) (gonna) (go to work but) (then I got high)


(I) (just got a) (new promotion but) (I got high)


[no pickup] (Now I'm) (selling dope and) (I know why)

 

[no pickup] ('Cause I got high)


(Be)(cause I got high)


(Be)(cause I got high)

Comment by Charlie Steiner on What is the most effective way to donate to AGI XRisk mitigation? · 2021-06-11T19:12:47.982Z · LW · GW

Right on time, turns out there's more grants - but now I'm not sure if these are academic-style or not (I guess we might see the recipients later). https://futureoflife.org/fli-announces-grants-program-for-existential-risk-reduction/?fbclid=IwAR3_pMQ0tDd_EOg_RShlLY8i71nGFliu0YH8kzbc7fClACEgxIo2uK6gPW8&cn-reloaded=1

Comment by Charlie Steiner on Oh No My AI (Filk) · 2021-06-11T17:04:37.638Z · LW · GW

I was gonna help humanity

with my AI

So I put a definition of brains

in my AI

Now it satisfies the preferences

of the fruit fly (oh yeah)

oh no my AI

oh no my AI

oh no my AI

Comment by Charlie Steiner on ML is now automating parts of chip R&D. How big a deal is this? · 2021-06-11T00:27:04.357Z · LW · GW

One additional thing I'd be interested in is AI-assisted solution of the differential equations behind better masks for EUV lithography. It seems naively like another factor of 2-ish in feature size is just sitting out there waiting to be seized, though maybe I'm misunderstanding what I've heard about switching back to old-style masks with EUV.

Comment by Charlie Steiner on Covid 6/10: Somebody Else’s Problem · 2021-06-11T00:22:02.247Z · LW · GW

Someone (MondSemmel to be precise) posted this last week. I think it's very cool, but also see last-week-me's further thoughts here: https://www.lesswrong.com/posts/92aXvTXxReBQZk2gx/?commentId=Dg6EBu3Cjd6DcBxjD

Comment by Charlie Steiner on Covid 6/10: Somebody Else’s Problem · 2021-06-10T14:09:22.647Z · LW · GW

Did your lab leak section have anything at all on biology as opposed to politics? I'm concerned that a lot of total non-experts seem to be promoting the genetic engineering hypothesis because it's a good morality story, or it gives them someone to blame, or even just that it's something exciting to talk about. Arguments like "wikipedia has broken links!" aren't merely unconvincing to me, I find them actively causing me to raise my guard about the selection process that led it to my attention. And though engaging with the biology isn't a surefire way to avoid repeating bs, at least it helps.

Comment by Charlie Steiner on Big picture of phasic dopamine · 2021-06-10T04:49:18.137Z · LW · GW

How does the section of the amygdala that a particular dopamine neuron connects to even get trained to do the right thing in the first place? It seems like there should be enough chance in connections that there's really only this one neuron linking a brainstem's particular output to this specific spot in the amygdala - it doesn't have a whole bundle of different signals available to send to this exact spot.

SL in the brain seems tricky because not only does the brainstem have to reinforce behaviors in appropriate contexts, it might have to train certain outputs to correspond to certain behaviors in the first place, all with only one wire to each location! Maybe you could do this with a single signal that means both "imitate the current behavior" and also "learn to do your behavior in this context"? Alternatively we might imagine some separate mechanism for of priming the developing amygdala to start out with a diverse yet sensible array of behavior proposals, and the brainstem could learn what its outputs correspond to and then signal them appropriately.

Comment by Charlie Steiner on Big picture of phasic dopamine · 2021-06-09T23:16:26.406Z · LW · GW

One thing that strikes me as odd about this model is that it doesn't have the blessing of dimensionality - each plan is one loop, and evaluating feedback to a winning plan just involves feedback to one loop. When it's general reward we can simplify this with just rewarding recent winning plans, but in some places it seems like you do imply highly specific feedback, for which you need N feedback channels to give feedback on ~N possible plans. The "blessing of dimensionality" kicks in when you can use more diverse combinations of a smaller number of feedback channels to encode more specific feedback.

Maybe what seems to be specific feedback is actually a smaller number of general types? Like rather than specific feedback to snake-fleeing plans or whatever, a broad signal (like how Success-In-Life Reward is a general signal rewarding whatever just got planned) could be sent out that means "whatever the amygdala just did to make the snake go away, good job" (or something). Note that I have no idea what I'm talking about.

Comment by Charlie Steiner on Should we vaccinate against PGBD5 which codes for a transposase? · 2021-06-08T20:08:37.816Z · LW · GW

Aren't most parts of the body expressing transposons? Seems like not the right thing to alert our immune system to.

Not a biologist, but I assumed we'd need to alter lots of cells - either to produce something that interferes with transposons like RNAi, or to break them with direct gene editing.

Comment by Charlie Steiner on Open and Welcome Thread – June 2021 · 2021-06-07T23:33:01.712Z · LW · GW

Good points. I was imagining some successful slow takeoff scenario where there's a period of post-scarcity with basically human control of the future (reminds me of the Greg Egan story Border Guards.). But late into a slow takeoff, or full-on post-singleton, the full transhumanist package will be realizable.

I'm not so sure that learning to love numbers at the expense of my current hobbies is all that great an idea. Sure, my future self would like it, but right now I don't love numbers that much. I think a successful future would need some anti-wireheading guardrails that would make it difficult to learn to love math in a way that really eclipsed all your previous interests.

Comment by Charlie Steiner on Some AI Governance Research Ideas · 2021-06-07T18:01:44.229Z · LW · GW

Another thing I'm really interested in is the ordinary work of lobbying... I remember someone had a question here a while back about applying for internships with US congresspeople, and thinking "this is the sort of question that would benefit from some lobbyists who had a professional understanding of how to put convincing people and ideas near the levers of government."

More EA-involved people might know: are there EA lobbyists who make it their business to know how to nudge government to make things easier for effective charities?

Comment by Charlie Steiner on Open and Welcome Thread – June 2021 · 2021-06-07T17:41:59.025Z · LW · GW

I like to think that depictions of good life after AGI are just called slice of life stories. Just find a story about three friends baking a cake and add "also, most of the production of ingredients was handled by robots." Any story that doesn't hinge on someone being poor or in danger is valid post-scarcity. This eliminates a huge fraction of all stories we tell, but a much smaller fraction of the stories you'd actually like to have happen to you.

I'm not sure of any slice of life stories that actual do have the "also, robots" conceit, though. Maybe Questionable Content?

Comment by Charlie Steiner on Thoughts on the Alignment Implications of Scaling Language Models · 2021-06-03T13:43:32.304Z · LW · GW

Great post! I very much hope we can do some clever things with value learning that let us get around needing AbD to do the things that currently seem to need it.

The fundamental example of this is probably optimizability - is your language model so safe that you can query it as part of an optimization process (e.g. making decisions about what actions are good), without just ending up in the equivalent of deepDream's pictures of Maximum Dog.

Comment by Charlie Steiner on What is the most effective way to donate to AGI XRisk mitigation? · 2021-05-31T18:21:57.231Z · LW · GW

I actually haven't heard anything out of them in the last few years either. My knowledge of grantmaking organizations is limited - I think similar organizations like Berkeley Existential Risk Initiative, or the Long-Term Future Fund, tend to be less about academic grantmaking and more about funding individuals and existing organizations (not that this isn't also valuable).

Comment by Charlie Steiner on The Homunculus Problem · 2021-05-31T02:30:12.103Z · LW · GW

Right. Rather than having a particular definition of meaning, I'm more thinking about the social aspects of explanation. If someone could say "There are two ways of talking about this same part of the world, and both ways use the same word, but these two ways of using the word actually mean different things" and not get laughed out of the room, then that means something interesting is going on if I try to answer a question posed in one way of talking by making recourse to the other.

Comment by Charlie Steiner on The Homunculus Problem · 2021-05-30T20:11:38.019Z · LW · GW

Good points!

The specific case here is why-questions about bits of a model of the world (because I'm making the move to say it's important that certain why-questions about mental stuff aren't just raw data, they are asked about pieces of a model of mental phenomena). For example, suppose I think that the sky is literally a big sphere around the world, and it has the property of being blue in the day and starry in the night. If I wonder why the sky is blue, this pretty obviously isn't going to be a logical consequence of some other part of the model. If I had a more modern model of the sky, its blueness might be a logical consequence of other things, but I wouldn't mean quite the same thing by "sky."

So my claim about different semantics isn't that you can't have any different models with overlapping semantics, it's specifically about going from a model where some datum (e.g. looking up and seeing blue) is a trivial consequence to one where it's a nontrivial consequence. I'm sure it's not totally impossible for the meanings to be absolutely identical before and after, but I think it's somewhere between exponentially unlikely and measure zero.

Comment by Charlie Steiner on What is the most effective way to donate to AGI XRisk mitigation? · 2021-05-30T16:08:42.256Z · LW · GW

I agree that there are multiple types of basic research we might want to see, and maybe not all of them are getting done. I therefore actually put a somewhat decent effect size on traditional academic grants from places like FLI, even though most of its grants aren't useful, because it seems like a way to actually get engineers to work on problems we haven't thought of yet. This is the grant-disbursing process as an active ingredient, not just as filler. I am skeptical if this effect size is bigger on the margin than just increasing CHAI's funding, but presumably we want some amount of diversification.

Comment by Charlie Steiner on Covid 5/27: The Final Countdown · 2021-05-28T07:19:18.253Z · LW · GW

https://en.m.wikipedia.org/wiki/Zoonosis#Lists_of_diseases

Comment by Charlie Steiner on The Homunculus Problem · 2021-05-27T23:09:46.177Z · LW · GW

Fair enough, my only defense is that I thought you'd find it funny.

A more serious answer to the homunculus problem as stated is simply levels of description - one of our ways of talking about and modeling humans in terms of experience (particularly inaccurate experience) tends to reserve a spot for an inner experiencer, and our way of talking about humans in terms of calcium channels tends not to. Neither model is strictly necessary for navigating the world, it's just that by the very question "Why does any explanation of subjective experience invoke an experiencer" you have baked the answer into the question by asking it in a way of talking that saves a spot for the experiencer. If we used a world-model without homunculi, the phrase "subjective experience" would mean something other than what it does in that question.

There is no way out of this from within - as Sean Carroll likes to point out, "why" questions have answers (are valid questions) within a causal model of the world, but aren't valid about the entire causal model itself. If we want to answer questions about why a way of talking about and modeling the world is the way it is, we can only do that within a broader model of the world that contains the first (edit: not actually true always, but seems true in this case), and the very necessity of linking the smaller model to a broader one means the words don't mean quite the same things in the broader way of talking about the world. Nor can we answer "why is the world just like my model says it is?" without either tautology or recourse to a bigger model.

We might as well use the Hard Problem Of Consciousness here. Start with a model of the world that has consciousness explicitly in it. Then ask why the world is like it is in the model. If the explanation stays inside the original model, it is a tautology, and if it uses a different model, it's not answering the original question because all the terms mean different things. The neuroscientists' hard problem of consciousness, as you call it, is in the second camp - it says "skip that original question. We'll be more satisfied by answering a different question."

This homunculus problem seems to be a similar sort of thing, one layer out. The provocative version has an unsatisfying, non-explaining answer, but we might be satisfied by asking related yet different questions like "why is talking about the world in terms of experience and experiencer a good idea for human-like creatures?"

Comment by Charlie Steiner on The Homunculus Problem · 2021-05-27T21:11:35.477Z · LW · GW

Easy :P Just build a language module for the Bayesian model that, when asked about inner experience, starts using words to describe the postprocessed sense data it uses to reason about the world.

Of course this is a joke, hahaha, because humans have Real Experience that is totally different from just piping processed sense data out to be described in response to a verbal cue.

Comment by Charlie Steiner on Covid 5/27: The Final Countdown · 2021-05-27T17:22:04.499Z · LW · GW

Very interesting page! I think the most Bayesian flaw is not conditioning on earlier evidence in some places. The equivalent of trying to evaluate P(virus with properties A, B, and C) as P(A) x P(B) x P(C), rather than the correct method P(A) x P(B given A) x P(C given A and B).

I'm specifically thinking of how the genetics of cov-2 are not independent of its adaptedness to humans. Once you condition on the unlikely genetics, it shouldn't be as unlikely that you see transmission in humans, and vice versa.

Substantively, I'm not sure whether coincidences are being cherrypicked here - there are lots of cleavage sites in viruses, and lots of other viruses to compare chunks of genes to. How many similar coincidences should we expect just due to chance? Basically if you update on the coincidences you see but never correct for the number of possible coincidences you could have seen, you'll overrate how much evidence coincidences give.

I also feel like they put a lot of thought into evidence capable of testing a normal zoonotic origin ( as is proper - https://www.lesswrong.com/posts/rmAbiEKQDpDnZzcRf/positive-bias-look-into-the-dark ), but didn't put forth similar effort into what goes against the genetic engineering hypothesis. How likely is it that a research lab finds a new bat coronavirus and then before publishing anything about it, decides that it's the perfect testbed for dramatic gain of function research? This likelihood could be evaluated by checking for prior examples of ambitious research using a microbe before anything about it had been published. This sort of thing is missing.

Comment by Charlie Steiner on Problems facing a correspondence theory of knowledge · 2021-05-24T18:05:42.768Z · LW · GW

I think grappling with this problem is important because it leads you directly to understanding that what you are talking about is part of your agent-like model of systems, and how this model should be applied depends both on the broader context and your own perspective.

Comment by Charlie Steiner on What's your probability that the concept of probability makes sense? · 2021-05-22T23:14:54.222Z · LW · GW

18 nines maybe?

Comment by Charlie Steiner on The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization · 2021-05-22T22:02:50.517Z · LW · GW

I'm still confused about the part where you use the Hoeffding inequality - how is the lambda in that step and the lambda in the loss function "the same lambda"?

Comment by Charlie Steiner on What will 2040 probably look like assuming no singularity? · 2021-05-19T05:14:21.106Z · LW · GW

Speaking of transmission costs, I think the 2040 future there is carbon nanotube power lines.

Comment by Charlie Steiner on Knowledge Neurons in Pretrained Transformers · 2021-05-19T05:11:33.460Z · LW · GW

On one hand, maybe? Maybe training using a differential representation and SGD was the only missing ingredient.

But I think I'll believe it when I see large neural models distilled into relatively tiny symbolic models with no big loss of function. If that's hard, it means that partial activations and small differences in coefficients are doing important work.

Comment by Charlie Steiner on Saving Time · 2021-05-19T02:14:51.193Z · LW · GW

What is then stopping us from swapping the two copies of the coarser node?

Isn't it precisely that they're playing different roles in an abstracted model of reality? Though alternatively, you can just throw more logical nodes at the problem and create a common logical cause for both.

Also, would you say what you have in mind is built out of of augmenting a collection of causal graphs with logical nodes, or do you have something incompatible in mind?

Comment by Charlie Steiner on Suppose $1 billion is given to AI Safety. How should it be spent? · 2021-05-16T10:18:26.850Z · LW · GW

The biggest problem is taste. A hypothetical billion-dollar donor needs to somehow turn their grant into money spent on useful things for current and future people actually working on AI safety, rather than on relatively useless projects marketed by people who are skilled at collecting grant money. This is more or less the problem Open Philanthropy has been trying to solve, and they're doing an okay job, so if I was a non-expert billionaire I would try to do something meta like OpenPhil.

But if I personally had a billion dollars to spend, and had to spend it with a 10 year horizon...

Things to plausibly do:

  • Grants for current entities. Giving them more than they currently need is just a sneaky way of spreading around the allocation process. Might be better to just give them a small amount (~2M/yr, i.e. 2%), but partner with them through my meta-level grantmaking organization (see below).

  • Money to move adjacent workers or experts not currently working on AI alignment into full time work. Might also be related to the next item:

  • Found my own object-level AI alignment organization. Budget depends on how big it is. Probably can't scale past 50 people or 5M/yr very well with what I currently think is the state of people worth hiring.

  • Securing computing resources. Currently unimportant (except secondarily for reputation and/or readiness), might be very important very suddenly, sometime in the future. Spend ~0.4M/yr on preparing but set aside 100M-200M for compute?

  • Found or partner with a meta-level organization to search for opportunities I'm not aware of now or don't have the expertise to carry out, and to do meta-level work as method of promoting AI safety (e.g. search for opportunities for promoting AI alignment work in China, lobbying other organizations such as Google by providing research on how they can contribute to AI safety.) (3M/yr on org, setting aside ~30M/yr for opportunities)

  • Found a meta-level organization (may be part of another organization) focused on influencing the education of students interested in AI. Maybe try to get textbooks written that have a LW perspective, partner with professors to develop courses on alignment-related topics, and also make some grants to places we think are doing it right and for students excelling at those places. (Say 2M/yr on background, 4M/yr on grants)

This is only 6 things. Could I spend 160M (16M per year for 10 years) on each of these things? Looking at the estimates, maybe not. This indicates that to spend money at the billion-dollar level, we might have to spend only part (my estimate says 60%) on things that have good marginal returns from where we're currently standing, and the rest might have to go into some large hierarchical nonprofit that tries to turn ordinary philosophers, mathematicians, and software engineers into AI alignment workers by paying them or otherwise making it a glamorous problem for the best and brightest to work on. But I'm worried that bad ideas could become popular in this kind of ecosystem. Some iron-fisted control over the purse strings may be necessary.

I'm not sure if this is "talent-limited" or not. To some extent yes, it is. But we're also limited by the scale of social trust, and by what one might call the "surface area" of the problem, which determines how fast the returns diminish when just adding more people, even if they were clones of known people.

Comment by Charlie Steiner on Agency in Conway’s Game of Life · 2021-05-13T03:52:17.035Z · LW · GW

The truly arbitrary version seems provably impossible. For example, what if you're trying to make a smiley face, but some other part of the world contains an agent just like you except they're trying to make a frowny face - you obviously both can't succeed. Instead you need some special environment with low entropy, just like humans do in real life.

Comment by Charlie Steiner on Challenge: know everything that the best go bot knows about go · 2021-05-12T20:56:51.997Z · LW · GW

To make my own point that may be distinct from ACP's: the point isn't that neural networks don't know anything. The point is that the level of description I'm operating on when I say that phrase is so imprecise that it doesn't allow you to make exact demands like knowing "everything the NN does" or "exactly what the NN does," for any system other than a copy of that same neural network.

If I make the verbal chain of reasoning "the NN can know things, I can know things, therefore I can know what the NN knows," this chain of reasoning actually fails. Even though I'm using the same English word "know" both times, the logical consequences of the word are different each time I use it. If I want to make progress here, I'll need to taboo the word "know."

Comment by Charlie Steiner on Thoughts on Iterated Distillation and Amplification · 2021-05-12T01:16:57.587Z · LW · GW

That's what we need for LW: BibTeX support :P

I'm curious if I'm an outlier here - have you really never tried to relate some joke or funny story and then cracked up before you could finish it? I can't tickle myself, but I can easily make myself laugh.

Anyhow, I think this is to some extent a low-dimensional analogy for a high-dimensional world. When the world is complicated, trying to do something new can result in finding connections to something you already know about, and studying something familiar can uncover the surprising. This is for exactly the same reason that a 1D patch on a string is close to fewer neighbors than a 3D section of space, which in turn has fewer neighbors than a drug molecule has in the space of possible chemicals. In high dimensional problems, both connections and surprises are so common as to be unavoidable. But if we're just walking around on the 2D surface of the earth, we'll probably run into connections and surprises at about the rate we'd expect from our stories.

Comment by Charlie Steiner on What is the best chemistry textbook? · 2021-05-12T00:48:16.340Z · LW · GW

I do think organic chemistry texts assume you start out with notions of molarity, chemical equations, conservation of energy, change of phase, the kinetic theory of temperature, a basic grasp of the periodic table, and as you go deeper into them they might start to expect you to know some quantum mechanics from elsewhere. So if I was teaching a child from scratch, we would certainly need to cover basic chemistry material before opening an organic chemistry textbook. But if you've taken a high school chemistry course and want to jump straight in to college-level orgo, you'll almost certainly be fine (speaking from experience).

Comment by Charlie Steiner on MIRI location optimization (and related topics) discussion · 2021-05-09T02:53:11.871Z · LW · GW

As a midwesterner: Columbus OH, various Chicago suburbs, and various Detroit suburbs should be on your list. Plausibly also Kalamazoo MI and Bloomington-Normal IL.

On the smaller population side, another town similar to Champaign-Urbana IL is Ithaca NY.

Comment by Charlie Steiner on Anthropics: different probabilities, different questions · 2021-05-06T17:53:05.170Z · LW · GW

Someday your review of this will mention subjective probability questions in addition to the frequency questions :P The "I have a probabilistic model of the world and I want to compute anthropic probabilities using this model" kind of thing.

A related interesting approach to anthropics is Solomonoff induction. You treat the entire process generating your subjective experience as a Turing machine and ask about what it's likely to do next. In some sense this drops the whole notion of "world" out entirely, and just generally breaks the mold

Comment by Charlie Steiner on 25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong · 2021-05-01T21:21:54.391Z · LW · GW

No, I'm definitely being more descriptivist than causal-ist here. The point I want to get at is on a different axis.

Suppose you were Laplace's demon, and had perfect knowledge of a human's brain (it's not strictly necessary to pretend determinism, but it sure makes the argument simpler). You would have no need to track the human's "wants" or "beliefs," you would just predict based on the laws of physics. Not only could you do a better job than some human psychologist on human-scale tasks (like predicting in advance which button the human will press), you would be making information-dense predictions about the microphysical state of the human's brain the would just be totally beyond a model of humans coarse-grained to the level of psychology rather than physics.

So when you say "External reference is earned by reasoning in such a way that attributing content like 'the cause of this and that sensory state ...' is a better explanation", I totally agree, but I want to emphasize: better explanation for whom? If we somehow built Laplace's demon, what I'd want to tell it is something like "model me according to my own standards for intentionality."

Comment by Charlie Steiner on 25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong · 2021-04-30T18:59:30.068Z · LW · GW

I don't think we disagree too much, but what does "play the right functional role" mean, since my desires are not merely about what brain-state I want to have, but are about the real world? If I have a simple thermostat where a simple bimetallic spring opens or closes a switch, I can't talk about the real-world approximate goals of the thermostat until I know whether the switch goes to the heater or to the air conditioner. And if I had two such thermostats, I would need the connections to the external world to figure out if they were consistent or inconsistent.

In short, the important functional role that my desires play does not just take place intra-cranially, they function in interaction with my environment. If you were a new superintelligence, and the first thing you found was a wireheaded human, you might conclude that humans value having pleasurable brain states. If the first thing you found were humans in their ancestral environment, you might conclude that they value nutritious foods or producing healthy babies. The brains are basically the same, but the outside world they're hooked up to is different.

So from the premises of functionalism, we get a sort of holism.

Comment by Charlie Steiner on Notes on Robert McIntyre’s Brain Preservation Talk at the Long Now Foundation · 2021-04-30T03:59:55.786Z · LW · GW

Thanks for the pointer to this interesting talk.

I do wish he'd backed up his optimism about exponential growth with a bit more inside view.

Comment by Charlie Steiner on Low-stakes alignment · 2021-04-30T00:44:56.305Z · LW · GW

I feel like we can approximately split the full alignment problem into two parts: low stakes and handling catastrophes.

Insert joke about how I can split physics research into two parts: low stakes and handling catastrophes.

I'm a little curious about whether assuming fixed low stakes accidentally favors training regimes that have the real-world drawback of raising the stakes.

But overall I think this is a really interesting way of reframing the "what do we do if we succeed?" question. There is one way it might be misleading, which is I think that we're left with much more of the problem of generalizing beyond the training domain that it first appears: even though the AI gets to equilibrate to new domains safely and therefore never has to take big leaps of generalization, the training signal itself has to do all the work of generalization that the trained model gets to avoid!

Comment by Charlie Steiner on 25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong · 2021-04-29T22:18:31.950Z · LW · GW

Very interesting! More interesting to me than the last time I looked through your proposal, both because of some small changes I think you've made but primarily because I'm a lot more amenable to this "genre" than I was.

I'd like to encourage a shift in perspective from having to read preferences from the brain, to being able to infer human preferences from all sorts of human-related data. This is related to another shift from trying to use preferences to predict human behavior in perfect detail, to being content to merely predict "human-scale" facts about humans using an agential model.

These two shifts are related by the conceptual change from thinking about the human preferences as "in the human," thus being inextricably linked to understanding humans on a microscopic level, to thinking about human preferences as "in our model of the human" - as being components that need to be understood as elements of an intentional-stance story we're telling about the world.

This of course isn't to say that brains have no mutual information with values. But rather than having two separate steps in your plan like "first, figure out human values" and "later, fit those human values into the AI's model of the world," I wonder if you've explored how it could work for the AI to try to figure out human values while simultaneously locating them within a way (or ways) of modeling the world.