Posts

Life as metaphor for everything else. 2020-04-05T07:21:11.303Z · score: 37 (11 votes)
Meta-preferences two ways: generator vs. patch 2020-04-01T00:51:49.086Z · score: 19 (6 votes)
Gricean communication and meta-preferences 2020-02-10T05:05:30.079Z · score: 14 (5 votes)
Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z · score: 15 (11 votes)
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z · score: 9 (3 votes)
The AI is the model 2019-10-04T08:11:49.429Z · score: 12 (10 votes)
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z · score: 17 (5 votes)
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z · score: 14 (5 votes)
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Value learning for moral essentialists 2019-05-06T09:05:45.727Z · score: 13 (5 votes)
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z · score: 20 (6 votes)
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z · score: 40 (10 votes)
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z · score: 40 (21 votes)
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z · score: 16 (5 votes)
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z · score: 19 (7 votes)
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z · score: 53 (13 votes)
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z · score: 101 (27 votes)
A useful level distinction 2018-02-24T06:39:47.558Z · score: 26 (6 votes)
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z · score: 18 (9 votes)
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z · score: 8 (3 votes)
Dan Dennett on Stances 2017-12-27T08:15:53.124Z · score: 8 (4 votes)
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z · score: 11 (5 votes)
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z · score: 25 (9 votes)
Limited agents need approximate induction 2015-04-24T21:22:26.000Z · score: 1 (1 votes)

Comments

Comment by charlie-steiner on Seeking opinions on the pros and cons of various telepresence tools · 2020-04-06T05:36:30.603Z · score: 2 (1 votes) · LW · GW

My brother the IT professional is recommending BlueJeans over Zoom, but I'm not going to try it out until next weekend, at which point I'll report back.

Comment by charlie-steiner on Life as metaphor for everything else. · 2020-04-05T15:59:12.405Z · score: 5 (3 votes) · LW · GW

Yes, I think our position relative to consciousness is similar to Pasteur's relative to life.

Nonetheless, here's two exercises that I think are entirely possible today:

One: Try to to find an edge case of consciousness - someone or something that's a bit like the analogue of a virus, where one might imagine appeals either way, based on different facts of the matter. How precisely can you describe what patterns are there/missing? (Hint in rot13 if having trouble thinking of an edge case: Crbcyr gnyx nobhg "pbeeryngrf bs pbafpvbhfarff" - jung unf fbzr ohg abg bguref?)

Two: Introspect on yourself, and try to identify some ability of yourself that is one of the "powers of your consciousness" (the analogues in life being big things like reproduction, or little things like doing fermentation), but that could in principle be absent. For example, I feel like my ability to subconsciously and fluently interpret my visual field as scenes and objects really contributes to my sense of consciousness.

Comment by charlie-steiner on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T17:40:44.086Z · score: 8 (5 votes) · LW · GW

To believe that you're a one in a million case (e.g. in the first or last millionth of all humans), you need 20 bits of information (because 2^20 is about 1000000).

So on the one hand, 20 bits can be hard to get if the topic is hard to get reliable information about. But we regularly get more than 20 bits of information about all sorts of questions (reading this comment has probably given you more than 20 bits of information). So how hard this should "feel" depends heavily on how well we can translate our observational data into information about the future of humanity.

Extra note: In the case that there are an infinite number of humans, this uniform prior actually breaks down (or else naively you'd think you have a 0.0% chance of being anyone at all), so there can be a finite contribution from the possibility that there are infinite people.

Comment by charlie-steiner on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T04:45:05.142Z · score: 7 (4 votes) · LW · GW

People are bad at interpreting the Doomsday Argument, because people are bad at dealing with evidence as Bayesian evidence, rather than a direct statement of the correct belief.

The Doomsday Argument is evidence that we should update on. But it is not a direct statement of the correct belief.

A parable:

On a parallel earth, humanity is on the decline. Some disaster has struck, and the once-billions of proud humanity have been reduced to a few scattered thousands. Now the last exiles of civilization hide in sealed habitats that they no longer have the supply chains to repair, and they know that soon enough the end will come for them too. But on the other hand, the philosophers among them remark, at least there's the Doomsday Argument, which says that on average we should expect to be in the middle of humanity. So if the DA is right, the current crisis is merely a bottleneck in the middle of humanity's time, and everything will probably work itself out any day now. The last philosopher dies after breathing in contaminated air, with the last words "No! The position I occupy is... very unlikely!"

Moral:

Your eyes and ears also provide you evidence about the expected span of humanity.

Comment by charlie-steiner on FLI Podcast: The Precipice: Existential Risk and the Future of Humanity with Toby Ord · 2020-04-01T06:31:55.706Z · score: 2 (1 votes) · LW · GW

Fun podcast. The analogy to human planning horizons was a very thought-provoking one. Though obviously, there are forces that explain the way things are; competition between different interests is a strong selection pressure for short-termism.

Comment by charlie-steiner on [AN #92]: Learning good representations with contrastive predictive coding · 2020-03-25T20:28:46.404Z · score: 4 (2 votes) · LW · GW

Is SIDLE not also a perfectly fine word? I don't know how this went through peer review.

Anyhow, good newsletter this week, thanks :)

Comment by charlie-steiner on Deconfusing Human Values Research Agenda v1 · 2020-03-25T18:35:11.983Z · score: 4 (2 votes) · LW · GW

I almost agree, but still ended up disagreeing with a lot of your bullet points. Since reading your list was useful, I figured it would be worthwhile to just make a parallel list. ✓ for agreement, × for disagreement (• for neutral).

Problem overview

✓ I think we're confused about what we really mean when we talk about human values.

× But our real problem is on the meta-level: we want to understand value learning so that we can build an AI that learns human values even without starting with a precise model waiting to be filled in.

_× We can trust AI to discover that structure for us even though we couldn't verify the result, because the point isn't getting the right answer, it's having a trustworthy process.

_ × We can't just write down the correct structure any more than we can just write down the correct content. We're trying to translate a vague human concept into precise instructions for an AI

✓ Agree with extensional definition of values, and relevance to decision-making.

• Research on the content of human values may be useful information about what humans consider to be human values. I think research on the structure of human values is in much the same boat - information, not the final say.

✓ Agree about Stuart's work being where you'd go to write down a precise set of preferences based on human preferences, and that the problems you mention are problems.

Solution overview

✓ Agree with assumptions.

• I think the basic model leaves out the fact that we're changing levels of description.

_ × Merely causing events (in the physical level of description) is not sufficient to say we're acting (in the agent level of description). We need some notion of "could have done something else," which is an abstraction about agents, not something fundamentally physical.

_ × Similar quibbles apply to the other parts - there is no physically special decision process, we can only find one by changing our level of description of the world to one where we posit such a structure.

_ × The point: Everything in the basic model is a statistical regularity we can observe over the behavior of a physical system. You need a bit more nuanced way to place preferences and meta-preferences.

_ • The simple patch is to just say that there's some level of description where the decision-generation process lives, and preferences live at a higher level of abstraction than that. Therefore preferences are emergent phenomena from the level of description the decision-generation process is on.

_ _ × But I think if one applies this patch, then it's a big mistake to use loaded words like "values" to describe the inputs (all inputs?) to the decision-generation process, which are, after all, at a level of description below the level where we can talk about preferences. I think this conflicts with the extensive definitions from earlier.

× If we recognize that we're talking about different levels of description, then preferences are not either causally after or causally before decisions-on-the-basic-model-level-of-abstraction. They're regular patterns that we can use to model decisions at a slightly higher level of abstraction.

_ • How to describe self-aware agents at a low level of abstraction then? Well, time to put on our GEB hats. The low level of abstraction just has to include a computation of the model we would use on the higher level of abstraction.

✓ Despite all these disagreements, I think you've made a pretty good case that the human brain plausibly computes a single currency (valence) that it uses to rate both most decisions and most predictions.

_ × But I still don't agree that this makes valence human values. I mean values in the sense of "the cluster we sometimes also point at with words like value, preference, affinity, taste, aesthetic, intention, and axiology." So I don't think we're left with a neuroscience problem, I still think what we want the AI to learn is on that higher level of abstraction where preferences live.

Comment by charlie-steiner on Tagging (Click Gear Icon to filter Coronavirus content) · 2020-03-23T23:49:46.254Z · score: 4 (2 votes) · LW · GW

Yay (!)

Comment by charlie-steiner on Quadratic models and (un)falsified data · 2020-03-14T07:55:23.473Z · score: 2 (1 votes) · LW · GW

It's just a measure of how close the data is to the line - like the "inside view" uncertainty that the model has about the data. In fact, that's more precisely what it is if this is the chi squared statistic (or square root thereof) that you minimized to fit the model. And it's in nice convenient units that you can compare to other things.

It's not quite right, because it uses an implicit prior about noise and models that doesn't match your actual state of information. But it's something that someone who's currently reporting R^2 to us can do in 30 seconds in Excel.

Comment by charlie-steiner on Adaptive Immune System Aging · 2020-03-13T06:49:24.500Z · score: 6 (4 votes) · LW · GW

This was not what I expected to learn today :) Alas, poor gonads, I hardly knew ye.

Comment by charlie-steiner on Puzzles for Physicalists · 2020-03-12T23:16:13.315Z · score: 5 (2 votes) · LW · GW

Well, I was skimming through Word and Object when I "became enlightened," but it may have mostly been a catalyst. Still recommended though?

I don't think l was very clear about what problem I was solving, and I don't think you managed to read my mind, so let me try again.

The problem I was interested in was: how does reference work? How can I point at or verbally indicate some thingie, and actually be indicating the thingie in question? And could I program that into an AI?

In your post, you connect this to indexicals, which I've interpreted as a question like "how does reference work? How can I point at or verbally indicate some thingie, and actually be indicating that thingie, in a way that you could explain to a microscopic physics simulation?"

One of the key parts of the solution is that words don't have inherent "aboutness" attached to them. Reference doesn't make any sense if you just focus on the speaker and try to define the aboutness in their statements. It needs to be interpreted as communication, which uses some notion of a functional audience you're constructing a message for.

So that question of "How do I verbally indicate the thing and really indicate it?" has to be left unanswered to the extent that we have false beliefs about our ability to "really indicate" things. Instead, I advocate breaking it down into questions about how you model other people and choose communicative acts.

So I am absolutely not saying we should replace "is x true?" with "is x a communicatively useful act?". The closest thing I'm saying would be that we can cash out "what is the referent of sentence x?" into "what is the modeled audience getting pointed at by the act of saying sentence x?".

I'm not sure how you're interpreting physicalism here. But if we single out the notion that there should be some kind of "physics shorthand" for human concepts and references - like H2O is for water, or like the toy model of reference as passing numerical coordinates - then yeah, there is no physics shorthand. Where there is something like it, it is humans that have done the work to accomodate physics, not vice versa.

Comment by charlie-steiner on Puzzles for Physicalists · 2020-03-12T13:43:53.506Z · score: 7 (3 votes) · LW · GW

Yeah, I spent a lot of last year struggling with the reference thing. In the end I decided that reference was not fundamental even within the human-centered picture, and that reference was just a special case of communication (in the sense of Quine, Grice, et al.: I do a communicative act because I model you as modeling why I do it.)

Figuring this out made me a bit upset with academic philosophy, because I'd been looking through the recent literature fruitlessly before I found Quine basically solving the problem 50 years before. This is the opposite of the problem I usually pin on philosophy, that it's too backward-looking. In this case, it's more like the people talking about reference within the last 20 years are all self-selected for not caring about Quine much at all.

Whether or not you find this useful may depend on a certain mental maneuver of taking something you were asking a question about, and breaking it into pieces rather than answering the question. In this case, "How are the semantics of a sentence determined?" is a question, but rather than answering it I'm advocating getting rid of this high-level-of-abstraction word "semantics" by working in a more concrete level of description where there are humans with models of each other. And of course I've framed this in a very palatable way, but I think whether this maneuver feels good or not is a big dividing line - if you have the unshakeable feeling that I have missed something vital by not answering the original question, then you fall on the other side of the line - though perhaps one can still be lured over with practical applications.

Comment by charlie-steiner on [Article review] Artificial Intelligence, Values, and Alignment · 2020-03-12T13:17:55.473Z · score: 4 (2 votes) · LW · GW

It sure seems like if he really grokked the philosophical and technical challenge of getting a GAI agent to be net beneficial, he would write a different paper. That first challenge sort of overshadows the task of dividing up the post-singularity pie.

But I'm not sure whether the overshadowing is merely by being bigger (in which case this paper is still doing useful work), or if we should expect that solutions to the pie-dividing problems (e.g. weighing egalitarianism vs. utilitarianism) will necessarily fall out of the process that lets the AI learn how to behave well.

Comment by charlie-steiner on Zoom In: An Introduction to Circuits · 2020-03-11T09:18:25.506Z · score: 4 (2 votes) · LW · GW

I'll probably post a child comment after I actually read the article, but I want to note before I do that I think the power of ResNets are evidence against these claims. Having super-deep networks with residual connections promote a picture that looks much more like a continuous "massaging" of the data than a human-friendly decision tree.

Comment by charlie-steiner on Quadratic models and (un)falsified data · 2020-03-09T06:00:38.906Z · score: 4 (2 votes) · LW · GW

Thanks!

Picking a descriptive statistic for these sorts of problems is pretty tricky. But I think we can do better than R^2, even without going all Bayesian-parameter-estimation.

What I mostly care about is just the standard deviation (in excel, STDEV.S() ) of the difference between the data and the model. Then I want to know how this compares to other scales in the data (like the average number of new cases per day).

Comment by charlie-steiner on Attainable Utility Preservation: Scaling to Superhuman · 2020-02-27T18:40:58.697Z · score: 4 (2 votes) · LW · GW

Right. Some intuition is necessary. But a lot of these choices are ad hoc, by which I mean they aren't strongly constrained by the result you want from them.

For example, you have a linear penalty governed by this parameter lambda, but in principle it could have been any old function - the only strong constraint is that you want it to monotonically increase from a finite number to infinity. Now, maybe this is fine, or maybe not. But I basically don't have much trust for meditation in this sort of case, and would rather see explicit constraints that rule out more of the available space.

Comment by charlie-steiner on Attainable Utility Preservation: Scaling to Superhuman · 2020-02-27T18:36:13.454Z · score: 4 (2 votes) · LW · GW

Jinx :P

Comment by charlie-steiner on Attainable Utility Preservation: Scaling to Superhuman · 2020-02-27T09:59:33.186Z · score: 7 (4 votes) · LW · GW

My very general concern is that strategies that maximize might be very... let's say creative, and your claims are mostly relying on intuitive arguments for why those strategies won't be bad for humans.

I don't really buy the claim that if you've been able to patch each specific problem, we'll soon reach a version with no problems - the exact same inductive argument you mention suggests that there will just be a series of problems, and patches, and then more problems with the patched version. Again, I worry that patches are based a lot on intuition.

For example, in the latest version, because you're essentially dividing out by the long-term reward of taking the best action now, if the best action now is really really good, then it becomes cheap to take moderately good actions that still increase future reward - which means the agent is incentivized to concentrate the power of actions into specific timsteps. For example, an agent might be able to set things up so that it can sacrifice its ability to achieve total future reward of to make it cheap to take an action that increases its future reward by . This might looks like sacrificing the ability to colonize distant galaxies in order to gain total control over the Milky Way.

Comment by charlie-steiner on How much delay do you generally have between having a good new idea and sharing that idea publicly online? · 2020-02-24T06:04:57.796Z · score: 3 (2 votes) · LW · GW

For interesting stuff, two weeks to two months. Usually this is warranted, because ideas are cheap but filtering and thinking are hard. The ideal faster time mostly just means that ideally I'd be spending more hours per week on ideas, not that I'd be spending less time per idea.

Comment by charlie-steiner on Curiosity Killed the Cat and the Asymptotically Optimal Agent · 2020-02-23T07:30:14.137Z · score: 2 (1 votes) · LW · GW

After a bit more thought, I've learned that it's hard to avoid ending back up with EU maximization - it basically happens as soon as you require that strategies be good not just on the true environment, but on some distribution of environments that reflect what we think we're designing an agent for (or the agent's initial state of knowledge about states of the world). And since this is such an effective tool at penalizing the "just pick the absolute best answer" strategy, it's hard for me to avoid circling back to it.

Here's one possible option, though: look for strategies that are too simple to encode the one best answer in the first place. If the absolute best policy has K-complexity of 10^3 (achievable in the real world by strategies being complicated, or in the multi-armed bandit case by just having 2^1000 possible actions) and your agent is only allowed to start with 10^2 symbols, this might make things interesting.

Comment by charlie-steiner on Northwest Passage Update · 2020-02-23T07:18:21.348Z · score: 4 (2 votes) · LW · GW

I like it! But you know, Northwest Passage is already written as a retrospective.

Three centuries thereafter, I take passage overland
In the footsteps of brave Kelso, where his "sea of flowers" began
Watching cities rise before me, then behind me sink again
This tardiest explorer, driving hard across the plain.
And through the night, behind the wheel, the mileage clicking west
I think upon Mackenzie, David Thompson and the rest
Who cracked the mountain ramparts and did show a path for me
To race the roaring Fraser to the sea.

Because the singer is modern, the chorus "Ah, for just one time / I would take the Northwest Passage" is about wishing to identify a lonely life with the grandeur of the past. A verse about the loss of the historical arctic would tie right back into this without needing to change the chorus a jot.

Comment by charlie-steiner on Curiosity Killed the Cat and the Asymptotically Optimal Agent · 2020-02-20T22:59:13.418Z · score: 2 (1 votes) · LW · GW

Maybe optimality relative to the best performer out of some class of algorithms that doesn't include "just pick the absolute best answer?" You basically prove that in environments with traps, anything that would, absent traps, be guaranteed to find the absolute best answer will instead get trapped. So those aren't actually very good performers.

I just can't come up with anything too clever, though, because the obvious classes of algorithms, like "polynomial time," include the ability to just pick the absolute best answer by luck.

Comment by charlie-steiner on Goal-directed = Model-based RL? · 2020-02-20T22:19:07.500Z · score: 3 (2 votes) · LW · GW

The former (that is, model-based RL-> agent). The latter (smart agent -> model-based RL), I think, would be founded on a bit of a level error. At bottom, there are only atoms and the void. Whether something is "really" an agent is a question of how well we can describe this collection of atoms in terms of an agent-shaped model. This is different from the question of what abstractions humans used in the process of programming the AI; Like Rohin says, parts of the agent might be thought of as implicit in the programming, rather than explicit.

Sorry, I don't know if I can direct you to any explicit sources. If you check out papers like Concrete Problems in AI Safety or others in that genre, though, you'll see model-based RL used as a simplifying set of assumptions that imply agency.

Comment by charlie-steiner on Curiosity Killed the Cat and the Asymptotically Optimal Agent · 2020-02-20T20:41:32.286Z · score: 2 (1 votes) · LW · GW

It seems like the upshot is that even weak optimality is too strong, since it has to try everything once. How does one make even weaker guarantees of good behavior that are useful in proving things, without just defaulting to expected utility maximization?

Comment by charlie-steiner on Goal-directed = Model-based RL? · 2020-02-20T20:22:16.232Z · score: 4 (3 votes) · LW · GW

Yup, I'm pretty sure people are aware of this :) See also the model of an agent as something with preferences, beliefs, available actions, and a search+decision algorithm that makes it take actions it believes will help its preferences.

But future AI research will require some serious generalizations that are left un-generalized in current methods. A simple gridworld problem might treat the entire grid as a known POMDP and do search over possible series of actions. Obviously the real world isn't a known POMDP, so suppose that we just call it an unknown POMDP and try to learn it through observation - now all of a sudden, you can't hand-specify a cost function in terms of the world model anymore, so that needs to be re-evaluated as well.

Comment by charlie-steiner on How do you survive in the humanities? · 2020-02-20T19:24:13.831Z · score: 9 (12 votes) · LW · GW

Obviously I have much less information about your situation than you, but it seems to me that you're not in the right here, and you should be less adversarial.

E.g.:

Yesterday, a teacher was explaining the composition of a literary essay, and she claimed that an essay writer isn't required to provide justification for their claims. I asked, "Then why should I believe anything the essay says?" and she replied, "You're free to decide whether you believe it or not," and I was just too exhausted from last week to explain that's not how beliefs should work.

But an essay writer isn't required to provide justification for their claims. For example, your first sentence is a claim that you have started studying creative writing full time. Have you justified this to me (either in some unattainable absolute sense, or even just beyond reasonable doubt)? No. Should you? Also no.

When you run into an absurd claim like "An essay writer isn't required to provide justifications for their claims," you should think seriously about how it might be true. I think you're only going to be satisfied by understanding communication on a more detailed level than your professors do, but you should do that, not just reject what they say.

Back to the object level: Why should I believe you when you claim that you've started studying creative writing? I do believe you, of course - but practically speaking, why do I do that? Try to figure out an answer that generalizes by being based on the practicalities of how humans communicate and infer things about the world. And then apply that answer back to what sort of evidence an essay writer needs to provide to their audience to do their job well.

I also think you're trying to use arguments in ways that won't work. Robert Nozick makes some clever comments about arguments in best part of his book Philosophical Explanations (the introduction), something like: The goal of most philosophers seems to be to find arguments so compelling, that if a person were to disagree with the conclusion after reading the argument, their head would explode.

Longer quote:

The terminology of philosophical art is coercive: arguments are powerful and best when they are knockdown, arguments force you to a conclusion, if you believe the premises you have to or must believe the conclusion, some arguments do not carry much punch, and so forth. A philosophical argument is an attempt to get someone to believe something, whether he wants to believe it or not. A successful philosophical argument, a strong argument, forces someone to a belief.
Though philosophy is carried on as a coercive activity, the penalty philosophers wield is, after all, rather weak. If the other person is willing to bear the label of "irrational" or "having the worse arguments," he can skip away happily maintaining his previous belief. He will be trailed, of course, by the philosopher furiously hurling philosophical imprecations: "What do you mean, you're willing to be irrational? You shouldn't be irrational because..." And although the philosopher is embarrassed by his inability to complete this sentence in a noncircular fashion - he can only produce reasons for accepting reasons - still, he is unwilling to let his adversary go.
Wouldn't it be better if philosophical arguments left the person no possible answer at all, reducing him to impotent silence? Even then, he might sit there silently, smiling, Buddhalike. Perhaps philosophers need arguments so powerful they set up reverberations in the brain: if the person refuses to accept the conclusion, he dies. How's that for a powerful argument. Yet, as with other physical threats ("your money or your life"), he can choose defiance. A "perfect" philosophical argument would leave no choice.

But the point of that chapter is that such arguments don't exist. They're an oversimplification of how arguments work. A fiction. Much like an essay, an argument with a professor is an exercise in communication, not in structuring a coercive argument.

So if nothing else, I think taking a more learning-oriented approach towards your professors might make you more likely to be able to convince them of things :)

Comment by charlie-steiner on Reference Post: Trivial Decision Problem · 2020-02-18T08:02:32.325Z · score: 4 (2 votes) · LW · GW

Reflective modification flow: Suppose we have an EDT agent that can take an action to modify its decision theory. It will try to choose based on the average outcome conditioned on taking the different decision. In some circumstances, EDT agents are doing well so it will expect to do well by not changing; in other circumstances, maybe it expects to do better conditional on self-modifying to use the Counterfactual Perspective more.

Evolutionary flow: If you put a mixture of EDT and FDT agents in an evolutionary competition where they're playing some iterated game and high scorers get to reproduce, what does the population look like at large times, for different games and starting populations?

Comment by charlie-steiner on On characterizing heavy-tailedness · 2020-02-17T09:58:04.558Z · score: 3 (2 votes) · LW · GW

It seems like if I'm trying to talk about a real-world case with finite support, I'll say something like "it's not actually a power law - but it's well described by one over the relevant range of values." Meaning that I have some notion of "relevant" which is probably derived from action-relevance, or relevance to my observations, or maybe computational complexity.

If I can't say that, then the other main option is that I care more and more as the power law gets more extreme, and then as the possibilities reach their physical limit I care most of all. But cases like this are so idiosyncratic that maybe there's no point in trying to develop a unified language for them.

Comment by charlie-steiner on A 'Practice of Rationality' Sequence? · 2020-02-17T07:46:00.928Z · score: 12 (3 votes) · LW · GW

I usefully demonstrated rationality superpowers yesterday by bringing a power strip to a group project with limited power outlets.

Now, you could try to grind this ability by playing improv games with the situations around you, looking for affordances, needs, and solutions. But this is only a sub-skill, and I think most of my utility comes from things that are more like mindset technology.

A personal analogy: If I want to learn the notes of a tune on the flute, it works fine to just play it repeatedly - highly grindable. If I want to make that tune sound better, this is harder to grind but still doable; it involves more skillful listening to others, listening to yourself, talking, trial and error. If I want to improve the skills I use to make tunes sound better, I can make lots of tunes sound better, but less of my skill is coming from grinding now, because that accumulation is slower than other methods of learning. And if I want to improve my ability to learn the skills used in making tunes sound better...

Well, first off, that's a rationality-adjacent skill, innit? But second, grinding that is so slow, and so stochastic, that it's hard to distinguish from just living my life, but just happening to try to learn things, and accepting that I might learn one-time things that obviate a lot of grinding.

So maybe the real grinding was bringing the power strip all along.

Comment by charlie-steiner on The Catastrophic Convergence Conjecture · 2020-02-17T05:08:57.704Z · score: 2 (1 votes) · LW · GW

How much are you thinking about stability under optimization? Most objective catastrophes are also human catastrophes. But if a powerful agent is trying to achieve some goal while avoiding objective catastrophes, it seems like it's still incentivized to dethrone humans - to cause basically the most human-catastrophic thing that's not objective-catastrophic.

Comment by charlie-steiner on Reference Post: Trivial Decision Problem · 2020-02-17T01:43:58.407Z · score: 2 (1 votes) · LW · GW

I'm definitely satisfied with this kind of content.

The names suggest you're classifying decision procedures by what kind of thoughts they have in special cases. But "sneakily" the point is this is relevant because these are the kinds of thoughts they have all the time.

I think the next place to go is to put this in the context of methods of choosing decision theories - the big ones being reflective modification and evolutionary/population level change. Pretty generally it seems like the trivial perspective is unstable is under these, but there are some circumstances where it's not.

Comment by charlie-steiner on What can the principal-agent literature tell us about AI risk? · 2020-02-10T06:53:59.709Z · score: 11 (6 votes) · LW · GW

Thank you for putting all the time and thoughtfulness into this post, even if the conclusion is "nope, doesn't pan out." I'm grateful that it's out here.

Comment by charlie-steiner on Did AI pioneers not worry much about AI risks? · 2020-02-10T01:10:03.435Z · score: 19 (6 votes) · LW · GW

I think it's mostly (3). Not because AI safety is an outlier, but because of how much work people had to do to come to grips with Moravec's paradox.

If you take someone clever and throw them at the problem of GAI, the first thing they'll think of is something doing logical reasoning, able to follow natural language commands. Their intuition will be based on giving orders to a human. It takes a lot of work to supplant that intuition with something more mechanistic.

Like, it seems obvious to us now that building something that takes natural language commands and actually does what we mean is a very hard problem. But this is exactly a Moravec's paradox situation, because knowing what people mean is mostly effortless and unconscious to us.

Comment by charlie-steiner on The BODY alignment game. Take a break, have a quick play. · 2020-02-08T07:42:30.890Z · score: 3 (2 votes) · LW · GW

Hey, thanks for this congenial reply to my fairly rude comment :)

So, I bring up the military thing because of a roommate of mine, but if I google "military posture tips," I get this page, which basically says that if you're hunched forward, you need to stretch the muscles causing that force, and exercise the muscles that naturally oppose it. In short, get a stronger upper and lower back! They also give specific recommendations (albeit mostly geared towards body-weight exercises easy for a home reader to do).

Comment by charlie-steiner on Synthesizing amplification and debate · 2020-02-06T10:40:46.172Z · score: 4 (2 votes) · LW · GW

I really love the level of detail in this sketch!

I'm mentally substituting for some question more like "should this debate continue?", because I think the setup you describe keeps going until is satisfied with an answer, which might be never for weak . It's also not obvious to me that this reward system you describe actually teaches agents to debate between odd and even steps. If there's a right answer that the judge might be convinced of, I think will be trained to give it no matter the step parity, because when that happens it gets rewarded.

Really, it feels like the state of the debate is more like the state of a RNN, and you're going to end up training something that can make use of that state to do a good job ending debates and making the human response be similar to the model response.

Comment by charlie-steiner on Potential Research Topic: Vingean Reflection, Value Alignment and Aspiration · 2020-02-06T09:26:10.534Z · score: 5 (3 votes) · LW · GW

Thanks! This is an interesting recommendation.

I was definitely struck by the resemblance between her notion of "normative dependence" and the ideas behind the CIRL framework. And I think that the fix to the AI reasoning about something more intelligent is more or less the same thing humans do, which is we abstract away the planning and replace it with some "power" to do something. Like if I imagine playing Magnus Carlsen in chess, I don't simulate a chess game at all, I compare an imaginary chess-winning power I attribute to us in my abstracted mental representation.

But as for the philosophical problems she mentions in the interview, I felt like they fell into pretty standard orthodox philosophical failure modes. For the sake of clarity, I guess I should say I mean the obsession with history, and the default assumption that questions have one right answer - things I think are boondoggles have to be addressed just because they're historical, and there's too much worry about what humans "really" are like as opposed to consideration of models of humans.

Comment by charlie-steiner on Writeup: Progress on AI Safety via Debate · 2020-02-06T06:01:31.950Z · score: 7 (3 votes) · LW · GW

You have an entire copy of the post in the commenting guidelines, fyi :)

What's often going on in unresolvable debates among humans is that there is a vague definition baked into the question, such that there is no "really" right answer (or too many right answers).

E.g. "Are viruses alive?"

To the extent that we've dealt with the question of whether viruses are alive, it's been by understanding the complications and letting go of the need for the categorical thinking that generated the question in the first place. Allowing this as an option seems like it brings back down the complexity class of things you can resolve debates on (though if you count "it's a tie" as a resolution, you might retain the ability to ask questions in PSPACE but just have lots of uninformative ties and only update your own worldview when it's super easy).

For questions of value, though, this approach might not even always work, because the question might be "is it right to take action A or action B," and even if you step back from the category "right" because it's too vague, you still have to choose between action A or B. But you still have the original issue that the question has too few / too many right answers. Any thoughts on ways to make debate do work on this sort of tricky problem?

Comment by charlie-steiner on The BODY alignment game. Take a break, have a quick play. · 2020-02-05T05:15:06.325Z · score: 3 (2 votes) · LW · GW

Weird question, why bother mentally mapping two different ends of the same bone (sternum)? In fact, why all this trivia about bone knobs in the first place? If I want good posture, I'd be better off learning the lessons of the military, and if I want to relax, why bone knobs?

Comment by charlie-steiner on Money isn't real. When you donate money to a charity, how does it actually help? · 2020-02-03T04:48:55.476Z · score: 3 (3 votes) · LW · GW

It's fine, people are only 1 layer of unreality removed from money, so they can interact via gravity, which "leaks" into the 4th dimension (explaining why it's so much weaker than the electromagnetic force).

Comment by charlie-steiner on Towards deconfusing values · 2020-02-03T02:09:50.159Z · score: 4 (2 votes) · LW · GW

When you say the human decision procedure causes human values, what I hear is that the human decision procedure (and its surrounding way of describing the world) is more ontologically basic than human values (and their surrounding way if describing the world).

Our decision procedure is "the reason for our values" in the same way that the motion of electric charge in your computer is the reason it plays videogames (even though "the electric charge is moving" and "it's playing a game" might be describing the same physical event). The arrow between them isn't the most typical causal arrow between two peers in a singular way of describing the world, it's an arrow of reduction/emergence, between things at different levels of abstraction.

Comment by charlie-steiner on Instrumental Occam? · 2020-02-01T00:30:38.626Z · score: 7 (2 votes) · LW · GW

Whenever you're picking a program to do a task, you run into similar constraints as when you're trying to pick a program to make predictions. There an infinite number of programs, so you can't just put a uniform prior on which one is best, you need some way of ranking them. Simplicity is a ranking. Therefore, ranking them by simplicity works, up to a constant. The induction problem really does seem similar - the only difference is that in prediction, the cost function is simple, but in action, the cost function can be complicated.

I don't have a copy of Li and Vitanyi on me at the moment, but I wonder if they have anything on how well SI does if you have a cost function that's a computable function of prediction errors.

Comment by charlie-steiner on Appendix: how a subagent could get powerful · 2020-01-29T20:34:22.865Z · score: 4 (2 votes) · LW · GW

Basically this is because the agent treats itself specially (imagining intervening on its own goals) but can treat the subagent as a known quantity (which can be chosen to appropriately respond to imagined interventions on the agent's goals)?

Comment by charlie-steiner on Using vector fields to visualise preferences and make them consistent · 2020-01-29T08:27:36.004Z · score: 4 (2 votes) · LW · GW

Sure. But same as DanielV's point about elimiating circularity by just asking for the complete preference ordering, we are limited by what humans can think about.

Humans have to think in terms of high-level descriptions of approximately constant size, no matter the spatiotemporal scale. We literally cannot elicit preferences over universe-histories, much as we'd like to.

What we can do, maybe, is elicit some opinions on these "high-level descriptions of approximately constant size," at many different spatiotemporal scales: ranging from general opinions on how the universe should go to what could improve the decor of your room today. Stitching these together into a utility function over universe histories is pretty tricky, but I think there might be some illuminating simplifications we could think about.

Comment by charlie-steiner on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-29T08:14:42.378Z · score: 7 (3 votes) · LW · GW

My take is that we (the characters) have some wireheadable goals (e.g. curing a headache), but we also have plenty of goals best understood externally.

But the "player" is a less clearly goal-oriented process, and we can project different sorts of goals onto it, ranging from "it wants to make the feedback signal from the cortical neurons predict the output of some simple pattern detector" to "it wants us to avoid spiders" to "it wants us to be reproductively fit."

Comment by charlie-steiner on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-29T08:05:10.924Z · score: 2 (1 votes) · LW · GW

You might have already mentioned this elsewhere, but do you have any reading recommendations for computation and the brain?

Comment by charlie-steiner on Using vector fields to visualise preferences and make them consistent · 2020-01-29T02:18:21.042Z · score: 2 (1 votes) · LW · GW

Here's the fundamental, concrete problem: if you figure out "the best state of the universe," do you just make the universe be in that state for ever and ever? No, that sounds really, really boring.

Which is to say, it's possible for a scheme like this to not even include good results in its hypothesis space.

Also, yeah, figure out how Helmholtz decomposition works. It's useful.

Comment by charlie-steiner on Using vector fields to visualise preferences and make them consistent · 2020-01-29T01:53:24.890Z · score: 5 (3 votes) · LW · GW

One problem with getting peoples' entire ranking at once is that we're cognitively incapable of ranking all states of the universe, so some approximation has to be used.

Your point about the elicitation method is interesting. In some sense, the problem is utterly inescapable, because "what do you think about A?" is literally a different elicitation than "what do you think about B?", prompting the listener to think of different rating criteria.

Comment by charlie-steiner on The two-layer model of human values, and problems with synthesizing preferences · 2020-01-28T00:48:56.755Z · score: 3 (2 votes) · LW · GW

Speaking as a character, I too think the player can just go jump in a lake.

My response to this post is to think about something else instead, so if you'll excuse me getting on a hobby horse...

I agree that when we look at someone making bizarre rationalizations, "their values" are not represented consciously, and we have to jump to a different level to find human values. But I think that conscious->unconscious is the wrong level jump to make.

Instead, the jump I've been thinking about recently is to our own model of their behavior. In this case, our explanation of their behavior relies on the unconscious mind, but in other cases, I predict that we'll identify values with conscious desires when that is a more parsimonious explanation of behavior. An AI learning human values would then not merely be modeling humans, but modeling humans' models of humans. But I think it might be okay if it makes those models out of completely alien concepts (at least outside of deliberately self-referential special cases - there might be an analogy here to the recursive modeling of Gricean communication).

Comment by charlie-steiner on On hiding the source of knowledge · 2020-01-27T03:15:34.102Z · score: 6 (3 votes) · LW · GW

I need to work things out carefully not to obey norms of communication (I have some posts on reasoning inside causal networks that I think literally only Ilya S. put in the effort to decode - weird flex, I know), but to protect me from myself.

But maybe I'm thinking of a different context than you, here. If I was writing about swing dancing or easy physics, I'd probably be a lot happier to trust my gut. But for new research, or in other cases where there's optimization pressure working against human intuition, I think it's better to check, and not much worth reading low-bandwidth text from someone who doesn't check.

Comment by charlie-steiner on On the ontological development of consciousness · 2020-01-27T03:03:49.183Z · score: 2 (1 votes) · LW · GW

This is a really nice way of explaining the "camera-like point of view." Obv. "consciousness" has a bunch of extra grab-bag components that we associate with the word.