Posts

Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z · score: 13 (10 votes)
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z · score: 9 (3 votes)
The AI is the model 2019-10-04T08:11:49.429Z · score: 12 (10 votes)
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z · score: 17 (5 votes)
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z · score: 14 (5 votes)
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Value learning for moral essentialists 2019-05-06T09:05:45.727Z · score: 13 (5 votes)
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z · score: 20 (6 votes)
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z · score: 40 (10 votes)
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z · score: 40 (21 votes)
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z · score: 16 (5 votes)
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z · score: 19 (7 votes)
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z · score: 53 (13 votes)
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z · score: 101 (27 votes)
A useful level distinction 2018-02-24T06:39:47.558Z · score: 26 (6 votes)
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z · score: 18 (9 votes)
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z · score: 8 (3 votes)
Dan Dennett on Stances 2017-12-27T08:15:53.124Z · score: 8 (4 votes)
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z · score: 11 (5 votes)
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z · score: 25 (9 votes)
Limited agents need approximate induction 2015-04-24T21:22:26.000Z · score: 1 (1 votes)

Comments

Comment by charlie-steiner on The Goodhart Game · 2019-11-20T18:35:44.828Z · score: 2 (1 votes) · LW · GW

Pretty sure you understood it :) But yeah, not only would I like to be able to compare two things, I'd like to be able to find the optimum values of some continuous variables. Though I suppose it doesn't matter as much if you're trying to check / evaluate ideas that you arrived at by more abstract reasoning.

Comment by charlie-steiner on Cybernetic dreams: Beer's pond brain · 2019-11-20T05:42:23.451Z · score: 2 (1 votes) · LW · GW

I'm also looking forward to upcoming posts, but all these examples so far sound to me like a modernist's substitute for sympathetic magic :P

Comment by charlie-steiner on Drawing on Walls · 2019-11-20T05:30:25.464Z · score: 4 (2 votes) · LW · GW

Sounds like a sales pitch for whiteboard wallpaper :)

Comment by charlie-steiner on The Goodhart Game · 2019-11-20T02:32:20.722Z · score: 2 (1 votes) · LW · GW

The impractical part about training for good behavior is that it's a nested loop - every training example on how to find good maxima requires training a model that in turn needs its own training examples. So it's destined to be behind the state of the art, probably using state of the art models to generate the copious required training data.

The question, I suppose, is whether this is still good enough to learn useful general lessons. And after thinking about it, I think the answer is that yes, it should be, especially for feed-forward architectures that look like modern machine learning, where you don't expect qualitative changes in capability as you scale computational resources.

Comment by charlie-steiner on Impossible moral problems and moral authority · 2019-11-19T04:46:44.618Z · score: 3 (2 votes) · LW · GW

Yes, I hope that my framing of the problem supports this sort of conclusion :P

An alternate framing where it still seems important would be "moral uncertainty". Where when we don't know what to do, it's because we are lacking some facts, maybe even key facts. So I'm sort of sneakily arguing against that frame.

Comment by charlie-steiner on The Power to Draw Better · 2019-11-18T10:40:45.868Z · score: 4 (2 votes) · LW · GW

Any sequence that involves recommending people work through Drawing on the Right Side of the Brain is a sequence I should read :P

Comment by charlie-steiner on Can indifference methods redeem person-affecting views? · 2019-11-17T22:24:51.679Z · score: 2 (1 votes) · LW · GW

You mean, why I expect a person-affecting utility function to be different if evaluated today v. tomorrow?

Well, suppose that today I consider the action of creating a person, and am indifferent to creating them. Since this is true for all sorts of people, I am indifferent to creating them one way vs. another (e.g. happy vs sad). If they are to be created inside my guest bedroom, this means I am indifferent between certain ways the atoms in my guest bedroom could be arranged. Then if this person gets created tonight and is around tomorrow, I'm no longer indifferent between the arrangement that is them sad and the arrangement that is them happy.

Yes, you could always reverse-engineer a utility function over world-histories that encompasses both of these. But this doesn't necessarily solve the problems that come to mind when I say "change in utility functions" - for example, I might take bets about the future that appear lose/lose when I have to pay them off, or take actions that modify my own capabilities in ways I later regret.

I dunno - were you thinking of some specific application of indifference that could sidestep some of these problems?

Comment by charlie-steiner on Can indifference methods redeem person-affecting views? · 2019-11-12T18:03:45.734Z · score: 2 (1 votes) · LW · GW

Hilary Greaves sounds like a really interesting person :)

So, you could use these methods to construct a utility function corresponding to the person-affecting viewpoint from your current world, but this wouldn't protect this utility function from critique. She brings up the Pareto principle, where this person-affecting utility function would be indifferent to some things that were strict improvements, which seems undesirable.

I think the more fundamental problem there is intransitivity. You might be able to define a utility function that captures the person-affecting view to you, but a copy of you one day later (or one world over) would say "hang on, I didn't agree to that." They'd make their own utility function with priorities on different people. And so you end up fighting with yourself, until one of you can self-modify to actually give up the person-affecting view, and just keep this utility function created by their past self.

A more reflective self might try to do something clever like bargaining between all selves they expect to plausibly be (and who will follow the same reasoning), and taking actions that benefit those selves, confident that their other selves will keep their end of the bargain.

My general feeling about population ethics, though, is that it's aesthetics. This was a really important realization for me, and I think most people who think about population ethics don't think about the problem the right way. People don't inherently have utility, utility isn't a fluid stored in the gall bladder, it's something evaluated by a decision-maker when they think about possible ways for the world to be. This means it's okay to have a preferred standard of living for future people, to have nonlinear terms on population and "selfish" utility, etc.

Comment by charlie-steiner on An optimal stopping paradox · 2019-11-12T16:08:37.156Z · score: 1 (2 votes) · LW · GW

If the growth is exponential, I still don't think there's a paradox - sure, you're incentivized to wait forever, but I'm already incentivized to wait forever with my real life investments. The only thing that stops me from real life investing my money forever is that sometimes I have things (not included in the toy problem) that I really want to buy with that money.

Comment by charlie-steiner on What are human values? - Thoughts and challenges · 2019-11-11T17:42:27.545Z · score: 3 (2 votes) · LW · GW

So, the dictionary definition (SEP) would be something like "objectively good/parsimonious/effective ways of carving up reality."

There's also the implication that when we use kinds in reasoning, things of the same kind should share most or all important properties for the task at hand. There's also sort of the implication that humans naively think of the world as made out of natural kinds on an ontologically basic level.

I'm saying that even if people don't believe in disembodied souls, when they ask "what do I want?" they think they're getting an answer back that is objectively a good/parsimonious/effective way of talking. That there is some thing, not necessarily a soul but at least a pattern, that is being accessed by different ways of asking "what do I want?", which can't give us inconsistent answers because it's all one thing.

Comment by charlie-steiner on Neural nets as a model for how humans make and understand visual art · 2019-11-11T15:43:18.386Z · score: 2 (1 votes) · LW · GW

Thanks for the reply :)

Sure, you can get the AI to draw polka-dots by targeting a feature that likes polka dots, or a Mondrian by targeting some features that like certain geometries and colors, but now you're not using style transfer at all - the image is the style. Moreover, it would be pretty hard to use this to get a Kandinsky, because the AI that makes style-paintings has no standard by which it would choose things to draw that could be objects but aren't. You'd need a third and separate scheme to make Kandinskys, and then I'd just bring up another artist not covered yet.

If you're not trying to capture all human visual art in one model, then this is no biggie. So now you're probably going "this is fine, why is he going on about this." So I'll stop.

Do you have examples in mind when you mention "human experience" and "embodiment" and "limited agents"

For "human experience," yeah, I just means something like communicative/evocative content that relies on a theory of mind to use for communication. Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before. I'm also reminded of Karpathy's example of Obama with his foot on the scale.

For embodiment, this means art that blurs the line between visual and physical. I was thinking of how some things aren't art if they're normal sized, but if you make them really big, then they're art. Since all human art is physical art, this line can be avoided mostly but not completely.

For "limited," I imagined something like Dennett's example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them. Compared to the example above of using understanding of humans to choose content, this example uses an understanding of humans to choose style.

Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I'm curious whether this was surprising to you.

It was impressive, but I remember the old 2015 post the Chris Olah co-authored. First off, if you look at the pictures, they're less pretty than the pictures that came later. And I remember one key sentence: "By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated." My impression is that DeepDream et al. have been trained to make visual art - by hyperparameter tuning (grad student descent).

Comment by charlie-steiner on Neural nets as a model for how humans make and understand visual art · 2019-11-10T17:43:58.728Z · score: 2 (1 votes) · LW · GW

I like this exposition, but I'm still skeptical about the idea.

Since "art" is a human concept, it's naturally a grab bag of lots of different meanings. It's plausible that for some meanings of "art," humans do something similar to searching through a space of parameters for something that strongly activates some target concept within the constraints of a style. But there's also a lot about art that's not like that.

Like art that's non-representational, or otherwise denies the separation between form and content. Or art that's heavily linguistic, or social, or relies on some sort of thinking on the part of the audience. Art that's very different for the performer and the audience, so that it doesn't make sense to talk about a search process optimizing for the audience's experience, or otherwise doesn't have a search process as a particularly simple explanation. Art that's so rooted in emotion or human experience that we wouldn't consider an account of it complete without talking about the human experience. Art that only makes sense when considering humans as embodied, limited agents.

So if I consider the statement "the DeepDream algorithm is doing art," there is a sense in which this is reasonable. But I don't think that extends to calling what DeepDream does a model for what humans do when we think about or create art. We do something not merely more complicated in the details, but more complicated in its macros-structure, and hooked into many of the complications of human psychology.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T18:42:44.867Z · score: 2 (1 votes) · LW · GW

Dropout is like the converse of this - you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model - whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.

I think the root of the issue is that this Shapley value doesn't distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn't mean that I would currently benefit from additional blood.

Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.

Comment by charlie-steiner on Open & Welcome Thread - November 2019 · 2019-11-09T14:57:56.912Z · score: 2 (1 votes) · LW · GW

0.3 mg melatonin an hour before I want to be asleep works, my only trouble is actually planning in advance.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T12:39:55.024Z · score: 4 (2 votes) · LW · GW
You look at the world, and you say: "how can I maximize utility?" You look at your beliefs, and you say: "how can I maximize accuracy?" That's not a consequentialist agent; that's two different consequentialist agents!

Not... really? "how can I maximize accuracy?" is a very liberal agentification of a process that might be more drily thought of as asking "what is accurate?" Your standard sequence predictor isn't searching through epistemic pseudo-actions to find which ones best maximize its expected accuracy, it's just following a pre-made plan of epistemic action that happens to increase accuracy.

Though this does lead to the thought: if you want to put things on equal footing, does this mean you want to describe a reasoner that searches through epistemic steps/rules like an agent searching through actions/plans?

This is more or less how humans already conceive of difficult abstract reasoning. We don't solve integrals by gradient descent, we imagine doing some sort of tree search where the edges are different abstract manipulations of the integral. But for everyday reasoning, like navigating 3D space, we just use our specialized feed-forward hardware.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T08:56:03.226Z · score: 2 (1 votes) · LW · GW

Removing things entirely seems extreme. How about having a continuous "contribution parameter," where running the algorithm without an element would correspond to turning this parameter down to zero, but you could also set the parameter to 0.5 if you wanted that element to have 50% of the influence it has right now. Then you can send rewards to elements if increasing their contribution parameter would improve the decision.

:P

Comment by charlie-steiner on What AI safety problems need solving for safe AI research assistants? · 2019-11-06T09:31:41.000Z · score: 2 (1 votes) · LW · GW

It seems like the main problem is making sure nobody's getting systematically misled. To help humans make the right updates, the AI has to communicate not only accurate results, but well-calibrated uncertainties. It also has to interact with humans in a way that doesn't send the wrong signals (more a problem to do with humans than to do with AI).

This is very much on the near-term side of the near/long term AI safety work dichotomy. We don't need the AI to understand deception as a category, and why it's bad, so that it can make plans that don't involve deceiving us. We just need its training / search process (which we expect to more or less understand) to suppress incentives for deception to an acceptable range, on a limited domain of everyday problems.

(I'm probably a bigger believer in the significance of this dichotomy than most. I think looking at an AI's behavior and then tinkering with the training procedure to eliminate undesired behavior in the training domain is a perfectly good approach to handing near-term misalignment like overconfident advisor-chatbots, but eventually we want to switch over to a more scalable approach that will use few of the same tools.)

Comment by charlie-steiner on Normative reductionism · 2019-11-06T08:36:03.116Z · score: 2 (1 votes) · LW · GW

I would prefer someone not completely lie to me about the world, even if they're confident I won't ever find out.

Comment by charlie-steiner on The Simulation Epiphany Problem · 2019-11-06T05:14:32.306Z · score: 3 (2 votes) · LW · GW

Right, this is a sort of incentive for deception. The deception is working fine at getting the objective; we want to ultimately solve this problem by changing the objective function so that it properly captures our dislike of deception (or of having to get up and carry a robot, or whatever), not by changing the search process to try to get it to not consider deceptive hypotheses.

Comment by charlie-steiner on What are human values? - Thoughts and challenges · 2019-11-03T13:35:25.419Z · score: 4 (2 votes) · LW · GW

I think this hits the nail on the head. When we do an internal query for what "what do I want?", we get some unprincipled mixture of these things (depending heavily on context, priming, etc), and our instinctual reaction is to paper over this variation and adamantly insist that what we get from internal queries must be drawn from a natural kind.

Comment by charlie-steiner on Is requires ought · 2019-10-31T08:00:50.957Z · score: 2 (1 votes) · LW · GW

I guess I'm just still not sure what you expect the oughts to be doing.

Is the sort of behavior your thinking of like "I ought not to be inconsistent" being one of your "oughts," and leading to various epistemological actions to avoid inconsistency? This seems to me plausible, but it also seems to be almost entirely packed into how we usually define "rational" or "rich internal justificatory structure" or "sufficiently smart."

One could easily construct a competent system that did not represent its own consistency, or represented it but took certain actions that systematically failed to avoid it. To which you would say "well, that's not sufficiently reflective." What we'd want, for this to be a good move, is for "reflective" (or "smart," "rich structure," "rational," etc) to be a simple thing that predicts the "oughts" neatly. But the "oughts" you describe seem to be running on a model of world-modeling / optimization that is more complicated than strictly necessary for an optimizer, and adding slightly more complication with each ought (though not as much as is required to specify each one separately).

I think one of the reasons people are poking holes or bringing up non-"ought"-compliant agents is that we expect humans to sometimes be non-compliant too. This goes back to my question of whether every agent has some oughts, or whether every (sufficiently smart/rational/etc) agent would be impacted by every ought. If you give me a big list of oughts, I'll give you a big list of ways humans violate them.

I thought at first that your post was about there being some beliefs with unusual properties, labeled "oughts," that everyone has to have some of. But now I think you're claiming that there is some big bundle of oughts that everyone (who is sufficiently X/Y/Z) has all of, and my response is that I'm totally unconvinced that X/Y/Z is in fact a neutral way of ranking systems we want to talk about with the language of epistemology.

Comment by charlie-steiner on How do you assess the quality / reliability of a scientific study? · 2019-10-30T09:03:36.807Z · score: 9 (4 votes) · LW · GW

Here's an answer for condensed matter physics:

Step 1: Read the title, journal name, author list, and affiliations.

By reading papers in a field, talking to people in the field, and generally keeping track of the field as a social enterprise, you should be able to place papers in a context even before reading them. People absolutely have reputations, and that should inform your priors. You should also have an understanding of what the typical research methods are to answer a certain question - check either the title or the abstract to make sure that the methods used match the problem.

Actually, you know what?

Step 0: Spend years reading papers and keeping track of people to develop an understanding of trust and reputation as various results either pan our or don't. Read a few textbooks to understand the physical basis of the commonly-used experimental and theoretical techniques, then check that understanding by reading more papers and keeping track of what kind of data quality is the standard in the field, how techniques are best applied, and which techniques and methods of analysis provide the most reliable results.

For example, by combining steps 0 and 1, you can understand that certain experimental techniques might be more difficult and easier to fool yourself with, but might be the best method available for answering some specific question. If you see a paper applying this technique to this sort of question, this actually should increase your confidence in the paper relative to the base rate for this technique, because it shows that the authors are exercising good judgment. Next...

Step 2: Read the abstract and look at the figures.

This is good for understanding the paper too, not just evaluating trustworthiness. Look for data quality (remember that you learned how to judge the data quality of the most common techniques in step 0) and whether they've presented it in a way that clearly backs up the core claims of the abstract, or presents the information you're trying to learn from the paper. Data that is merely suggestive of the authors' claims is actually a red flag, because remember, everyone just presents the nicest figure they can. Responsible scientists reduce their claims when the evidence is weak.

Step 3: Read the paper.

If you have specific parts you know you care about, you can usually just read those in detail and skim the rest. But if you really care about assessing this particular paper, check the procedures and compare it to your knowledge of how this sort of work should go. If there are specific parts that you want to check yourself, and you can do so, do so. This is also useful so you can...

Step 4: Compare it to similar papers.

You should have background knowledge, but it's also useful to keep similar papers (both in terms of what methods they used, and what problem they studied) directly on hand if you want to check something. If you know a paper that did a similar thing, use that to check their methods. Find some papers on the same problem and cross-check how they present the details of the problem and the plausibility of various answers, to get a feel for the consensus. Speaking of consensus, if there are two similar papers from way in the past that you found via Google Scholar and one of them has 10x the citations of the other, take that into account. When you notice confusing statements, you can check those similar papers to see how they handled it. But once you're really getting into the details, you'll have to...

Step 5: Follow up citations for things you don't understand or want to check.

If someone is using a confusing method or explanation, there should be a nearby citation. If not, that's a red flag. Find the citation and check whether it supports the claim in the original paper (recursing if necessary). Accept that this will require lots of work and thinking, but hey, at least this feeds back into step 0 so you don't have to do it as much next time.

Step 6: Ask a friend.

There are smart people out there. Hopefully you know some, so that if something seems surprising and difficult to understand, you can ask them what they think about it.

Comment by charlie-steiner on Learning from other people's experiences/mistakes · 2019-10-30T07:18:05.247Z · score: 7 (5 votes) · LW · GW

One thing I have learned I don't do enough is to just ask them. Learn about working out by asking muscly acquaintances. Learn about job applications by asking someone who works in the field you want to work in. Et c. Find a place that's suitable for talking and ask them to talk to you. People are happy to tell you all sorts of stuff they think you should know, and it's a way larger number of bits per minute than trying to infer what their lives are like from afar, or neutrally observing them like they're the subject of a nature documentary.

And then a key step 2 for getting advice on anything you have preconceptions about, actually consider that they can be right and you can be wrong. This isn't about following orders, this is asking various people for their advice in order to gain information, and then not throwing that information in the dumpster by only listening to people you already agree with.

Naturally this is largely advice to my past self, whose biases you (Dear reader) might not share.

Comment by charlie-steiner on Is requires ought · 2019-10-29T23:29:07.611Z · score: 2 (1 votes) · LW · GW

Fair enough. But that "compelling" wasn't so much about compelled agreement, and more about compelled action ("intrinsically motivating", as they say). It's impressive if all rational agents agree that murder is bad, but it doesn't have the same oomph if this has no effect on their actions re: murder.

Comment by charlie-steiner on Is requires ought · 2019-10-29T23:06:58.979Z · score: 2 (1 votes) · LW · GW

Have you set up your definitions in such a way that a system can use language to coordinate with allies even in highly abstract situations, but you would rule it out as "actually making claims" depending on whether you felt it was persuadable by the right arguments? In this case, you are right by definition.

Re:visual cortex, the most important point is that knowledge of my visual cortex, "ought"-type or not, is not necessary. People believed things just fine 200 years ago. Second, I don't like the language that my visual cortex "passes information to me." It is a part of me. There is no little homunculus in my head getting telegraph signals from the cortices, it's just a bunch of brain in there.

Comment by charlie-steiner on Is requires ought · 2019-10-28T19:58:22.234Z · score: 3 (2 votes) · LW · GW

If by "ought" claims you mean things we assign truth values that aren't derivable from is-statements, then I agree that humans require such beliefs to function. Maybe we could describe choice of a universal Turing machine as such a belief for a Solomonoff inductor.

If by "ought" statements you mean the universally compelling truths of moral realism, then no, it seems straightforward to produce counterexample thinkers that would not be be compelled. As far as I can tell, the things you're talking about don't even set a specific course of action for the thing believing them, they have no necessary function beyond the epistemic.

I think there's some dangerous reasoning here around the idea of "why." If I believe that a plate is on the table, I don't need to know anything at all about my visual cortex to believe that. The explanation is not a part of the belief, nor is it inseparably attached, nor is it necessary for having the belief, it's a human thing that we call an explanation in light of fulfilling a human desire for a story about what is being explained.

Comment by charlie-steiner on Human-AI Collaboration · 2019-10-26T09:50:46.178Z · score: 6 (3 votes) · LW · GW
... But when you update on the evidence that you see, you are learning about humans? I'm not sure why you say this is "not learn[ing] about humans at all".

Maybe I should retract this to "not learning about humans at train time," but on the other hand, maybe not. The point here is probably worth explaining, and then some rationalist taboo is probably in order.

What's that quote (via Richard Rorty summarizing Donald Davidson)? "If you believe that beavers live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs, true or false, about beavers." There is a certain sort of syntactic aboutness that we sometimes care about, not merely that our model captures the function of something, but that we can access the right concept via some specific signal.

When you train the AI on datasets of human behavior, the sense in which it's "learning about humans" isn't merely related to its function in a specific environment at test time, it's learning a model that is forced to capture human behavior in a wide variety of contexts, and it's learning this model in such a way that you the programmer can access it later to make use of it for planning, and be confident that you're not like the person trying to use the label "beaver" to communicate with someone who thinks beavers live in deserts.

When the purely-adaptive AI "learns about humans" during test time, it has fewer of those nice properties. It is not forced to make a broad model of humans, and in fact it doesn't need to distinguish humans from complicated human-related parts of the environment. If you think humans come with wifi and cellphone reception, and can use their wheels to travel at speeds upwards of 150 kph, I'm suspicious about your opinion on how to satisfy human values.

Also, I disagree with "small" in that quote, but that's probably not central.
Also, our AI systems are not going to be powerful enough to use a simplicity prior.

Fair enough (though it can be approximated surprisingly well, and many effective learning algorithms aspire to similar bounds on error relative to best in class). So do you think this means that pre-training a human model will in general be a superior solution to having the AI adapt to its environment? Or just that it will be important in enough specific cases (e.g. certain combinations of availability of human data, ease of simulation of the problem, and ease of extracting a reward signal from the environment) that the "engineers on the ground" will sit up and take notice?

Comment by charlie-steiner on Human-AI Collaboration · 2019-10-23T18:52:14.703Z · score: 2 (1 votes) · LW · GW

What do you think about the possibility that, in practice, a really good strategy might be to not learn about humans at all, but just to learn to adapt to whatever player is out there (if you're powerful enough to use a simplicity prior, you only make a small finite number of mistakes relative to the best hypothesis in your hypothesis space)? I think it might exacerbate the issues CIRL has with distinguishing humans from the environment.

Comment by charlie-steiner on Deliberation as a method to find the "actual preferences" of humans · 2019-10-23T07:43:42.827Z · score: 2 (1 votes) · LW · GW

Thanks for the post!

You definitely highlight that there's a continuum here, from "most deliberation-like" being actual humans sitting around thinking, to "least deliberation-like" being highly-abstracted machine learning schemes that don't look much at all like a human sitting around thinking, and in fact extend this continuum past "doing meta-philosophy" and towards the realm of "doing meta-ethics."

However, I'd argue against the notion (just arguing in general, not sure if this was your implication) that this continuum is about a tradeoff between quality and practicality - that "more like the humans sitting around is better if we can do it." I think deliberation has some bad parts - dumb things humans will predictably do, subagent-alignment problems humans will inevitably cause, novel stimuli humans will predictably be bad at evaluating. FAI schemes that move away from pure deliberation might not just do so for reasons of practicality, they might be doing it because they think it actually serves their purpose best.

Comment by charlie-steiner on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T06:49:16.468Z · score: 22 (9 votes) · LW · GW

Humans are incredible un-secure systems, so compelling arguments can be made for almost any position that anyone takes seriously. Political, identity, and commercialized issues are where you'll find the most pre-existing examples, simply because that's where people have incentives (psychological or tangible) to make arguments whether or not a position is true.

I guess you're asking for examples that we (presumed intellectuals) find most compelling, but note that there's a serious selection effect going on here, because now you're not selecting merely contrarian ideas, you're selecting contrarian ideas and arguments that are pre-filtered for appeal to the sort of person you're interested in researching. You'll get a very different set of ideas and arguments here than if you ask alternative medicine practitioners what arguments they find compelling. And if you use these different sets of arguments in a study, I predict you'll find they convince quite different sets of people.

To give a really on the nose example, consider the contrarian position "I have the power to make a rubber band colder than the surrounding room just by pulling on it." There are two different convincing arguments for this, which might convince very different groups.

One argument is that this is actually a fact of thermodynamics, because rubber bands actually become more ordered when you stretch them (like straightening our a tangled string) and more disordered when allowed to relax, and this actually causes a change in entropy, which causes a change in temperature, and so they become colder when you pull on them.

This is a fairly convincing argument, especially in our society where we might be disposed to believe a sciencey argument just on its tone and vocabulary.

Another argument is that I know this because late one night I was playing around with a rubber band, and I noticed that if I focused really hard on the temperature of the rubber band, it became colder when I pulled on it to stretch it out. I think this is just one of those weird facts about humans - like have you ever grabbed something hot, like a hot pan, and thought for sure that you should have a burn, but weren't burned at all? There are some unexplained phenomena that would make a lot more sense if humans sometimes had a special connection with the heat of their surroundings, and I promise you that I've noticed that I can do this with rubber bands, and I'm sure if you concentrate, you can too.

Comment by charlie-steiner on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T05:57:14.249Z · score: 2 (3 votes) · LW · GW

Indeed. Have a compensatory upvote.

Comment by charlie-steiner on All I know is Goodhart · 2019-10-22T03:52:42.955Z · score: 2 (1 votes) · LW · GW

As far as I can tell we're not actually dividing the space of W's by a plane, we're dividing the space of E(W|π)'s by a plane. We don't know for certain that U-V is negative, we merely think so in expectation. This leads to the Bayesian correction for the Optimizer's curse, which lets us do better when presented with lots of options with different uncertainties, but if the uncertainty is fixed it won't let us pick a strategy that does better than the one that maximizes the proxy.

Comment by charlie-steiner on We tend to forget complicated things · 2019-10-22T02:59:49.709Z · score: 4 (2 votes) · LW · GW

Some of the benefit to me is "knowing how" rather than "knowing what". Maybe the hardest thing I did more than 10 years ago that's unrelated to what I do now was write a MIDI encoder/decoder. I couldn't write one right now - in fact, I've completely forgotten all the important details of the MIDI specifications. But I could write one in a couple days, way easier than the first time, because I know more or less what I did the first time, and I tautologically got lots of practice in the sub-skills that were used the most, even if I don't remember the details.

So it really does depend on what I want to use the knowledge for.

Comment by charlie-steiner on [AN #69] Stuart Russell's new book on why we need to replace the standard model of AI · 2019-10-21T21:21:26.997Z · score: 2 (1 votes) · LW · GW

I'm just a little leery of calling things "wrong" when it makes the same predictions about observations as being "right." I don't want people to think that we can avoid "wrong ontologies" by starting with some reasonable-sounding universal prior and then updating on lots of observational data. Or that something "wrong" will be doing something systematically stupid, probably due to some mistake or limitation that of course the reader would never program into their AI.

Comment by charlie-steiner on [AN #69] Stuart Russell's new book on why we need to replace the standard model of AI · 2019-10-20T00:26:08.284Z · score: 5 (2 votes) · LW · GW
As with the previous paper, this argument is only really a problem when the agent's belief about the reward function is wrong: if it is correct, then at the point where there is no more information to gain, the agent should already know that humans don't like to be killed, do like to be happy, etc.

There's also the scenario where the AI models the world in a way that has as good or better predictive power than our intentional stance model, but this weird model assigns undesirable values to the AI's co-player in the CIRL game. We can't rely on the agent "already knowing that humans don't like to be killed," because the AI doesn't have to be using the level of abstraction on which "human" or "killed" are natural categories.

Comment by charlie-steiner on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-19T05:07:41.378Z · score: 2 (1 votes) · LW · GW

The difficulty is that we want to take human values and put them into an AI that doesn't do prediction error minimization in the human sense, but instead does superhumanly competent search and planning. But if you have a specific scheme in mind that could outperform humans without leaving anything out, I'd be super interested.

Comment by charlie-steiner on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-18T02:29:15.977Z · score: 4 (2 votes) · LW · GW

If you define what humans want in terms of states of the brain, and you don't want the AI to just intervene directly on peoples' brains, there's a lot of extra work that has to happen, which I think will inevitably "de-purify" the values by making them dependent on context and on human behavior. Here's what I think this might look like:

You have some model ("minimize prediction error") that identifies what's good, and try to fit the brain's actual physiology to this model, in order to identify what's going on physically when humans' values are satisfied. But of course what humans want isn't a certain brain state, humans want things to happen in the world. So your AI needs to learn what changes in the world are the "normal human context" in which it can apply this rating based on brain states. But heroin still exists, so if we don't want it to learn that everyone wants heroin, this understanding of the external world has to start getting really value-laden, and maybe value-laden in a way based on human behaviors and not just human brain states.

One further thing to think about: this doesn't engage with meta-ethics. Our meta-ethical desires are about things like what our desires should be, what are good procedural rules of decision-making (simple decision-making procedures often fail to care about continuity of human identity), and how to handle population ethics. The learn-from-behavior side puts these on equal footing with our desire for e.g. eating tasty food, because they're all learned from human behavior. But if you ground our desire for tasty food in brain structure, this at the very least puts opinions on stuff like tasty food and opinions on stuff like theory of identity on very different footings, and might even cause some incompatibilities. Not sure.

Overall I think reading this post increased how good of an idea I think it is to try to ground human liking in terms of a model of brain physiology, but I think this information has to be treated quite indirectly. We can't just give the AI preferences over human brain states, it needs to figure out what these brain states are referring to in the outside world, which is a tricky act of translation / communication in the sense of Quine and Grice.

Comment by charlie-steiner on Gradient hacking · 2019-10-16T23:18:36.635Z · score: 2 (1 votes) · LW · GW

If the inner optimizer only affects the world by passing predictions to the outer model, the most obvious trick is to assign artificially negative outcomes to states you want to avoid (e g. which the inner optimizer can predict would update the outer model in bad ways) which then never get checked by virtue of being too scary to try. What are the other obvious hacks?

I guess if you sufficiently control the predictions, you can just throw off the pretense and send the outer model the prediction that giving you a channel to the outside would be a great idea.

Comment by charlie-steiner on Always Discard Fascist Policies · 2019-10-16T20:44:30.367Z · score: 6 (4 votes) · LW · GW

Reading this makes me realize what a great game Werewolf One Night is, because the secret information encourages you to lie even when you're in the majority, which is much more interesting than just one group telling the truth and the other group lying.

Comment by charlie-steiner on What's the dream for giving natural language commands to AI? · 2019-10-13T23:52:13.700Z · score: 4 (2 votes) · LW · GW
What's the input-output function in the two cases?

Good question :) We need the AI to have a persistent internal representation of the world so that it's not limited to preferences directly over sensory inputs. Many possible functions would work, and in various places (like comparison to CIRL), I've mentioned that it would be really useful to have some properties of a hierarchical probabilistic model, but as an aid to imagination I mostly just thought of a big ol' RNN.

We want the world model to share associations between words and observations, but we don't want it to share dynamics (one text-state following another is a very different process from one world-state following another). It might be sufficient for the encoding/decoding functions from observations to be RNNs, and the encoding/decoding functions from text just to be non-recurrent neural networks on patches of text.

That is, if we call the text the observations (at time ) , and the internal state , we'd have the encoding function , decoding something like , and also and . And then you could compose these functions to get things like . Does this answer your question, and do you think it brings new problems to light? I'm more interested in general problems or patterns than in problems specific to RNNs (like initialization of the state), because I'm sort of assuming that this is just a placeholder for future technology that would have a shot at learning a model of the entire world.

For example, I would say that a brain has one world model that is interlinked with speech and vision and action, etc. Right?

Right. I sort of flip-flop on this, also calling it "one simultaneous model" plenty. If there are multiple "models" in here, it's because different tasks use different subsets of its parts, and if we do training on multiple tasks, those subsets get trained together. But of course the point is that the subsets overlap.

Comment by charlie-steiner on What's going on with "provability"? · 2019-10-13T20:22:43.880Z · score: 2 (1 votes) · LW · GW

Let me mention my favorite intuition pump against the axiom of choice - the prisoners with infinite hats. For any finite number of prisoners, if they can't communicate they can't even do better than chance, let alone saving all but a tiny fraction. But as soon as there are infinitely many, there's some strange ritual they can do that lets them save all but an infinitely small fraction. This is unreasonable.

The issue is that once you have infinite prisoners you can construct these janky non-measurable sets that aren't subject to the laws of probability theory. There's an argument to be made that these are a bigger problem than the axiom of choice - the axiom of choice is just what lets you take the existence of these janky, non-constructive sets and declare that they give you a recipe for saving prisoners.

Comment by charlie-steiner on Thoughts on "Human-Compatible" · 2019-10-10T16:28:11.862Z · score: 4 (2 votes) · LW · GW

The class of non-agent AI's (not choosing actions based on the predicted resulting utility) seems very broad. We could choose actions alphabetically, or use an expert system representing the outside view, or use a biased/inaccurate model when predicting consequences, or include preferences about which actions are good or bad in themselves.

I don't think there's any general failure mode (there are certainly specific ones), but if we condition on this AI being selected by humans, maybe we select something that's doing enough optimization that it will take a highly-optimizing action like rewriting itself to be an agent.

Comment by charlie-steiner on Characterizing Real-World Agents as a Research Meta-Strategy · 2019-10-09T20:34:40.377Z · score: 2 (1 votes) · LW · GW

Somehow I missed that second post of yours. I'll try out the subscribe function :)

Do you also get the feeling that you can sort of see where this is going in advance?

When asking what computations a system instantiates, it seems you're asking what models (or what fits to an instantiated function) perform surprisingly well, given the amount of information used.

To talk about humans wanting things, you need to locate their "wants." In the simple case this means knowing in advance which model, or which class of models, you are using. I think there are interesting predictions we can make about taking a known class of models and asking "does one of these do a surprisingly good job at predicting a system in this part of the world including humans?"

The answer is going to be yes, several times over - humans, and human-containing parts of the environment, are pretty predictable systems, at multiple different levels of abstraction. This is true even if you assume there's some "right" model of humans and you get to start with it, because this model would also be surprisingly effective at predicting e.g. the human+phone system, or humans at slightly lower or higher levels of abstraction. So now you have a problem of underdetermination. What to do? The simple answer is to pick whatever had the highest surprising power, but I think that's not only simple but also wrong.

Anyhow, since you mention you're not into hand-coding models of humans where we know where the "wants" are stored, I'd be interested in your thoughts on that step too, since just looking for all computations that humans instantiate is going to return a whole lot of answers.

Comment by charlie-steiner on Embedded Agency via Abstraction · 2019-10-09T08:36:48.886Z · score: 2 (1 votes) · LW · GW

Obviously if you I the sum, I just want to know the die1-die2? The only problem is that the signed difference looks like a uniform distribution with width dependent on the sum - the signed difference can range from 11 possibilities (-5 to 5) down to 1 (0).

So what I think you do is you put all the differences onto the same scale by constructing a "unitless difference," which will actually be defined as a uniform distribution.

Rather than having the difference be a single number in a chunk of the number line that changes in size, you construct a big set of ordered points of fixed size equal to the least common multiple of the number of possible differences for all sums. If you think of a difference not as a number, but as a uniform distribution on the set of possible differences, then you can just "scale up" this distribution from its set of variable into the big set of constant size, and sample from this distribution to forget the sum but remember the most information about the difference.

EDIT: I shouldn't do math while tired.

Comment by charlie-steiner on Characterizing Real-World Agents as a Research Meta-Strategy · 2019-10-08T18:30:51.575Z · score: 6 (3 votes) · LW · GW

I know this is becoming my schtick, but have you considered the intentional stance? Specifically, the idea that there is no "the" wants and ontology of e. coli, but that we are ascribing wants and world-modeling to it as a convenient way of thinking about a complicated world, and that different specific models might have advantages and disadvantages with no clear winner.

Because this seems like it has direct predictions about where the meta-strategy can go, and what it's based on.

But all this said, I don't think it's hopeless. But it will require abstraction. There is a tradeoff between predictive accuracy of a model of a physical system, and it including anything worth being called a "value," and so you must allow agential models of complicated systems to only be able to predict a small amount of information about the system, and maybe even be poor predictors of that.

Consider how your modeling me as an agent gives you some notion of my abstract wants, but gives you only the slimmest help in predicting this text that I'm writing. Evaluated purely as a predictive model, it's remarkably bad! It's also based at least as much in nebulous "common sense" as it is in actually observing my behavior.

So if you're aiming for eventually tinkering with hand-coded agential models of humans, one necessary ingredient is going to be tolerance for abstraction and suboptimal predictive power. And another ingredient is going to be this "common sense," though maybe you can substitute for that with hand-coding - it might not be impossible, given how simplified our intuitive agential models of humans are.

Comment by charlie-steiner on Human instincts, symbol grounding, and the blank-slate neocortex · 2019-10-05T07:24:14.566Z · score: 4 (3 votes) · LW · GW

This is a really cool post, thanks!

Comment by charlie-steiner on New Petrov Game Brainstorm · 2019-10-05T03:00:58.200Z · score: 2 (1 votes) · LW · GW

Well, if players still have identifying pseudonyms, you could share that you're being attacked by a certain pseudonym to help you update on whether attacks are real or false, you could try to coordinate to not attack each other if you know your own psudonym. But even without pseudonames, you could share timing information, which might be important.

Comment by charlie-steiner on What empirical work has been done that bears on the 'freebit picture' of free will? · 2019-10-05T02:09:32.292Z · score: -1 (5 votes) · LW · GW

The earliest correct answer I know of to the question of "how do we have free will?" comes from St. Augustine, except instead of free will vs. determinism it was free will vs. divine omniscience. God knowing the future, Augustine says, doesn't invalidate our free will, because the cause of the choice still lies within our power, and that's what matters.

So yeah, sorry, I guess you weren't interested in talking about whether this makes any sense in relation to free will, but it does seem relevant when something is about 1500 years out of date.

Though for human amplification of quantum noise, check out the work on perception of single photons.

Comment by Charlie Steiner on [deleted post] 2019-10-04T23:44:34.735Z

Welcome to LessWrong :) If you have not read the Sequences, particularly A Human's Guide to Words, I think you might find it really interesting, and has some bearing on this question.

Comment by charlie-steiner on New Petrov Game Brainstorm · 2019-10-04T23:38:53.633Z · score: 2 (1 votes) · LW · GW

It looks like you get a big advantage from sharing information with another player. In the absence of this, maybe the best strategy is to respond to all alerts with some probability that you think leads to good dynamics if adopted as a general strategy.