Comment by charlie-steiner on Some Comments on Stuart Armstrong's "Research Agenda v0.9" · 2019-07-16T19:12:44.242Z · score: 2 (1 votes) · LW · GW
"I don't trust humans to be a trusted source when it comes to what an AI should do with the future lightcone."
First, let's acknowledge that this is a new objection you are raising which we haven't discussed yet, eh? I'm tempted to say "moving the goalposts", but I want to hear your best objections wherever they come from; I just want you to acknowledge that this is in fact a new objection :)

Sure :) I've said similar things elsewhere, but I suppose one must sometimes talk to people who haven't read one's every word :P

We're being pretty vague in describing the human-AI interaction here, but I agree that one reason why the AI shouldn't just do what it would predict humans would tell it to do (or, if below some threshold of certainty, ask a human) is that humans are not immune to distributional shift.

There are also systematic factors, like preserving your self-image, that sometimes make humans say really dumb things about far-off situations because of more immediate concerns.

Lastly, figuring out what the AI should do with its resources is really hard, and figuring out which to call "better" between two complicated choices can be hard too, and humans will sometimes do badly at it. Worst case, the humans appear to answer hard questions with certainty, or conversely the questions the AI is most uncertain about slowly devolve into giving humans hard questions and treating their answers as strong information.

I think the AI should actively take this stuff into account rather than trying to stay in some context where it can unshakeably trust humans. And by "take this into account," I'm pretty sure that means model the human and treat preferences as objects in the model.

Skipping over the intervening stuff I agree with, here's that Eliezer quote:

Eliezer Yudkowsky wrote: "If the subject is Paul Christiano, or Carl Shulman, I for one am willing to say these humans are reasonably aligned; and I'm pretty much okay with somebody giving them the keys to the universe in expectation that the keys will later be handed back."
Do you agree or disagree with Eliezer? (In other words, do you think a high-fidelity upload of a benevolent person will result in a good outcome?)
If you disagree, it seems that we have no hope at success whatsoever. If no human can be trusted to act, and AGI is going to arise through our actions, then we can't be trusted to build it right. So we might as well just give up now.

I think Upload Paul Christiano would just go on to work on the alignment problem, which might be useful but is definitely passing the buck.

Though I'm not sure. Maybe Upload Paul Christiano would be capable of taking over the world and handling existential threats before swiftly solving the alignment problem. Then it doesn't really matter if it's passing the buck or not.

But my original thought wasn't about uploads (though that's definitely a reasonable way to interpret my sentence), it was about copying human decision-making behavior in the same sense that an image classifier copies human image-classifying behavior.

Though maybe you went in the right direction anyhow, and if all you had was supervised learning the right thing to do is to try to copy the decision-making of a single person (not an upload, a sideload). What was that Greg Egan book - Zendegi?

so far, it hasn't really proven useful to develop methods to generalize specifically in the case where we are learning human preferences. We haven't really needed to develop special methods to solve this specific type of problem. (Correct me if I'm wrong.)

There are some cases where the AI specifically has a model of the human, and I'd call those "special methods." Not just IRL, the entire problem of imitation learning often uses specific methods to model humans, like "value iteration networks." This is the sort of development I'm thinking of that helps AI do a better job at generalizing human values - I'm not sure if you meant things at a lower level, like using a different gradient descent optimization algorithm.

Comment by charlie-steiner on Some Comments on Stuart Armstrong's "Research Agenda v0.9" · 2019-07-15T01:05:31.307Z · score: 2 (1 votes) · LW · GW

Ah, but I don't trust humans to be a trusted source when it comes to what an AI should do with the future lightcone. I expect you'd run into something like Scott talks about in The Tails Coming Apart As Metaphor For Life, where humans are making unprincipled and contradictory statements, with not at all enough time spent thinking about the problem.

As Ian Goodfellow puts it, machine learning people have already been working on alignment for decades. If alignment is "learning and respecting human preferences", object recognition is "human preferences about how to categorize images", and sentiment analysis is "human preferences about how to categorize sentences"

I somewhat agree, but you could equally well call them "learning human behavior at categorizing images," "learning human behavior at categorizing sentences," etc. I don't think that's enough. If we build an AI that does exactly what a human would do in that situation (or what action they would choose as correct when assembling a training set), I would consider that a failure.

So this is two separate problems: one, I think humans can't reliably tell an AI what they value through a text channel, even with prompting, and two, I think that mimicking human behavior, even human behavior on moral questions, is insufficient to deal with the possibilities of the future.

I've never heard anyone in machine learning divide the field into cases where we're trying to generalize about human values and cases where we aren't. It seems like the same set of algorithms, tricks, etc. work either way.

It also sounds silly to say that one can divide the field into cases where you're doing model-based reinforcement learning, and cases where you aren't. The point isn't the division, it's that model-based reinforcement learning is solving a specific type of problem.

Let me take another go at the distinction: Suppose you have a big training set of human answers to moral questions. There are several different things you could mean by "generalize well" in this case, which correspond to solving different problems.

The first kind of "generalize well" is where the task is to predict moral answers drawn from the same distribution as the training set. This is what most of the field is doing right now for Ian Goodfellow's examples of categorizing images or categorizing sentences. The better we get at generalizing in this sense, the more reproducing the training set corresponds to reproducing the test set.

Another sort of "generalize well" might be inferring a larger "real world" distribution even when the training set is limited. For example, if you're given labeled data for handwritten digits 0-20 into binary outputs, can you give the correct binary output for 21? How about 33? In our moral questions example, this would be like predicting answers to moral questions spawned by novel situations not seen in training. The better we get at generalizing in this sense, the more reproducing the training set corresponds to reproducing examples later drawn from the real world.

Let's stop here for a moment and point out that if we want generalization in the second sense, algorithmic advances in the first sense might be useful, but they aren't sufficient. For the classifier to output the binary for 33, it probably has to be deliberately designed to learn flexible representations, and probably get fed some additional information (e.g. by transfer learning). When the training distribution and the "real world" distribution are different, you're solving a different problem than when they're the same.

A third sort of "generalize well" is to learn superhumanly skilled answers even if the training data is flawed or limited. Think of an agent that learns to play Atari games at a superhuman level, from human demonstrations. This generalization task often involves filling in a complex model of the human "expert," along with learning about the environment - for current examples, the model of the human is usually hand-written. The better we get at generalizing in this way, the more the AI's answers will be like "what we meant" (either by some metric we kept hidden from the AI, or in some vague intuitive sense) even if they diverge from what humans would answer.

(I'm sure there are more tasks that fall under the umbrella of "generalization," but you'll have to suggest them yourself :) )

So while I'd say that value learning involves generalization, I think that generalization can mean a lot of different tasks - a rising tide of type 1 generalization (which is the mathematically simple kind) won't lift all boats.

Comment by charlie-steiner on Some Comments on Stuart Armstrong's "Research Agenda v0.9" · 2019-07-14T00:07:18.517Z · score: 2 (1 votes) · LW · GW

Yes, I agree that generalization is important. But I think it's a bit too reductive to think of generalization ability as purely a function of the algorithm.

For example, an image-recognition algorithm trained with dropout generalizes better, because dropout acts like an extra goal telling the training process to search for category boundaries that are smooth in a certain sense. And the reason we expect that to work is because we know that the category boundaries we're looking for are in fact usually smooth in that sense.

So it's not like dropout is a magic algorithm that violates a no-free-lunch theorem and extracts generalization power from nowhere. The power that it has comes from our knowledge about the world that we have encoded into it.

(And there is a no free lunch theorem here. How to generalize beyond the training data is not uniquely encoded in the training data, every bit of information in the generalization process has to come from your model and training procedure.)

For value learning, we want the AI to have a very specific sort of generalization skill when it comes to humans. It has to not only predict human actions, it has to make a very particular sort of generalization ("human values"), and single out part of that generalization to make plans with. The information to pick out one particular generalization rather than another has to come from humans doing hard, complicated work, even if it gets encoded into the algorithm.

Comment by charlie-steiner on Please give your links speaking names! · 2019-07-12T05:54:06.506Z · score: 7 (3 votes) · LW · GW

I'm guilty, I'll try to do better :)

Comment by charlie-steiner on Some Comments on Stuart Armstrong's "Research Agenda v0.9" · 2019-07-12T05:33:35.150Z · score: 2 (1 votes) · LW · GW
I don't understand why you're so confident. It doesn't seem to me that my values are divorced from biology (I want my body to stay healthy) or population statistics (I want a large population of people living happy lives).

When I say your preference is "more abstract than biology," I'm not saying you're not allowed to care about your body, I'm saying something about what kind of language you're speaking when you talk about the world. When you say you want to stay healthy, you use a fairly high-level abstraction ("healthy"), you don't specify which cell organelles should be doing what, or even the general state of all your organs.

This choice of level of abstraction matters for generalization. At our current level of technology, an abstract "healthy" and an organ-level description might have the same outcomes, but at higher levels of technology, maybe someone who preferred to be healthy would be fine becoming a cyborg, while someone who wanted to preserve some lower-level description of their body would be against it.

"Once it starts encoding the world differently than we do, it won't have the generalization properties we want - we'd be caught cheating, as it were."
Are you sure?

I think the right post to link here is this one by Kaj Sotala. I'm not totally sure - there may be some way to "cheat" in practice - but my default view is definitely that if the AI carves the world up along different boundaries than we do, it won't generalize in the same way we would, given the same patterns.

Nice find on the Bostrom quote btw.

I think your claim proves too much. Different human brains have different encodings, and yet we are still able to learn the values of other humans (for example, when visiting a foreign country) reasonably well when we make an honest effort.

I would bite this bullet, and say that when humans are doing generalization of values into novel situations (like trolley problems, or utopian visions), they can end up at very different places even if they agree on all of the everyday cases.

If you succeed at learning the values of a foreigner, so well that you can generalize those values to new domains, I'd suspect that the simplest way for you to do it involves learning about what concepts they're using well enough to do the right steps in reasoning. If you just saw a snippet of their behavior and couldn't talk to them about their values, you'd probably do a lot worse - and I think that's the position many current value learning schemes place AI in.

Comment by charlie-steiner on Some Comments on Stuart Armstrong's "Research Agenda v0.9" · 2019-07-12T05:07:44.006Z · score: 2 (1 votes) · LW · GW

I'd definitely be interested in your thoughts about preferences when you get them into a shareable shape.

In some sense, what humans "really" have is just atoms moving around, all talk of mental states and so on is some level of convenient approximation. So when you say you want to talk about a different sort of approximation from Stuart, my immediate thing I'm curious about is "how can you make your way of talking about humans convenient for getting an AI to behave well?"

Comment by charlie-steiner on IRL in General Environments · 2019-07-12T04:06:24.205Z · score: 2 (1 votes) · LW · GW

A good starting point. I'm reminded of an old Kaj Sotala post (which then later provided inspiration for me writing a sort of similar post) about trying to ensure that the AI has human-like concepts. If the AI's concepts are inhuman, then it will generalize in an inhuman way, so that something like teaching a policy though demonstrations might not work.

But of course having human-like concepts is tricky and beyond the scope of vanilla IRL.

Comment by charlie-steiner on Learning biases and rewards simultaneously · 2019-07-08T22:18:23.344Z · score: 3 (2 votes) · LW · GW

I like this example of "works in practice but not in theory." Would you associate "ambitious value learning vs. adequate value learning" with "works in theory vs. doesn't work in theory but works in practice"?

One way that "almost rational" is much closer to optimal than "almost anti-anti-rational" is ye olde dot product, but a more accurate description of this case would involve dividing up the model space into basins of attraction. Different training procedures will divide up the space in different ways - this is actually sort of the reverse of a monte carlo simulation where one of the properties you might look for is ergodicity (eventually visiting all points in the space).

Some Comments on Stuart Armstrong's "Research Agenda v0.9"

2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Comment by charlie-steiner on Modeling AGI Safety Frameworks with Causal Influence Diagrams · 2019-06-30T17:35:02.741Z · score: 2 (1 votes) · LW · GW

All good points.

The paper you linked was interesting - the graphical model is part of an AI design that actually models other agents using that graph. That might be useful if you're coding a simple game-playing agent, but I think you'd agree that you're using CIDs in a more communicative / metaphorical way?

Comment by charlie-steiner on GreaterWrong Arbital Viewer · 2019-06-28T08:36:32.257Z · score: 6 (3 votes) · LW · GW

Neat! I finally found the Solomonoff induction AI dialogue that someone made me curious about in the previous thread.

Comment by charlie-steiner on Modeling AGI Safety Frameworks with Causal Influence Diagrams · 2019-06-24T06:06:41.494Z · score: 11 (7 votes) · LW · GW

The reason I don't personally find these kinds of representation super useful is because each of those boxes is a quite complicated function, and what's in the boxes usually involves many more bits worth of information about an AI system than how the boxes are connected. And sometimes one makes different choices in how to chop an AI's operation up into causally linked boxes, which can lead to an apples-and-oranges problem when comparing diagrams (for example, the diagrams you use for CIRL and IDI are very different choppings-up of the algorithms).

I actually have a draft sitting around of how one might represent value learning schemes with a hierarchical diagram of information flow. Eventually I decided that the idea made lots of sense for a few paradigm cases and was more trouble than it was worth for everything else. When you need to carefully refer to the text description to understand a diagram, that's a sign that maybe you should use the text description.

This isn't to say I think one should never see anything like this. Different ways of presenting the same information (like diagrams) can help drive home a particularly important point. But I am skeptical that there's a one-size-fits-all solution, and instead think that diagram usage should be tailored to the particular point it's intended to make.

Comment by charlie-steiner on Is your uncertainty resolvable? · 2019-06-21T14:27:36.853Z · score: 5 (3 votes) · LW · GW

I think I must be the odd one out here in terms of comfort using probabilities close to 1 and 0. Because 90% and 99% are not "near certainty" to me.

How sure are you that the English guy who you've been told helped invent calculus and did stuff with gravity and optics was called "Isaac Newton"? We're talking about probabilities like 99.99999% here. (Conditioning on no dumb gotchas from human communication, e.g. me using a unicode character from a different language and claiming it's no longer the same, which has suddenly become much more salient to you and me both. An "internal" probability, if you will.)

Maybe it would help to think of this as about 20 bits of information past 50%? Every bit of information you can specify about something means you are assigning a more extreme probability distribution about that thing. The probability of the answer being "Isaac Newton" has a very tiny prior for any given question, and only rises to 50% after lots of bits of information. And if you could get to 50%, it's not strange that you could have quite a few more bits left over, before eventually running into the limits set by the reliability of your own brain.

So when you say some plans require near certainty, I'm not sure if you mean what I mean but chose smaller probabilities, or if you mean some somewhat different point about social norms about when numbers are big/small enough that we are allowed to stop/start worrying about them. Or maybe you mean a third thing about legibility and communicability that is correlated with probability but not identical?

Comment by charlie-steiner on Is "physical nondeterminism" a meaningful concept? · 2019-06-19T21:14:38.880Z · score: 4 (2 votes) · LW · GW
"At the most basic level of description, things—the quantum fields or branes or whatever—just do what they do. They don’t do it nondeterministically, but they also don’t do it deterministically"
How do you know? If that claim isn't based on a model, what is it based on?

I'm happy to reply that the message of my comment as a whole applies to this part - my claim about "what things do at a basic level of description" is a meta-model claim about what you can say about things at different levels of description.

It's human nature to interpret this as a claim that things have some essence of "just doing what they do" that is a competitor to the essences of determinism and nondeterminism, but there is no such essence for the same reasons I'm already talking about in the comment. Maybe I could have worded it more carefully to prevent this reading, but I figure that would sacrifice more clarity than it gained.

The point is not about some "basic nature of things," the point is about some "basic level of description." We might imagine someone saying "I know there are some deterministic models of atoms and some nondeterministic models, but are the atoms really deterministic or not?" Where this "really" seems to mean some atheoretic direct understanding of the nature of atoms. My point, in short, is that atheoretic understanding is fruitless ("It's just one damn thing after another") and the instinct that says it's desirable is misleading.

"much better explanation of wetness would be in terms of surface tension and intermolecular forces and so on"
Why? Because they are real properties?

Because they're part of a detailed model of the world that helps tell a "functional and causal story" about the phenomenon. If I was going to badmouth one set of essences just to prop up another, I would have said so :P

You can explain away some properties in terms of others, but there is, going to be some residue

My point is that this residue is never going to be the "Real Properties," they're just going to be the same theory-laden properties as always.

What makes a theory of everything a theory of everything is not that it provides a final answer for which properties are the real properties that atoms have in some atheoretic direct way. It's that it provides a useful framework in which we can understand all (literally all) sorts of stuff.

Comment by charlie-steiner on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-19T20:47:38.946Z · score: 4 (2 votes) · LW · GW

Now that I think about it, it's a pretty big PR problem if I have to start every explanation of my value learning scheme with "humans don't have actual preferences so the AI is just going to try to learn something adequate." Maybe I should figure out a system of jargon such that I can say, in jargon, that the AI is learning peoples' actual preferences, and it will correspond to what laypeople actually want from value learning.

I'm not sure whether such jargon would make actual technical thinking harder, though.

Comment by charlie-steiner on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-18T11:01:25.935Z · score: 9 (5 votes) · LW · GW

This is really similar to some stuff I've been thinking about, so I'll be writing up a longer comment with more compare/contrast later.

But one thing really stood out to me - I think one can go farther in grappling with and taking advantage of "where lives." doesn't live inside the human, it lives in the AI's model of the human. Humans aren't idealized agents, they're clusters of atoms, which means they don't have preferences except after the sort of coarse-graining procedure you describe, and this coarse-graining procedure lives with a particular model of the human - it's not inherent in the atoms.

This means that once you've specified a value learning procedure and human model, there is no residual "actual preferences" the AI can check itself against. The challenge was never to access our "actual preferences," it was always to make a best effort to model humans as they want to be modeled. This is deeply counterintuitive ("What do you mean, the AI isn't going to learn what humans' actual preferences are?!"), but also liberating and motivating.

Comment by charlie-steiner on Is "physical nondeterminism" a meaningful concept? · 2019-06-16T21:31:16.055Z · score: 6 (3 votes) · LW · GW

Interesting question! My answer is basically a long warning about essentialism: this question might seem like it's stepping down from the realm of human models to the realm of actual things, to ask about the essence of those things. But I think better answers are going to come from stepping up from the realm of models to the realm of meta-models, to ask about the properties of models.

At the most basic level of description, things - the quantum fields or branes or whatever - just do what they do. They don't do it nondeterministically, but they also don't do it deterministically! Without recourse to human models, all us humans can say is that the things just do what they do - models are the things that make talk about categories possible in the first place.

Any answer of the sort "no, things can always be rendered into a deterministic form by treating 'random' results as fixed constants" or "yes, there are perfectly valid classes of models that include nondeterminism" is going to be an answer about models, within some meta-level framework. And that's fine!

This can seem unsatisfying because it goes against our essentialist instinct - that the properties in our models should reflect the real properties that things have. If water is considered wet, it's because water has the basic property or essence of wetness (so the instinct goes).

Note that this doesn't explain any of the mechanics or physics of wetness. If you could look inside someone's head as they were performing this essentialist maneuver, they would start with a model ("water is wet"), then they would notice their model ("I model water as wet"), then they would justify themselves to themselves, in a sort of reassuring pat on the back ("I model water as wet because water is really, specially wet").

I think that this line of self-reassuring reasoning is flawed, and a much better explanation of wetness would be in terms of surface tension and intermolecular forces and so on - illuminating the functional and causal story behind our model of the world, rather than believing you've explained wetness in terms of "real wetness". Also see the story about Bleggs and Rubes.

Long story short, any good explanation for why we should or shoudn't have nondeterminism in a model is either going to be about how to choose good models, or it's going to be a causal and functional story that doesn't preserve nondeterminism (or determinism) as an essence.

I think there's an interesting question about physics in whether or not (and how) we should include nondeterminism as an option in fundamental theories. But first I just wanted to warn that the question "models aside, are things really nondeterministic" is not going to have an interesting answer.

Comment by charlie-steiner on What kind of thing is logic in an ontological sense? · 2019-06-13T01:17:08.530Z · score: 6 (3 votes) · LW · GW

Basically, we have a mental model of logic the same way we have a mental model of geography. It's useful to say that logical facts have referents for the same internal reason it's useful to say that geographical facts have referents. But if you looked at a human from outside, the causal story behind logical facts vs. geographical facts would be different.

Comment by charlie-steiner on Logic, Buddhism, and the Dialetheia · 2019-06-10T20:38:53.913Z · score: 10 (5 votes) · LW · GW

As a practicing quantum mechanic, I'd warn you against the claim that dialetheism is used in quantum computers. Qubits are sometimes described as taking "both states at the same time," but that's not precisely what's going on, and people who actually work on quantum computers use a more precise understanding that doesn't involve interpreting intermediate qubits as truth values.

There are also two people I wanted to see in your post: Russell and Gödel - mathematicians rather than philosophers. Russell's type theory was one of the main attempts to eliminate paradoxes in mathematical logic. Gödel showed how that doesn't quite work, but also showed how things become a lot clearer if you consider provability as well as truth value.

Comment by charlie-steiner on Learning magic · 2019-06-10T16:21:07.831Z · score: 19 (6 votes) · LW · GW
On Wednesdays at the Princeton Graduate College, various people would come in to give talks. The speakers were often interesting, and in the discussions after the talks we used to have a lot of fun. For instance, one guy in our school was very strongly anti-Catholic, so he passed out questions in advance for people to ask a religious speaker, and we gave the speaker a hard time.
Another time somebody gave a talk about poetry. He talked about the structure of the poem and the emotions that come with it; he divided everything up into certain kinds of classes. In the discussion that came afterwards, he said, “Isn’t that the same as in mathematics, Dr. Eisenhart?”
Dr. Eisenhart was the dean of the graduate school and a great professor of mathematics. He was also very clever. He said, “I’d like to know what Dick Feynman thinks about it in reference to theoretical physics.” He was always putting me on in this kind of situation.
I got up and said, “Yes, it’s very closely related. In theoretical physics, the analog of the word is the mathematical formula, the analog of the structure of the poem is the interrelationship of the theoretical bling-bling with the so-and so”—and I went through the whole thing, making a perfect analogy. The speaker’s eyes were beaming with happiness.
Then I said, “It seems to me that no matter what you say about poetry, I could find a way of making up an analog with any subject, just as I did for theoretical physics. I don’t consider such analogs meaningful.”
In the great big dining hall with stained-glass windows, where we always ate, in our steadily deteriorating academic gowns, Dean Eisenhart would begin each dinner by saying grace in Latin. After dinner he would often get up and make some announcements. One night Dr. Eisenhart got up and said, “Two weeks from now, a professor of psychology is coming to give a talk about hypnosis. Now, this professor thought it would be much better if we had a real demonstration of hypnosis instead of just talking about it. Therefore he would like some people to volunteer to be hypnotized.
I get all excited: There’s no question but that I’ve got to find out about hypnosis. This is going to be terrific!
Dean Eisenhart went on to say that it would be good if three or four people would volunteer so that the hypnotist could try them out first to see which ones would be able to be hypnotized, so he’d like to urge very much that we apply for this. (He’s wasting all this time, for God’s sake!)
Eisenhart was down at one end of the hall, and I was way down at the other end, in the back. There were hundreds of guys there. I knew that everybody was going to want to do this, and I was terrified that he wouldn’t see me because I was so far back. I just had to get in on this demonstration!
Finally Eisenhart said, “And so I would like to ask if there are going to be any volunteers …”
I raised my hand and shot out of my seat, screaming as loud as I could, to make sure that he would hear me: “MEEEEEEEEEEE!”
He heard me all right, because there wasn’t another soul. My voice reverberated throughout the hall—it was very embarrassing. Eisenhart’s immediate reaction was, “Yes, of course, I knew you would volunteer, Mr. Feynman, but I was wondering if there would be anybody else.”
Finally a few other guys volunteered, and a week before the demonstration the man came to practice on us, to see if any of us would be good for hypnosis. I knew about the phenomenon, but I didn’t know what it was like to be hypnotized.
He started to work on me and soon I got into a position where he said, “You can’t open your eyes.”
I said to myself, “I bet I could open my eyes, but I don’t want to disturb the situation: Let’s see how much further it goes.” It was an interesting situation: You’re only slightly fogged out, and although you’ve lost a little bit, you’re pretty sure you could open your eyes. But of course, you’re not opening your eyes, so in a sense you can’t do it.
He went through a lot of stuff and decided that I was pretty good.
When the real demonstration came he had us walk on stage, and he hypnotized us in front of the whole Princeton Graduate College. This time the effect was stronger; I guess I had learned how to become hypnotized. The hypnotist made various demonstrations, having me do things that I couldn’t normally do, and at the end he said that after I came out of hypnosis, instead of returning to my seat directly, which was the natural way to go, I would walk all the way around the room and go to my seat from the back.
All through the demonstration I was vaguely aware of what was going on, and cooperating with the things the hypnotist said, but this time I decided, “Damn it, enough is enough! I’m gonna go straight to my seat.”
When it was time to get up and go off the stage, I started to walk straight to my seat. But then an annoying feeling came over me: I felt so uncomfortable that I couldn’t continue. I walked all the way around the hall.
I was hypnotized in another situation some time later by a woman. While I was hypnotized she said, “I’m going to light a match, blow it out, and immediately touch the back of your hand with it. You will feel no pain.”
I thought, “Baloney!” She took a match, lit it, blew it out, and touched it to the back of my hand. It felt slightly warm. My eyes were closed throughout all of this, but I was thinking, “That’s easy. She lit one match, but touched a different match to my hand. There’s nothin’ to that; it’s a fake!”
When I came out of the hypnosis and looked at the back of my hand, I got the biggest surprise: There was a burn on the back of my hand. Soon a blister grew, and it never hurt at all, even when it broke.
So I found hypnosis to be a very interesting experience. All the time you’re saying to yourself, “I could do that, but I won’t”—which is just another way of saying that you can’t.
  • Surely You Must Be Joking Mr Feynman
Comment by charlie-steiner on References that treat human values as units of selection? · 2019-06-10T16:04:18.234Z · score: 3 (2 votes) · LW · GW
You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents.

I agree that in our usual use of the word, it doesn't make sense to talk about what (terminal) values are right.

But you agree that (within a certain level of abstraction and implied context) you can talk as if you should take certain actions? Like "you should try this dessert" is a sensible English sentence. So what about actions that impact intelligent agents?

Like, suppose there was a pill you could take that would make you want to kill your family. Should you take it? No, probably not. But now we've just expressed a preference about the values of an intelligent agent (yourself).

Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place: they are wrong with respect to your current values, which are a thing we model you as having within a certain level of abstraction.

Comment by charlie-steiner on Major Update on Cost Disease · 2019-06-07T12:37:57.967Z · score: 4 (2 votes) · LW · GW

We have to separate colleges from public K-12 education. Colleges are the place where you hear about increasing numbers of non-teaching staff. K-12 actually has fewer administrators per student than 20 years ago (in most places).

Comment by charlie-steiner on How is Solomonoff induction calculated in practice? · 2019-06-05T11:53:27.708Z · score: 9 (3 votes) · LW · GW

Minimum message length fitting uses an approximation of K-complexity and gets used sometimes when people want to fit weird curves in a sort of principled way. But "real" Solomonoff induction is about feeding literally all of your sensory data into the algorithm to get predictions for the future, not just fitting curves.

So I guess I'd say that it's possible to approximate K-complexity and use that in your prior for curve fitting, and people sometimes do that. But that's not necessarily going to be your best estimate, because your best estimate is going to take into account all of the data you've already seen, which becomes impossible very quickly (even if you just want a controlled approximation).

Comment by charlie-steiner on How is Solomonoff induction calculated in practice? · 2019-06-05T11:44:54.768Z · score: 3 (2 votes) · LW · GW

Sure. The OP might more accurately have asked "How is the Solomonoff prior calculated?"

Comment by charlie-steiner on Selection vs Control · 2019-06-03T06:29:56.753Z · score: 4 (2 votes) · LW · GW

Yeah I think this is definitely a "stance" thing.

Take the use of natural selection and humans as examples of optimization and mesa-optimization - the entire concept of "natural selection" is a human-convenient way of describing a pattern in the universe. It's approximately an optimizer, but in order to get rid of that "approximately" you have to reintroduce epicycles until your model is as complicated as a model of the world again. Humans aren't optimizers either, that's just a human-convenient way of describing humans.

More abstractly, the entire process of recognizing a mesa-optimizer - something that models the world and makes plans - is an act of stance-taking. Or Quinean radical translation or whatever. If a cat-recognizing neural net learns an attention mechanism that models the world of cats and makes plans, it's not going to come with little labels on the neurons saying "these are my input-output interfaces, this is my model of the world, this is my planning algorithm." It's going to be some inscrutable little bit of linear algebra with suspiciously competent behavior.

Not only could this competent behavior be explained either by optimization or some variety of "rote behavior," but the neurons don't care about these boundaries and can occupy a continuum of possibilities between any two central examples. And worst of all, the same neurons might have multiple different useful ways of thinking about them, some of which are in terms of elements like "goals" and "search," and others are in terms of the elements of rote behavior.

In light of this, the problem of mesa-optimizers is not "when will this bright line be crossed?" but "when will this simple model of the AI's behavior be predictable useful?" Even though I think the first instinct is the opposite.

Comment by charlie-steiner on Uncertainty versus fuzziness versus extrapolation desiderata · 2019-06-01T09:21:11.282Z · score: 5 (3 votes) · LW · GW

Nice post. I suspect you'll still have to keep emphasizing that fuzziness can't play the role of uncertainty in a human-modeling scheme (like CIRL), and is instead a way of resolving human behavior into a utility function framework. Assuming I read you correctly.

I think that there are some unspoken commitments that the framework of fuzziness makes for how to handle extrapolating irrational human behavior. If you represent fuzziness as a weighting over utility functions that gets aggregated linearly (i.e. into another utility function), this is useful for the AI making decisions but can't be the same thing that you're using to model human behavior, because humans are going to take actions that shouldn't be modeled as utility maximization.

To bridge this gap from human behavior to utility function, what I'm interpreting you as implying is that you should represent human behavior in terms of a patchwork of utility functions. In the post you talk about frequencies in a simulation, where small perturbations might lead a human to care about the total or about the average. Rather than the AI creating a context-dependent model of the human, we've somehow taught it (this part might be non-obvious) that these small perturbations don't matter, and should be "fuzzed over" to get a utility function that's a weighted combination of the ones exhibited by the human.

But we could also imagine unrolling this as a frequency over time, where an irrational human sometimes takes the action that's best for the total and other times takes the action that's best for the average. Should a fuzzy-values AI represent this as the human acting according to different utility functions at different times, and then fuzzing over those utility functions to decide what is best?

Comment by charlie-steiner on A shift in arguments for AI risk · 2019-05-29T09:35:58.254Z · score: 7 (3 votes) · LW · GW

An alternate framing could be about changing group boundaries rather than changing demographics in an isolated group.

There were surely people in 2010 who thought that the main risk from AI was it being used by bad people. The difference might not be that these people have popped into existence or only recently started talking - it's that they're inside the fence more than before.

And of course, reality is always complicated. One of the concerns in the "early LW" genre is value stability and self-trust under self-modification, which has nothing to do with sudden growth. And one of the "recent" genre concerns is arms races, which are predicated on people expecting sudden capability growth to give them a first mover advantage.

Comment by charlie-steiner on Why the empirical results of the Traveller’s Dilemma deviate strongly away from the Nash Equilibrium and seems to be close to the social optimum? · 2019-05-25T04:41:23.097Z · score: 2 (1 votes) · LW · GW

I would guess that people don't actually compute the Nash equilibrium or expect other people to.

Instead, they use the same heuristic reasoning methods that they evolved to learn, and which have served them well in social situations their entire life, and expect other people to do the same.

I think we should expect these heuristics to be close to rational (not for the utilities of humans, but for the fitness of genes) in the ancestral environment. But there's no particular reason to think they're going to be rational by any standard in games chosen specifically because the Nash equilibrium is counterintuitive to humans.

Comment by charlie-steiner on Free will as an appearance to others · 2019-05-23T22:20:50.385Z · score: 4 (2 votes) · LW · GW

If I may toot my own horn:

I'll admit I'm not totally sure what Said Achmiz means by his comparison, though :)

Comment by charlie-steiner on Does the Higgs-boson exist? · 2019-05-23T08:48:36.167Z · score: 5 (3 votes) · LW · GW

Sure, but he also says "Since my expectations sometimes conflict with my subsequent experiences, I need different names for the thingies that determine my experimental predictions and the thingy that determines my experimental results. I call the former thingies 'beliefs', and the latter thingy 'reality'."

This key allows you to substitute in to his previous paragraph, to obtain statements in terms of predictions and experimental results that would be Sabine-approved.

If we think of the philosophical camps as realism, instrumentalism / pragmatism, and skepticism, the state of play seems to be less

"I am a realist, you are a skeptic, let's argue,"

and more

"I'm the true pragmatist!" "No, I'm the true pragmatist!"

I have now skimmed the previous thread, where you also quoted what I just quoted, but said Eliezer was just assuming that there was some thingie out there being reality. The alternative being, presumably, that our observations are not determined by anything that acts like an object with properties, and are instead brute facts.

But the first sentence ("since my expectation sometimes conflict...") is precisely about how he's not assuming an external reality, but instead advancing it as a hypothesis in order to explain observations. Maybe he's not doing it the way you'd like - and maybe I as a biased reader will interpret that statement as a metaphor for something I expect, wheras you'd do the same but get a different result.

Comment by charlie-steiner on Free will as an appearance to others · 2019-05-23T07:55:44.460Z · score: 4 (2 votes) · LW · GW

This also works on yourself. If your best model of yourself is as an agent making choices based on their beliefs, then you will seem to have free well to yourself.

Comment by charlie-steiner on Does the Higgs-boson exist? · 2019-05-23T05:33:32.933Z · score: 8 (5 votes) · LW · GW

Hm - would the prevailing realist crowd disagree with this way of using the word?

For example, it sure seems like Eliezer takes more or less the same position here. I take a slightly more unusual position mathematical truth here.

I still think that things exist, and are true, and are real, and so on. The difference between me and me ten years ago is just that I have an explanation for why I use those words, rather than just repeating them more slowly and louder.

Comment by charlie-steiner on TAISU - Technical AI Safety Unconference · 2019-05-22T23:24:43.216Z · score: 10 (4 votes) · LW · GW

Sounds interesting, and I'd say there's about a 20% chance I'd like to go, but I mostly care about the network effects. Is there some kind of kickstart page where people can pledge to go if the right number of other people pledge to go?

Comment by charlie-steiner on What is your personal experience with "having a meaningful life"? · 2019-05-22T21:16:12.069Z · score: 4 (2 votes) · LW · GW

My life was definitely changed by reading Luke's How to be Happy.

But that's not a direct answer to the question. Here's my take: If you try to find the True Meaning of life and then fulfil it, you will find nothing. It frames the problem in terms of a universal essence that never existed. But if you try to do what's right and good, you can do it.

Comment by charlie-steiner on The concept of evidence as humanity currently uses it is a bit of a crutch. · 2019-05-21T22:16:14.915Z · score: 4 (3 votes) · LW · GW

This seems pretty likely. An AI that does internal reasoning will find it useful to have its own opinions on why it thinks things, which need bear about as much relationship to their internal microscopic function as human opinions about thinking do to human neurons.

Comment by charlie-steiner on [AN #56] Should ML researchers stop running experiments before making hypotheses? · 2019-05-21T21:06:11.235Z · score: 3 (2 votes) · LW · GW

Thanks, this is actually really useful feedback. As the author, I "see" the differences and the ways in which I'm responding to other people, but it also makes sense to me why you'd say they're very similar. The only time I explicitly contrast what I'm saying in that post with anything else is... contrasting with my own earlier view.

From my perspective where I already know what I'm thinking, I'm building up from the basics to frame problems in what I think is a useful way that immediately suggests some of the other things I've been thinking. From your perspective, if there's something novel there, I clearly need to turn up the contrast knob.

Comment by charlie-steiner on [AN #56] Should ML researchers stop running experiments before making hypotheses? · 2019-05-21T05:06:42.013Z · score: 4 (2 votes) · LW · GW

Yay, I'm in the thing!

I have little idea if people have found my recent posts interesting or useful, or how they'd like them to be improved. I have a bunch of wilder speculation that piles up in unpublished drafts, and once I see an idea getting used or restated in multiple drafts, that's what I actually post.

Comment by charlie-steiner on Training human models is an unsolved problem · 2019-05-18T22:37:08.265Z · score: 2 (1 votes) · LW · GW

> 2. I'm not saying seatbelt-free driving is always rational, but on what grounds is it irrational?

Individual actions are not a priori irrational, because we're always talking about a conflict between at least two things. Furthermore, you can always describe humans as perfect rational agents - just use the microphysical description of all their atoms to predict their next action, and say that they assign that action high utility. (This is basically Rohin's point)

Ways of choosing actions are what we think can be irrational (there are problems with this but I'd rather ignore them), but these ways of choosing actions are only associated with humans within some particular way of describing humans (the intentional stance). Like, if you're describing humans as collections of atoms, your description will never label anything as a value conflict or an inconsistency. You have to describe humans in terms of values and choices and so on before you can say that an action "conflicts with their values and is therefore irrational" or whatever.

Long story short, when I say that driving without a seatbelt is usually dumb because people don't want to die, there is no further or more universal sense I know of in which anything is irrational. I do not mean that you cannot assign values to humans in which driving without a seatbelt is always right - in fact, the problem is that I'm worried that a poor AI design might do just such a thing! But in the values that I actually do assign to humans, driving without a seatbelt is usually dumb.

Comment by charlie-steiner on Which scientific discovery was most ahead of its time? · 2019-05-17T07:23:15.243Z · score: 2 (1 votes) · LW · GW

There are plenty of accidental discoveries that we might imagine happening much later - but I don't feel like this should be enough, because it's not that they were surprisingly early, they were just drawn out of a very broad probability distribution.

I'm more satisfied with disoveries that not only could have happened later, but happened when they did for sensible local reasons. Example: Onnes' discovery of superconductivity. Not just because superconductivity was discovered very rapidly (3 years) after the necessary liquefaction of helium, when it conceivably could have taken a lot longer to properly measure the resistance of mercury or lead at low temperatures. But because Onnes' lab in Leiden was the first place to ever make liquid helium to cool superconductors with, and it took 15 years for anyone else in the world (in this case, Toronto) to start liquefying helium!

In short, to my mind being ahead of your time is the opposite of multiple discovery - we push back the luck one step by asking not for a lucky break, but for a sensible and straightforward discovery that could only have happened in a very unusual place.

Comment by charlie-steiner on Programming Languages For AI · 2019-05-12T00:52:53.554Z · score: 2 (1 votes) · LW · GW

Disclaimer: I don't know anything about designing programming languages.

I don't think this programming language can neatly precede the AI design, like your example of chesslang. In fact, it might be interesting to look at history to see which tended to come first - support for features in programming languages, or applications that implemented those features in a more roundabout way.

Like the proof reasoning support, for example, might or might not be in any particular AI design.

Another feature is support for reasoning about probability distributions, which shows up in probabilistic programming languages. Maybe your AI is a big neural net and doesn't need to use your language to do its probabilistic reasoning.

Or maybe it's some trained model family that's not quite neural nets - in which case it's probably still in python, using some packages designed to make it easy to write these models.

Basically, I think there are a lot of possible ways to end up with AI, and if you try to shove all of them into one programming language, it'll end up bloated - your choice of what to do well has to be guided by what the AI design will need. Maybe just making good python packages is the answer.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-12T00:11:10.932Z · score: 2 (1 votes) · LW · GW

Well, you are free to use the word "color" in such a way that there is no fact of the matter about whether Neptune has a color if you wish. But I think that this directs us into a definitional argument that I don't feel like having - in fact, almost a perfect analogy to "If a tree falls in a forest and there's no one to hear it, does it make a sound?"

So yeah, it's been fun :)

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-11T23:26:47.551Z · score: 6 (3 votes) · LW · GW
What does philosophical sophistication matter? Are the quoted facts false? Is “A blue-violet laser blasts the atom, which then absorbs and re-emits enough light particles to be photographed with conventional equipment” a lie?

The sentence you quote is true. And "The sun blasts the cover of the book, which then absorbs and re-emits photons to be photographed with conventional equipment" is an equally valid description of taking a picture of a book. I'm honestly confused as to what you're expecting me to find inconsistent here. Are you implying that since you can describe scattering of light using scientific talk like "photons" and "re-emission," you can't describe it using unscientific talk like "the atom was purple?"

Ah, or maybe you're expecting me to know that you mean something like "What if the atom is only purple because it's being 'blasted' with purple photons, making this photo useless in terms of understanding its color?" That's a reasonable thing to think if you've never had reason to study atomic spectra, but it turns out the direction of causation is just the opposite - scientists chose a purple laser because that's one of the few colors the atom would absorb and re-emit.

I think what this shows is that my perception of what's "everyday" has definitely been skewed by a physics degree, and things that just seem like the way everything works to me might seem like unusual and rare sciencey phenomena to everyone else.

Like, you ask me to refute this fact or that fact, but I already quoted precisely the statements I thought deserved highlighting from the articles, and none of them were facts about how light or atoms work. That should tell you that I have zero intention of overthrowing our basic understanding of optics. But since we might have different understandings of the same sentences, I will go through the specific ones you highlight and try to read your mind.

the purple speck at the center of this photo is not the true size of the strontium atom itself

This is due to the limitations of the camera - even if the light came from a single point, the camera wouldn't be able to focus it back down to a single pixel on the detector. This is due to a combination of imperfections in the lenses plus physics reasons that make all pictures a little blurry. The same thing happens in the human eye - even if we were looking at a point source of purple light, it would look to our eye like a little round dot of purple light.

Mind reading: Maybe you are implying that if the camera can't resolve the true size of the atom, it's not "really" a picture of the atom. This is sort of true - if the atom was blue on the left and red on the right (let's ignore for a moment the physical impossibility), it would still look purple in this photo despite never emitting purple photons. Of course, we do know that it emits purple photons. But I think the thing that is actually mistaken about worrying about whether this is "really" the atom is that it's forgetting the simplicity of just looking at things to tell what color they are.

Even when I argue that things invisible to the planet eye, like the planet Neptune, can have color, it's not necessarily because of complicated sciencey reasoning, but more some heuristic arguments about process that preserve color. You can look at Neptune through a telescope and it has color then - so since telescopes preserve color, Neptune is blue. Or if I took a red teapot and shot it into space somewhere between the orbit of Mars and Jupiter and we never saw it again, it would still be red, because in color-logic, moving something somewhere else is a process that preserves color.

In short, I didn't bother to actually do the math on whether you could see an atom with the naked eye until today because I think of time lapse photography or using lenses to look at something as process that preserve "intuitive color."

When bathed in a specific wavelength of blue light, strontium creates a glow hundreds of times wider than the radius of the atom itself (which is about a quarter of a nanometer, or 2.5x10 to the −7 meters, Nadlinger said).

I'm not sure what the reporter is getting at here. I think they are comparing the wavelength of the light to the electron radius of the atom, and the wavelength of the light is what they mean by "creates a glow." It's definitely inaccurate. But I'm not sure I could do better to explain what's going on to an audience that doesn't know quantum mechanics. Maybe a water analogy? The atom emitting a photon is like a tiny stone tossed into a pond. The size of the ripples emitted by the tiny stone is a lot bigger than the size of the stone itself.

Mind reading: Maybe you read this as something like "We're not actually seeing the atom, we're just seeing 'a glow created by the atom' that's a lot bigger than the atom, therefore this isn't a picture of the atom." That's false. This goes back to why I was making fun of the reporter who said "We're not seeing the atom, we're just seeing the light emitted by the atom." Seeing the light emitted by something is what seeing is. You could just as well say "I'm not actually seeing my hand, I'm just seeing the light emitted by my hand," and you'd be just as wrong. Learning about photons should not destroy your ability to see things, and if it does you're doing philosophy very wrong.

This glow would be barely perceptible with the naked eye but becomes apparent with a little camera manipulation.

Yup, this is that "long exposure to the atom + short flash of the surroundings" technique I talked about in the previous comment. Maybe you see the word "manipulation" and assume that everything is meaningless and this "doesn't count"? I think this question stops being useful when you understand what was actually done to get the photo.

“The apparent size you see in the picture is what we’d call optical aberration,” Nadlinger said.

See above about the limitations of cameras.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-11T21:13:04.383Z · score: 2 (1 votes) · LW · GW
So, technically, you’re seeing light emitted from an atom and not the atom itself.

Hm. Not sure you want to pick this reporter as your example of philosophical sophistication. Next.

Without the long exposure effect, the atom wouldn't be visible to the naked eye.

I think this is an important caveat in the context of talking about how the photograph was taken - a long exposure (on the order of 10 seconds) was used to capture the atom, and then a flash was used to illuminate the surroundings. Without this technique, the atom would not be visible in an image that also included the surroundings. What it is not, however, is a pronouncement on the inherent invisibility of atoms. But the next quote is.

So, seeing a single atom with the naked eye is impossible.

Congrats, you found a reporter saying the thing. However, this random reporter's opinion isn't necessarily true. To be honest, I'm not really sure if an atom really is visible to the naked eye (at least not as a continuous object, since humans can detect even single photons but only stochastically), or if you'd need binoculars to see it. Let's do some math. From here, let's suppose our trapped atom emits 20 million photons per second, spread across all angles. The faintest visible stars are about magnitude +6. How many photons per second is that?

Well, the sun shines down with about 1000 W/m^2 and it's magnitude -27. Since magnitude is a logarithmic scale, this means a magnitude +6 star shines about 6 × 10^-11 W/m^2 on us.

For the 400nm light used in the stackexchange comment, 20 million photons per second is just about 10^-11 Watts. This means that the surface of the sphere around the atom is only allowed to be 1/6th of a m^2, which means radius of 12 cm (about 5"). So if you were to put your eye 12 cm away from the atom, it would be as bright as the faintest stars humans can see in the sky. Farther than that and I'm happy to agree you couldn't see it with your naked eye.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-11T20:21:35.408Z · score: 2 (1 votes) · LW · GW

Trapped atoms are always illuminated by a laser that picks out one single wavelength emitted by the atom. This isn't necessarily the same color you'd see if these atoms were scattering sunlight - in addition to the color used in the lab, you might see a few other wavelengths as well, along with a generic bluish color due to Rayleigh scattering. But since each atom emits / absorbs different wavelengths, each will look different both under sunlight and when trapped in the lab.

Here's an example of trapped atoms emitting green light - figure 3 is a photo taken through an optical microscope:

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-11T19:20:15.224Z · score: 0 (2 votes) · LW · GW

Turns out, yes.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-11T19:09:10.293Z · score: 2 (1 votes) · LW · GW

Suppose you are handed a book. How do you figure out what color the cover of the book is?

Suppose you were handed a single atom. Could you judge it the same way that you did book?

Comment by charlie-steiner on Disincentives for participating on LW/AF · 2019-05-10T21:39:05.331Z · score: 28 (10 votes) · LW · GW

As someone basically thinking alone (cue George Thoroughgood), I definitely would value more comments / discussion. But if someone has access to research retreats where they're talking face to face as much as they want, I'm not surprised that they don't post much.

Talking is a lot easier than writing, and more immediately rewarding. It can be an activity among friends. It's more high-bandwidth to have a discussion face to face than it is over the internet. You can assume a lot more about your audience which saves a ton of effort. When talking, you are more allowed to bullshit and guess and handwave and collaboratively think with the other person, and still be interesting, wheras when writing your audience usually expects you to be confident in what you've written. Writing is hard, reading is hard, understanding what people have written is harder than understanding what people have said and if you ask for clarification that might get misunderstood in turn. This all applies to comments almost as much as to posts, particularly on technical subjects.

The two advantages writing has for me is that I can communicate in writing with people who I couldn't talk to, and that when you write something out you get a good long chance to make sure it's not stupid. When talking it's very easy to be convincing, including to yourself, even when you're confused. That's a lot harder in writing.

To encourage more discussion in writing one could try to change the format to reduce these barriers as much as possible - trying to foster one-to-one or small group threads rather than one-to-many, forstering/enabling knowledge about other posters, creating a context that allows for more guesswork and collaborative thinking. Maybe one underutilized tool on current LW is the question thread. Question threads are great excuses to let people bullshit on a topic and then engage them in small group threads.

Training human models is an unsolved problem

2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Comment by charlie-steiner on Open Thread May 2019 · 2019-05-07T20:57:49.844Z · score: 2 (1 votes) · LW · GW

If you cover a brown chair with blue paint, it becomes a blue chair. There is no answer to the question "What color is a chair?", because how a chair scatters light depends on context.

Chairs are non-fundamentally colored, so the only question you can even try to answer is "What color is this chair?"

Y'all are trying to rely on a dichotomy between "Fundamental particles are fundamentally colored" and "Fundamental particles have no color." That is a false dichotomy. The color of an electron depends on context - congrats, you have shown that it is not fundamentally colored, we agree.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-07T01:43:51.849Z · score: -6 (5 votes) · LW · GW

I think this is becoming much too abstract. "Can an atom be a color?" was not supposed to be one of those troll questions like "Is a hotdog a sandwich?"

If you want to know whether an atom can have color, you can just look at one. Here. That's an atom. As you can see, it is purple. If you wish to claim that this atom is not purple, because color depends on context, and if you encased a blade of grass in tungsten and threw it into the sun, would it still be green?, please jump in a lake whose color depends on context.

If you wish to claim that this atom is not purple because the objects invoked to explain the color of materials must themselves be colorless, please forward your mail to:

Robert Nozick

242 Emerson Hall

Harvard University


Comment by charlie-steiner on Value learning for moral essentialists · 2019-05-06T22:02:09.653Z · score: 5 (3 votes) · LW · GW

Yeah, I'm not 100% my caricature of a person actually exists or is worth addressing. They're mostly modeled on Robert Nozick, who is dead and cannot be reached for comment on value learning. But I had most of these thoughts and the post was really easy to write, so I decided to post it. Oh well :)

The person I am hypothetically thinking about is not very systematic - on average, they would admit that they don't know where morality comes from. But they feel like they learn about morality by interacting in some mysterious way with an external moral reality, and that an AI is going to be missing something important - maybe even be unable to do good - if they don't do that too. (So 90% overlap with your description of strong moral essentialism.)

I think these people plausibly should be for value learning, but are going to be dissatisfied with it and feel like it sends the wrong philosophical message.

Comment by charlie-steiner on Open Thread May 2019 · 2019-05-06T21:29:25.026Z · score: 2 (1 votes) · LW · GW
Consider the possibility that you are not being as clear as you think you are.

Fair enough.

are atoms colorless? No! Very no! They are colored in exactly the mundane way that everything is, which is rather the entire point.

Maybe it would help if I said that atoms are non-fundamentally colored, in exactly the same way that a chair, or grass, or the sky is non-fundamentally colored. The everyday meaning of "the grass is green" or "the sun is yellowish-white" extends nicely down to the atomic scale. Even a free electron (being a thing that scatters light) has a color.

When you make a reductionist explanation of color, it's the explanation that lacks color, not the atoms.

Value learning for moral essentialists

2019-05-06T09:05:45.727Z · score: 13 (5 votes)

Humans aren't agents - what then for value learning?

2019-03-15T22:01:38.839Z · score: 20 (6 votes)

How to get value learning and reference wrong

2019-02-26T20:22:43.155Z · score: 40 (10 votes)

Philosophy as low-energy approximation

2019-02-05T19:34:18.617Z · score: 38 (20 votes)

Can few-shot learning teach AI right from wrong?

2018-07-20T07:45:01.827Z · score: 16 (5 votes)

Boltzmann Brains and Within-model vs. Between-models Probability

2018-07-14T09:52:41.107Z · score: 19 (7 votes)

Is this what FAI outreach success looks like?

2018-03-09T13:12:10.667Z · score: 53 (13 votes)

Book Review: Consciousness Explained

2018-03-06T03:32:58.835Z · score: 101 (27 votes)

A useful level distinction

2018-02-24T06:39:47.558Z · score: 26 (6 votes)

Explanations: Ignorance vs. Confusion

2018-01-16T10:44:18.345Z · score: 18 (9 votes)

Empirical philosophy and inversions

2017-12-29T12:12:57.678Z · score: 8 (3 votes)

Dan Dennett on Stances

2017-12-27T08:15:53.124Z · score: 8 (4 votes)

Philosophy of Numbers (part 2)

2017-12-19T13:57:19.155Z · score: 11 (5 votes)

Philosophy of Numbers (part 1)

2017-12-02T18:20:30.297Z · score: 25 (9 votes)

Limited agents need approximate induction

2015-04-24T21:22:26.000Z · score: 1 (1 votes)