Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-19T23:33:00.158Z · score: 3 (2 votes) · LW · GW

Sure. It describes how humans aren't robust to distributional shift.

Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-18T16:56:21.431Z · score: 2 (1 votes) · LW · GW

I hope so! IRL and CIRL are really nice frameworks for learning from general behavior, and as far as I can tell, learning from verbal behavior requires a simultaneous model of verbal and general behavior, with some extra parts that I don't understand yet.

Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-17T21:55:17.297Z · score: 2 (1 votes) · LW · GW

I mostly agree, though you can really tell me we have the right answer once we can program it into a computer :) Human introspection is good at producing verbal behavior, but is less good at giving you a utility function on states of the universe. Part of the problem is that it's not like we have "a part of ourselves that does introspection" like it's some kind of orb inside our skulls - breaking human cognition into parts like that is yet another abstraction that has some free parameters to it.

Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-16T19:06:11.806Z · score: 2 (1 votes) · LW · GW
Does it seem clear to you that if you model a human as a somewhat complicated thermostat (perhaps making decisions according to some kind of flowchart) then you aren't going to predict that a human would write a post about humans being somewhat complicated thermostats?

Is my flowchart model complicated enough to emulate a RNN? Then I'm not sure.

Or one might imagine a model that has psychological parts, but distributes the function fulfilled by "wants" in an agent model among several different pieces, which might conflict or reinforce each other depending on context. This model could reproduce human verbal behavior about "wanting" with no actual component in the model that formalizes wanting.

If this kind of model works well, it's a counterexample (less compute-intensive than a microphysical model) of the idea I think you're gesturing towards, which is that the data really privileges models in which there's an agent-like formalization of wanting.

Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-16T18:01:46.344Z · score: 2 (1 votes) · LW · GW

Person A isn't getting it quite right :P Humans want things, in the usual sense that "humans want things" indicates a useful class of models I use to predict humans. But they don't Really Want things, the sort of essential Wanting that requires a unique, privileged function from a physical state of the human to the things Wanted.

So here's the dialogue with A's views more of an insert of my own:

A: Humans aren't agents, by which I mean that humans don't Really Want things. It would be bad to make an AI that assumes they do.

B: What do you mean by "bad"?

A: I mean that there wouldn't be such a privileged Want for the AI to find in humans - humans want things, but can be modeled as wanting different things depending on the environment and level of detail of the model.

B: No, I mean how could you cash out "bad" if not in terms of what you Really Want?

A: Just in terms of what I regular, contingently want - how I'm modeling myself right now.

B: But isn't that a privileged model that the AI could figure out and then use to locate your wants? And since these wants so naturally privileged, wouldn't that make them what you Really Want?

A: The AI could do something like that, but I don't like to think of that as finding out what I Really Want. The result isn't going to be truly unique because I use multiple models of myself, and they're all vague and fallible. And maybe more importantly, programming an AI to understand me "on my own terms" faces a lot of difficult challenges that don't make sense if you think the goal is just to translate what I Really Want into the AI's internal ontology.

B: Like what?

A: You remember the Bay Area train analogy at the end of The Tails Coming Apart as Metaphor for Life? When the train lines diverge, thinking of the problem as "figure out what train we Really Wanted" doesn't help, and might divert people from the possible solutions, which are going to be contingent and sometimes messy.

B: But eventually you actually do follow one of the train lines, or program it into the AI, which uniquely specifies that as what you Really Want! Problem solved.

A: "Whatever I do is what I wanted to do" doesn't help you make choices, though.

Comment by charlie-steiner on Humans aren't agents - what then for value learning? · 2019-03-16T02:09:31.888Z · score: 5 (3 votes) · LW · GW

Could you elaborate on what you mean by "if your model of humans is generative enough to generate itself, then it will assign agency to at least some humans?" I think the obvious extreme is a detailed microscopic model that reproduces human behavior without using the intentional stance - is this a model that doesn't generate itself, or is this a model that assigns agency to some humans?

It seems to me that you're relying on the verb "generate" here to involve some sort of human intentionality, maybe? But the argument of this post is that our intentionality is inexact and doesn't suffice.

Suppose you are building an AI and want something from it. Then you are an agent with respect to that thing, since you want it.

There's wanting, and then there's Wanting. The AI's model of me isn't going to regenerate my Real Wanting, which requires the Essence of True Desire. It's only going to regenerate the fact that I can be modeled as wanting the thing. But I can be modeled as wanting lots of things, is the entire point.

Comment by charlie-steiner on A theory of human values · 2019-03-15T22:55:59.178Z · score: 2 (1 votes) · LW · GW

This has prompted me to get off my butt and start publishing the more useful bits of what I've been thinking about. Long story short, I disagree with you while still almost entirely agreeing with you.

This isn't really the full explanation of why I think the AI can't just be given a human model and told to fill it in, though. For starters, there's also the issue about whether the human model should "live" in the AI's native ontology, or whether it should live in its own separate, "fictional" ontology.

I've become more convinced of the latter - that if you tell the AI to figure out "human values" in a model that's interacting with whatever its best-predicting ontology is, it will come up with values that include things as strange as "Charlie wants to emit CO2" (though not necessarily in the same direction). Instead, its model of my values might need to be described in a special ontology in which human-level concepts are simple but the AI's overall predictions are worse, in order for a predictive human model to actually contain what I'd consider to be my values.

Comment by charlie-steiner on A cognitive intervention for wrist pain · 2019-03-15T22:28:50.898Z · score: 4 (2 votes) · LW · GW

Sure. And my comment is more aimed at the audience than at Richard - I don't know him, and I agree that reducing stress can help, and can help more the more you're stressed. Maybe some parts of his story seem like they could also fit with a story of injury and healing (did you know that wrists feeling strange, swollen or painful at night or after other long periods of stillness can be because of reduced flow of lymph fluid through inflamed wrists?), but they could also fit with his story of stress. I think this is one of those posts that has novelty precisely because the common view is actually right most of the time, and my past self probably needed to take the common view into account more.

Humans aren't agents - what then for value learning?

2019-03-15T22:01:38.839Z · score: 20 (6 votes)
Comment by charlie-steiner on A cognitive intervention for wrist pain · 2019-03-15T20:15:46.610Z · score: 6 (6 votes) · LW · GW

You say " there would be an epidemic of wrist pain at typing-heavy workplaces" as if there isn't a ton of wrist pain at typing-heavy workplaces. And, like, funny how stress is making your wrists hurt rather than your toes or elbows, right?

I think, as one grows old, one gets a better sense that the human body just breaks down sometimes, and doesn't repair itself perfectly. Those horribly injured solders you bring up probably had aches and pains sometimes for the rest of their life that they never really talked about, because other people wouldn't understand. My mom has pain in her left foot sometimes from where she broke it 40 years ago. And eventually, our bodies will just accumulate injuries more and more until we die.

If you have pain that you think is due to wrist inflammation, check out the literature and take action to the degree you can. The mind can control pain quite well, and the human body is tough, but if you do manage to injure yourself you'll regret it.

Comment by charlie-steiner on Open Thread February 2019 · 2019-03-05T18:08:55.086Z · score: 4 (2 votes) · LW · GW

Definitely depends on the field. For experimental papers in the field I'm already in, it only takes like half an hour, and then following up on the references for things I need to know the context for takes an additional 0.5-2 hours. For theory papers 1-4 hours is more typical.

Comment by charlie-steiner on Syntax vs semantics: alarm better example than thermostat · 2019-03-04T20:32:09.354Z · score: 2 (1 votes) · LW · GW

Sure. "If it's smart, it won't make simple mistakes." But I'm also interested in the question of whether, given the first few in this sequence of approximate agents, one could do a good job at predicting the next one.

It seems like you could - like there is a simple rule governing these systems ("check whether there's a human in the greenhouse") that might involve difficult interaction with the world in practice but is much more straightforward when considered from the omniscient third-person view of imagination. And given that this rule is (arguendo) simple within a fairly natural (though not by any means unique) model of the world, and that it helps predict the sequence, one might be able to guess that this rule was likely just from looking at the sequence of systems.

(This also relies on the distinction between just trying to find likely or good-enough answers, and the AI doing search to find weird corner cases. The inferred next step in the sequence might be expected to give similar likely answers, with no similar guarantee for corner-case answers.)

Comment by charlie-steiner on To understand, study edge cases · 2019-03-03T14:00:29.777Z · score: 5 (3 votes) · LW · GW

Is this contra https://www.lesswrong.com/posts/aNaP8eCiKW7wZxpFt/philosophy-as-low-energy-approximation ?

To repeat my example from there, to understand superconductivity it doesn't help much to smash them into their components, even though it helps a lot for understanding atoms. A non-philosophical example from your list where people went to the "extremist" view for a little too long might be mental health before the rise of positive psychology.

Comment by charlie-steiner on So You Want to Colonize The Universe Part 4: Velocity Changes and Energy · 2019-02-27T23:28:04.012Z · score: 2 (1 votes) · LW · GW

Minor nitpick: diamond is only metastable, especially at high temperatures. It will slowly turn to graphite. After sufficient space travel, all diamond parts will be graphite parts.

Comment by charlie-steiner on So You Want to Colonize the Universe Part 2: Deep Time Engineering · 2019-02-27T19:11:45.015Z · score: 5 (3 votes) · LW · GW
There are exceptions. The sea walls in the Netherlands are sized for 10,000-year flood numbers, and I got a pleasant chill up my back when I read that, because there's something really nice about seeing a civilization build for thousands of years in the future.

Or at least people taking expected value mildly seriously and not just copying the engineering standards acceptable for roads.

Comment by charlie-steiner on So You Want to Colonize The Universe · 2019-02-27T18:59:43.916Z · score: 3 (2 votes) · LW · GW

How about deliberately launching probes that you could catch up with by expending more resources, but are still adequate to reach our local group of galaxies? That sounds like a pretty interesting idea. Like "I'm pretty sure moral progress has more or less converged, but in case I really want to change my mind about the future of far-off galaxies, I'll leave myself the option to spend lots of resources to send a faster probe."

If we imagine that the ancient Greeks had the capability to launch a Von Neumann probe to a receding galaxy, I'd rather they do it (and end up with a galaxy full of ancient Greek morality) than they let it pass over the cosmic horizon.

Comment by charlie-steiner on Is LessWrong a "classic style intellectual world"? · 2019-02-27T18:54:16.772Z · score: 4 (3 votes) · LW · GW

Different people write with different goals. Writing is useful for forcing me to think, and to the extent I want attention I want it from a fairly small group. On the other hand, high readership and karma naturally accrues to people who write the sorts of things that get high reader counts.

I have literally zero problems with this natural scale of karma-numbers as long as it's not actively interfering with my goals on the site. Maybe if I was a reader who wanted to use karma-number to sort posts, I would be inconvenienced by the stratification by topic.

Comment by charlie-steiner on How to get value learning and reference wrong · 2019-02-27T15:32:33.855Z · score: 2 (1 votes) · LW · GW

Definitely a parallel sort of move! I would have already said that I was pretty rationality anti-realist, but I seem to have become even more so.

I think if I had to describe how I've changed my mind briefly, it would be something about before, I thought that to learn an AI stand-in for human preferences you should look at the effect on the real world. Now, I take much more seriously the idea that human preferences "live" in a model that is itself a useful fiction.

How to get value learning and reference wrong

2019-02-26T20:22:43.155Z · score: 35 (10 votes)
Comment by charlie-steiner on The Case for a Bigger Audience · 2019-02-23T20:07:05.097Z · score: 5 (3 votes) · LW · GW

I think people are just writing about less accessible things on average. I wouldn't want to have more comments just by not talking about abstruse topics, at the moment. I love you all, but I also love AI safety :P

I briefly looked at the comments within 1 year on some old LW posts, and the numbers seem to match from then too - only ~25 comments on the more rareified meta-ethics posts, well below average.

Comment by charlie-steiner on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-23T19:27:53.956Z · score: 5 (3 votes) · LW · GW

I'm reminded of Dennett's passage on the color red. Paraphrased:

To judge that you have seen red is to have a complicated visual discrimination process send a simple message to the rest of your brain. There is no movie theater inside the brain that receives this message and then obediently projects a red image on some inner movie screen, just so that your inner homunculus can see it and judge it all over again. Your brain only needs to make the judgment once!

Similarly, if I think I'm conscious, or feel emotion, it's not like our brain notices and then passes this message "further inside," to the "real us" (the homunculus). Your brain only has to go into the "angry state" once - it doesn't have to then send an angry message to the homunculus so that you can really feel anger.

Comment by charlie-steiner on Thoughts on Human Models · 2019-02-22T11:57:19.389Z · score: 2 (1 votes) · LW · GW

I actually think this is pretty wrong (posts forthcoming, but see here for the starting point). You make a separation between the modeled human values and the real human values, but "real human values" are a theoretical abstraction, not a basic part of the world. In other words, real human values were always a subset of modeled human values.

In the example of designing a transit system, there is an unusually straightforward division between things that actually make the transit system good (by concise human-free metrics like reliability or travel time), and things that make human evaluators wrongly think it's good. But there's not such a concise human-free way to write down general human values.

The pitfall of optimization here happens when the AI is searching for an output that has a specific effect on humans. If you can't remove the fact that there is a model of humans involved, then the AI has to be evaluating its output in some other way than modeling the human's reaction to it.

Comment by charlie-steiner on Probability space has 2 metrics · 2019-02-10T20:42:49.752Z · score: 3 (2 votes) · LW · GW

Awesome idea! I think there might be something here, but I think the difference between "no chance" and "0.01% chance" is more of a discrete change from not tracking something to tracking it. We might also expect neglect of "one in a million" vs "one in a trillion" in both updates and decision-making, which causes a mistake opposite that predicted by this model in the case of decision-making.

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-10T03:14:54.835Z · score: 2 (1 votes) · LW · GW

About 95%. Because philosophy is easy* and full of obvious confusions.

(* After all, anyone can do it well enough that they can't see their own mistakes. And with a little more effort, you can't even see your mistakes when they're pointed out to you. That's, like, the definition of easy, right?)

95% isn't all that high a confidence, if we put aside "how dare you rate yourself so highly?" type arguments for a bit. I wouldn't trust a parachute that had a 95% chance of opening. Most of the remaining 5% is not dualism being true or us needing a new kind of science, it's just me having misunderstood something important.

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-08T19:53:40.625Z · score: 4 (2 votes) · LW · GW

Anyhow, I agree that we have long since been rehashing standard arguments here :P

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-07T22:41:12.593Z · score: 7 (3 votes) · LW · GW

Seeing red is more than a role or disposition. That is what you have left out.

Suppose epiphenomenalism is true. We would still need two separate explanations - one explanation of your epiphenomenal activity in terms of made-up epiphenomenology, and a different explanation for how your physical body thinks it's really seeing red and types up these arguments on LessWrong, despite having no access to your epiphenomena.

The mere existence of that second explanation makes it wrong to have absolute confidence in your own epiphenomenal access. After all, we've just described approximate agents that think they have epiphenomenal access, and type and make facial expressions and release hormones as if they do, without needing any epiphenomena at all.

We can imagine the approximate agent made out of atoms, and imagine just what sort of mistake it's making when it says "no, really, I see red in a special nonphysical way that you have yet to explain" even when it doesn't have access to the epihpenomena. And then we can endeavor not to make that mistake.

If I, the person typing these words, can Really See Redness in a way that is independent or additional to a causal explanation of my thoughts and actions, my only honest course of action is to admit that I don't know about it.

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-06T19:19:07.482Z · score: 7 (3 votes) · LW · GW

I'm supposing that we're conceptualizing people using a model that has internal states. "Agency" of humans is shorthand for "conforms to some complicated psychological model."

I agree that I do see red. That is to say, the collection of atoms that is my body enters a state that plays the same role in the real world as "seeing red" plays in the folk-psychological model of me. If seeing red makes the psychological model more likely to remember camping as a child, exposure to a red stimulus makes the atoms more likely to go into a state that corresponds to remembering camping.

"No, no," you say. "That's not what seeing red is - you're still disagreeing with me. I don't mean that my atoms are merely in a correspondence with some state in an approximate model that I use to think about humans, I mean that I am actually in some difficult to describe state that actually has parts like the parts of that model."

"Yes," I say "- you're definitely in a state that corresponds to the model."

"Arrgh, no! I mean when I see red, I really see it!"

"When I see red, I really see it too."

...

It might at this point be good for me to reiterate my claim from the post, that rather than taking things in our notional world and asking "what is the true essence of this thing?", it's more philosophically productive to ask "what approximate model of the world has this thing as a basic object?"

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-06T18:31:45.357Z · score: 4 (2 votes) · LW · GW

Then the thought experiment is a useful negative result telling us we need something more comprehensive.

Paradigms also outline which negative results are merely noise :P I know it's not nice to pick on people, but look at the negative utilitarians. They're perfectly nice people, they just kept looking for The Answer until they found something they could see no refutation of, and look where that got them.

I'm not absolutely against thought experiments, but I think that high-energy philosophy as a research methodology is deeply flawed.

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-06T14:04:38.576Z · score: 8 (2 votes) · LW · GW

Suppose that we show how certain physical processes play the role of qualia within an abstract model of human behavior. "This pattern of neural activities means we should think of this person as seeing the color red," for instance.

David Chalmers might then say that we have merely solved an "easy problem," and that what's missing is whether we can predict that this person - this actual first-person point of view - is actually seeing red.

This is close to what I parody as "Human physical bodies are only approximate agents, so how does this generate the real Platonic agent I know I am inside?"

When I think of myself as an abstract agent in the abstract state of "seeing red," this is not proof that I am actually an abstract Platonic Agent in the abstract state of seeing red. The person in the parody has been misled by their model of themselves - they model themselves as a real Platonic agent, and so they believe that's what they have to be.

Once we have described the behavior of the approximate agents that are humans, we don't need to go on to describe the state of the actual agents hiding inside the humans.

Comment by charlie-steiner on Philosophy as low-energy approximation · 2019-02-06T13:40:12.134Z · score: 2 (1 votes) · LW · GW

Also replying to:

I am not clear how you are defining HEphil: do you mean (1) that any quest for the ontologically basic is HEphil, or (2) treating mental properties as physical is the only thing that is HEphil ?

Neither of those things is quite what I meant - sorry if I was unclear. The quest for the ontologically basic is what I call "thinking you're like a particle physicist," (not inherently bad, but I make the claim that when done to mental objects it's pretty reliably bad). This is distinct from "high energy philosophy," which I'm trying to use in a similar way to Scott.

High Energy Philosophy is the idea that extreme thought experiments help illuminate what we "really think" about things - that our ordinary low-energy thoughts are too cluttered and dull, but that we can draw out our intuitions with the right thought experiment.

I argue that this is a dangerous line of thought because it's assuming that there exists some "what we really think" that we are uncovering. But what if we're thinking using an approximation that doesn't extend to all possible situations? Then asking what we really think about extreme situations is a wrong question.

[Even worse is when people ignore the fact that the concept is a human invention at all, and try to understand "the true nature of belief" (not just what we think about belief) by conceptual analysis.]

So, now, back the the question of "the correct ethical theory." What, one might ask, is the correct ethical theory that captures what we really value in all possible physical situations (i.e. "extends to high energy")?

Well, one can ask that, but maybe it doesn't have an answer. Maybe, in fact, there is no such object as "what we really value in all possible physical situations" - it might be convenient to pretend there is in order to predict humans using a simple model, but we shouldn't try to push that model too far.

(EDIT: Thanks for asking me these questions / pressing me on these points, by the way.)

Comment by charlie-steiner on Greatest Lower Bound for AGI · 2019-02-05T23:24:41.192Z · score: 2 (3 votes) · LW · GW

2016.


Quantum immortality!

(jk)

Comment by charlie-steiner on AI Safety Prerequisites Course: Revamp and New Lessons · 2019-02-05T23:17:06.617Z · score: 2 (1 votes) · LW · GW

Seems interesting, thanks!

I definitely think machine learning topics are useful. Given that there's so much stuff out there and you can only cover a small fraction of it, maybe recent machine learning topics are a point of comparative advantage, even. The best textbook on set theory is probably pretty good already.

Another service that could take advantage of pre-existing textbooks is short summaries, designed to give people just enough of a taste to make an informed decision about reading said good textbook. Probably easier than developing a course on algorithmic information theory, or circuit complexity, or whatever.

Comment by charlie-steiner on What we talk about when we talk about life satisfaction · 2019-02-05T22:58:52.438Z · score: 2 (1 votes) · LW · GW

I think the way to make sense of this (And e.g. surveys that ask this question) might be tautological. "It's 0-10 on whatever opaque process I use to answer this question."

This makes the absolute number nearly meaningless, though given human habits you can probably figure out approximate emotional valences of 0, 1-3, 4-5, 6-9, and 10. But depending on how stable the average person's opaque mapping of emotional state to number is, it might still yield really interesting cross-time and cross-population comparisons.

Philosophy as low-energy approximation

2019-02-05T19:34:18.617Z · score: 38 (20 votes)
Comment by charlie-steiner on How does Gradient Descent Interact with Goodhart? · 2019-02-02T06:51:47.333Z · score: 9 (4 votes) · LW · GW

In the rocket example, procedures A and B can both be optimized either by random sampling or by local search. A is optimizing some hand-coded rocket specifications, while B is optimizing a complicated human approval model.

The problem with A is that it relies on human hand-coding. If we put in the wrong specifications, and the output is extremely optimized, there are two possible cases: we recognize that this rocket wouldn't work and we don't approve it, or we think that it looks good but are probably wrong, and the rocket doesn't work.

On the upside, if we successfully hand-coded in how a rocket should be, it will output working rockets.

The problem with B is that it's simply the wrong thing to optimize if you want a working rocket. And because it's modeling the environment and trying to find an output that makes the environment-model do something specific, you'll get bad agent-like behavior.

Let's go back to take a closer look at case A. Suppose you have the wrong rocket specifications, but they're "pretty close" in some sense. Maybe the most spec-friendly rocket doesn't function, but the top 0.01% of designs by the program are mostly in the top 1% of rockets ranked by your approval.

The programmed goal is proxy #1. Then you look at some of the sampled (either randomly or through local search) top 0.01% designs for something you think will fly. Your approval is proxy #2. Your goal is the rocket working well.

What you're really hoping for in designing this system is that even if proxy #1 and proxy #2 are both misaligned, their overlap or product is more aligned - more likely to produce an actual working rocket - than either alone.

This makes sense, especially under the model of proxies as "true value + noise," but to the extent that model is violated maybe this doesn't work out.

This is another way of seeing what's wrong with case B. Case B just purely optimizes proxy #2, when the whole point of having human approval is to try to combine human approval with some different proxy to get better results.

As for local search vs. random sampling, this is a question about the landscape of your optimized proxy, and how this compares to the true value - neither way is going to be better literally 100% of the time.

If we imagine local optimization like water flowing downhill in the U.S., given a random starting point, the water is much more likely to end up at the mouth of the Mississippi river than it is to end up in Death Valley, even though Death Valley is below sea level. The Mississippi just has a broad network of similar states that lead into it via local optimization, whereas Death Valley is a "surprising" optimum. Under random sampling, you're equally likely to find equal areas of the mouth of the Mississippi or Death Valley.

Applying this to rockets, I would actually expect local search to produce much safer results in case B. Working rockets probably have broad basins of similar almost-working rockets that feed into them in configuation-space, whereas the rocket that spells out a message to the experimenter is quite a bit more fragile to perturbations.

(Even if rockets are so complicated and finnicky that we expect almost-working rockets to be rarer than convincing messages to the experimenter, we still might think that the gradient landscape makes gradient descent relatively better.)

In case A, I would expect much less difference between locally optimizing proxy #1 and sampling until it was satisfied. The difference for human approval came because we specifically didn't want to find the unstable, surprising maxima of human approval. And maybe the same is true of our hand-coded rocket specifications, but I would expect this to be less important.

Comment by charlie-steiner on [Link] Did AlphaStar just click faster? · 2019-01-29T00:08:15.856Z · score: 4 (3 votes) · LW · GW

It was using micro effectively, but the crazy 1200+ APM fight was pretty unusual. If you look at most of its fights (e.g. https://youtu.be/H3MCb4W7-kM?t=2854 , APM number appears intermittently in the center-bottom ), with 6-10 units, it's using about the same APM as the human - the micro advantage for 98% of the game isn't because it's clicking faster, its clicks are just better.

There were a bunch of mistakes in the first matches shown, but when they trained for twice as long it seemed like those mistakes mostly went away, and its macro play seemed within the range of skilled humans (if you're willing to suspect that overbuilding probes might be good).

Comment by charlie-steiner on [Link] Did AlphaStar just click faster? · 2019-01-28T21:55:40.273Z · score: 14 (7 votes) · LW · GW

Well, it definitely may have had an advantage that embodied humans can't have. "Does perfect stalker micro really count as intelligence?", we wail. But you have to remember that previous starcraft bots playing with completely unrestricted apm weren't even close to competitive level. I think that the evidence is pretty strong that AlphaStar (at least the version without attention that just perceived the whole map) could beat humans under whatever symmetric APM cap you want.

Comment by charlie-steiner on Alignment Newsletter #42 · 2019-01-24T02:01:17.018Z · score: 8 (3 votes) · LW · GW

I want to like the AI alignment podcast, but I feel like Lucas is over-emphasizing thinking new thoughts and asking challenging questions. When this goes wrong (it felt like ~30% of this episode, but maybe this is a misrecollection), it ends up taking too much effort to understand for not enough payoff.

There are sometimes some big words and abstruse concepts dragged in, just to talk about some off-the-cuff question and then soon jump to another. But I don't think that's what I want as a member of the audience. I'd prefer a gentler approach that used smaller words and broke down the abstruse concepts more, and generated interest by delving into interesting parts of "obvious" questions, rather than jumping to new questions.

In short, I think I'd prefer a more curiosity-based interviewing style - I like it most when he's asking the guests what they think, and why they think that, and what they think is important. I don't know if you (dear reader) have checked out Sean Carroll's podcast, but his style is sort of an extreme of this.

Comment by charlie-steiner on Why not tool AI? · 2019-01-20T08:01:52.760Z · score: 7 (4 votes) · LW · GW

Any time you have a search process (and, let's be real, most of the things we think of as "smart" are search problems), you are setting a target but not specifying how to get there. I think the important sense of the word "agent" in this context is that it's a process that searches for an output based on the modeled consequences of that output.

For example, if you want to colonize the upper atmosphere of Venus, one approach is to make an AI that evaluates outputs (e.g. text outputs of persuasive arguments and technical proposals) based on some combined metric of how much Venus gets colonized and how much it costs. Because it evaluates outputs based on their consequences, it's going to act like an agent that wants to pursue its utility function at the expense of everything else.

Call the above output "the plan" - you can make a "tool AI" that still outputs the plan without being an agent!

Just make it so that the plan is merely part of the output - the rest is composed according to some subprogram that humans have designed for elucidating the reasons the AI chose that output (call this the "explanation"). The AI predicts the results as if its output was only the plan, but what humans see is both the plan and the explanation, so it's no longer fulfilling the criterion for agency above.

In this example, the plan is a bad idea in both cases - the thing you programmed the AI to search for is probably something that's bad for humanity when taken to an extreme. It's just that in the "tool AI" case, you've added some extra non-search-optimized output that you hope undoes some of the work of the search process.

Making your search process into a tool by adding the reason-elucidator hopefully made it less disastrously bad, but it didn't actually get you a good plan. The problems that you need to solve to get a superhumanly good plan are in fact the same problems you'd need to solve to make the agent safe.

(Sidenote: This can be worked around by giving your tool AI a simplified model of the world and then relying on humans to un-simplify the resulting plan, much like Google Maps makes a plan in an extremely simplified model of the world and then you follow something that sort of looks like that plan. This workaround fails when the task of un-simplifying the plan becomes superhumanly difficult, i.e. right around when things get really interesting, which is why imagining a Google-Maps-like list of safe abstract instructions might be building a false intuition.)

In short, to actually find out the superintelligently awesome plan to solve a problem, you have to have a search process that's looking for the plan you want. Since this sounds a lot like an agent, and an unfriendly agent is one of the cases we're most concerned about, it's easy and common to frame this in terms of an agent.

Comment by charlie-steiner on The reward engineering problem · 2019-01-17T04:11:07.161Z · score: 1 (1 votes) · LW · GW

I'm not 100% sold on explaining actions as a solution here. It seems like the basic sorts of "attack" (exploiting human biases or limitations, sending an unintended message to the supervisor, sneaking a message to a 3rd party that will help contol the reward signal) still work fine - so long as the search process includes tue explainer as part of the environment. And if it doesn't, we run into the usual issue with such schemes: the AI predictably gets its predictions wrong, and so you need some guarantee that you can keep this AI and its descendants in this unnatural state.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-13T01:28:42.989Z · score: 4 (3 votes) · LW · GW
I'll probably end up mostly agreeing with Integrated Information theories

Ah... x.x Maybe check out Scott Aaronsons' blog posts on the topic (here and here)? I'm definitely more of the Denettian "consciousness is a convenient name for a particular sort of process built out of lots of parts with mental functions" school.

Anyhow, the reason I focused on drawing boundaries to separate my brain into separate physical systems is mostly historical - I got the idea from the Ebborians (further rambling here. Oh, right - I'm Manfred). I just don't find mere mass all that convincing as a reason to think that some physical system's surroundings are what I'm more likely to see next.

Intuitively it's something like a symmetry of my information - if I can't tell anything about my own brain mass just by thinking, then I shouldn't assign my probabilities as if I have information about my brain mass. If there are two copies of me, one on Monday with a big brain and one on Tuesday with a small brain, I don't see much difference in sensibleness between "it should be Monday because big brains are more likely" and "I should have a small brain because Tuesday is an inherently more likely day." It just doesn't compute as a valid argument for me without some intermediate steps that look like the Ebborians argument.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-12T09:42:24.537Z · score: 1 (1 votes) · LW · GW

It's about trying to figure out what's implied about your brain by knowing that you exist.

It's also about trying to draw some kind of boundary with "unknown environment to interact with and reason about" on one side and "physical system that is thinking and feeling" on the other side. (Well, only sort of.)

Treating a merely larger brain as more anthropically important is equivalent to saying that you can draw this boundary inside the brain (e.g. dividing big neurons down the middle), so that part of the brain is the "reasoner" and the rest of the brain, along with the outside, is the environment to be reasoned about.

This is boundary can be drawn, but I think it doesn't match my self-knowledge as well as drawing the boundary based on my conception of my inputs and outputs.

My inputs are sight, hearing, proprioception, etc. My outputs are motor control, hormone secretion, etc. The world is the stuff that affects my inputs and is affected by my outputs, and I am the thing doing the thinking in between.

If I tried to define "I" as the left half of all the neurons in my head, suddenly I would be deeply causally connected to this thing (the right halves of the neurons) I have defined as not-me. These causal connections are like a huge new input and output channel for this defined-self - a way for me to be influenced by not-me, and influence it in turn. But I don't notice this or include it in my reasoning - Paper and Scissors in the story are so ignorant about it that they can't even tell which of them has it!

So I claim that I (and they) are really thinking of themselves as the system that doesn't have such an interface, and just has the usual suite of senses. This more or less pins down the thing doing my thinking as the usual lump of non-divided neurons, regardless of its size.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-12T00:13:13.942Z · score: 3 (3 votes) · LW · GW

Very beautiful! Though see here, particularly footnote 1. I think there are pretty good reasons to think that our ability to locate ourselves as persons (and therefore our ability to have selfish preferences) doesn't depend on brain size or even redundancy, so long as the redundant parts are causally yoked together.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-10T08:15:51.995Z · score: 1 (1 votes) · LW · GW

This is true when getting training data, but I think it's a difference between A (or HCH) and AlphaGo Zero when doing simulation / amplification. Someone wins a simulated game of Go even if both players are making bad moves (or even random moves), which gives you a signal that A doesn't have access to.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-10T00:21:06.852Z · score: 1 (1 votes) · LW · GW

Oh, I've just realized that the "tree" was always intended to be something like task decomposition. Sorry about that - that makes the analogy a lot tighter.

Comment by charlie-steiner on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-09T03:46:41.535Z · score: 1 (1 votes) · LW · GW

Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.

A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than general map), and its output is too (directly controlling a heater rather than displaying directions). If we made a "Google Thermostat Maps" website that let you input your thermostat's state, and showed you a heater control value, it would perform the same computations as your thermostat but lose its apparent agency. The condition for us treating the thermostat like an agent isn't just what computation it's doing, it's that its input, search (such as it is), and output ontologies match and extend into the real world well enough that even very simple computation can produce behavior suitable for the intentional stance.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-09T03:25:36.946Z · score: 3 (2 votes) · LW · GW

MCTS works as amplification because you can evaluate future board positions to get a convergent estimate of how well you're doing - and then eventually someone actually wins the game, which keeps p from departing reality entirely. Importantly, the single thing you're learning can play the role of the environment, too, by picking the opponents' moves.

In trying to train A to predict human actions given access to A, you're almost doing something similar. You have a prediction that's also supposed to be a prediction of the environment (the human), so you can use it for both sides of a tree search. But A isn't actually searching through an interesting tree - it's searching for cycles of length 1 in its own model of the environment, with no particular guarantee that any cycles of length 1 exist or are a good idea. "Tree search" in this context (I think) means spraying out a bunch of outputs and hoping at least one falls into a fixed point upon iteration.

EDIT: Big oops, I didn't actually understand what was being talked about here.

Comment by charlie-steiner on Coherence arguments do not imply goal-directed behavior · 2019-01-03T23:35:07.330Z · score: 3 (2 votes) · LW · GW

Sorry, this was a good response to my confused take - I promised myself I'd write a response but only ended up doing it now :)

I think the root of my disagreeing-feeling is that when I talk about things like "it cares" or "it values," I'm in a context where the intentional stance is actually doing useful work - thinking of some system as an agent with wants, plans, goals, etc. is in some cases a useful simplification that helps me better predict the world. This is especially true when I'm just using the words informally - I can talk about the constantly-twitching agent wanting to constantly twitch, when using the words deliberately, but I wouldn't use this language intuitively, because it doesn't help me predict anything the physical stance wouldn't. It might even mislead me, or dilute the usefulness of intentional stance language. This conflict with intuition is a lot of what's driving my reaction this this argument.

The other half of the issue is that I'm used to thinking of intentional-stance features as having cognitive functions. For example, if I "believe" something, this means that I have some actual physical pattern inside me that performs the function of a world-model, and something like plans, actions, or observations that I check against that world-model. The physical system that constantly twitches can indeed be modeled by an agent with a utility function over world-histories, but that agent is in some sense an incorporeal soul - the physical system itself doesn't have the cognitive functions associated with intentional-stance attributes (like "caring about coherence").

Comment by charlie-steiner on Perspective Reasoning and the Sleeping Beauty Problem · 2019-01-03T23:06:19.527Z · score: 1 (1 votes) · LW · GW

Sorry for the slow reply. I eventually looked though the pdf, but really just wanted to argue one more time against this idea that "what day is it?" is not a valid statement. Today is Thursday. This is a fact that has very strong predictive powers. I can test it by checking the internet or asking someone for the date. It is part of the external world, just as much as "did I go dancing on New Year's Eve 2018?" or "when I flip this coin, will it land Heads?"

It seems to me like you're treating Sleeping Beauty as being some sort of communal consciousness encompassing all of her copies, and there's no fact of the matter about what "today" is for this communal consciousness. But that's not actually what's happening - each instance of Sleeping Beauty is a complete person, and gets to have beliefs about the external world just as good as any other person's.

You might also think of the problem in terms of Turing machines that get fed a tape of observations, trying to predict the next observation. By "Today is Thursday" I mean that the state of my Turing machine is such that I predict I'll see "Thursday" when I click on the time at the bottom-left of my computer screen (there is, after all, no real thursday-ness that I could identify by doing a fundamental physics experiment). The copies of Sleeping Beauty can be thought of as identical Turing machines that have been fed identical observations so far, but can't tell for certain what they will observe when they look at a calendar.

Comment by charlie-steiner on Card Balance and Artifact · 2018-12-28T16:25:47.389Z · score: 2 (2 votes) · LW · GW

What a well-balanced card pool on the card level means is that players are confronted with many more meaningful choices about what gameplan they want to be executing, which makes the draft and deckbuilding experience a lot richer. If you're a constructed player who just plays the meta deck, it doesn't matter so much, I agree.

Comment by charlie-steiner on Card Collection and Ownership · 2018-12-27T17:55:19.143Z · score: 2 (2 votes) · LW · GW

I did end up getting Artifact. In the patch notes they said something that made me very hopeful, which was something like "players want a good game above all else." Balancing cards makes the game itself better, therefore you should do it - that is, if you're trying to appeal to players like me who care less about treating the cards as physical objects or investments. I agree that balancing cards is going to reduce incentive to trade, and will happily bet that cosmetic monetization is coming.

Comment by charlie-steiner on Kindergarten in NYC: Much More than You Wanted to Know · 2018-12-26T05:19:20.719Z · score: 1 (1 votes) · LW · GW

I went to an elementary school in Michigan that was 60% free and reduced lunch, no art class (music starting in 3rd grade), half an hour of recess per day, rooms that felt cramped when I went back to visit them as a teenager. I went to a middle school that had slit-like windows, indicating that it had been built (or, for some parts, renovated) in the 1970s. Some of these things may not matter to your child, as they more or less didn't matter for me.

What made my public school district work for me was that I could find ~4 friends in it, that they made liberal use of tracking and partial-day advanced programs to help meet my needs, and that they fed me and kept me from dying of exposure while I read books. More or less full stop. My point is that things that seem important (particularly facilities) might not be, and that looking at my schools in terms of averages would have been misleading when I ended up in a rather specialized pocket of it.

Oh, and (more unsolicited opinion) going to social justice school probably won't hurt your kid. How much propaganda do you remember from 2nd grade?

Comment by charlie-steiner on The Pavlov Strategy · 2018-12-25T00:24:42.011Z · score: 5 (4 votes) · LW · GW

Huh, intersting paper. That's 1993 - Is there a more modern version with more stochastic parameters explored? Seems like an easy paper if not.

I'm also reminded of how computer scientists often end up doing simulations rather than basic math. This seems like a complicated system of equations, but maybe you could work out its properties with a couple of hours and basic nonlinear dynamics knowledge.

Can few-shot learning teach AI right from wrong?

2018-07-20T07:45:01.827Z · score: 16 (5 votes)

Boltzmann Brains and Within-model vs. Between-models Probability

2018-07-14T09:52:41.107Z · score: 19 (7 votes)

Is this what FAI outreach success looks like?

2018-03-09T13:12:10.667Z · score: 53 (13 votes)

Book Review: Consciousness Explained

2018-03-06T03:32:58.835Z · score: 101 (27 votes)

A useful level distinction

2018-02-24T06:39:47.558Z · score: 26 (6 votes)

Explanations: Ignorance vs. Confusion

2018-01-16T10:44:18.345Z · score: 18 (9 votes)

Empirical philosophy and inversions

2017-12-29T12:12:57.678Z · score: 8 (3 votes)

Dan Dennett on Stances

2017-12-27T08:15:53.124Z · score: 8 (4 votes)

Philosophy of Numbers (part 2)

2017-12-19T13:57:19.155Z · score: 11 (5 votes)

Philosophy of Numbers (part 1)

2017-12-02T18:20:30.297Z · score: 25 (9 votes)

Limited agents need approximate induction

2015-04-24T21:22:26.000Z · score: 1 (1 votes)