Posts

Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z · score: 15 (11 votes)
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z · score: 9 (3 votes)
The AI is the model 2019-10-04T08:11:49.429Z · score: 12 (10 votes)
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z · score: 17 (5 votes)
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z · score: 14 (5 votes)
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Value learning for moral essentialists 2019-05-06T09:05:45.727Z · score: 13 (5 votes)
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z · score: 20 (6 votes)
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z · score: 40 (10 votes)
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z · score: 40 (21 votes)
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z · score: 16 (5 votes)
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z · score: 19 (7 votes)
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z · score: 53 (13 votes)
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z · score: 101 (27 votes)
A useful level distinction 2018-02-24T06:39:47.558Z · score: 26 (6 votes)
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z · score: 18 (9 votes)
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z · score: 8 (3 votes)
Dan Dennett on Stances 2017-12-27T08:15:53.124Z · score: 8 (4 votes)
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z · score: 11 (5 votes)
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z · score: 25 (9 votes)
Limited agents need approximate induction 2015-04-24T21:22:26.000Z · score: 1 (1 votes)

Comments

Comment by charlie-steiner on Is backwards causation necessarily absurd? · 2020-01-15T14:41:02.554Z · score: 2 (1 votes) · LW · GW

I think we can go a bit farther in predicting that backwards causation will be a useful concept in some very specific cases, which will break down far above the scale of the normal second law.

We "see" backwards causation when we know the outcome but not how the system will get there. What does this behavior sound like a hallmark of? Optimization processes! We can predict in advance that backwards causation will be a useful idea to talk about the behavior of some optimization processes, but that it will stop contributing useful information when we want to zoom in past the "intentional stance" level of description.

Comment by charlie-steiner on Ascetic aesthetic · 2020-01-15T14:15:21.871Z · score: 2 (1 votes) · LW · GW

I thought "aesthetics come from facts" was going to go off into evolutionary psychology. Health being good for our genes is a fact that explains why (without explaining away) health is aesthetically better than sickness (for most people), etc.

Comment by charlie-steiner on Artificial Intelligence and Life Sciences (Why Big Data is not enough to capture biological systems?) · 2020-01-15T14:07:39.146Z · score: 1 (2 votes) · LW · GW

As with many film franchises, the first Jurassic Park movie is actually titled "Jurassic Park."

Comment by charlie-steiner on Open & Welcome Thread - January 2020 · 2020-01-13T06:44:54.963Z · score: 3 (2 votes) · LW · GW

I bet you'd like Jim Guthrie.

https://jimguthrie.bandcamp.com/album/takes-time

I'm basically thinking of half the tracks on this album. "Taking My Time," "Difference a Day Makes," "Before and After," "The Rest is Yet To Come," "Don't Be Torn," and "Like a Lake."

An unexplainable thing
I'll have to change to stay the same
Just like this bottle of wine
It's gonna take time no doubt

It's not hard
Letting go
But it's hard
Even so

And you say
‘Come here and sit down
Don't try to own it all’

And you said ‘The rest is yet to come’
I said ‘Don't you mean the best?’
You said ‘We're making a huge mess’
Won't lay down. Won't confess
All burnt out and won't succumb
Ah but the rest has yet to come
Comment by charlie-steiner on Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal · 2020-01-10T19:14:25.059Z · score: 2 (1 votes) · LW · GW

Note for anyone who (like me) wanted to know what the Kesselman Estimative Words are:

Almost certain: 86-99%

Highly likely: 71-85%

Likely: 56-70%

Chances a little better [or less] than even: 46-55%

Unlikely: 31-45%

Highly unlikely: 16-30%

Remote: 1-15%

Comment by charlie-steiner on What's the dream for giving natural language commands to AI? · 2020-01-06T00:43:26.715Z · score: 2 (1 votes) · LW · GW

Ah, I see what you mean. Yes, this is a serious problem, but (I think) this scheme does have forces that act against it - which makes more sense if you imagine what supervised vs unsupervised learning does to our encoder/decoder. (As opposed to lumping everything together into a semi-supervised training process.)

Supervised learning is the root of the problem, because the most accurate way to predict the supervised text from the world state is to realize that it's the output of a specific physical process (the keyboard). If we only had supervised learning, we'd have to make the training optimum different from the most accurate prediction, by adding a regularization term and the crossing our fingers that we'd correctly set the arbitrary parameters in it.

But the other thing going on in the scheme is that the AI is trying to compress text and sensory experience to the same representation using unsupervised learning. This is going to help to the extent that language shares important patterns with the world.

For example, if the AI hacks its text channel so that it's just a buffer full of "Human values are highly satisfied," this might (in the limit of lots of data and compute) make supervised learning happy. But unsupervised learning just cares about is the patterns it discovered that language and the world share.

(Though now that I think about it, in the limit of infinite compute, unsupervised learning also discovers the relationship between the text and the physical channel. But it still also cares about the usual correspondence between description and reality, and seems like it should accurately make a level distinction between reality and the text, so I need to think about whether this matters)

To the unsupervised learning, hacking the text channel looks (to the extent that you can do translation by compressing to a shared representation) like the sort of thing that might be described by sentences like "The AI is just sitting there" or "A swarm of nanomachines has been released to protect the text channel," not "Human values are highly satisfied."

So why consider supervised text/history pairs at all? Well, I guess just because supervised learning is way more efficient at picking out something that's at least sort of like the correspondence that we mean. Not just as a practical benefit - there might be multiple optima that unsupervised learning could end up in, and I think we want something close-ish to the supervised case.

Comment by charlie-steiner on Characterising utopia · 2020-01-02T23:21:36.366Z · score: 4 (2 votes) · LW · GW

It occurs to me that the pleasure/displeasure balance differs radically for the senses.

Obviously for our sense of pain, displeasure is favored / has more of the range, though it's also possible to have "a good ache" or other uses of pain as part of a pleasurable experience.

But for taste, I'm pretty sure I've never tasted anything even a quarter as bad as beef wellington is good. This includes raw bugs, etc. Taste just seems to be more about signalling good things than bad.

Smell and touch seem to be slightly positive-biased as well, while temperature sense seems biased negative. Sight and hearing seem positive-biased in normal circumstances but have lots of negative range in damaging/painful conditions that sometimes crop up.

Comment by charlie-steiner on Does GPT-2 Understand Anything? · 2020-01-02T22:37:27.402Z · score: 4 (3 votes) · LW · GW

Nice analysis!

The only part I'm skeptical about is the qualia argument. If it's supposed to be ineffable, why be sure it doesn't have it? If it's effable after all, then we can be more specific: for example, we might want words to be associated with abstract representations of sensory experience, which can be used for things like imagination, composition with other concepts, or top-down control.

Comment by charlie-steiner on What will quantum computers be used for? · 2020-01-02T21:51:16.572Z · score: 3 (2 votes) · LW · GW

Quantum computers aren't going to be owned by individuals for a long time, because of the requirement of cryogenic cooling. Instead, you'll rent time on a quantum computer, or use a service that itself uses a quantum computer.

Anyhow, what might be the equivalent of looking at ENIAC and predicting flappy bird? Audio processing software that uses fast fourier transforms, maybe? Least-action optimization algorithms that power the AI that recommends what videos you should watch?

It's hard; quantum computation is just more specialized (in our high-temperature environment) than classical computation.

Comment by charlie-steiner on What is a Rationality Meetup? · 2020-01-02T20:15:22.590Z · score: 7 (4 votes) · LW · GW

It's a meetup for people interested in psychology, human biases, science, AI, that sort of thing.

My-parents-approved.

Comment by charlie-steiner on What's the dream for giving natural language commands to AI? · 2020-01-02T19:46:42.050Z · score: 2 (1 votes) · LW · GW

Sorta the right ballpark. Lack of specificity is definitely my fault - I have more sympathy now for those academics who have a dozen publications that are restatements of the same thing.

I'm a bit more specific in my reply to steve2152 above. I'm thinking about this scheme as a couple of encoder-decoders stiched together at the point of maximal compression, which can do several different encoding/decoding tasks and therefore can be (and for practical purposes should be) trained on several different kinds of data.

For example, it can encode sensory information into an abstract representation, and then decode it back, so you can train that task. It can encode descriptive sentences into the same representation, and then decode them back, so you can train that task. This should reduce the amount of actual annotated text-sensorium pairs you need.

As for what to tell it to pattern-match for as a good state, I was thinking with a little subtlety, but not much. "You did what we wanted" is too bare bones; it will try to change what we want. But I think we might get it to do metaethics for us by talking about "human values" in the abstract, ot maybe "human values as of 2020." And I don't think it can do much harm to further specify things like enjoyment, interesting lives, friendship, love, learning, sensory experience, etc etc.

This "wish" picks out a vector in the abstract representation space for the AI to treat as the axis of goodness. And the entire dream is that this abstract space encodes enough of common sense that small perturbations of the vector won't screw up the future. Which now that I say it like that, sounds like the sort of thing that should imply some statistical properties we could test for.

Comment by charlie-steiner on [AN #78] Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison · 2019-12-26T04:20:35.079Z · score: 4 (2 votes) · LW · GW

Well, how many degrees of freedom are in the state space?

Comment by charlie-steiner on 2019 AI Alignment Literature Review and Charity Comparison · 2019-12-21T11:30:26.791Z · score: 9 (5 votes) · LW · GW

An excellent exterior scoop.

If I had to point out one more research avenue from the past year that I find interesting, it would be the application of the predictive processing model of cognition to AI safety. One post from Jan Kulveit (FHI), one post from G Gordon Worley (PAISRI, which appears to be a one man organization at the moment).

I'm also only like 85% sure that I'm not among those referred to as "just learn human values with an RNN." So on that 15% chance, I would like to stress that although it's definitely something I'm thinking about, I'm just trying to nail down the details so that it's specific enough to poke holes in. Honest!

Comment by charlie-steiner on Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk · 2019-12-21T02:12:50.248Z · score: 11 (9 votes) · LW · GW

AI value alignment is a hard problem, definitely. But it has one big advantage over politics, if we're shopping for problems: it hasn't been quite so optimized by memetic evolution for ability to take over a conversation.

I think talking about gender issues on LW, that time everyone talked about mostly gender issues for a while, was good (not as solving anything, but as a political act on the object level). But also, saying things like "we should be able to solve politics" is how you get struck by memetic Zeus' lightning. SSC has a subreddit and another subreddit for a reason, and that reason isn't because rationalists are so good at solving politics that they need two whole subreddits to do it in.

Comment by charlie-steiner on We run the Center for Applied Rationality, AMA · 2019-12-19T22:17:13.809Z · score: 35 (13 votes) · LW · GW

How much interesting stuff do you think there is in your curriculum that hasn't percolated into the community? What's stopping said percolation?

Comment by charlie-steiner on Neural networks as non-leaky mathematical abstraction · 2019-12-19T22:05:51.287Z · score: 3 (2 votes) · LW · GW

Interesting perspective, thanks for crossposting!

Comment by charlie-steiner on Is Causality in the Map or the Territory? · 2019-12-19T21:16:37.945Z · score: 2 (1 votes) · LW · GW

I think another related qualitative intuition is constructive vs. nonconstructive. "Just turn the knob" is simple and obvious enough to you to be regarded as constructive, not leaving any parts unspecified for a planner to compute. "Just set the voltage to 10V" seems nonconstructive - like it would require further abstract thought to make a plan to make the voltage be 10V. But as we've learned, turning knobs is a fairly tricky robotics task, requiring plenty of thought - just thought that's unconscious in humans.

Comment by charlie-steiner on "You can't possibly succeed without [My Pet Issue]" · 2019-12-19T05:46:08.623Z · score: 19 (10 votes) · LW · GW

You know, I think my favorite thing about internet rationalists is when they notice a bias and go "I wonder if I can notice this in myself to avoid being wrong" rather than "How can I use this to win arguments about current hot topics."

Comment by charlie-steiner on Is Causality in the Map or the Territory? · 2019-12-18T13:57:19.667Z · score: 4 (2 votes) · LW · GW

I'm not totally sure I'm objecting to anything. For something that thinks about and interacts with the world more or less like a human, I agree that turning a knob is probably an objectively better affordance than e.g. selecting the location of each atom individually.

You could even phrase this as an objective fact: "for agents in some class that includes humans, there are certain guidelines for constructing causal models that, if obeyed, lead to them being better predictors than if not." This would be a fact about the territory. And it would tell you that if you were like a human, and wanted to predict the effect of your actions, there would be some rules your map would follow.

And then if your map did follow those rules, that would be a fact about your map.


Comment by charlie-steiner on Is Causality in the Map or the Territory? · 2019-12-18T02:36:03.654Z · score: 5 (3 votes) · LW · GW

The knob on the current supply is a part of the territory, but "being able to adjust the knob" being an affordance of the territory, while "setting the knob so that it outputs a desired voltage" isn't (or at least, is a less central example) is part of our map.

The other thing this reminds me of is the reductionist point (Sean Carroll video for laypeople here) that the laws of physics seem to be simplest when thought of not in terms of causes, but in terms of differential equations that enforce a pattern that holds between past and future.

Comment by charlie-steiner on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T07:55:07.848Z · score: -5 (3 votes) · LW · GW

(I think this is a good chance for you to think of an answer yourself.)

Comment by charlie-steiner on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-15T09:20:22.409Z · score: 3 (3 votes) · LW · GW

If they are put into an interferometer, someone who thinks the wavefunction has collapsed would think, while in the middle, that they have a 50/50 chance of coming out each arm, while an everettian will make choices as if they might deterministically come out of one arm (depending on the construction of the interferometer).

The difficulty of putting humans into interferometers us more or less why this doesn't matter much. Though of course "pragmatism" shouldn't stop us from applying occam's razor.

Comment by charlie-steiner on Values, Valence, and Alignment · 2019-12-14T06:02:00.166Z · score: 4 (2 votes) · LW · GW

This was definitely an interesting and persuasive presentation of the idea. I think this goes to the same place as learning from behavior in the end, though.

For behavior: In the ancestral environment, we behaved like we wanted nourishing food and reproduction. In the modern environment we behave like we want tasty food and sex. Given a button that pumps heroin into our brain, we might behave like we want heroin pumped into our brains.

For valence, the set of preferences that optimizing valence cashes out to depends on the environment. We, in the modern environment, don't want to be drugged to maximize some neural signal. But if we were raised on super-heroin, we'd probably just want super-heroin. Even assuming this single-neurological-signal hypothesis, we aren't valence-optimizers, we are the learned behavior of a system whose training procedure relies on the valence signal.

Ex hypothesi, we're going to have learned preferences that won't optimize valence, but might still be understandable in terms of a preference maturation process that is "trying" to optimize valence but ran into distributional shift or adversarial optimization or something. These preferences (like refusing the heroin) are still fully valid human preferences, and you're going to need to look at human behavior to figure out what they are (barring big godlike a priori reasoning), which entails basically similar philosophical problems as getting all values from behavior without this framework.

Comment by charlie-steiner on Examples of Causal Abstraction · 2019-12-14T03:40:31.222Z · score: 4 (2 votes) · LW · GW

A tangent:

It sounds like there's some close ties to logical inductors here, both in terms of the flavor of the problem, and some difficulties I expect in translating theory into practice.

A logical inductor is kinda like an approximation. But it's more accurate to call it lots and lots of approximations - it tries to keep track of every single approximation within some large class, which is essential to the proof that it only does finitely worse than any approximation within that class.

A hierarchical model doesn't naturally fall out of such a mixture, it seems. If you pose a general problem, you might just get a general solution. You could try to encourage specialized solutions by somehow ensuring that the problem has several different scales of interest, and sharply limit storage space so that the approximation can't afford special cases that are too similar. But even then I think there's a high probability that the best solution (according to something that is as theoretically convenient as logical inductors) would be alien - something humans wouldn't pick out as the laws of physics in a million tries.

Comment by charlie-steiner on Full toy model for preference learning · 2019-12-09T22:56:11.418Z · score: 2 (1 votes) · LW · GW

This is really handy. I didn't have much to say, but revisited this recently and figured I'd write down the thoughts I did think.

My general feeling about human models is that they need precisely one more level of indirection than this. Too many levels of indirection, and you get something that correctly predicts the world, but doesn't contain something you can point to as the desires. Too few, and you end up trying to fit human examples with a model that doesn't do a good job of fitting human behavior.

For example, if you build your model on responses to survey questions, then what about systematic human difficulties in responding to surveys (e.g. difficulty using a consistent scale across several orders of magnitude of value) that the humans themselves are unaware of? I'd like to use a model of humans that learns about this sort of thing from non-survey-question data.

Comment by charlie-steiner on Counterfactuals: Smoking Lesion vs. Newcomb's · 2019-12-09T18:06:59.080Z · score: 2 (1 votes) · LW · GW

Sure. I have this sort of instinctive mental pushback because I think of counterfactuals primarily as useful tools for a planning agent, but I'm assuming that you don't mean to deny this, and are just applying different emphasis.

Comment by charlie-steiner on Counterfactuals: Smoking Lesion vs. Newcomb's · 2019-12-09T07:12:16.290Z · score: 2 (1 votes) · LW · GW

Oh, I more or less agree :P

If there was one criticism I'd like to repeat, it's that framing the smoking lesion problem in terms of clean decisions between counterfactuals is already missing something from the pre-mathematical description of the problem. The problem is interesting because we as humans sometimes have to worry that we're running on "corrupted hardware" - it seems to me that mathematization of this idea requires us to somehow mutilate the decision theories we're allowed to consider.

To look at this from another angle: I'm agreeing that the counterfactuals are "socio-linguistic conventions" - and I want to go even further and place the entire problem within a context that allows it to have lots of unique quirks depending on the ideas it's expressing, rather than having only the straightforward standardized interpretation. I see this as a feature, not a bug, and think that we can afford to be "greedy" in trying to hang on to the semantics of the problem statement rather than "lazy" in trying to come up with an efficient model.

Comment by charlie-steiner on Counterfactuals: Smoking Lesion vs. Newcomb's · 2019-12-08T22:32:34.220Z · score: 6 (3 votes) · LW · GW

I think the smoking lesion problem is one of those intuition pumps that you have to be *very* careful with mathematizing and comparing with other things. Let me just quote myself from the past:

In the Smoking Lesion problem, and in similar cases where you consider an agent to have "urges" or "dispositions" et c., it's important to note that these are pre-mathematical descriptions of something we'd like our decision theory to consider, and that to try to directly apply them to a mathematical theory is to commit a sort of type error.
Specifically, a decision-making procedure that "has a disposition to smoke" is not FDT. It is some other decision theory that has the capability to operate in uncertainty about its own dispositions.
I think it's totally reasonable to say that we want to research decision theories that are capable of this, because this epistemic state of not being quite sure of your own mind is something humans have to deal with all the time. But one cannot start with a mathematically specified decision theory like proof-based UDT or causal-graph-based CDT and then ask "what it would do if it had the smoking lesion." It's a question that seems intuitively reasonable but, when made precise, is nonsense.
I think what this feels like to philosophers is giving the verbal concepts primacy over the math. (With positive associations to "concepts" and negative associations to "math" implied). But what it leads to in practice is people saying "but what about the tickle defense?" or "but what about different formulations of CDT" as if they were talking about different facets of unified concepts (the things that are supposed to have primacy), when these facets have totally distinct mathematizations.
At some point, if you know that a tree falling in the forest makes the air vibrate but doesn't lead to auditory experiences, it's time to stop worrying about whether it makes a sound.
So obviously I (and LW orthodoxy) are on the pro-math side, and I think most philosophers are on the pro-concepts side (I'd say "pro-essences," but that's a bit too on the nose). But, importantly, if we agree that this descriptive difference exists, then we can at least work to bridge it by being clear about whether were's using the math perspective or the concept perspective. Then we can keep different mathematizations strictly separate when using the math perspective, but work to amalgamate them when talking about concepts.
Comment by charlie-steiner on Reading list: Starting links and books on studying ontology and causality · 2019-12-06T20:41:40.208Z · score: 3 (2 votes) · LW · GW

I would skip everything by Pearl except Causality: models, reasoning and inference itself. That book is the essential bit.

Best ontology book I've ever read was Li and Vitanyi's classic textbook on algorithmic information theory. My impression is that this is an area where traditional philosophers do a particularly bad job.

Comment by charlie-steiner on What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. · 2019-12-05T20:27:30.419Z · score: 2 (1 votes) · LW · GW

Well, you mentioned that a lot of people were getting off the train at point 1. My comment can be thought of as giving a much more thoroughly inside-view look at point 1, and deriving other stuff as incidental consequences.

I'm mentally working with an analogy to teaching people a new contra dance (if you don't know what contra dancing is, I'm just talking about some sequence of dance moves). The teacher often has an abstract view of expression and flow that the students lack, and there's a temptation for the teacher to try to share that view with the students. But the students don't want to abstractions, what they want is concrete steps to follow, and good dancers will dance the dance just fine without ever hearing about the teacher's abstract view. Before dancing they regard the abstractions as difficult to understand and distracting from the concrete instructions; they'll be much more equipped to understand and appreciate them *after* dancing the dance.

Comment by charlie-steiner on What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. · 2019-12-03T17:43:58.289Z · score: 5 (3 votes) · LW · GW

Huh, I wonder what you think of a different way of splitting it up. Something like:

  • It's a scientific possibility to have AI that's on average better than humanity at the class of tasks "choose actions that achieve a goal in the real world." Let's label this by some superlative jargon like "superintelligent AI." Such a technology would be hugely impactful.

  • It would be really bad if a superintelligent AI was choosing actions to achieve some goal, but this goal wasn't beneficial to humans. There are several open problems that this means we need to solve before safely turning on any such AI.

  • We know enough that we can do useful work on (most of) these open problems right now. Arguing for this also implies that superintelligent AI is close enough (if not in years, then in "number of paradigm shifts") that this work needs to start getting done.

  • We would expect a priori that work on these open problems of beneficial goal design should be under-prioritized (public goods problem, low immediate profit, not obvious you need it before you really need it). And indeed that seems to be the case (insert NIPS survey here), though there's work going on at nonprofits that have different incentives. So consider thinking about this area if you're looking for things to research.

Comment by charlie-steiner on [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | Arxiv · 2019-11-21T08:03:16.334Z · score: 4 (4 votes) · LW · GW

Welp, we're doomed (/s), as soon as someone figures out how to get 100 million tries at taking over the world so we can crush the world-taking-over problem with stochastic gradient descent.

Comment by charlie-steiner on A fun calibration game: "0-hit Google phrases" · 2019-11-21T07:53:56.972Z · score: 3 (2 votes) · LW · GW

Here's some: "antipacek", "progressive killer fat", "hut refusal guideline", "south-stream resignation", "pamplem", "conscience bw", "fog log dog bog", "layer iron trolley", "prevent publication frequency", "sconspiracyn".

Comment by charlie-steiner on The Goodhart Game · 2019-11-20T18:35:44.828Z · score: 2 (1 votes) · LW · GW

Pretty sure you understood it :) But yeah, not only would I like to be able to compare two things, I'd like to be able to find the optimum values of some continuous variables. Though I suppose it doesn't matter as much if you're trying to check / evaluate ideas that you arrived at by more abstract reasoning.

Comment by charlie-steiner on Cybernetic dreams: Beer's pond brain · 2019-11-20T05:42:23.451Z · score: 2 (1 votes) · LW · GW

I'm also looking forward to upcoming posts, but all these examples so far sound to me like a modernist's substitute for sympathetic magic :P

Comment by charlie-steiner on Drawing on Walls · 2019-11-20T05:30:25.464Z · score: 4 (2 votes) · LW · GW

Sounds like a sales pitch for whiteboard wallpaper :)

Comment by charlie-steiner on The Goodhart Game · 2019-11-20T02:32:20.722Z · score: 2 (1 votes) · LW · GW

The impractical part about training for good behavior is that it's a nested loop - every training example on how to find good maxima requires training a model that in turn needs its own training examples. So it's destined to be behind the state of the art, probably using state of the art models to generate the copious required training data.

The question, I suppose, is whether this is still good enough to learn useful general lessons. And after thinking about it, I think the answer is that yes, it should be, especially for feed-forward architectures that look like modern machine learning, where you don't expect qualitative changes in capability as you scale computational resources.

Comment by charlie-steiner on Impossible moral problems and moral authority · 2019-11-19T04:46:44.618Z · score: 3 (2 votes) · LW · GW

Yes, I hope that my framing of the problem supports this sort of conclusion :P

An alternate framing where it still seems important would be "moral uncertainty". Where when we don't know what to do, it's because we are lacking some facts, maybe even key facts. So I'm sort of sneakily arguing against that frame.

Comment by charlie-steiner on The Power to Draw Better · 2019-11-18T10:40:45.868Z · score: 4 (2 votes) · LW · GW

Any sequence that involves recommending people work through Drawing on the Right Side of the Brain is a sequence I should read :P

Comment by charlie-steiner on Can indifference methods redeem person-affecting views? · 2019-11-17T22:24:51.679Z · score: 2 (1 votes) · LW · GW

You mean, why I expect a person-affecting utility function to be different if evaluated today v. tomorrow?

Well, suppose that today I consider the action of creating a person, and am indifferent to creating them. Since this is true for all sorts of people, I am indifferent to creating them one way vs. another (e.g. happy vs sad). If they are to be created inside my guest bedroom, this means I am indifferent between certain ways the atoms in my guest bedroom could be arranged. Then if this person gets created tonight and is around tomorrow, I'm no longer indifferent between the arrangement that is them sad and the arrangement that is them happy.

Yes, you could always reverse-engineer a utility function over world-histories that encompasses both of these. But this doesn't necessarily solve the problems that come to mind when I say "change in utility functions" - for example, I might take bets about the future that appear lose/lose when I have to pay them off, or take actions that modify my own capabilities in ways I later regret.

I dunno - were you thinking of some specific application of indifference that could sidestep some of these problems?

Comment by charlie-steiner on Can indifference methods redeem person-affecting views? · 2019-11-12T18:03:45.734Z · score: 2 (1 votes) · LW · GW

Hilary Greaves sounds like a really interesting person :)

So, you could use these methods to construct a utility function corresponding to the person-affecting viewpoint from your current world, but this wouldn't protect this utility function from critique. She brings up the Pareto principle, where this person-affecting utility function would be indifferent to some things that were strict improvements, which seems undesirable.

I think the more fundamental problem there is intransitivity. You might be able to define a utility function that captures the person-affecting view to you, but a copy of you one day later (or one world over) would say "hang on, I didn't agree to that." They'd make their own utility function with priorities on different people. And so you end up fighting with yourself, until one of you can self-modify to actually give up the person-affecting view, and just keep this utility function created by their past self.

A more reflective self might try to do something clever like bargaining between all selves they expect to plausibly be (and who will follow the same reasoning), and taking actions that benefit those selves, confident that their other selves will keep their end of the bargain.

My general feeling about population ethics, though, is that it's aesthetics. This was a really important realization for me, and I think most people who think about population ethics don't think about the problem the right way. People don't inherently have utility, utility isn't a fluid stored in the gall bladder, it's something evaluated by a decision-maker when they think about possible ways for the world to be. This means it's okay to have a preferred standard of living for future people, to have nonlinear terms on population and "selfish" utility, etc.

Comment by charlie-steiner on An optimal stopping paradox · 2019-11-12T16:08:37.156Z · score: 2 (3 votes) · LW · GW

If the growth is exponential, I still don't think there's a paradox - sure, you're incentivized to wait forever, but I'm already incentivized to wait forever with my real life investments. The only thing that stops me from real life investing my money forever is that sometimes I have things (not included in the toy problem) that I really want to buy with that money.

Comment by charlie-steiner on What are human values? - Thoughts and challenges · 2019-11-11T17:42:27.545Z · score: 3 (2 votes) · LW · GW

So, the dictionary definition (SEP) would be something like "objectively good/parsimonious/effective ways of carving up reality."

There's also the implication that when we use kinds in reasoning, things of the same kind should share most or all important properties for the task at hand. There's also sort of the implication that humans naively think of the world as made out of natural kinds on an ontologically basic level.

I'm saying that even if people don't believe in disembodied souls, when they ask "what do I want?" they think they're getting an answer back that is objectively a good/parsimonious/effective way of talking. That there is some thing, not necessarily a soul but at least a pattern, that is being accessed by different ways of asking "what do I want?", which can't give us inconsistent answers because it's all one thing.

Comment by charlie-steiner on Neural nets as a model for how humans make and understand visual art · 2019-11-11T15:43:18.386Z · score: 2 (1 votes) · LW · GW

Thanks for the reply :)

Sure, you can get the AI to draw polka-dots by targeting a feature that likes polka dots, or a Mondrian by targeting some features that like certain geometries and colors, but now you're not using style transfer at all - the image is the style. Moreover, it would be pretty hard to use this to get a Kandinsky, because the AI that makes style-paintings has no standard by which it would choose things to draw that could be objects but aren't. You'd need a third and separate scheme to make Kandinskys, and then I'd just bring up another artist not covered yet.

If you're not trying to capture all human visual art in one model, then this is no biggie. So now you're probably going "this is fine, why is he going on about this." So I'll stop.

Do you have examples in mind when you mention "human experience" and "embodiment" and "limited agents"

For "human experience," yeah, I just means something like communicative/evocative content that relies on a theory of mind to use for communication. Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before. I'm also reminded of Karpathy's example of Obama with his foot on the scale.

For embodiment, this means art that blurs the line between visual and physical. I was thinking of how some things aren't art if they're normal sized, but if you make them really big, then they're art. Since all human art is physical art, this line can be avoided mostly but not completely.

For "limited," I imagined something like Dennett's example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them. Compared to the example above of using understanding of humans to choose content, this example uses an understanding of humans to choose style.

Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I'm curious whether this was surprising to you.

It was impressive, but I remember the old 2015 post the Chris Olah co-authored. First off, if you look at the pictures, they're less pretty than the pictures that came later. And I remember one key sentence: "By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated." My impression is that DeepDream et al. have been trained to make visual art - by hyperparameter tuning (grad student descent).

Comment by charlie-steiner on Neural nets as a model for how humans make and understand visual art · 2019-11-10T17:43:58.728Z · score: 2 (1 votes) · LW · GW

I like this exposition, but I'm still skeptical about the idea.

Since "art" is a human concept, it's naturally a grab bag of lots of different meanings. It's plausible that for some meanings of "art," humans do something similar to searching through a space of parameters for something that strongly activates some target concept within the constraints of a style. But there's also a lot about art that's not like that.

Like art that's non-representational, or otherwise denies the separation between form and content. Or art that's heavily linguistic, or social, or relies on some sort of thinking on the part of the audience. Art that's very different for the performer and the audience, so that it doesn't make sense to talk about a search process optimizing for the audience's experience, or otherwise doesn't have a search process as a particularly simple explanation. Art that's so rooted in emotion or human experience that we wouldn't consider an account of it complete without talking about the human experience. Art that only makes sense when considering humans as embodied, limited agents.

So if I consider the statement "the DeepDream algorithm is doing art," there is a sense in which this is reasonable. But I don't think that extends to calling what DeepDream does a model for what humans do when we think about or create art. We do something not merely more complicated in the details, but more complicated in its macros-structure, and hooked into many of the complications of human psychology.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T18:42:44.867Z · score: 2 (1 votes) · LW · GW

Dropout is like the converse of this - you use dropout to assess the non-outdropped elements. This promotes resiliency to perturbations in the model - whereas if you evaluate things by how bad it is to break them, you could promote fragile, interreliant collections of elements over resilient elements.

I think the root of the issue is that this Shapley value doesn't distinguish between something being bad to break, and something being good to have more of. If you removed all my blood I would die, but that doesn't mean that I would currently benefit from additional blood.

Anyhow, the joke was that as soon as you add a continuous parameter, you get gradient descent back again.

Comment by charlie-steiner on Open & Welcome Thread - November 2019 · 2019-11-09T14:57:56.912Z · score: 2 (1 votes) · LW · GW

0.3 mg melatonin an hour before I want to be asleep works, my only trouble is actually planning in advance.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T12:39:55.024Z · score: 4 (2 votes) · LW · GW
You look at the world, and you say: "how can I maximize utility?" You look at your beliefs, and you say: "how can I maximize accuracy?" That's not a consequentialist agent; that's two different consequentialist agents!

Not... really? "how can I maximize accuracy?" is a very liberal agentification of a process that might be more drily thought of as asking "what is accurate?" Your standard sequence predictor isn't searching through epistemic pseudo-actions to find which ones best maximize its expected accuracy, it's just following a pre-made plan of epistemic action that happens to increase accuracy.

Though this does lead to the thought: if you want to put things on equal footing, does this mean you want to describe a reasoner that searches through epistemic steps/rules like an agent searching through actions/plans?

This is more or less how humans already conceive of difficult abstract reasoning. We don't solve integrals by gradient descent, we imagine doing some sort of tree search where the edges are different abstract manipulations of the integral. But for everyday reasoning, like navigating 3D space, we just use our specialized feed-forward hardware.

Comment by charlie-steiner on The Credit Assignment Problem · 2019-11-09T08:56:03.226Z · score: 2 (1 votes) · LW · GW

Removing things entirely seems extreme. How about having a continuous "contribution parameter," where running the algorithm without an element would correspond to turning this parameter down to zero, but you could also set the parameter to 0.5 if you wanted that element to have 50% of the influence it has right now. Then you can send rewards to elements if increasing their contribution parameter would improve the decision.

:P

Comment by charlie-steiner on What AI safety problems need solving for safe AI research assistants? · 2019-11-06T09:31:41.000Z · score: 2 (1 votes) · LW · GW

It seems like the main problem is making sure nobody's getting systematically misled. To help humans make the right updates, the AI has to communicate not only accurate results, but well-calibrated uncertainties. It also has to interact with humans in a way that doesn't send the wrong signals (more a problem to do with humans than to do with AI).

This is very much on the near-term side of the near/long term AI safety work dichotomy. We don't need the AI to understand deception as a category, and why it's bad, so that it can make plans that don't involve deceiving us. We just need its training / search process (which we expect to more or less understand) to suppress incentives for deception to an acceptable range, on a limited domain of everyday problems.

(I'm probably a bigger believer in the significance of this dichotomy than most. I think looking at an AI's behavior and then tinkering with the training procedure to eliminate undesired behavior in the training domain is a perfectly good approach to handing near-term misalignment like overconfident advisor-chatbots, but eventually we want to switch over to a more scalable approach that will use few of the same tools.)