What to do with imitation humans, other than asking them what the right thing to do is? 2020-09-27T21:51:36.650Z · score: 10 (3 votes)
Charlie Steiner's Shortform 2020-08-04T06:28:11.553Z · score: 6 (1 votes)
Constraints from naturalized ethics. 2020-07-25T14:54:51.783Z · score: 21 (5 votes)
Meta-preferences are weird 2020-07-16T23:03:40.226Z · score: 18 (3 votes)
Down with Solomonoff Induction, up with the Presumptuous Philosopher 2020-06-12T09:44:29.114Z · score: 13 (5 votes)
The Presumptuous Philosopher, self-locating information, and Solomonoff induction 2020-05-31T16:35:48.837Z · score: 40 (13 votes)
Life as metaphor for everything else. 2020-04-05T07:21:11.303Z · score: 37 (11 votes)
Meta-preferences two ways: generator vs. patch 2020-04-01T00:51:49.086Z · score: 19 (6 votes)
Gricean communication and meta-preferences 2020-02-10T05:05:30.079Z · score: 14 (5 votes)
Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z · score: 15 (11 votes)
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z · score: 9 (3 votes)
The AI is the model 2019-10-04T08:11:49.429Z · score: 12 (10 votes)
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z · score: 17 (5 votes)
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z · score: 14 (5 votes)
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Value learning for moral essentialists 2019-05-06T09:05:45.727Z · score: 13 (5 votes)
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z · score: 20 (6 votes)
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z · score: 40 (10 votes)
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z · score: 40 (21 votes)
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z · score: 16 (5 votes)
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z · score: 19 (7 votes)
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z · score: 53 (13 votes)
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z · score: 101 (27 votes)
A useful level distinction 2018-02-24T06:39:47.558Z · score: 26 (6 votes)
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z · score: 18 (9 votes)
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z · score: 8 (3 votes)
Dan Dennett on Stances 2017-12-27T08:15:53.124Z · score: 8 (4 votes)
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z · score: 11 (5 votes)
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z · score: 26 (10 votes)
Limited agents need approximate induction 2015-04-24T21:22:26.000Z · score: 1 (1 votes)


Comment by charlie-steiner on What to do with imitation humans, other than asking them what the right thing to do is? · 2020-09-30T23:54:46.883Z · score: 2 (1 votes) · LW · GW

I feel like my state is significantly more complicated than that. I smoothly accumulate short-term memory and package some of it away into long-term memory, which even more slowly gets packaged away into longer-term memory. GPT-3's window size would run out the first time I tried to do a literature search and read a few papers, because it doesn't form memories so easily.

The way actual GPT-3 (or really anything with limited state but lots of training data, I think) gets around this sort of thing is by already having read those papers during training, plus lots of examples of people reacting to papers, and then using context to infer that it should output words that come from someone at a later stage of paper-reading.

Do you foresee a different, more human-like model of humans becoming practical to train?

Comment by charlie-steiner on What to do with imitation humans, other than asking them what the right thing to do is? · 2020-09-30T12:24:18.929Z · score: 3 (2 votes) · LW · GW

In retrospect, I was totally unclear that I wan't necessarily talking about something that has a complicated internal state, such that it can behave like one human over long time scales. I was thinking more about the "minimum human-imitating unit" necessary to get things like IDA off the ground.

In fact this post was originally titled "What to do with a GAN of a human?"

Comment by charlie-steiner on David Deutsch on Universal Explainers and AI · 2020-09-26T20:39:20.387Z · score: 2 (1 votes) · LW · GW

That's a good point. It's still not clear to me that he's talking about precisely the same thing in both quotes. The point also remains that if you're not associating "understanding" with a class as broad as turing-completeness, then you can construct things that humans can't understand, e.g. by hiding them in complex patterns, or by using human blind spots.

Comment by charlie-steiner on David Deutsch on Universal Explainers and AI · 2020-09-25T22:18:37.492Z · score: 2 (1 votes) · LW · GW

it could be there are aspects of reality that are beyond the capacity of our brains.’ But that cannot be so. For if the ‘capacity’ in question is mere computational speed and amount of memory, then we can understand the aspects in question with the help of computers

I'm disagreeing with the notion, equivalent to taking turing completeness as understanding-universality, that the human capacity for understanding is the capacity for universal computation.

Comment by charlie-steiner on David Deutsch on Universal Explainers and AI · 2020-09-24T08:33:53.307Z · score: 2 (1 votes) · LW · GW

Turing completeness misses some important qualitative properties of what it means for people to understand something. When I understand something I don't merely compute it, I form opinions about it, I fit it into a schema for thinking about the world, I have a representation of it in some latent space that allows it to be transformed in appropriate ways, etc.

I could, given a notebook of infinite size, infinite time, and lots of drugs, probably compute the Ackermann function A(5,5). But this has little to do with my ability to understand the result in the sense of being able to tell a story about the result to myself. In fact, there are things I can understand without actually computing, so long as I can form opinions about it, fit it into a picture of the world, represent it in a way that allows for transformations, etc.

Comment by charlie-steiner on Open & Welcome Thread - September 2020 · 2020-09-16T02:02:42.178Z · score: 4 (2 votes) · LW · GW

Over, and over... the pheromones... the overwhelming harmony...

Comment by charlie-steiner on Comparing Utilities · 2020-09-15T06:09:34.982Z · score: 4 (2 votes) · LW · GW

One think I'd also ask about is: what about ecology / iterated games? I'm not very sure at all whether there are relevant iterated games here, so I'm curious what you think.

How about an ecology where there are both people and communities - the communities have different aggregation rules, and the people can join different communities. There's some set of options that are chosen by the communities, but it's the people who actually care about what option gets chosen and choose how to move between communities based on what happens with the options - the communities just choose their aggregation rule to get lots of people to join them.

How can we set up this game so that interesting behavior emerges? Well, people shouldn't just seek out the community that most closely matches their own preferences, because then everyone would fracture into communities of size 1. Instead, there must be some benefit to being in a community. I have two ideas about this: one is that the people could care to some extent about what happens in all communities, so they will join a community if they think they can shift its preferences on the important things while conceding the unimportant things. Another is that there could be some crude advantage to being in a community that looks like a scaling term (monotonically increasing with community size) on how effective they are at satisfying their peoples' preferences.

Comment by charlie-steiner on What happens if you drink acetone? · 2020-09-15T05:31:33.584Z · score: 6 (3 votes) · LW · GW

I'm curious about the comparison to drinking isopropyl alcohol (rubbing alcohol) instead, which is gradually metabolized into acetone (the actual psychoactive ingredient) inside the body. If you drink the same amount then gradual seems safer, but I'm not sure if it actually has a bigger difference between active dose and LD50 (or active dose and severe gastrointestinal inflammation).

Comment by charlie-steiner on Egan's Theorem? · 2020-09-14T01:20:27.069Z · score: 5 (3 votes) · LW · GW

Right, it's a little tricky to specify exactly what this "relationship" is. Is the notion that you should be able to compress the approximate model, given an oracle for the code of the best one (i.e. that they share pieces?). Because most Turing machines don't compress well, and so it's easy to find counterexamples (the most straightforward class is where the approximate model is already extremely simple).

Anyhow, like I said, hard to capture the spirit of the problem. But when I *do* try to formalize the problem, it tends to not have the property, which is definitely driving my intuition.

Comment by charlie-steiner on Egan's Theorem? · 2020-09-13T21:29:38.029Z · score: 3 (2 votes) · LW · GW

If by "account for that" you mean not be in direct conflict with earlier sense data, then sure. All tautologies about the data will continue to be true. Suppose some data can be predicted by classical mechanics with 75% accuracy. This is a tautology given the data itself, and no future theory will somehow make classical mechanics stop giving 75% accurate predictions for that past data.

Maybe that's all you meant?

I'd sort of interpreted you as asking questions about properties of the theory. E.g. "this data is really well explained by the classical mechanics of point particles, therefore any future theory should have a particularly simple relationship to the point particle ontology." It seems like there shouldn't be a guaranteed relationship that's much simpler than reconstructing the data and recomputing the inferred point particles.

I spent a little while trying to phrase this in terms of Turing machines but I don't think I quite managed to capture the spirit.

Comment by charlie-steiner on Egan's Theorem? · 2020-09-13T18:50:46.703Z · score: 5 (4 votes) · LW · GW

The answer to the question you actually asked is no, there is no ironclad guarantee of properties continuing, nor any guarantee that there will be a simple mapping between theories. With some effort you can construct some perverse Turing machines with bad behavior.

But the answer the more generalized question is yes, simple properties can be expected (in a probabilistic sense) to generalize even if the model is incomplete. This is basically Minimum Message Length prediction, which you can put on the theoretical basis of the Solomonoff prior (It's somewhere in Li and Vitanyi - chapter 5?).

Comment by charlie-steiner on Sunday September 6, 12pm (PT) — Casual hanging out with the LessWrong community · 2020-09-06T20:14:02.736Z · score: 2 (1 votes) · LW · GW

Looks like nobody showed up - must be because gathertown is actually sufficiently stable for use now.

Comment by charlie-steiner on Li and Vitanyi's bad scholarship · 2020-09-06T00:14:25.229Z · score: 6 (4 votes) · LW · GW

Well, yes, it's not a perfect summary. I have no idea why they'd say Popper was working on Bayesianism - unless maybe "the problem" in that clause was the problem of induction, and something got lost in an edit.

But sometimes nitpicks aren't that important. Like, for example, it's spelled Vitanyi. But this isn't really a crushing refutation of your post (though it is a very convenient illustration). You shouldn't sweat this too much, because their textbook really is worth reading about algorithmic information theory.

Comment by charlie-steiner on Sunday September 6, 12pm (PT) — Casual hanging out with the LessWrong community · 2020-09-04T08:35:23.775Z · score: 2 (1 votes) · LW · GW

Actually, is it okay if I'm in charge of the Zoom call? I would like to set up one with different rooms and cohostify people, so it's not everyone locked in together.

Comment by charlie-steiner on Introduction To The Infra-Bayesianism Sequence · 2020-09-01T15:21:39.963Z · score: 2 (1 votes) · LW · GW

Could you defend worst-case reasoning a little more? Worst cases can be arbitrarily different from the average case - so maybe having worst-case guarantees can be reassuring, but actually choosing policies by explicit reference to the worst case seems suspicious. (In the human context, we might suppose that worst case, I have a stroke in the next few seconds and die. But I'm not in the business of picking policies by how they do in that case.)

You might say "we don't have an average case," but if there are possible hypotheses outside your considered space you don't have the worst case either - the problem of estimating a property of a non-realizable hypothesis space is simplified, but not gone.

Anyhow, still looking forward to working my way through this series :)

Comment by charlie-steiner on What is the interpretation of the do() operator? · 2020-08-26T23:15:41.311Z · score: 5 (3 votes) · LW · GW

Well, first off, Pearl would remind you that reduction doesn't have to mean probability distributions. If Markov models are simple explanations of our observations, then what's the problem with using them?

The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, "conditioning" on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl's book Causality.

I'd just like you to think more about what you want from an "explanation." What is it you want to know that would make things feel explained?

Comment by charlie-steiner on nostalgebraist: Recursive Goodhart's Law · 2020-08-26T16:17:07.731Z · score: 1 (2 votes) · LW · GW

Yup. Humans have a sort of useful insanity, where they can expect things to be bad not based on explicitly evaluating the consequences, but off of a model or heurstic about what to expect from different strategies. And then we somehow only apply this reasoning selectively, where it seems appropriate according to even more heuristics.

Comment by charlie-steiner on Mathematical Inconsistency in Solomonoff Induction? · 2020-08-26T06:34:11.223Z · score: 3 (2 votes) · LW · GW

I'd rather frame this as good news. The good news is that if you want to learn about Solomonoff induction, the entire first half-and-a-bit of the book is a really excellent resource. It's like if someone directed you to a mountain of pennies. Yes, you aren't going to be able to take this mountain of pennies home anytime soon, and that might feel awkward, but it's not like you'd be materially better off if the mountain was smaller.

If you just want the one-sentence answer, it's as above - "X or Y" is not a Turing machine. If you want to be able to look the whole edifice over on your own, though, it really will take 200+ pages of work (it took me about 3 months of reading on the train) - starting with prefix-free codes and Kolmogorov complexity, and moving on to sequence prediction and basic Solomonoff induction and the proofs of its nice properties. Then you can get more applied stuff like thinking about how to encode what you actually want to ask in terms of Solomonoff induction, minimum message length prediction and other bounds that hold even if you're not a hypercomputer, and the universal prior and the proofs that it retains the nice properties of basic Solomonoff induction.

Comment by charlie-steiner on Mathematical Inconsistency in Solomonoff Induction? · 2020-08-25T19:36:42.746Z · score: 4 (2 votes) · LW · GW

Details can be found in the excellent textbook by Li and Vitanyi.

In this context, "hypothesis" means a computer program that predicts your past experience and then goes on to make a specific prediction about the future.

"X or Y" is not such a computer program - it's a logical abstraction about computer programs.

Now, one might take two programs that have the same output, and then construct another program that is sorta like "X or Y" that runs both X and Y and then reports only one of their outputs by some pseudo-random process. In which case it might be important to you to know about how you can construct Solomonoff induction using only the shortest program that produces each unique prediction.

Comment by charlie-steiner on Sunday August 23rd, 12pm (PDT) – Double Crux with Buck Shlegeris and Oliver Habryka on Slow vs. Fast AI Takeoff · 2020-08-23T22:02:25.485Z · score: 4 (2 votes) · LW · GW

All of these "video chat but in 2d space" websites have had serious problems for me. My preference would just be Zoom breakout rooms with thematic names, honestly. Not sure what the average experience has been.

Comment by charlie-steiner on Thoughts on the Feasibility of Prosaic AGI Alignment? · 2020-08-22T02:04:46.804Z · score: 3 (2 votes) · LW · GW

I think it's absolutely feasible, but my idea of what a solution looks like is probably in a minority (if I had to guess, maybe of ~30%?)

All you have to do is understand what it is you mean by the AI fulfilling human values, in a way that can be implemented in the architecture and training procedure of a prosaic AI. Easy peasy, lemon squeezy.

The majority of other feasible-ers is mostly dominated by Paulians right now, who want to solve the problem without having to understand that complicated human values thing. Typically by trusting in humans and giving them big awesome planning powers, or using their oversight and feedback to choose good things.

Comment by charlie-steiner on What's a Decomposable Alignment Topic? · 2020-08-22T01:29:51.557Z · score: 3 (2 votes) · LW · GW

Here's a problem for you, which I'm not sure fits the requirements, but might: How do you learn whether an AI has been trained to use Gricean communication (e.g. "I interpret your words by modeling you as saying them because you model me as interpreting them, and so on until further recursion isn't fruitful") without being able to read its source code and check its functioning against some specification of recursive agential modeling?

Comment by charlie-steiner on What am I missing? (quantum physics) · 2020-08-22T01:16:35.249Z · score: 2 (1 votes) · LW · GW

Also, if you want more "surprising" aspects of entanglement, I think superdense coding is a nice example. Basically, sharing an entangled qubit does let you send information, but only after you also send one more qubit in an ordinary way. This is very not possible with hidden variables.

Comment by charlie-steiner on What am I missing? (quantum physics) · 2020-08-22T01:05:08.161Z · score: 5 (3 votes) · LW · GW

I'm not surprised that you're not surprised :D EPR's paper introducing the superluminal entanglement thought experiment was published in 1935, and they basically said what you did - that clearly quantum mechanics was incomplete, and there was some way that the spins had decided which was which beforehand.

Bell's theorem, which uses a significantly more complicated situation to demonstrate why that's not possible, was published in 1964. So it took an entire field about 30 years to see why entanglement should be surprising!

Comment by charlie-steiner on Search versus design · 2020-08-19T00:12:15.941Z · score: 2 (1 votes) · LW · GW

Well, any process that picks actions ends up equivalent to some criterion, even if only "actions likely to be picked by this process." The deal with agents and agent-like things is that they pick actions based on their modeled consequences. Basically anything that picks actions in different way (or, more technically, a way that's complicated to explain in terms of planning) is an other-izer to some degree. Though maybe this is drift from the original usage, which wanted nice properties like reflective stability etc.

The example of the day is language models. GPT doesn't pick its next sentence by modeling the world and predicting the consequences. Bam, other-izer. Neither design nor search.

Anyhow, back on topic, I agree that "helpfulness to humans" is a very complicated thing. But maybe there's some simpler notion of "helpful to the AI" that results in design-like other-izing that loses some of the helpfulness-to-humans properties, but retains some of the things that make design seem safer than search even if you never looked at the "stories."

Comment by charlie-steiner on A way to beat superrational/EDT agents? · 2020-08-18T08:30:49.003Z · score: 2 (1 votes) · LW · GW

Yeah I'm still not sure how to think about this sort of thing short of going full UDT and saying something like "well, imagine this whole situation was a game - what would be the globally winning strategy?"

Comment by charlie-steiner on Search versus design · 2020-08-18T07:15:58.819Z · score: 4 (2 votes) · LW · GW

There's also the "Plato's man" type of problem where undesirable things fall under the advanced definition of interpretability. For example, ordinary neural nets are "interpretable," because they are merely made out of interpretable components (simple matrices with a non-linearity) glued together.

Comment by charlie-steiner on Search versus design · 2020-08-18T00:00:26.247Z · score: 2 (1 votes) · LW · GW

Design is also closely related to the other-izer problem because if you think of "designing" strategies or actions, this can have different Goodhart's law implications than searching for them - if you break down the problem according to "common sense" rather than according to best achieving the objective, at least.

Comment by charlie-steiner on A way to beat superrational/EDT agents? · 2020-08-17T19:22:46.052Z · score: 4 (-5 votes) · LW · GW

See also Psy-Kosh's non-anthropic problem.

Comment by charlie-steiner on Many-worlds versus discrete knowledge · 2020-08-14T18:11:53.477Z · score: 2 (1 votes) · LW · GW

If the microphysical theory is like quantum mechanics (Bohm-ish mechanics very much included), this is basically Schrödinger's cat argument. It would be absurd if there was not some function from the microphysical state of the world to the truth of the macrophysical fact of whether the cat in the box is alive or dead. Therefore, there is some such function, and if quantum mechanics doesn't support it then quantum mechanics is incomplete.

Schrödinger was wrong about the cat thing, as far as we can tell. His knowledge of discrete macrophysical states of cats had an explanation, but didn't directly reflect reality.

There are absurd quantum states that don't allow for a function from the microphysical state of the world to whether I observe a photon as having spin left or spin right. If I believe otherwise, my beliefs deserve an explanation, but that doesn't mean they directly reflect reality.

Comment by charlie-steiner on Alignment By Default · 2020-08-14T17:12:37.071Z · score: 6 (3 votes) · LW · GW

This is the sort of thing I've been thinking about since "What's the dream for giving natural language commands to AI?" (which bears obvious similarities to this post). The main problems I noted there apply similarly here:

  • Prediction in the supervised task might not care about the full latent space used for the unsupervised tasks, losing information.
  • Little to no protection from Goodhart's law. Things that are extremely good proxies for human values still might not be safe to optimize.
  • Doesn't care about metaethics, just maximizes some fixed thing. Which wouldn't be a problem if it was meta-ethically great to start with, but it probably incorporates plenty of human foibles in order to accurately predict us.

The killer is really that second one. If you run this supervised learning process, and it gives you a bunch of rankings of things in terms of their human values score, this isn't a safe AI even if it's on average doing a great job, because the thing that gets the absolute best score is probably an exploit of the specific pattern-recognition algorithm used to do the ranking. In short, we still need to solve the other-izer problem.

Actually, your trees example does give some ideas. Could you look inside a GAN trained on normal human behavior and identify what parts of it were the "act morally" or "be smart" parts, and turn them up? Choosing actions is, after all, a generative problem, not a classification or regression problem.

Comment by charlie-steiner on Many-worlds versus discrete knowledge · 2020-08-14T16:49:49.432Z · score: 2 (1 votes) · LW · GW

I model macrophysical observations as discrete too. But I also model tables and chairs as discrete, without needing to impose any requirements that they not be made of non-discrete stuff. A microphysical explanation of discrete observations doesn't need to be made up of discrete parts.

Comment by charlie-steiner on Many-worlds versus discrete knowledge · 2020-08-14T00:15:16.984Z · score: 3 (2 votes) · LW · GW

We model some discrete facts as being known. But Bohm interpretation or Everett, there's still just a bunch of microphysical stuff moving around. To expect the microphysical stuff to be different so that our model of how things are known works seems backwards to me.

Comment by charlie-steiner on Strong implication of preference uncertainty · 2020-08-13T00:55:15.411Z · score: 4 (2 votes) · LW · GW

The most basic examples are comparisons between derived preferences that assume the human is always rational (i.e. every action they take, no matter how mistaken it may appear, is in the service of some complicated plan for how the universe's history should go. My friend getting drunk and knocking over his friend's dresser was all planned and totally in accordance with their preferences.), and derived preferences that assume the human is irrational in some way (e.g. maybe they would prefer not to drink so much coffee, but can't wake up without it, and so the action that best fulfills their preferences is to help them drink less coffee).

But more intuitive examples might involve comparison between two different sorts of human irrationality.

For example, in the case of coffee, the AI is supposed to learn that the human has some pattern of thoughts and inclinations that mean it actually doesn't want coffee, and its actions of drinking coffee are due to some sort of limitation or mistake.

But consider a different mistake: not doing heroin. After all, upon trying heroin, the human would be happy and would exhibit behavior consistent with wanting heroin. So we might imagine an AI that infers that humans want heroin, and that their current actions of not trying heroin are due to some sort of mistake.

Both theories can be prediction-identical - the two different sets of "real preferences" just need to be filtered through two different models of human irrationality. Depending on what you classify as "irrational," this degree of freedom translates into a change in what you consider "the real preferences."

Comment by charlie-steiner on Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe? · 2020-08-09T17:07:52.341Z · score: 3 (2 votes) · LW · GW
This doesn't solve the problem of aligning the ASI with the goal described above. This only replaces "aligning AGI with humans values" by "aligning AGI to run instances of our universe". Yet, this seems to ease the problem by having a simpler objective: "predict the next step of the sentient part of the universe in a loop". (Finally, I don't know how, but we may use the fact that physics laws are constant and unchangeable, to my knowledge.)

Yeah, this would be half of my central concern. It just doesn't seem particularly easier to specify ideas like "run a simulation, but only worry about getting the sentient parts right" than it does to specify "fulfill human values." And then once we've got that far, I do think it's significantly more valuable to go for human-aligned AI than to roll the dice again.

Comment by charlie-steiner on Open & Welcome Thread - July 2020 · 2020-08-05T05:06:35.394Z · score: 4 (2 votes) · LW · GW

Anyone want to help proofread a post that's more or less a continuation of my last couple posts?

Comment by charlie-steiner on What is filling the hole left by religion? · 2020-08-05T05:05:33.630Z · score: 2 (1 votes) · LW · GW


Comment by charlie-steiner on Situating LessWrong in contemporary philosophy: An interview with Jon Livengood · 2020-08-04T23:06:28.525Z · score: 2 (1 votes) · LW · GW

Out of curiosity I looked up what you were teaching in the spring - the problem of induction, right? (I'll be surprised and impressed if you managed to foist a reading from Li and Vitanyi on your students :P ) I'm definitely curious about what you think of the progress in probability, and what morals one could draw from it.

I'd actually checked because I thought it would be the philosophy of psychology. That seems like one of those areas where there were, in hindsight, obvious past mistakes, and it's not clear how much of the progress has been emprirical versus things that could have been figured out using the empirical knowledge of the time.

Comment by charlie-steiner on Solving Key Alignment Problems Group · 2020-08-04T06:50:22.866Z · score: 4 (3 votes) · LW · GW

Interesting, but there are multiple schemes worth thinking about that I have a hard time fitting into the same top-down design. There are disjunctions like "either the AI should take good actions or it should find arguments that cause humans to take good actions" that prompt different ideas on different sides of the disjunction. On the other hand, maybe you plan to manage and catalogue all the disjunctions rather than splitting different branches into isolated discussions? Potentially doable, but also potentially leads to pandemonium.

Comment by charlie-steiner on Charlie Steiner's Shortform · 2020-08-04T06:28:12.915Z · score: 4 (2 votes) · LW · GW

It seems like there's room for the theory of logical-inductor-like agents with limited computational resources, and I'm not sure if this has already been figured out. The entire trick seems to be that when you try to build a logical inductor agent, it's got some estimation process for math problems like "what does my model predict will happen?" and it's got some search process to find good actions, and you don't want the search process to be more powerful than the estimator because then it will find edge cases. In fact, you want them to be linked somehow, so that the search process is never in the position of taking advantage of the estimator's mistakes - if you, a human, are making some plan and notice a blind spot in your predictions, you don't "take advantage" of yourself, you do further estimating as part of the search process.

The hard part is formalizing this handwavy argument, and figuring out what other strong conditions need to be met to get nice guarantees like bounded regret.

Comment by charlie-steiner on Meaning is Quasi-Idempotent · 2020-07-29T14:57:11.056Z · score: 4 (2 votes) · LW · GW

Well, because as per your example, you can't ask for the meaning of "meaning" that way. You've got to do something else, like have the usage of "meaning" demonstrated to you and pick it up inductively.

Comment by charlie-steiner on Meaning is Quasi-Idempotent · 2020-07-29T08:34:41.585Z · score: 2 (1 votes) · LW · GW

Or more directly, you've demonstrated that humans can't have learned their vocabulary by asking well-formed variations on "What does X mean?"

Comment by charlie-steiner on SDM's Shortform · 2020-07-23T15:57:54.795Z · score: 5 (3 votes) · LW · GW

I think the mountain analogy really is the center of the rationality anti-realist argument.

It's very intuitive to think of us perceiving facts about e.g. epistemology as if gazing upon a mountain. There is a clean separation between us, the gazer, and that external mountain, which we perceive in a way that we can politely pretend is more or less directly. We receive rapid, rich data about it, through a sensory channel whose principles of operation we well understand and trust, and that data tends to cohere well together with everything else, except when sometimes it doesn't but let's not worry about that. Etc.

The rationality anti-realist position is that perceiving facts about epistemology is very little like looking at a mountain. I'm reminded of a Dennett quote about the quality of personal experience:

Just about every author who has written about consciousness has made what we might call the first-person-plural presumption: Whatever mysteries consciousness may hold, we (you, gentle reader, and I) may speak comfortably together about our mutual acquaintances, the things we both find in our streams of consciousness. And with a few obstreperous exceptions, readers have always gone along with the conspiracy.
This would be fine if it weren’t for the embarrassing fact that controversy and contradiction bedevil the claims made under these conditions of polite mutual agreement. We are fooling ourselves about something. Perhaps we are fooling ourselves about the extent to which we are all basically alike. Perhaps when people first encounter the different schools of thought on phenomenology, they join the school that sounds right to them, and each school of phenomenological description is basically right about its own members’ sorts of inner life, and then just innocently overgeneralizes, making unsupported claims about how it is with everyone.

So, the mountain disanalogy: sometimes there are things we have opinions about, and yet there is no clean separation between us and the thing. We don't perceive it in a way that we can agree is trusted or privileged. We receive vague, sparse data about it, and the subject is plagued by disagreement, self-doubt, and claims that other people are doing it all wrong.

This isn't to say that we should give up entirely, but it means that we might have to shift our expectations of what sort of explanation or justification we are "entitled" to. Everyone would absolutely love it if they could objectively dunk on all those other people who disagree with them, but it's probably going to turn out that a thorough explanation will sound more like "here's how things got the way they are" rather than "here's why you're right and everyone else is wrong."

Comment by charlie-steiner on What Would I Do? Self-prediction in Simple Algorithms · 2020-07-20T09:44:31.127Z · score: 6 (3 votes) · LW · GW

From a logic perspective you'd think any epsilon>0 would be enough to rule out the "conditioning on a falsehood" problem. But I second your question, because Scott makes it sound like there's some sampling process going on that might actually need to do the thing. Which is weird - I thought the sampling part of logical inductors was about sampling polynomial-time proofs, which don't seem like they should depend much on epsilon.

Comment by charlie-steiner on Sunday July 19, 1pm (PDT) — talks by Raemon, ricraz, mr-hire, Jameson Quinn · 2020-07-19T18:44:45.958Z · score: 2 (1 votes) · LW · GW

See you in an hour :)

Comment by charlie-steiner on Why is pseudo-alignment "worse" than other ways ML can fail to generalize? · 2020-07-19T07:38:08.948Z · score: 8 (2 votes) · LW · GW

Although on the other hand, decade+ old arguments about the instrumental utility of good behavior while dependent on humans have more or less the same format. Seeing good behavior is better evidence of intelligence (capabilities generalizing) than it is of benevolence (goals 'generalizing').

The big difference is that the olde-style argument would be about actual agents being evaluated by humans, while the mesa-optimizers argument is about potential configurations of a reinforcement learner being evaluated by a reward function.

Comment by charlie-steiner on Why is pseudo-alignment "worse" than other ways ML can fail to generalize? · 2020-07-19T03:14:51.044Z · score: 2 (1 votes) · LW · GW

It's an important failure mode if you think you're going to want to do sm

Comment by charlie-steiner on Why is pseudo-alignment "worse" than other ways ML can fail to generalize? · 2020-07-19T03:10:13.609Z · score: 11 (6 votes) · LW · GW


Comment by charlie-steiner on Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment · 2020-07-17T05:59:59.828Z · score: 2 (1 votes) · LW · GW
To clarify, when said "performs well", I did not mean "learns human values well", nor did I have any sort of scoring rule in mind. I intended to mean that the algorithm learns patterns which are actually present in the world - much like earlier when I talked about "the human-labelling-algorithm 'working correctly'".

Ah well. I'll probably argue with you more about this elsewhere, then :)

Comment by charlie-steiner on What should be the topic of my LW mini-talk this Sunday (July 18th)? · 2020-07-16T23:47:01.089Z · score: 4 (2 votes) · LW · GW

Yeah, I dunno, just pick something you think is mildly interesting and can be fun with no context. 5 minutes is only 5 slideworth of time, or 3 slides and one 2-minute-long digression.

Introductory material like why we want a difference between voting systems for a single winner vs. for a parliament might be interesting, but just some interesting factoid or digression that takes up 5 minutes is also good.