Comment by charlie-steiner on The reward engineering problem · 2019-01-17T04:11:07.161Z · score: 1 (1 votes) · LW · GW

I'm not 100% sold on explaining actions as a solution here. It seems like the basic sorts of "attack" (exploiting human biases or limitations, sending an unintended message to the supervisor, sneaking a message to a 3rd party that will help contol the reward signal) still work fine - so long as the search process includes tue explainer as part of the environment. And if it doesn't, we run into the usual issue with such schemes: the AI predictably gets its predictions wrong, and so you need some guarantee that you can keep this AI and its descendants in this unnatural state.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-13T01:28:42.989Z · score: 4 (3 votes) · LW · GW
I'll probably end up mostly agreeing with Integrated Information theories

Ah... x.x Maybe check out Scott Aaronsons' blog posts on the topic (here and here)? I'm definitely more of the Denettian "consciousness is a convenient name for a particular sort of process built out of lots of parts with mental functions" school.

Anyhow, the reason I focused on drawing boundaries to separate my brain into separate physical systems is mostly historical - I got the idea from the Ebborians (further rambling here. Oh, right - I'm Manfred). I just don't find mere mass all that convincing as a reason to think that some physical system's surroundings are what I'm more likely to see next.

Intuitively it's something like a symmetry of my information - if I can't tell anything about my own brain mass just by thinking, then I shouldn't assign my probabilities as if I have information about my brain mass. If there are two copies of me, one on Monday with a big brain and one on Tuesday with a small brain, I don't see much difference in sensibleness between "it should be Monday because big brains are more likely" and "I should have a small brain because Tuesday is an inherently more likely day." It just doesn't compute as a valid argument for me without some intermediate steps that look like the Ebborians argument.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-12T09:42:24.537Z · score: 1 (1 votes) · LW · GW

It's about trying to figure out what's implied about your brain by knowing that you exist.

It's also about trying to draw some kind of boundary with "unknown environment to interact with and reason about" on one side and "physical system that is thinking and feeling" on the other side. (Well, only sort of.)

Treating a merely larger brain as more anthropically important is equivalent to saying that you can draw this boundary inside the brain (e.g. dividing big neurons down the middle), so that part of the brain is the "reasoner" and the rest of the brain, along with the outside, is the environment to be reasoned about.

This is boundary can be drawn, but I think it doesn't match my self-knowledge as well as drawing the boundary based on my conception of my inputs and outputs.

My inputs are sight, hearing, proprioception, etc. My outputs are motor control, hormone secretion, etc. The world is the stuff that affects my inputs and is affected by my outputs, and I am the thing doing the thinking in between.

If I tried to define "I" as the left half of all the neurons in my head, suddenly I would be deeply causally connected to this thing (the right halves of the neurons) I have defined as not-me. These causal connections are like a huge new input and output channel for this defined-self - a way for me to be influenced by not-me, and influence it in turn. But I don't notice this or include it in my reasoning - Paper and Scissors in the story are so ignorant about it that they can't even tell which of them has it!

So I claim that I (and they) are really thinking of themselves as the system that doesn't have such an interface, and just has the usual suite of senses. This more or less pins down the thing doing my thinking as the usual lump of non-divided neurons, regardless of its size.

Comment by charlie-steiner on The Mirror Chamber: A short story exploring the anthropic measure function and why it can matter · 2019-01-12T00:13:13.942Z · score: 2 (2 votes) · LW · GW

Very beautiful! Though see here, particularly footnote 1. I think there are pretty good reasons to think that our ability to locate ourselves as persons (and therefore our ability to have selfish preferences) doesn't depend on brain size or even redundancy, so long as the redundant parts are causally yoked together.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-10T08:15:51.995Z · score: 1 (1 votes) · LW · GW

This is true when getting training data, but I think it's a difference between A (or HCH) and AlphaGo Zero when doing simulation / amplification. Someone wins a simulated game of Go even if both players are making bad moves (or even random moves), which gives you a signal that A doesn't have access to.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-10T00:21:06.852Z · score: 1 (1 votes) · LW · GW

Oh, I've just realized that the "tree" was always intended to be something like task decomposition. Sorry about that - that makes the analogy a lot tighter.

Comment by charlie-steiner on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-09T03:46:41.535Z · score: 1 (1 votes) · LW · GW

Thanks for the summary! I agree that this is missing some extra consideration for programs that are planning / searching at test time. We normally think of Google Maps as non-agenty, "tool-like," "task-directed," etc, but it's performing a search for the best route from A to B, and capable of planning to overcome obstacles - as long as those obstacles are within the ontology of its map of ways from A to B.

A thermostat is dumber than Google Maps, but its data is more closely connected to the real world (local temperature rather than general map), and its output is too (directly controlling a heater rather than displaying directions). If we made a "Google Thermostat Maps" website that let you input your thermostat's state, and showed you a heater control value, it would perform the same computations as your thermostat but lose its apparent agency. The condition for us treating the thermostat like an agent isn't just what computation it's doing, it's that its input, search (such as it is), and output ontologies match and extend into the real world well enough that even very simple computation can produce behavior suitable for the intentional stance.

Comment by charlie-steiner on AlphaGo Zero and capability amplification · 2019-01-09T03:25:36.946Z · score: 3 (2 votes) · LW · GW

MCTS works as amplification because you can evaluate future board positions to get a convergent estimate of how well you're doing - and then eventually someone actually wins the game, which keeps p from departing reality entirely. Importantly, the single thing you're learning can play the role of the environment, too, by picking the opponents' moves.

In trying to train A to predict human actions given access to A, you're almost doing something similar. You have a prediction that's also supposed to be a prediction of the environment (the human), so you can use it for both sides of a tree search. But A isn't actually searching through an interesting tree - it's searching for cycles of length 1 in its own model of the environment, with no particular guarantee that any cycles of length 1 exist or are a good idea. "Tree search" in this context (I think) means spraying out a bunch of outputs and hoping at least one falls into a fixed point upon iteration.

EDIT: Big oops, I didn't actually understand what was being talked about here.

Comment by charlie-steiner on Coherence arguments do not imply goal-directed behavior · 2019-01-03T23:35:07.330Z · score: 3 (2 votes) · LW · GW

Sorry, this was a good response to my confused take - I promised myself I'd write a response but only ended up doing it now :)

I think the root of my disagreeing-feeling is that when I talk about things like "it cares" or "it values," I'm in a context where the intentional stance is actually doing useful work - thinking of some system as an agent with wants, plans, goals, etc. is in some cases a useful simplification that helps me better predict the world. This is especially true when I'm just using the words informally - I can talk about the constantly-twitching agent wanting to constantly twitch, when using the words deliberately, but I wouldn't use this language intuitively, because it doesn't help me predict anything the physical stance wouldn't. It might even mislead me, or dilute the usefulness of intentional stance language. This conflict with intuition is a lot of what's driving my reaction this this argument.

The other half of the issue is that I'm used to thinking of intentional-stance features as having cognitive functions. For example, if I "believe" something, this means that I have some actual physical pattern inside me that performs the function of a world-model, and something like plans, actions, or observations that I check against that world-model. The physical system that constantly twitches can indeed be modeled by an agent with a utility function over world-histories, but that agent is in some sense an incorporeal soul - the physical system itself doesn't have the cognitive functions associated with intentional-stance attributes (like "caring about coherence").

Comment by charlie-steiner on Perspective Reasoning and the Sleeping Beauty Problem · 2019-01-03T23:06:19.527Z · score: 1 (1 votes) · LW · GW

Sorry for the slow reply. I eventually looked though the pdf, but really just wanted to argue one more time against this idea that "what day is it?" is not a valid statement. Today is Thursday. This is a fact that has very strong predictive powers. I can test it by checking the internet or asking someone for the date. It is part of the external world, just as much as "did I go dancing on New Year's Eve 2018?" or "when I flip this coin, will it land Heads?"

It seems to me like you're treating Sleeping Beauty as being some sort of communal consciousness encompassing all of her copies, and there's no fact of the matter about what "today" is for this communal consciousness. But that's not actually what's happening - each instance of Sleeping Beauty is a complete person, and gets to have beliefs about the external world just as good as any other person's.

You might also think of the problem in terms of Turing machines that get fed a tape of observations, trying to predict the next observation. By "Today is Thursday" I mean that the state of my Turing machine is such that I predict I'll see "Thursday" when I click on the time at the bottom-left of my computer screen (there is, after all, no real thursday-ness that I could identify by doing a fundamental physics experiment). The copies of Sleeping Beauty can be thought of as identical Turing machines that have been fed identical observations so far, but can't tell for certain what they will observe when they look at a calendar.

Comment by charlie-steiner on Card Balance and Artifact · 2018-12-28T16:25:47.389Z · score: 2 (2 votes) · LW · GW

What a well-balanced card pool on the card level means is that players are confronted with many more meaningful choices about what gameplan they want to be executing, which makes the draft and deckbuilding experience a lot richer. If you're a constructed player who just plays the meta deck, it doesn't matter so much, I agree.

Comment by charlie-steiner on Card Collection and Ownership · 2018-12-27T17:55:19.143Z · score: 2 (2 votes) · LW · GW

I did end up getting Artifact. In the patch notes they said something that made me very hopeful, which was something like "players want a good game above all else." Balancing cards makes the game itself better, therefore you should do it - that is, if you're trying to appeal to players like me who care less about treating the cards as physical objects or investments. I agree that balancing cards is going to reduce incentive to trade, and will happily bet that cosmetic monetization is coming.

Comment by charlie-steiner on Kindergarten in NYC: Much More than You Wanted to Know · 2018-12-26T05:19:20.719Z · score: 1 (1 votes) · LW · GW

I went to an elementary school in Michigan that was 60% free and reduced lunch, no art class (music starting in 3rd grade), half an hour of recess per day, rooms that felt cramped when I went back to visit them as a teenager. I went to a middle school that had slit-like windows, indicating that it had been built (or, for some parts, renovated) in the 1970s. Some of these things may not matter to your child, as they more or less didn't matter for me.

What made my public school district work for me was that I could find ~4 friends in it, that they made liberal use of tracking and partial-day advanced programs to help meet my needs, and that they fed me and kept me from dying of exposure while I read books. More or less full stop. My point is that things that seem important (particularly facilities) might not be, and that looking at my schools in terms of averages would have been misleading when I ended up in a rather specialized pocket of it.

Oh, and (more unsolicited opinion) going to social justice school probably won't hurt your kid. How much propaganda do you remember from 2nd grade?

Comment by charlie-steiner on The Pavlov Strategy · 2018-12-25T00:24:42.011Z · score: 3 (2 votes) · LW · GW

Huh, intersting paper. That's 1993 - Is there a more modern version with more stochastic parameters explored? Seems like an easy paper if not.

I'm also reminded of how computer scientists often end up doing simulations rather than basic math. This seems like a complicated system of equations, but maybe you could work out its properties with a couple of hours and basic nonlinear dynamics knowledge.

Comment by charlie-steiner on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-23T15:06:02.003Z · score: 3 (2 votes) · LW · GW

Weighting rewards according to population is the process of ADT Adam and Eve, who take identical actions to SSA Adam and Eve but can have different reasons. SSA Adam and Eve are trying to value their future reward proportional to how likely they are to receive it. Like, if these people actually existed and you could talk to them about their decision-making process, I imagine that ADT Adam and Eve would say different things than SSA Adam and Eve.

Comment by charlie-steiner on Anthropic paradoxes transposed into Anthropic Decision Theory · 2018-12-23T02:13:02.871Z · score: 1 (1 votes) · LW · GW

The Adam and Eve example really helped me understand the correspondence between "ADT average utilitarians" and "CDT average utilitarians". Thanks!

It's also kind of funny that one of the inputs is "assume a 50% chance of pregnancy from having sex" - it seems like an odd input to allow in anthropic decision-making, though it can be cashed out in terms of reasoning using a model of the world with certain parameters that look like Markov transition probabilities.

And of course, one shouldn't forget that, by their own standards, SSA Adam and Eve are making a mistake. (This becomes more obvious if we replace probabilities with frequencies - if we change this "50% chance of pregnancy" into two actual copies of them, one of which will get pregnant, but keep their decisions fixed, we can deterministically money-pump them.) It's all well and good to reverse-engineer their decisions into a different decision-making format, but we shouldn't use a framework that can't imagine people making mistakes.

Comment by charlie-steiner on The E-Coli Test for AI Alignment · 2018-12-16T09:59:47.187Z · score: 3 (2 votes) · LW · GW

I think there are some important advantages that humans have over e. coli, as subjects of value learning. We have internal bits that correspond to much more abstract ways of reasoning about the world and making plans. We can give the AI labeled data or hardcoded priors. We can answer natural language questions. We have theory of mind about ourselves. The states we drive the world into within our domain of normal operation are more microscopically different, increasing the relative power of abstract models of our behavior over reductionist ones.

Comment by charlie-steiner on Open and Welcome Thread December 2018 · 2018-12-15T06:36:01.091Z · score: 7 (5 votes) · LW · GW

P-zombies are indeed all about epiphenomenalism. Go check out David Chalmers' exposition for the standard usage. I think the problem with epiphenominalism is that it's treating ignorance as a positive license to intoduce its epiphenomenal essence.

We know that the brain in your body does all sorts of computational work, and does things that function like memory, and planning, and perception, and being affected by emotions. We might even use a little poetic language and say that there is "someone home" in your body - that it's convenient and natural to treat this body as a person with mental attributes. But it is the unsolved Hard Problem of Consciousness, as some would say, to prove that the person home in your body is you. We could have an extra consciousness-essence attached to these bodies, they say. You can't prove we don't!

When it comes to denying qualia, I think Dennett would bring up the anecdote about magic from Lee Siegel:

"I'm writing a book on magic”, I explain, and I'm asked, “Real magic?” By real magic people mean miracles, thaumaturgical acts, and supernatural powers. “No”, I answer: “Conjuring tricks, not real magic”. Real magic, in other words, refers to the magic that is not real, while the magic that is real, that can actually be done, is not real magic."

Dennett thinks peoples' expectations are that "real qualia" are the things that live in the space of epiphenomenal essences and can't possibly be the equivalent of a conjuring trick.

Comment by charlie-steiner on How rapidly are GPUs improving in price performance? · 2018-12-14T09:41:34.961Z · score: 3 (2 votes) · LW · GW

Interesting, thanks. This "unweighted" (on a log scale) graph looks a lot more like what I'd expect to be a good fit for a single-exponential model.

Of course, if you don't like how an exponential curve fits the data, you can always change models - in this case, probably to a curve with 1 more free parameter (indicating a degree of slowdown of the exponential growth) or 2 more free parameters (to have 2 different exponentials stitched together at a specific point in time).

Comment by charlie-steiner on Who's welcome to our LessWrong meetups? · 2018-12-11T23:56:57.468Z · score: 3 (3 votes) · LW · GW

Not sure what your meetup content is, or how you feel the real criteria for someone fitting in are. Are you going to talk about science, or technology, or philosophy, or are you going to to do some kind of exercise or group activity, or are you just going to hang out?

For meetups I've run in the past, I think the most important criterion of fit is that someone should enjoy training their cognitive skills (which was usually the meat of the meetups), and enjoyment of LW subculture ("did you see X?" being a good way to have a fun conversation / hang out) was an important secondary quality.

Comment by charlie-steiner on Why we need a *theory* of human values · 2018-12-05T23:10:16.209Z · score: 1 (1 votes) · LW · GW

I strongly agree, but I think the format of the thing we get, and how to apply it, are still going to require more thought.

Human values as they exist inside humans are going to exist natively as several different, perhaps conflicting, ways of judging human internal ways of representing the world. So first you have to make a model of a human, and figure out how you're going to locate intentional-stance elements like "representation of the world." Then you run into ontological crises from moving the human's models and judgments into some common, more accurate model (that an AI might use). Get the wrong answer in one of these ontological crises, and the modeled utility function may assign high value to something we would regard as deceptive, or as wireheading the human (such reactions might give some hints towards how we want to resolve such ontological crises).

Once we're comparing human judgments on a level playing field, we can still run into problems of conflicts, problems of circularity, and other weird meta-level conflicts where we don't value some values that I'm not sure how to address in a principled way. But suppose we compress these judgments into one utility function within the larger model. Are we then done? I'm not sure.

Comment by charlie-steiner on Coherence arguments do not imply goal-directed behavior · 2018-12-03T17:45:28.340Z · score: 3 (2 votes) · LW · GW

I'm not sure that the agent that constantly twitches is going to be motivated by coherence theorems anyways. Is the class of agents that care about coherence identical to the class of potentially dangerous goal-directed/explicit-utility-maximizing/insert-euphemism-here agents?

Comment by charlie-steiner on Formal Open Problem in Decision Theory · 2018-12-01T22:26:05.727Z · score: 4 (2 votes) · LW · GW

When thinking about agents, the first motivation might not quite work out. Small changes in observation might introduce discontinuous changes in policy - e.g. in the Matching Pennies game. Suppose there are agents (functions) in that output a fixed , no matter their input. If you can continuously vary by moving in , then Matching Pennies play will be discontinuous at . So right away you've committed to some unusual behavior for the agents in by asking for continuity - they can't play perfect Matching Pennies at the very least.

Comment by charlie-steiner on How rapidly are GPUs improving in price performance? · 2018-11-25T22:18:56.589Z · score: 30 (11 votes) · LW · GW

Because the noise usually grows as the signal does. Consider Moore's law for transistors per chip. Back when that number was about 10^4, the standard deviation was also small - say 10^3. Now that density is 10^8, no chips are going to be within a thousand transiators of each other, the standard deviation is much bigger (~10^7).

This means that if you're trying to fit the curve, being off by 10^5 is a small mistake when preducting current transistor #, but a huge mistake when predicting past transistor #. It's not rare or implausible now to find a chip with 10^5 more transistors, but back in the '70s that difference is a huge error, impossible under an accurate model of reality.

A basic fitting function, like least squares, doesn't take this into account. It will trade off transistors now vs. transistors in the past as if the mistakes were of exactly equal importance. To do better you have to use something like a chi squared method, where you explicitly weight the points differently based on their variance. Or fit on a log scale using the simple method, which effectively assumes that the noise is proportional to the signal.

Comment by charlie-steiner on How rapidly are GPUs improving in price performance? · 2018-11-25T20:46:09.413Z · score: 6 (4 votes) · LW · GW

When trying to fit an exponential curve, don't weight all the points equally. Or if you're using excel and just want the easy way, take the log of your values and then fit a straight line to the logs.

Comment by charlie-steiner on Perspective Reasoning and the Sleeping Beauty Problem · 2018-11-25T12:19:27.990Z · score: 4 (3 votes) · LW · GW

Ah, it started so well. And then the numbered list started, and you didn't use any of the things from before the list at all! You assumed some new things (1, 2 and 3) that contained your entire conclusion.

Let me try to redirect you just a little.

Suppose we flip a coin and hide it under a cup without looking at it. We should bet as if the coin has P(Heads)=0.5, because when we are ignorant we can't do better than assigning a probability, even though the reality is fixed. In fact, the same argument applies before flipping the coin if we ignore quantum effects - the universe is already arranged such that the coin will land heads or tails, but because we don't know which, we assign a probability.

Now suppose that you get to look at the coin, while I don't. Now you should assign P(Heads)=1 if it is heads, and P(Heads)=0 if it is tails, but I should still assign P(Heads)=0.5. Different people can assign different probabilities, and that's okay.

The Sleeping Beauty problem has two perspectives - Sleeping Beauty's view, and the experimenter's view (or god's view). In these two views, you face different constraints. To Sleeping Beauty, she is special and she knows that certain logical relationships hold between the allowed day and the state of the coin. To the experimenter, the coin and the day are independent variables, and no instance of Sleeping Beauty is special.

(note: if you think the day being Monday is an "invalid" observable, just suppose that there is a calendar outside the room and Sleeping Beauty is predicting what she will see when she checks the calendar, much like how we predicted what we would see when we looked at the flipped coin.)

Everyone thinks that assigning probabilities from the experimenter's view is easy, but they disagree about Sleeping Beauty's view.

Here's a trick that tells you about what betting odds Sleeping Beauty should assign, using only the easy experimenter's view! Just suppose that the experimenter is betting money against Sleeping Beauty - every time Sleeping Beauty wakes up she makes this bet. Every dollar won by Sleeping Beauty is lost by the experimenter. What is a fair price for Sleeping Beauty to pay, in exchange for the experimenter paying her $1.00 if the day is Monday?

We don't need to use Sleeping Beauty's view to answer this question. We just use the fact that the experimenter's view is easy, and the bet is fair if the experimenter doesn't gain or lose any money on average, from the experimenter's view. With probability 0.5 (for the experimenter) Sleeping Beauty only wakes up on Monday, and with probability 0.5 (for the experimenter) she wakes up on both Monday and Tuesday and makes the bet both times. So with probability 0.5 the experimenter pays a dollar and gets the fair price, and with probability 0.5 the experimenter pays a dollar and gets twice the fair price.

In other words, 3 times the fair price = 2 dollars. The fair price for a bet that pays Sleeping Beauty on Monday is $2/3.

Comment by charlie-steiner on Review: Artifact · 2018-11-25T11:13:41.858Z · score: 1 (1 votes) · LW · GW

Looks pretty interesting. I'm not super sold on this being a "nice" business model, since playing constructed at a competitive level still seems like a multi-hundred-dollar buy-in that's only going to increase with further expansions. But I like drafting anyhow, so sure.

I'm also a little concerned about some of the big power differences in heroes, and certain instances of early-game RNG - a little is necessary but I think things can get unfun when there's a high enough variance and clear enough options that you can tell that you've probably already won or lost, but have to play it out anyhow.

Still, I'll probably get it - I'm more or less done with Slay the Spire (if you like card-based combat, puzzly roguelikes, good balance, and high difficulty, I definitely recommend that game, but at this point I've beaten A20 with all the dudes, and don't feel like going for high winrate), and the gameplay videos seem interesting.

Anyone can PM me if they want to talk Artifact, I guess?

Comment by charlie-steiner on Values Weren't Complex, Once. · 2018-11-25T10:28:50.113Z · score: 3 (3 votes) · LW · GW

Have you read the Blue-Minimizing Robot? Early Homo sapiens was in the simple environment where it seemed like they were "minimizing blue," i.e. maximizing genetic fitness. Now, you might say, it seems like our behavior indicates preferences for happiness, meaning, validation, etc, but really that's just an epiphenomenon no more meaningful than our previous apparent preference for genetic fitness.

However, there is an important difference between us and the blue-minimizing robot, which is that we have a much better model of the world, and within that model of the world we do a much better job than the robot at making plans. What kind of plans? The thing that motivates our plans is, from a purely functional perspective, our preferences. And this thing isn't all that different in modern humans versus hunter-gatherers. We know, we've talked to them. There have been some alterations due to biology and culture, but not as much as there could have been. Hunter-gatherers still like happiness, meaning, validation, etc.

What seems to have happened is that evolution stumbled upon a set of instincts that produced human planning, and that in the ancestral environment this correlated well with genetic fitness, but in the modern environment this diverges even though the planning process itself hasn't changed all that much. There are certain futuristic scenarios that could seriously disrupt the picture of human values I've given, but I don't think it's the default, particularly if there aren't any optimization processes much stronger than humans running around.

Comment by charlie-steiner on On MIRI's new research directions · 2018-11-25T07:04:21.225Z · score: 6 (4 votes) · LW · GW

Hm. I wonder what an "alternative" to neural nets and gradient descent would look like. Neural nets are really just there as a highly expressive model class that gradient descent works on.

One big difficulty is that if your model is going to classify pictures of cats (or go boards, etc.), it's going to be pretty darn complicated, and I'm sceptical that any choice of model class is going to prevent that. But maybe one could try to "hide" this complexity in a recursive structure. Neural nets already do this, but convnets especially mix up spatial hierarchy with logical hierarchy, and nns in general aren't as nicely packaged into human-thought-sized pieces as maybe they could be - consider resnets, which work well precisely because they abandon the pretense of each neuron being some specific human-scale logical unit.

So maybe you could go the opposite direction and make that pretense a reality with some kind of model class that tries to enforce "human-thought-sized" reused units with relatively sparse inter-unit connections? Could still train with SGD, or treat hypotheses as decision trees and take advantage of that literature.

But suppose we got such a model class working, and trained it to recognize cats. Would it actually be human-comprehensible? Probably not! I guess I'm just not really clear on what "designed for transparency and alignability" is supposed to cash out to at this stage of the game.

Comment by charlie-steiner on Believing others' priors · 2018-11-25T06:32:46.907Z · score: 2 (2 votes) · LW · GW

I think Sean Carroll does a pretty good job, e.g. in Free Will Is As Real As Baseball.

Comment by charlie-steiner on Acknowledging Human Preference Types to Support Value Learning · 2018-11-25T02:05:29.651Z · score: 1 (1 votes) · LW · GW

Interesting! I'm still concerned that, since you need to aggregate these things in the end anyhow (because everything is commensurable in the metric of affecting decisions), the aggregation function is going to be allowed to be very complicated and dependent on factors that don't respect the separation of this trichotomy.

But it does make me consider how one might try to import this into value learning. I don't think it would work to take these categories as given and then try to learn meta-preferences to sew them together, but most (particularly more direct) value learning schemes have to start with some "seed" of examples. If we draw that seed only from "approving," does that mean that the trained AI isn't going to value wanting or liking enough? Or would everything probably be fine, because we wouldn't approve of bad stuff?

Comment by charlie-steiner on Topological Fixed Point Exercises · 2018-11-21T07:13:29.471Z · score: 11 (3 votes) · LW · GW

#8 actually comes up in physics:

in the field of nonlinear dynamics (pretty picture, actual wikipedia). The fact that continuous changes in functions can lead to surprising changes in fixed points (specifically stable attractors) is pretty darn important to understanding e.g. phase transitions!

Comment by charlie-steiner on Topological Fixed Point Exercises · 2018-11-21T05:00:33.152Z · score: 11 (3 votes) · LW · GW

Does this work for #7? (and question) (Spoilers for #6):

I did #6 using 2D Sperner's lemma and closedeness. Imagine the the destination points are colored [as in #5, which was a nice hint] by where they are relative to their source points - split the possible difference vectors into a colored circle as in #5 [pick the center to be a fourth color so you can notice if you ever sample a fixed point directly, but if fixed points are rare this shouldn't matter], and take samples to make it look like 2d Sperner's lemma, in which there must be at least one interior tri-colored patch. Define a limit of zooming in that moves you towards the tri-colored patch, apply closedness to say the center (fixed) point is included, much like how we were encouraged to do #2 with 1D Sperner's lemma.

To do #7, it seems like you just need to show that there's a continuous bijection that preserves whether a point is interior or on the edge, from any convex compact subset of R^2 to any other. And there is indeed a recipe to do this - it's like you imagine sweeping a line across the two shapes, at rates such that they finish in equal time. Apply a 1D transformation (affine will do) at each point in time to make the two cross sections match up and there you are. This uses the property of convexity, even though it seems like you should be able to strengthen this theorem to work for simply connected compact subsets (if not - why not?).

EDIT: (It turns out that I think you can construct pathological shapes with uncountable numbers of edges for which a simple linear sweep fails no matter the angle, because you're not allowed to sweep over an edge of one shape while sweeping over a vertex of the other. But if we allow the angle to vary slightly with parametric 'time', I don't think there's any possible counterexample, because you can always find a way to start and end at a vertex.)

Then once you've mapped your subset to a triangle, you use #6. But.

This doesn't use the hint! And the hints have been so good and educational everywhere I've used them. So what am I missing about the hint?

Comment by charlie-steiner on Topological Fixed Point Exercises · 2018-11-21T03:39:05.873Z · score: 3 (2 votes) · LW · GW

As a physicist, this is my favorite one for obvious reasons :)

Comment by charlie-steiner on Topological Fixed Point Exercises · 2018-11-19T21:45:14.915Z · score: 11 (3 votes) · LW · GW

Yeah, I did the same thing :)

Putting it right after #2 was highly suggestive - I wonder if this means there's some very different route I would have thought of instead, absent the framing.

Comment by charlie-steiner on Mandatory Obsessions · 2018-11-19T17:50:35.991Z · score: 2 (2 votes) · LW · GW

Shrug I dunno man, that seems hard :) I just tend to evaluate community norms by how well they've worked elsewhere, and gut feeling. But neither of these is any sort of diamond-hard proof.

Your question at the end is pretty general, and I would say that most chakra-theorists would not want to join this community, so in a sense we're already mostly avoiding chakra-theorists - and there are other groups who are completely unrepresented. But I think the mechanism is relatively indirect, and that's good.

Comment by charlie-steiner on Mandatory Obsessions · 2018-11-19T08:42:35.546Z · score: 3 (2 votes) · LW · GW

Consider something like protecting the free speech of people you strongly disagree with. It can be an empirical fact (according to one's model of reality) that if just those people were censored, the discussion would in fact improve. But such pointlike censorship is usually not an option that you actually have available to you - you are going to have unavoidable impacts on community norms and other peoples' behavior. And so most people around here protect something like a principle of freedom of speech.

If costs are unavoidable, then, isn't that just the normal state of things? You're thinking of "harm" as relative to some counterfactual state of non-harm - but there are many counterfactual states an online discussion group could be in that would be very good, and I don't worry too much about how we're being "harmed" by not being in those states, except when I think I see a way to get there from here.

In short, I don't think I associate the same kind of negative emotion with these kinds of tradeoffs that you do. They're just a fairly ordinary part of following a strategy that gets good results.

Comment by charlie-steiner on Mandatory Obsessions · 2018-11-19T06:35:33.070Z · score: 3 (2 votes) · LW · GW

I like to make the distinction between thinking the chakra-theorists are valuable members of the community, and thinking that it's important to have community norms that include the chakra-theorists.

It's a lot like the distinction between morality and law. The chakra theorists are probably wrong and in fact it probably harms the community that they're here. But it's not a good way to run a community to kick them out, so we shouldn't, and in fact we should be as welcoming to them as we think we should be to similar groups that might have similar prima facie silliness.

Comment by charlie-steiner on Model Mis-specification and Inverse Reinforcement Learning · 2018-11-18T23:21:28.469Z · score: 1 (1 votes) · LW · GW

So, to sum up (?):

We want the AI to take the "right" action. In the IRL framework, we think of getting there by a series of ~4 steps - (observations of human behavior) -> (inferred human decision in model) -> (inferred human values) -> (right action).

Going from step 1 to 2 is hard, and ditto with 2 to 3, and we'll probably learn new reasons why 3 to 4 is hard when try to do it more realistically. You mostly use model mis-specification to illustrate this - because very different models of step 2 can predict similar step 1, the inference is hard in a certain way. Because very different models of step 3 can predict similar step 2, that inference is also hard.

Comment by charlie-steiner on History of LessWrong: Some Data Graphics · 2018-11-16T16:24:59.762Z · score: 2 (2 votes) · LW · GW

Maybe? I think the user habits are pretty different on the site now compared to then. But I agree that more comments would be better :)

Comment by charlie-steiner on Prediction-Augmented Evaluation Systems · 2018-11-14T06:25:34.578Z · score: 6 (2 votes) · LW · GW

This reminds me of boosted decision trees. In fact, boosting translates very well from aggregating decision trees to aggregating human judgment.

Comment by charlie-steiner on Open AI: Can we rule out near-term AGI? · 2018-11-11T01:04:26.342Z · score: 3 (2 votes) · LW · GW

I dunno, I'm not super convinced. Even the most apparently messy and real-world example, the robotic hand, isn't much conceptual progress over e.g. Andrew Ng's helicopters from 2004. He jsut doesn't address the impression that current approaches don't seem to work in situations with poor data availability, compute or no.

Comment by charlie-steiner on Humans can be assigned any values whatsoever… · 2018-11-07T02:46:10.674Z · score: 1 (1 votes) · LW · GW

Suppose we start our AI off with the intentional stance, where we have a high-level description of these human objects as agents with desires and plans, beliefs and biases and abilities and limitations.

What I'm thinking when I say we need to "bridge the gap" is that I think if we knew what we were doing, we could stipulate that some set of human button-presses is more aligned with some complicated object "hDesires" than not, and the robot should care about hDesires, where hDesires is the part of the intentional stance description of the physical human that plays the functional role of desires.

Comment by charlie-steiner on Humans can be assigned any values whatsoever… · 2018-11-06T20:11:13.194Z · score: 1 (1 votes) · LW · GW

I think that right now we don't know how to bridge the gap between the thing that presses the buttons on the computer, and a fuzzy specification of a human as a macroscopic physical object. And so if you are defining "human" as the thing that presses the buttons, and you can take actions that fully control which buttons get pressed, it makes sense that there's not necessarily a definition of what this "human" wants.

If we actually start bridging the gap, though, I think it makes lots of sense for the AI to start building up a model of the human-as-physical-object which also takes into account button presses, and in that case I'm not too pessimistic about regularization.

Comment by charlie-steiner on Beliefs at different timescales · 2018-11-06T02:43:40.474Z · score: 1 (1 votes) · LW · GW

The issue being, of course, that when we think of predicting the outcome of the chess game based on Elo score, we're not making any sort of prediction about the very next move (a feat possible only through logical microscience). A similar thing happens with the gas, where the Boltzmann distribution is not a distribution over histories. I don't think this is a coincidence.

Comment by charlie-steiner on Some cruxes on impactful alternatives to AI policy work · 2018-10-12T06:10:27.171Z · score: 9 (5 votes) · LW · GW

Worth noting that lobbying isn't just bribery - it's also about being able to connect lawmakers to experts (or, if you're less ethical, "experts"). Yes, you need to do the "real work" of having policy positions and reading proposed legislation etc. But you also need to invest money and effort into communication, networking, and generally becoming a Schelling point - experts need to know who you are so they can become part of your talent pool, and lawmakers need to know that you have expertise on some set of topics. This is probably the best excuse for all those fancy dinners we associate with lobbying firms - it's not bribery, it's advertising. Meanwhile the lobbying firm needs to know who is receptive to them, and try to work with those people.

Comment by charlie-steiner on We can all be high status · 2018-10-12T04:02:53.346Z · score: 1 (1 votes) · LW · GW

Compare this to research or most jobs. People work in groups. People have goals and work towards that goal. How does this happen? Usually it's because the group leader gets paid to do what they do, and they create a stable small community for people to work in over the mid to long term. Most people don't need to be at the center of some huge status ponzi scheme, because they just work with the same group for years on end and that's fine.

Comment by charlie-steiner on A compendium of conundrums · 2018-10-09T05:16:25.905Z · score: 2 (2 votes) · LW · GW

Can we just ban all puzzlers that require the axiom of choice?

Comment by charlie-steiner on Computerphile discusses MIRI's "Logical Induction" paper · 2018-10-06T00:46:21.928Z · score: 9 (5 votes) · LW · GW

Nice! Really seems to grok the perspective that this is about finding a good collection of desiderata, and then showing that they aren't actually mutually exclusive.

Comment by charlie-steiner on Epistemic Spot Check: The Dorito Effect (Mark Schatzker) · 2018-10-06T00:39:55.344Z · score: 2 (2 votes) · LW · GW

I get a sense-of-craving-that-could-be-satisfied-without-taste only for protein and a diffuse "vegetables," in addition to general hunger signals (though plausibly some of my sugar-craving is triggered by desire for vitamins in fruit, I can't disentangle it from desire for the taste). Not sure how detailed other peoples' experiences are.

Can few-shot learning teach AI right from wrong?

2018-07-20T07:45:01.827Z · score: 16 (5 votes)

Boltzmann Brains and Within-model vs. Between-models Probability

2018-07-14T09:52:41.107Z · score: 19 (7 votes)

Is this what FAI outreach success looks like?

2018-03-09T13:12:10.667Z · score: 53 (13 votes)

Book Review: Consciousness Explained

2018-03-06T03:32:58.835Z · score: 101 (27 votes)

A useful level distinction

2018-02-24T06:39:47.558Z · score: 26 (6 votes)

Explanations: Ignorance vs. Confusion

2018-01-16T10:44:18.345Z · score: 18 (9 votes)

Empirical philosophy and inversions

2017-12-29T12:12:57.678Z · score: 7 (2 votes)

Dan Dennett on Stances

2017-12-27T08:15:53.124Z · score: 7 (3 votes)

Philosophy of Numbers (part 2)

2017-12-19T13:57:19.155Z · score: 10 (4 votes)

Philosophy of Numbers (part 1)

2017-12-02T18:20:30.297Z · score: 24 (8 votes)