Posts

Down with Solomonoff Induction, up with the Presumptuous Philosopher 2020-06-12T09:44:29.114Z · score: 13 (5 votes)
The Presumptuous Philosopher, self-locating information, and Solomonoff induction 2020-05-31T16:35:48.837Z · score: 40 (13 votes)
Life as metaphor for everything else. 2020-04-05T07:21:11.303Z · score: 37 (11 votes)
Meta-preferences two ways: generator vs. patch 2020-04-01T00:51:49.086Z · score: 19 (6 votes)
Gricean communication and meta-preferences 2020-02-10T05:05:30.079Z · score: 14 (5 votes)
Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z · score: 15 (11 votes)
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z · score: 9 (3 votes)
The AI is the model 2019-10-04T08:11:49.429Z · score: 12 (10 votes)
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z · score: 17 (5 votes)
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z · score: 14 (5 votes)
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z · score: 22 (7 votes)
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z · score: 16 (6 votes)
Value learning for moral essentialists 2019-05-06T09:05:45.727Z · score: 13 (5 votes)
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z · score: 20 (6 votes)
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z · score: 40 (10 votes)
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z · score: 40 (21 votes)
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z · score: 16 (5 votes)
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z · score: 19 (7 votes)
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z · score: 53 (13 votes)
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z · score: 101 (27 votes)
A useful level distinction 2018-02-24T06:39:47.558Z · score: 26 (6 votes)
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z · score: 18 (9 votes)
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z · score: 8 (3 votes)
Dan Dennett on Stances 2017-12-27T08:15:53.124Z · score: 8 (4 votes)
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z · score: 11 (5 votes)
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z · score: 26 (10 votes)
Limited agents need approximate induction 2015-04-24T21:22:26.000Z · score: 1 (1 votes)

Comments

Comment by charlie-steiner on Situating LessWrong in contemporary philosophy: An interview with Jon Livengood · 2020-07-01T11:00:59.307Z · score: 21 (8 votes) · LW · GW

I've actually been recreationally looking at some undergraduate philosophy courses recently. And it still shocks me just how backwards-looking it all is. Basically nothing is taught as itself - it's only taught as a history of itself.

There are two main skills that I think are necessary to practice philosophy (at least the sort that I have practical use for): the ability to suspect that your model of things is wrong even as you try your best, and the ability to sometimes notice mistakes after you make them and go back to try again.

Presumably this is what grad school is for, in one's philosophy education, because I haven't seen deliberate practice of either in the lectures and books I've skimmed. The presence of this sort of thing is one of the factors that makes LW stand out to me. If one were situating it not within philosophy but within philosophy education, it would be a pretty nuts outlier.

Comment by charlie-steiner on Sunday Jun 28 – More Online Talks by Curated Authors · 2020-06-27T09:16:41.737Z · score: 3 (2 votes) · LW · GW

If for some reason you really need a speaker, I'm happy to volunteer.

Comment by charlie-steiner on GPT-3 Fiction Samples · 2020-06-26T12:46:26.283Z · score: 2 (1 votes) · LW · GW

There's a typo in your Andrew Mayne link, but thanks for linking it - that's wild!

Comment by charlie-steiner on An overview of 11 proposals for building safe advanced AI · 2020-06-19T01:25:48.822Z · score: 2 (1 votes) · LW · GW

Basically because I think that amplification/recursion, in the current way I think it's meant, is more trouble than it's worth. It's going to produce things that have high fitness according to the selection process applied, which in the limit are going to be bad.

On the other hand, you might see this as me claiming that "narrow reward modeling" includes a lot of important unsolved problems. HCH is well-specified enough that you can talk about doing it with current technology. But fulfilling the verbal description of narrow value learning requires some advances in modeling the real world (unless you literally treat the world as a POMDP and humans as Boltzmann-rational agents, in which case we're back down to bad computational properties and also bad safety properties), which gives me the wiggle room to be hopeful.

Comment by charlie-steiner on Intuitive Lagrangian Mechanics · 2020-06-15T07:54:46.096Z · score: 2 (1 votes) · LW · GW

Well, it makes sense for the effective field theory form of GR, for light at least.

The key to remembering how to derive the Euler-Lagrange equation (for me) is to remember that the variation in L vanishes at the boundary. This is what's going to let you do an integration by parts and throw away the constant term. Actually, once you have an intuitive grasp of what's going on, it's kind of fun to derive generalized EL equations for Lagrangians with more complicated stuff in them.

Comment by charlie-steiner on Open & Welcome Thread - June 2020 · 2020-06-13T06:06:43.762Z · score: 2 (1 votes) · LW · GW

I don't know much about nationally, but I know that locally there's been none to indetectable rioting, some amount of looting from opportunistic criminals / idiot teenagers (say, 1 per 700 protesters) and a less than expected but still some cop/protester violence that could look like rioting if you squint.

Comment by charlie-steiner on Down with Solomonoff Induction, up with the Presumptuous Philosopher · 2020-06-13T01:42:46.360Z · score: 2 (1 votes) · LW · GW

Right, but suppose that everybody knows beforehand that Omega is going to preserve copy number 0 only (or that it's otherwise a consequence of things you already know).

This "pays in advance" for the complexity of the number of the survivor. After t_1, it's not like they've been exposed to anything they would be surprised about if they were never copied.

Ah, wait, is that the resolution? If there's a known designated survivor, are they required to be simple to specify even before t_1, when they're "mixed in" with the rest?

Comment by charlie-steiner on Down with Solomonoff Induction, up with the Presumptuous Philosopher · 2020-06-12T20:20:18.749Z · score: 2 (1 votes) · LW · GW

Hm. I agree that this seems reasonable. But how do you square this with what happens if you locate yourself in a physical hypothesis by some property of yourself? Then it seems straightforward that when there are two things that match the property, they need a bit to distinguish the two results. And the converse of this is that when there's only one thing that matches, it doesn't need a bit to distinguish possible results.

I think it's very possible that I'm sneaking in an anthropic assumption that breaks the property of Bayesian updating. For example, if you do get shot, then Solomonoff induction is going to expect the continuation to look like incomprehensible noise that corresponds to the bridging law that worked really well so far. But if you make an anthropic assumption and ask for the continuation that is still a person, you'll get something like "waking up in the hospital after miraculously surviving" that has experienced a "mysterious" drop in probability relative to just before getting shot.

Comment by charlie-steiner on Down with Solomonoff Induction, up with the Presumptuous Philosopher · 2020-06-12T18:53:02.439Z · score: 2 (1 votes) · LW · GW

Yeah, this implicitly depends on how you're being pointed to. I'm assuming that you're being pointed to as "person who has had history [my history]". In which case you don't need to worry about doing extra work to distinguish yourself from shot-you once your histories (ballistically) diverge. (EDIT: I think. This is weird.)

If you're pointed to with coordinates, it's more likely that copies are treated as not adding any complexity. But this is disanalogous to the Presumptuous Philosopher problem, so I'm avoiding that case.

Comment by charlie-steiner on Inaccessible information · 2020-06-09T00:04:02.802Z · score: 3 (2 votes) · LW · GW

I think this is the key issue. There's a difference between inaccessible-as-in-expensive, and inaccessible-as-in-there-is-no-unique-answer.

If it's really impossible to tell what Alice is thinking, the safe bet not that Alice has some Platonic property of What She's Really Thinking that we just can't access. The safe bet is that the abstract model of the world we have where there's some unique answer to what Alice is thinking doesn't match reality well here. What we want isn't an AI that accesses Alice's Platonic properties, we want an AI that figures out what's "relevant, helpful, useful, etc."

Comment by charlie-steiner on Sparsity and interpretability? · 2020-06-03T01:40:58.538Z · score: 5 (3 votes) · LW · GW

I feel like this is trying to apply a neural network where the problem specification says "please train a decision tree." Even when you are fine with part of the NN not being sparse, it seems like you're just using the gradient descent training as an elaborate img2vec method.

Maybe the idea is that you think a decision tree is too restrictive, and you want to allow more weightings and nonlinearities? Still, it seems like if you can specify from the top down what operations are "interpretable," this will give you some tree-like structure that can be trained in a specialized way.

Comment by charlie-steiner on Human instincts, symbol grounding, and the blank-slate neocortex · 2020-06-02T22:55:53.236Z · score: 6 (3 votes) · LW · GW

This was just on my front page for me, for some reason. So, it occurs to me that the example of the evolved FPGA is precisely the nightmare scenario for the CCA hypothesis.

If neurons behave according to simple rules during growth and development, and there are only smooth modulations of chemical signals during development, then nevertheless you might get regions of the cortex that look very similar, but whose cells are exploiting the hardly-noticeable FPGA-style quirks of physics in different ways. You'd have to detect the difference by luckily choosing the right sort of computational property to measure.

Comment by charlie-steiner on Pessimism over AGI/ASI causing psychological distress? · 2020-06-02T19:56:58.031Z · score: 11 (6 votes) · LW · GW

Nobody is going to take time off from their utopia to spend resources torturing me. Now, killing me on the way to world domination is more plausible, but if someone solves all the technical problems required to actually use AI for world domination, the chances are still well in favor of them being some generally nice, cosmopolitan person in a lab somewhere.

No, unfortunately, it's far more likely that I will be killed by pure mistake, rather than malice.

You seem to go out of your way to make your thought experiments be about foreigners targeting you. Have you considered that maybe your concerns about AI here are an expression of an underlying anxiety about bad foreigners?

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-02T14:25:44.943Z · score: 2 (1 votes) · LW · GW

Actually I had forgotten about that :)

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-01T23:07:10.455Z · score: 2 (1 votes) · LW · GW

I am usually opposed on principle to calling something "SSA" as a description of limiting behavior rather than inside-view reasoning, but I know what you mean and yes I agree :P

I am still surprised that everyone is just taking Solomonoff induction at face value here and not arguing for anthropics. I might need to write a follow-up post to defend the Presumptuous Philosopher, because I think there's a real case the Solomonoff induction actually is missing something. I bet I can make it do perverse things in decision problems that involve being copied.

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-01T22:35:57.792Z · score: 2 (1 votes) · LW · GW

Good catch - I'm missing some extra factors of 2 (on average).

And gosh, I expected more people defending the anthropics side of the dilemma here.

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-01T22:34:30.940Z · score: 2 (1 votes) · LW · GW

Yeah, the log(n) is only the absolute minimum. If you're specifying yourself mostly by location, then for there to be n different locations you need at least log(n) bits on average (but in practice more), for example.

But I think it's plausible that the details can be elided when comparing two very similar theories - if the details of the bridging laws are basically the same and we only care about the difference in complexity, that difference might be about log(n).

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-06-01T21:33:31.167Z · score: 4 (2 votes) · LW · GW

I'm not really sure what you're arguing for. Yes, I've elided some details of the derivation of average-case complexity of bridging laws (which has gotten me into a few factors of two worth of trouble, as Donald Hobson points out), but it really does boil down to the sort of calculation I sketch in the paragraphs directly after the part you quote. Rather than just saying "ah, here's where it goes wrong" by quoting the non-numerical exposition, could you explain what conclusions you're led to instead?

Comment by charlie-steiner on The Presumptuous Philosopher, self-locating information, and Solomonoff induction · 2020-05-31T23:39:37.125Z · score: 6 (3 votes) · LW · GW

What's the minimum number of bits required to specify "and my camera is here," in such a way that it allows your bridging-law camera to be up to N different places?

In practice I agree that programs won't be able to reach that minimum. But maybe they'll be able to reach it relative to other programs that are also trying to set up the same sorts of bridging laws.

Comment by charlie-steiner on An overview of 11 proposals for building safe advanced AI · 2020-05-31T17:34:26.889Z · score: 17 (6 votes) · LW · GW

I noticed myself mentally grading the entries by some extra criteria. The main ones being something like "taking-over-the-world competitiveness" (TOTWC, or TOW for short) and "would I actually trust this farther than I could throw it, once it's trying to operate in novel domains?" (WIATTFTICTIOITTOIND, or WIT for short).

A raw statement of my feelings:

  1. Reinforcement learning + transparency tool: High TOW, Very Low WIT.
  2. Imitative amplification + intermittent oversight: Medium TOW, Low WIT.
  3. Imitative amplification + relaxed adversarial training: Medium TOW, Medium-low WIT.
  4. Approval-based amplification + relaxed adversarial training: Medium TOW, Low WIT.
  5. Microscope AI: Very Low TOW, High WIT.
  6. STEM AI: Low TOW, Medium WIT.
  7. Narrow reward modeling + transparency tools: High TOW, Medium WIT.
  8. Recursive reward modeling + relaxed adversarial training: High TOW, Low WIT.
  9. AI safety via debate with transparency tools: Medium-Low TOW, Low WIT.
  10. Amplification with auxiliary RL objective + relaxed adversarial training: Medium TOW, Medium-low WIT.
  11. Amplification alongside RL + relaxed adversarial training: Medium-low TOW, Medium WIT.
Comment by charlie-steiner on What is a decision theory as a mathematical object? · 2020-05-27T11:03:22.401Z · score: 2 (1 votes) · LW · GW

I think the most fundamental thing might be taking in a sequences of bits (or distribution over sequences if you think it's important to be analog) and outputting bits (or, again, distributions) that happen to control actions.

All this talk about taking causal models as an input is merely a useful abstraction of what happens when we do sequence prediction in our causal universe, and it might always be possible to find some plausible excuse to violate this abstraction.

Comment by charlie-steiner on Pointing to a Flower · 2020-05-19T21:43:52.447Z · score: 4 (2 votes) · LW · GW

So I'm betting, before really thinking about it, that I can find something as microphysically absurd as "the north side of the flower." How about "the mainland," where humans use a weird ontology to draw the boundary in, that makes no sense to a non-human-centric ontology? Or parts based on analogy or human-centric function, like being able to talk about "the seat" of a chair that is just one piece of plastic.

On the Type 2 error side, there are also lots of local minima of "information passing through the boundary" that humans wouldn't recognize. Like "the flower except for cell #13749788206." Often, the boundary a human draws is a fuzzy fiction that only needs to get filled in as one looks more closely - maybe we want to include that cell if it goes on to replicate, but are fine with excluding it if it will die soon. But humans don't think about this as a black box with Laplace's Demon inside, they think about it as using future information to fill in this fuzzy boundary when we try to look at it closer.

Comment by charlie-steiner on The Mechanistic and Normative Structure of Agency · 2020-05-19T09:09:01.245Z · score: 6 (3 votes) · LW · GW

I might give this a read, but based on the abstract I am concerned that "has a perspective" is going to be one of those properties that's so obvious that its presence can be left to human judgment, but that nonetheless contains all the complexity of the theory.

EDIT: Looks like my concerns were more or less unfounded. It's not what I would call the standard usage of the term, and I don't buy the conceptual-analysis-style justifications for why this makes sense as the definition of agent, but what gets presented is a pretty useful definition, at a fairly standard "things represented in models of the world" level of abstraction.

Comment by charlie-steiner on Pointing to a Flower · 2020-05-19T09:04:16.191Z · score: 4 (2 votes) · LW · GW

I think a wave would be a good test in a lot of ways, but by being such a clean example it might miss some possible pitfalls. The big one is, I think, the underdetermination of pointing at a flower - a flower petal is also an object even by human standards, so how could the program know you're not pointing at a flower petal? Even more perversely, humans can refer to things like "this cubic meter of air."

In some sense, I'm of the opinion that solving this means solving the frame problem - part of what makes a flower a flower isn't merely its material properties, but what sorts of actions humans can take in the environment, what humans care about, how our language and culture shape how we chunk up the world into labels, and what sort of objects we typically communicate about by pointing versus other signals.

Comment by charlie-steiner on Studies On Slack · 2020-05-16T07:15:29.102Z · score: 2 (1 votes) · LW · GW

Well, you need some selection process. But for a karma-less community you can still have selection on members, or social encouragement/discouragement. I guess this also requires that the volume of comments isn't so high that ain't nobody got time for that.

Comment by charlie-steiner on Could We Give an AI a Solution? · 2020-05-16T07:11:51.370Z · score: 2 (1 votes) · LW · GW

Sure. The issue is that the concepts referenced in this kind of idea are already at a very high level of abstraction - deceptively so, maybe.

For example, consider the notion of "humans in control of their own worlds." This is not a concept that's easy to describe in terms of quarks, or in terms of pixels! It's really complicated and abstract! In order to be able to detect the concept "humans are in control of a virtual world instantiated in this hardware," the AI needs to have a very sophisticated model of the world that has learned a lot of human-like concepts, and can optimize for those concepts without "breaking" them by finding edge cases.

Once you've solved these problems, it seems to me that you might as well go the last step and make an AI that you can safely tell "do the right thing." The problems you need to solve are very similar.

But because the problems are very similar, I'm not going to discourage you if this is easier to think about for you - if you can start with this dream and really drill down to the things we need to work on today, it's going to be important stuff.

Comment by charlie-steiner on How should AIs update a prior over human preferences? · 2020-05-16T05:00:28.667Z · score: 4 (2 votes) · LW · GW

I think Rohin's point is that the model of

"if I give the humans heroin, they'll ask for more heroin; my Boltzmann-rationality estimator module confirms that this means they like heroin, so I can efficiently satisfy their preferences by giving humans heroin".

is more IRL than CIRL. It doesn't necessarily assume that the human knows their own utility function and is trying to play a cooperative strategy with the AI that maximizes that same utility function. If I knew that what would really maximize utility is having that second hit of heroin, I'd try to indicate it to the AI I was cooperating with.

Problems with IRL look like "we modeled the human as an agent based on representative observations, and now we're going to try to maximize the modeled values, and that's bad." Problems with CIRL look like "we're trying to play this cooperative game with the human that involves modeling it as an agent playing the same game, and now we're going to try to take actions that have really high EV in the game, and that's bad."

Comment by charlie-steiner on How to avoid Schelling points? · 2020-05-14T23:56:54.773Z · score: 16 (10 votes) · LW · GW

Since you and the other player are cooperating, rather than thinking of the "Schelling point number," think of the Schelling point strategy that you expect each other to implement to try to win.

If I'm avoiding my ex, I might go to a bar that I like more than her, while she goes to a bar that she likes more than me. In the case of the list of numbers, I might pick a number that I think is more significant to me than to the other player.

This assumes that the two players have information that distinguishes them. But not only is that how it is in real life, it's also easy to show that it's necessary for any kind of nontrivial answer: if the two players are identical copies of the same physical system, and they don't have access to any source of randomness like an internet connection or a Geiger counter, then they're going to give the same answer.

Comment by charlie-steiner on Legends of Runeterra: Early Review · 2020-05-14T22:52:03.823Z · score: 2 (1 votes) · LW · GW

I have a soft spot for Heimerdinger midrange. Usually played as Heimer/Vi with Ionia as the second color these days, because of the decent matchup against both burn and shadow isles control.

Example List: CECACAQEBAAQEAQJAMAQEAQMHECACBA3E42DQAYBAECBAAQBAISTCAYCAIAQGCQBAEAQEJQ

(Import by copying to clipboard, then going to your Collection->Decks page and finding the "Import Deck" button.)

Comment by charlie-steiner on Utility need not be bounded · 2020-05-14T21:50:13.178Z · score: 2 (1 votes) · LW · GW

Neat! Sent you a PM with my email.

Comment by charlie-steiner on Studies On Slack · 2020-05-14T06:10:03.544Z · score: 12 (8 votes) · LW · GW

Thus reminds me of the machine learning point that when you do gradient descent in really high dimensions, local minima are less common than you'd think, because to be trapped in a local minimum, every dimension has to be bad.

Instead of gradient descent getting trapped at local minima, it's more likely to get pseudo-trapped at "saddle points" where it's at a local minimum along some dimensions but a local maximum along others, and due to the small slope of the curve it has trouble learning which is which.

Comment by charlie-steiner on Legends of Runeterra: Early Review · 2020-05-14T00:06:08.241Z · score: 2 (1 votes) · LW · GW

P.S. Sheesh, reading that RPS review was weird. Like congrats, man, you found a very linear netdeck and dunked on some scrubs, and yet somehow you're still almost certainly at low ranks, given the budget deck your opponent is playing. That's funny. Since the games don't have any decisions you could have made differently, surely it can only be RNG that you're still stuck in Silver, right? (Intense sarcasm)

I think a lot of what makes it hard to see the skills used in playing LoR well is that it's not obvious when you screwed up. One source of skill expression, common in card games, is knowing what's probably in your oppenent's deck and hand, and playing around them. There are cases where you want to make a different play to play around a card that you will only see several turns later - I think this sort of thing happens more often in LoR than in MtG due to the importance of combat keywords, and the extra knowledge you have about your opponent's deck. In reinforcement learning terms, noticing these cases is a hard credit assignment problem.

The spell-mana-storing system is a particularly unique offender here. If there was no spell mana, then every turn would be disconnected from the previous one, resource-wise. The addition of spell mana actually makes planning ahead much more complicated, in a way that's not at first obvious. If you want to pull off a big combo on turn 6 that spends your spell mana, you might have to not play a card, several turns earlier, to bank the mana. A new player might never even notice the possible line, or at most go, on turn 6, "aw, I don't quite have enough mana to do this cool thing. Oh well." And even if they notice that they may have screwed up, we're back to the credit assignment problem, where it's hard to learn what to do differently without thinking it through.

This is not to say that LoR is super-long-term-strategic, especially not at the start of a new expansion when there's a lot of aggro running around preying on people trying new things. But If you don't think you're making any decisions, then you're probably not noticing some decisions.

Comment by charlie-steiner on Legends of Runeterra: Early Review · 2020-05-13T23:22:26.985Z · score: 4 (2 votes) · LW · GW

With respect to the interaction of the economy and the reward system, I think you have it backwards. The rewards aren't there as a currency, they're there because players prefer that they be there. People want rewards and a feeling of progression. The reward system had to be there, and the fact that it's related to something in the store is merely a contingent fact about the design of the reward system.

When you're getting the new player experience (<20 hours played, maybe?), maybe you could get the impression that there's a Hearthstone business model - not pay-to-win as in Magic, but pay-to-have-fun as you can construct a few winning decks for free, but feel limited in options and the option of paying money for cards is available to you. But at some point around that 20 hour mark, arriving quicker if you did spend the money, you realize that actually, the system has given you enough to do just about whatever you want. That that sensation of spending a limited resource on new decks was an illusion long before you noticed.

Then it becomes obvious that the business model is to sell cosmetics, not cards (I'll happily bet about revenue from cards vs. cosmetics if you think it's the other way).

Anyhow, my review: The game is what you'd get if a very competent committee actually learned the lessons from Hearthstone and MtG when designing their mass-appeal digital card game. It balances defender's/attacker's advantage well by finding an intermediate point between MtG and Hearthstone, to take the most obvious example. The action and stack systems make it obvious that this was intended to play on mobile right from the start, while also retaining much of the interesting player interactivity from more complicated games, and it's all done very competently. Oh, and the balancing is what people wanted from digital card games all along.

But is it fun? Well, depends on what you like. I think it's extremely similar to MtG in terms of decision-making complexity across similar types of decks, but LoR is more populated (both in percent of the meta and in number of decks in the meta) by simpler decks. So my expectation for Zvi is that if he manages to learn the cards (and the UI for learning the cards) before throwing any electronics across the room, he'll very shortly have the resources to craft some Karma combo deck, or some janky cask deck, or a Heimer midrange variant, or some other deck that tickles his fancy, and then he'll have a lot more fun actually playing the games.

Per my point about resource abundance above, those resources will be there before it's obvious to your feelings. Don't hoard if it means you're having less fun.

Recommendations for the starting card pool: There are plenty of strong and cheap decks, and TBH, spiders is kind of bad right now because it gets a lot of incidental hate. And yet, the tools for the "Ctrl+F Spiders" deck get used elsewhere, maybe illustrating that deckbuilding isn't quite as simple as Zvi makes it out to be. Burn aggro, elusives/Zed aggro, and base shadow isles control are all cheap, strong decks for a beginner. No budget deck is particularly complicated (burn aggro may actually have the highest skill ceiling of those I listed, being pretty hard to play optimally), so if you want to play the interesting stuff ASAP, I'd priorize crafting Karma, Heimer, or Twisted Fate, all of which can fit into complicated decks either by themselves or with other champs.

Comment by charlie-steiner on Corrigibility as outside view · 2020-05-09T22:02:32.142Z · score: 4 (2 votes) · LW · GW

Sure. Humans have a sort of pessimism about their own abilities that's fairly self-contradictory.

"My reasoning process might not be right", interpreted as you do in the post, includes a standard of rightness that one could figure out. It seems like you could just... do the best thing, especially if you're a self-modifying AI. Even if you have unresolvable uncertainty about what is right, you can just average over that uncertainty and take the highest-expected-rightness action.

Humans seem to remain pessimistic despite this by evaluating rightness using inconsistent heuristics, and not having enough processing power to cause too much trouble by smashing those heuristics together. I'm not convinced this is something we want to put into an AI. I guess I'm also more of an optimist about the chances to just do value learning well enough.

Comment by charlie-steiner on Corrigibility as outside view · 2020-05-09T07:52:06.069Z · score: 6 (3 votes) · LW · GW

Here's the part that's tricky:

Analogously, we might have a value-learning agent take the outside view. If it's about to disable the off-switch, it might realize that this is a terrible idea most of the time. That is, when you simulate your algorithm trying to learn the values of a wide range of different agents, you usually wrongly believe you should disable the off-switch.

Suppose we have an AI that extracts human preferences by modeling them as agents with a utility function over physical states of the universe (not world-histories). This is bad because then it will just try to put the world in a good state and keep it static, which isn't what humans want.

The question is, will the OutsideView method tell it its mistake? Probably not - because the obvious way you generate the ground truth for your outside-view simulations is to sample different allowed parameters of the model you have of humans. And so the simulated humans will all have preferences over states of the universe.

In short, if your algorithm is something like RL based on a reward signal, and your OutsideView method models humans as agents, then it can help you spot problems. But if your algorithm is modeling humans and learning their preferences, then the OutsideView can't help, because it generates humans from your model of them. So this can't be a source of a value learning agent's pessimism about its own righteousness.

Comment by charlie-steiner on [AN #98]: Understanding neural net training by seeing which gradients were helpful · 2020-05-07T05:29:15.120Z · score: 2 (1 votes) · LW · GW

Does anyone have a review of Jane Ku's "Metaethica.AI"? Nate and Jessica get acknowledgements - maybe you have a gloss? I'm having a little trouble figuring out what's going on. From giving it an hour or so, it seems like it's using functional isomorphism to declare what pre-found 'brains' in a pre-found model of the world are optimizing, and then sort of vaguely constructing a utility function over external referents found by more functional isomorphism (Ramsey-Lewis method).

Am I right that it doesn't talk about how to get the models it uses? That it uses functional isomorphism relatively directly, with few (I saw something about mean squared error in the pseudocode, but couldn't really decipher it) nods to how humans might have models that aren't functionally isomorphic to the real world, and the most-isomoprhic thing out there might not be what humans want to refer to?

Comment by charlie-steiner on A game designed to beat AI? · 2020-05-06T07:17:38.084Z · score: 5 (3 votes) · LW · GW

Manual dexterity. I'm pretty sure I can whoop any AI at Jenga, for the next 3 years or so. And the more fiddly, the bigger my expected advantage - Men at Work is an example of an even more challenging game, with many more possible game states.

Comment by charlie-steiner on A game designed to beat AI? · 2020-05-06T06:59:06.390Z · score: 2 (1 votes) · LW · GW

I know several children who would play this game happily. As for re-use, many games have decks of problems or questions. Cranium or Trivial Pursuit, for examples - both use the same "roll, move, answer a question" kind of format that loosely wraps a progression/scoring mechanism around the trivia questions.

Comment by charlie-steiner on How uniform is the neocortex? · 2020-05-05T19:22:35.387Z · score: 2 (1 votes) · LW · GW

It's connecting this sort of "good models get themselves expressed" layer of abstraction to neurons that's the hard part :) I think future breakthroughs in training RNNs will be a big aid to imagination.

Right now when I pattern-match what tou say onto ANN architectures, I can imagine something like making an RNN from a scale-free network and trying to tune less-connected nodes around different weightings of more-connected nodes. But I expect that in the future, I'll have much better building blocks for imagining.

Comment by charlie-steiner on How uniform is the neocortex? · 2020-05-04T19:57:35.912Z · score: 3 (2 votes) · LW · GW

I'm saying the abstraction of (e.g.) CNNs as doing their forward pass all in one timestep does not apply to the brain. So I think we agree and I just wasn't too clear.

For CNNs we don't worry about top-down control intervening in the middle of a forward pass, and to the extent that engineers might increase chip efficiency by having different operations be done simultaneously, we usually want to ensure that they can't interfere with each other, maintaining the layer of abstraction. But the human visual cortex probably violates these assumptions not just out of necessity, but gains advantages.

Comment by charlie-steiner on How uniform is the neocortex? · 2020-05-04T10:12:24.611Z · score: 2 (1 votes) · LW · GW

Hierarchical predictive coding is interesting, but I have some misgivings that it does a good job explaining what we see of brain function, because brains seem to have really dramatic attention mechanisms.

By "attention" I don't mean to imply much similarity to attention mechanisms in current machine learning. I partly mean that not all our cortex is going at full blast all the time - instead, activity is modulated dynamically, and this interacts in a very finely tuned way with the short-term stored state of high-level representations. It seems like there are adaptations in the real-time dynamics of the brain that are finely selected to do interesting and complicated things that I don't understand well, rather than them trying to faithfully implement an algorithm that we think of as happening in one step.

Not super sure about all this, though.

Comment by charlie-steiner on Open & Welcome Thread—May 2020 · 2020-05-04T06:16:26.241Z · score: 12 (7 votes) · LW · GW

Yup, low. Although a high-entropy punchline probably wouldn't be funny either, for different >١c񁅰򺶦˥è򡆞.

Comment by charlie-steiner on Meditation: the screen-and-watcher model of the human mind, and how to use it · 2020-05-03T20:50:02.191Z · score: 2 (1 votes) · LW · GW

My imaginary naturalist discipline should avoid this usual thing, though, because if you want to imagine the function of your own brain in materialist detail, you can't be imagining "you" as something that can be uninvolved with your thoughts and feelings, or as something separate from the rest of your mind that watches what's going on. Instead, you have to be primed to imagine "you" as something emergent from the thoughts and feelings - if any watching is done, it's thoughts and feelings watching themselves. A play where the audience is the actors.

Comment by charlie-steiner on Meditation: the screen-and-watcher model of the human mind, and how to use it · 2020-05-03T19:30:19.824Z · score: 2 (1 votes) · LW · GW

Definitely sounds similar, thanks for the details. Not sure if I'll bother to delve into buddhism here, so I'll just ask - do you know if there's an even more direct analogue where you do the "motion" of anatta, but without first identifying your self with the watching self? So you'd notice how the patterns and abilities that make up what appears to be the watching self are already parts of the thought-generating self, without de-identifying yourself with the thought-generating self.

I don't think doing this would change your cognitive habits very much (compared to exercising your brain to change which thoughts get generated), so maybe it's not a thing.

Comment by charlie-steiner on Meditation: the screen-and-watcher model of the human mind, and how to use it · 2020-05-03T00:04:41.893Z · score: 5 (3 votes) · LW · GW

It strikes me that if one wants to become a naturalist about the function of the brain, maybe there should be a discipline of "anti-meditation," where you practice holding on to the notion that the screen-watching-self is a fiction, and expanding your sense of self to include the operations of the brain that make up what you are despite being no more individually conscious than a cell is a human.

Comment by charlie-steiner on Stanford Encyclopedia of Philosophy on AI ethics and superintelligence · 2020-05-02T23:52:12.716Z · score: 11 (6 votes) · LW · GW

Honestly, maybe he should have included a reference to Garfinkel 2017.

Comment by charlie-steiner on Stanford Encyclopedia of Philosophy on AI ethics and superintelligence · 2020-05-02T23:50:27.291Z · score: 2 (1 votes) · LW · GW

Good. But too much use of "quotes."

Comment by charlie-steiner on Motivating Abstraction-First Decision Theory · 2020-04-30T22:59:18.724Z · score: 2 (1 votes) · LW · GW

Good points. But I think that you can get a little logical uncertainty even with just a little bit of the necessary property.

That property being throwing away more information than logically necessary. Like modeling humans using an agent model you know is contradicted by some low-level information.

(From a Shannon perspective calling this throwing away information is weird, since the agent model might produce a sharper probability distribution than the optimal model. But it makes sense to me from a Solomonoff perspective, where you imagine the true sequence as "model + diff," where diff is something like an imaginary program that fills in for the model and corrects its mistakes. Models that throw away more information will have a longer diff.)

I guess ultimately, what I mean is that there are some benefits of logical uncertainty like for counterfactual reasoning and planning, where using an abstract model automatically gets those benefits. If you never knew the contradictory low-level information in the first place, like your examples, then we just call this "statistical-mechanics-style things." If you knew the low-level information but threw it away, you could call it logical uncertainty. But it's still the same model with the same benefits.

Comment by charlie-steiner on Motivating Abstraction-First Decision Theory · 2020-04-30T13:03:21.910Z · score: 6 (3 votes) · LW · GW

I think you can't avoid dragging in logical uncertainty - though maybe in a bit of a backwards way to what you meant.

A quote from an ancient LW post:

It may be possible to relate this back to logical uncertainty - where by "this" I mean the general thesis of predicting the future by building models that are allowed to be imperfect, not the specific example in part III. Soares and Fallenstein use the example of a complex Rube Goldberg machine that deposits a ball into one of several chutes. Given the design of the machine and the laws of physics, suppose that one can in principle predict the output of this machine, but that the problem is much too hard for our computer to do. So rather than having a deterministic method that outputs the right answer, a "logical uncertainty method" in this problem is one that, with a reasonable amount of resources spent, takes in the description of the machine and the laws of physics, and gives a probability distribution over the machine's outputs.
Meanwhile, suppose that we take [a predictor that uses abstractions], then ask it to predict the machine. We'd like it to make predictions via some appropriately simplified folk model of physics. If this model gives a probability distribution over outcomes - like in the simple case of "if you flip this coin in this exact way, it has a 50% shot at landing heads" - doesn't that make it a logical uncertainty method?
Comment by charlie-steiner on If I were a well-intentioned AI... I: Image classifier · 2020-04-27T20:22:06.317Z · score: 3 (2 votes) · LW · GW

I don't think this is necessarily a critique - after all, it's inevitable that AI-you is going to inherit some anthropomorphic powers. The trick is figuring out what they are and seeing if it seems like a profitable research avenue to try and replicate them :)

In this case, I think this is an already-known problem, because detecting out-of-distribution images in a way that matches human requirements requires the AI's distribution to be similar to human distribution (and conversely, mismatches in distribution allow for adversarial examples). But maybe there's something different in part 4 where I think there's some kind of "break down actions in obvious ways" power that might not be as well-analyzed elsewhere (though it's probably related to self-supervised learning of hierarchical planning problems).