## Posts

Time Travel, AI and Transparent Newcomb 2019-08-22T22:04:55.908Z · score: 9 (5 votes)
Embedded Naive Bayes 2019-08-22T21:40:05.972Z · score: 10 (3 votes)
Computational Model: Causal Diagrams with Symmetry 2019-08-22T17:54:11.274Z · score: 38 (13 votes)
Markets are Universal for Logical Induction 2019-08-22T06:44:56.532Z · score: 45 (17 votes)
Why Subagents? 2019-08-01T22:17:26.415Z · score: 80 (25 votes)
Compilers/PLs book recommendation? 2019-07-28T15:49:17.570Z · score: 10 (4 votes)
Results of LW Technical Background Survey 2019-07-26T17:33:01.999Z · score: 43 (15 votes)
Cross-Validation vs Bayesian Model Comparison 2019-07-21T18:14:34.207Z · score: 21 (7 votes)
Bayesian Model Testing Comparisons 2019-07-20T16:40:50.879Z · score: 13 (3 votes)
From Laplace to BIC 2019-07-19T16:52:58.087Z · score: 13 (3 votes)
Laplace Approximation 2019-07-18T15:23:28.140Z · score: 27 (8 votes)
Wolf's Dice II: What Asymmetry? 2019-07-17T15:22:55.674Z · score: 30 (8 votes)
Wolf's Dice 2019-07-16T19:50:03.106Z · score: 33 (12 votes)
Very Short Introduction to Bayesian Model Comparison 2019-07-16T19:48:40.400Z · score: 22 (6 votes)
How much background technical knowledge do LW readers have? 2019-07-11T17:38:37.839Z · score: 31 (10 votes)
Embedded Agency: Not Just an AI Problem 2019-06-27T00:35:31.857Z · score: 12 (7 votes)
Being the (Pareto) Best in the World 2019-06-24T18:36:45.929Z · score: 162 (75 votes)
ISO: Automated P-Hacking Detection 2019-06-16T21:15:52.837Z · score: 6 (1 votes)
Real-World Coordination Problems are Usually Information Problems 2019-06-13T18:21:55.586Z · score: 29 (12 votes)
The Fundamental Theorem of Asset Pricing: Missing Link of the Dutch Book Arguments 2019-06-01T20:34:06.924Z · score: 43 (13 votes)
When Observation Beats Experiment 2019-05-31T22:58:57.986Z · score: 15 (6 votes)
Constraints & Slackness Reasoning Exercises 2019-05-21T22:53:11.048Z · score: 37 (13 votes)
The Simple Solow Model of Software Engineering 2019-04-08T23:06:41.327Z · score: 26 (10 votes)
Declarative Mathematics 2019-03-21T19:05:08.688Z · score: 60 (25 votes)
Constructing Goodhart 2019-02-03T21:59:53.785Z · score: 31 (12 votes)
From Personal to Prison Gangs: Enforcing Prosocial Behavior 2019-01-24T18:07:33.262Z · score: 81 (28 votes)
The E-Coli Test for AI Alignment 2018-12-16T08:10:50.502Z · score: 58 (22 votes)
Competitive Markets as Distributed Backprop 2018-11-10T16:47:37.622Z · score: 44 (16 votes)
Two Kinds of Technology Change 2018-10-11T04:54:50.121Z · score: 61 (22 votes)
Don't Get Distracted by the Boilerplate 2018-07-26T02:15:46.951Z · score: 44 (22 votes)
ISO: Name of Problem 2018-07-24T17:15:06.676Z · score: 32 (13 votes)
Letting Go III: Unilateral or GTFO 2018-07-10T06:26:34.411Z · score: 22 (7 votes)
Letting Go II: Understanding is Key 2018-07-03T04:08:44.638Z · score: 12 (3 votes)
The Power of Letting Go Part I: Examples 2018-06-29T01:19:03.474Z · score: 38 (15 votes)
Problem Solving with Mazes and Crayon 2018-06-19T06:15:13.081Z · score: 124 (57 votes)
Fun With DAGs 2018-05-13T19:35:49.014Z · score: 38 (15 votes)
The Epsilon Fallacy 2018-03-17T00:08:01.203Z · score: 70 (19 votes)
The Cause of Time 2013-10-05T02:56:46.150Z · score: 0 (19 votes)
Recent MIRI workshop results? 2013-07-16T01:25:02.704Z · score: 2 (7 votes)

Comment by johnswentworth on Vague Thoughts and Questions about Agent Structures · 2019-08-23T17:28:08.402Z · score: 2 (1 votes) · LW · GW

You should check out "Why Subagents?". That post starts with the usual argument that acyclic preferences imply existence of a utility function, then shows that if we relax some of the assumptions, we actually get committees of utility-maximizers. Markets are my go-to example: they satisfy exactly the same "inexploitability" notions used by utility-existence proofs, but a market doesn't have a utility function in general, because it has internal degrees of freedom which result in path-dependent aggregate preferences.

Comment by johnswentworth on Time Travel, AI and Transparent Newcomb · 2019-08-23T16:52:09.157Z · score: 2 (1 votes) · LW · GW

The real question here is what mechanics + GR "says" about paradoxes; there's nothing special about the Gödel metric other than that it's a specific example of a system containing closed time-like loops.

The answer is that mechanics + GR cannot represent a system containing a paradox, at all. We just have a bunch of particles and/or fields moving around a space with a metric. The local laws of mechanics + GR constrain their behavior. A "paradox" would, for instance, assert that there is a particle at (x, t) with velocity v, but also not a particle at (x, t) with velocity v - the underlying theory can't even represent that.

We don't know how to integrate QFT with GR, but conceptually a similar problem should arise: we just have some quantum fields with complex amplitudes at each point in spacetime. A paradox would assign two different amplitudes to the field at the same point. Again, our physical models can't even represent that: the whole point of a field is that it assigns an amplitude at each point in spacetime.

We could maybe imagine some sort of multivalued state of the universe, but at that point our "time machine" isn't actually doing time travel at all - it's just moving around in a somewhat-larger multiverse.

Comment by johnswentworth on Computational Model: Causal Diagrams with Symmetry · 2019-08-23T16:33:10.454Z · score: 2 (1 votes) · LW · GW

The non-termination point is a bit subtle. If you look at the example in the OP, the causal diagram itself is infinite, but as long as we pass in a positive integer the output value won't actually depend on the whole circuit. One of the conditional nodes will recognize the base case, and ignore whatever's going on in the rest of the circuit below it. So the program can terminate even if the DAG is infinite. (Equivalent conceptualization: imagine running a lazy evaluator on the infinite DAG.)

That said, DAGs with symmetry certainly can represent computations which do not halt. This is a feature, not a bug: there are in fact computations which do not halt, and we can observe their physical behavior mid-computation. If we want to think about e.g. what my processor is doing when running an infinite loop, then we need a computational model which can represent that sort of thing, rather than just talking about input/output behavior.

Comment by johnswentworth on The "Commitment Races" problem · 2019-08-23T05:47:43.639Z · score: 2 (1 votes) · LW · GW

Blindfold + scanner does not necessarily lose to blindfold. The blindfold does not prevent swerving, it just prevents gaining information - the blindfold-only agent acts solely on its priors. Adding a scanner gives the agent more data to work with, potentially allowing the agent to avoid crashes. Foregoing the scanner doesn't actually help unless the other player knows I've foregone the scanner, which brings us back to communication - though the "communication" at this point may be in logical time, via simulation.

In the acausal context, communication kicks even harder, because either player can unilaterally destroy the communication channel: they can simply choose to not simulate the other player. The game will never happen at all unless both agents expect (based on priors) to gain from the trade.

Comment by johnswentworth on The "Commitment Races" problem · 2019-08-23T02:08:12.664Z · score: 3 (2 votes) · LW · GW

One big factor this whole piece ignores is communication channels: a commitment is completely useless unless you can credibly communicate it to your opponent/partner. In particular, this means that there isn't a reason to self-modify to something UDT-ish unless you expect other agents to observe that self-modification. On the other hand, other agents can simply commit to not observing whether you've committed in the first place - effectively destroying the communication channel from their end.

In a game of chicken, for instance, I can counter the remove-the-steering-wheel strategy by wearing a blindfold. If both of us wear a blindfold, then neither of us has any reason to remove the steering wheel. In principle, I could build an even stronger strategy by wearing a blindfold and using a beeping laser scanner to tell whether my opponent has swerved - if both players do this, then we're back to the original game of chicken, but without any reason for either player to remove their steering wheel.

Comment by johnswentworth on Time Travel, AI and Transparent Newcomb · 2019-08-22T23:56:49.549Z · score: 2 (1 votes) · LW · GW

Are you familiar with the Gödel metric? Time travel may well be impossible, but at least within the context of general relativity it is plenty well-defined to reason about.

Comment by johnswentworth on Computational Model: Causal Diagrams with Symmetry · 2019-08-22T21:48:57.072Z · score: 5 (3 votes) · LW · GW

I'll give an answer via analogy: the digit sequence 123123123123123... is symmetric: the sequence directly contains a copy of itself. The sequence 12345678910111213141516... is not symmetric: it will not repeat and does not contain a copy of itself, although there is a pattern and it could be generated programmatically.

Similarly with DAG symmetries. If we're producing these DAGs by expanding out programs, then the only infinite patterns we'll see are symmetries - subDAGs which repeat, corresponding to function blocks. We don't need to worry about DAGs with strange patterns which could be generated programmatically but can't just be built by repeating subDAGs.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-15T21:01:55.042Z · score: 4 (2 votes) · LW · GW

By "enemy" I meant the hypothetical terrorist in the "some terrorist group likes to eat babies" example.

I'm very confused about what you're perceiving here, so I think some very severe miscommunication has occurred. Did you accidentally respond to a different comment than you thought?

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-13T20:40:19.035Z · score: 13 (4 votes) · LW · GW

I probably won't get to that soon, but I'll put it on the list.

I also want to say that I'm sorry for kicking off this giant tangential thread on your post. I know this sort of thing can be a disincentive to write in the future, so I want to explicitly say that you're a good writer, this was a piece worth reading, and I would like to read more of your posts in the future.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-12T15:23:24.667Z · score: 9 (5 votes) · LW · GW

This is a really good point and a great distinction to make.

As an example, suppose I hear a claim that some terrorist group likes to eat babies. Such a claim may very well be true. On the other hand, it's the sort of claim which I would expect to hear even in cases where it isn't true. In general, I expect claims of the form "<enemy> is/wants/does <evil thing>", regardless of whether those claims have any basis.

Now, clearly looking into the claim is an all-around solid solution, but it's also an expensive solution - it takes time and effort. So, a reasonable question to ask is: should the burden of proof be on writer or critic? One could imagine a community norm where that sort of statement needs to come with a citation, or a community norm where it's the commenters' job to prove it wrong. I don't think either of those standards are a good idea, because both of them require the expensive work to be done. There's a correct Bayesian update whether or not the work of finding a citation is done, and community norms should work reasonably well whether or not the work is done.

A norm which makes more sense to me: there's nothing wrong with writers occasionally dropping conflict-theory-esque claims. But readers should be suspicious of such claims a-priori, and just as it's reasonable for authors to make the claim without citation, it's reasonable for readers to question the claim on a-priori grounds. It makes sense to say "I haven't specifically looked into whether <enemy> wants <evil thing>, but that sounds suspicious a-priori."

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-12T15:09:50.061Z · score: 4 (2 votes) · LW · GW

I generally endorse this line of reasoning.

Comment by johnswentworth on Why Gradients Vanish and Explode · 2019-08-09T16:48:24.739Z · score: 6 (4 votes) · LW · GW

If you're looking to improve your matrix calculus skills, I specifically recommend practicing tensor index notation and the Einstein summation convention. It will make neural networks much more pleasant, especially recurrent nets. (This may have been obvious already, but it's sometimes tough to tell what's useful when learning a subject.)

Comment by johnswentworth on AI Alignment Open Thread August 2019 · 2019-08-06T22:45:03.592Z · score: 16 (4 votes) · LW · GW

It does sound like our disagreement is the same thing outlined in Realism about Rationality (although I disagree with almost all of the "realism about rationality" examples in that post - e.g. I don't think AGI will necessarily be an "agent", I don't think Turing machines or Kolmogorov complexity are useful foundations for epistemology, I'm not bothered by moral intuitions containing contradictions, etc).

I would also describe my "no proofs => doomed" view, not as the proofs being causally important, but as the proofs being evidence of understanding. If we don't have the proofs, it's highly unlikely that we understand the system well enough to usefully predict whether it is safe - but the proofs themselves play a relatively minor role.

I do not know of any engineering discipline which places most of the confidence in safety on comprehensive, expensive testing. Every single engineering discipline I have ever studied starts from understanding the system under design, the principles which govern its function, and designs a system which is expected to be safe based on that understanding. As long as those underlying principles are understood, the most likely errors are either simple mistakes (e.g. metric/standard units mixup) or missing some fundamental phenomenon (e.g. aerodynamics of a bridge). Those are the sort of problems which testing is good at catching. Testing is a double-check that we haven't missed something critical; it is not the primary basis for thinking the system is safe.

A simple example, in contrast to AI: every engineering discipline I know of uses "safety factors" - i.e. make a beam twice as strong as it needs to be, give a wire twice the current capacity it needs, etc. A safety factor of 2 is typical in a wide variety of engineering fields. In AI, we cannot use safety factors because we do not even know what number we could double to make the AI more safe. Today, given any particular aspect of an AI system, we do not know whether adjusting any particular parameter will make the AI more or less reliable/risky.

Comment by johnswentworth on AI Alignment Open Thread August 2019 · 2019-08-06T18:45:30.722Z · score: 8 (2 votes) · LW · GW

Three possibly-relevant points here.

First, when I say "proof-level guarantees will be easy", I mean "team of experts can predictably and reliably do it in a year or two", not "hacker can do it over the weekend".

Second, suppose we want to prove that a sorting algorithm always returns sorted output. We don't do that by explicitly quantifying over all possible outputs. Rather, we do that using some insights into what it means for something to be sorted - e.g. expressing it in terms of a relatively small set of pairwise comparisons. Indeed, the insights needed for the proof are often exactly the same insights needed to design the algorithm. Once you've got the insights and the sorting algorithm in hand, the proof isn't actually that much extra work, although it will still take some experts chewing on it a bit to make sure it's correct.

That's the sort of thing I expect to happen for friendly AI: we are missing some fundamental insights into what it means to be "aligned". Once those are figured out, I don't expect proofs to be much harder than algorithms. Coming back to the "see whether the AI runs a check for whether it can deceive humans" example, the proof wouldn't involve writing the checker and then quantifying over all possible inputs. Rather, it would involve writing the AI in such a way that it always passes the check, by construction - just like we write sorting algorithms so that they will always pass an is_sorted() check by construction.

Third, continuing from the previous point: the question is not how hard it is to prove compared to test. The question is how hard it is to build a provably-correct algorithm, compared to an algorithm which happens to be correct even though we don't have a proof.

Comment by johnswentworth on AI Alignment Open Thread August 2019 · 2019-08-05T17:27:11.100Z · score: 10 (3 votes) · LW · GW

I mentioned that I expect proof-level guarantees will be easy once the conceptual problems are worked out. Strong interpretability is part of that: if we know how to "see whether the AI runs a check for whether it can deceive humans", then I expect systems which provably don't do that won't be much extra work. So we might disagree less on that front than it first seemed.

The question of whether to model the AI as an open-ended optimizer is is one I figured would come up. I don't think we need to think of it as truly open-ended in order to use any of the above arguments, especially the wish-granting analogy. The relevant point is that limited optimization implies limited wish-granting ability. In order to grant more "difficult" wishes, the AI needs to steer the universe into a smaller chunk of state-space - in other words, it needs to perform stronger optimization. So AI with limited optimization capability will be safer to exactly the extent that they are unable to grant unsafe wishes - i.e. the chunks of state-space which they can access just don't contain really bad outcomes.

Comment by johnswentworth on AI Alignment Open Thread August 2019 · 2019-08-05T16:55:19.110Z · score: 5 (3 votes) · LW · GW

Still unsafe, in both cases.

The second case is simpler. Think about it in analogy to a wish-granting genie/demon: if we have some intuitive argument that our wish-contract is safe and a few human-designed tests, do we really expect it to have no loopholes exploitable by the genie/demon? I certainly wouldn't bet on it. The problem here is that the AI is smarter than we are, and can find loopholes we will not think of.

The first case is more subtle, because most of the complexity is hidden under a human-intuitive abstraction layer. If we had an unaligned genie/demon and said "I wish for you to passively study me for a year, learn what would make me most happy, and then give me that", then that might be a safe wish - assuming the genie/demon already has an appropriate understanding of what "happy" means, including things like long-term satisfaction etc. But an AI will presumably not start with such an understanding out the gate. Abstractly, the AI can learn its optimization target, but in order to do that it needs a learning target - the thing it's trying to learn. And that learning target is itself what needs to be aligned. If we want the AI to learn what makes humans "happy", in a safe way, then whatever it's using as a proxy for "happiness" needs to be a safe optimization target.

On a side note, Yudkowsky's "The Hidden Complexity of Wishes" is in many ways a better explanation of what I'm getting at. The one thing it doesn't explain is how "more powerful" in the sense of "ability to grant more difficult wishes" translates into a more powerful optimizer. But that's a pretty easy jump to make: wishes require satisficing, so we use the usual approach of a two-valued utility function.

Comment by johnswentworth on AI Alignment Open Thread August 2019 · 2019-08-05T02:20:50.444Z · score: 6 (3 votes) · LW · GW

I believe the empirical claim. As I see it, the main issue is Goodhart: an AGI is probably going to be optimizing something, and open-ended optimization tends to go badly. The main purpose of proof-level guarantees is to make damn sure that the optimization target is safe. (You might imagine something other than a utility-maximizer, but at the end of the day it's either going to perform open-ended optimization of something, or be not very powerful.)

The best analogy here is something like an unaligned wish-granting genie/demon. You want to be really careful about wording that wish, and make sure it doesn't have any loopholes.

I think the difficulty of getting those proof-level guarantees is more conceptual than technical: the problem is that we don't have good ways to rigorously express many of the core ideas, e.g. the idea that physical systems made of atoms can "want" things. Once the core problems of embedded agency are resolved, I expect the relevant guarantees will not be difficult.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-04T16:11:40.487Z · score: 13 (7 votes) · LW · GW

I get what you're saying about theories vs theorists. I agree that there are plenty of people who hold conflict theories about some things but not others, and that there are multiple reasons for holding a conflict theory.

None of this changes the original point: explaining a problem by someone being evil is still a mind-killer. Treating one's own arguments as soldiers is still a mind-killer. Holding a conflict theory about any particular situation is still a mind-killer, at least to the extent that we're talking about conflict theory in the form of "bad thing happens because of this bad person" as opposed to "this person's incentives are misaligned". We can explain other peoples' positions by saying they're using a conflict theory, and that has some predictive power, but we should still expect those people to usually be mind-killed by default - even if their arguments happen to be correct.

As you say, explaining Calhoun and Buchanan's use of public choice theory as entirely a rationalisation for their political goals, is a conflict theory. Saying that people bring up public choice theory not due to differing economic understanding but due to different political goals, is a conflict theory. And I expect people using either those explanations to be mind-killed by default, even if the particular interpretation were correct.

Even after all this discussion of theories vs theorists, "conflict theory = predictably wrong" still seems like a solid heuristic.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-04T15:46:55.224Z · score: 0 (2 votes) · LW · GW

This explanation loses predictive power compared to the explanation I gave above. In particular, if we think of conflict theory as "bad things happen because of bad people", then it makes sense why conflict theorists would think public choice theory makes black people worse off, rather than better off. In your explanation, we need that as an additional assumption.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-04T06:54:14.252Z · score: 1 (4 votes) · LW · GW

Let's imagine for a minute that we didn't know any of the background, and just think about what we might have predicted ahead of time.

Frame 1: conflict theory is characterized by the idea that problems mostly come from people following their own self-interest. Not knowing anything else, what do we expect conflict theorists to think about public choice theory - a theory whose central premise is modeling public servants as following their own self-interests/incentives? Like, the third sentence of the wikipedia article is "it is the subset of positive political theory that studies self-interested agents (voters, politicians, bureaucrats) and their interactions".

If conflict theory is about problems stemming from people following their self-interest, public choice theory ought to be right up the conflict theorist's alley. This whole "meta-level conflict" thing sounds like a rather contrived post-hoc explanation; a-priori there doesn't seem to be much reason for all this meta stuff. And conflict theorists in practice seem to be awfully selective about when to go meta, in a way that we wouldn't predict just based on "problems mostly stem from people following their self-interest".

On the other hand...

Frame 2: conflict theory is characterized by the idea that bad things mostly happen because of bad people, and the solution is to punish them. In this frame, what would we expect conflict theorists to think of public choice theory?

Well, we'd expect them to dismiss it as obviously wrong - it doesn't denounce any bad people - and therefore also probably an attempt by bad people to steer things the way they want.

If conflict theory is characterized by "bad things happen because of bad people", then an article about how racism secretly underlies public choice theory is exactly the sort of thing we'd predict.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-04T02:13:05.759Z · score: 5 (6 votes) · LW · GW

After reading this and the comments you linked, I think people mean several different things by conflict/mistake theory.

I mostly think of conflict theory as a worldview characterized by (a) assuming that bad things mostly happen because of bad people, and (b) assuming that the solution is mostly to punish them and/or move power away from them. I think of mistake theory as a worldview characterized by assuming that people do not intend to be evil (although they can still have bad incentives). I see mechanism design as the prototypical mistake theory approach: if people are misaligned, then restructure the system to align their incentives. It's a technical problem, and getting angry at people is usually unhelpful.

In the comment thread you linked, Scott characterizes conflict theory as "the main driver of disagreement is self-interest rather than honest mistakes". That view matches up more with the example you give: the mistake theorist assumes that people have "good" intent, and if you just explain that their actions are harmful, then they'll stop. Under this interpretation, mechanism design is conflict-theory-flavored; it's thinking of people as self-interested and then trying to align them anyway.

(I think part of the confusion is that some people are coming in with the assumption that acting in self-interest is automatically bad, and others are coming in with more of an economic/game theory mindset. Like, from an economic viewpoint, there's no reason why "the main driver of disagreement is self-interest" would lead to arguing that public choice theory is racist, which was one of Scott's original examples.)

So I guess one good question to think about is: how do we categorize mechanism design? Is it conflict, is it mistake, is it something else? Different answers correspond to different interpretations of what "conflict" and "mistake" theory mean. I'm pretty sure my interpretation is a much better fit to the examples and explanations in Scott's original post on the topic, and it seems like a natural categorization to me. On the other hand, it also seems like there's another natural category of naive-mistake-theorists who just assume honest mistakes, as in your Bob-Charlie example, and apparently some people are using the terms to capture that category.

Personally, my view is that mechanism design is more-or-less-always the right way to think about these kinds of problems. Sometimes that will lead to the conclusion that someone is making an honest mistake, sometimes it will lead to the conclusion that punishment is an efficient strategy, and often it will lead to other conclusions.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-03T18:14:48.918Z · score: 6 (3 votes) · LW · GW
Mechanism design is, to a large extent, a conflict theory

I would say that mechanism design is how mistake theorists respond to situations where conflict theory is relevant - i.e., where there really is a "bad guy". Mechanism design is not about "what consequences should happen to different agents", it's about designing a system to achieve a goal using unaligned agents - "consequences" are just one tool in the tool box, and mechanism design (and mistake theory) is perfectly happy to use other tools as well.

the main thesis is that power allows people to avoid committing direct crime while having less-powerful people commit those crimes instead ... This is a denotative statement that can be evaluated independent of "who should we be angry at".

There's certainly a denotative idea in the OP which could potentially be useful. On the other hand, saying "the post has a few sentences about moral blame" seems like a serious understatement of the extent to which the OP is about who to be angry at.

in some cases "who we should be angry at" if that's the best available implementation

The OP didn't talk about any other possible implementations, which is part of why it smells like conflict theory. Framing it through principal-agent problems would at least have immediately suggested others.

Comment by johnswentworth on Power Buys You Distance From The Crime · 2019-08-03T17:33:58.954Z · score: 13 (12 votes) · LW · GW

Something about this piece felt off to me, like I couldn't see anything specifically wrong with it but still had a strong instinctive prior that lots of things were wrong.

After thinking about it for a bit, I think my main heuristic is: this whole piece sounds like it's built on a conflict-theory worldview. The whole question of the essay is basically "who should we be angry at"? Based on that, I'd expect that many or most of the individual examples are probably inaccurately understood or poorly analyzed. Lark's comment about the Wells Fargo case confirms that instinct for one of the examples.

Then I started thinking about the "conflict theory = predictably wrong" heuristic. We say "politics is the mindkiller", but I don't think that's quite right - people have plenty of intelligent discussions about policy, even when those discussions inherently involve politics. "Tribalism is the mindkiller" is another obvious formulation, but I'd also propose "conflict theory is the mindkiller". Models like "arguments are soldiers" or "our enemies are evil" are the core of Yudkowsky's original argument for viewing politics as a mind-killer. But these sort of models are essentially synonymous with conflict theory; if we could somehow have a tribalistic or political discussion without those conflict-theoretic elements, I'd expect it wouldn't be so mindkiller-ish.

Looping back to the main topic of the OP: what would be a more mistake-theoretic way to view the same examples? One theme that jumps out to me is principal-agent problems: when something is outsourced, it's hard to align incentives. That topic has a whole literature in game theory, and I imagine more useful insight could be had by thinking about how it applies to the examples above, rather than thinking about "moral culpability" - a.k.a. who to be angry at.

Comment by johnswentworth on Why Subagents? · 2019-08-03T17:07:09.882Z · score: 2 (1 votes) · LW · GW

Those are consistent path-dependent preferences, so they can be modeled by a committee of subagents by the method outlined in the post. It would require something like states, I think, one for each current topping times each possible set of toppings tried already. Off the top of my head, I'm not sure how many dimensions it would require, but you can probably figure it out by trying a few small examples.

That said, the right way to model those particular preferences is to introduce uncertainty and Bayesian reasoning. The "hidden state" in this case is clearly information the agent has learned about each topping.

This raises another interesting question: can we just model all path-dependent preferences by introducing uncertainty? What subset can be modeled this way? Nonexistence of a representative agent for markets suggests that we can't always just use uncertainty, at least without changing our interpretations of "system" or "preference" or "state" somewhat. On the other hand, in some specific cases it is possible to interpret the wealth distribution in a market as a probability distribution in a mixture model - log utilities let us do this, for instance. So I'd guess that there's some clever criteria that would let us tell whether a committee/market with given utilities can be interpreted as a single Bayesian utility maximizer.

Comment by johnswentworth on Very different, very adequate outcomes · 2019-08-02T23:31:12.008Z · score: 9 (4 votes) · LW · GW

One potential problem: if the two utilities have different asymptotic behavior, then one of them can dominate decision-making. For instance, suppose we're using 0-1 normalization, but one of the two utilities has a big spike or tail somewhere. Then it's going to have near-zero slope everywhere else.

More concrete example: on the hedonism axis, humans have more capacity for severe pain than extreme pleasure. So that end of the axis has a big downward spike, and the hedonism-utility would be near-flat at the not-severe-pain end (at least for any of the normalizations you suggest, other than max-mean, which has the same problem with the other end of the axis). But if the preferences-utility lacks a big spike like that, then we're liable to end up with constant low-grade hedonic unhappiness.

That's still a lot better than plenty of other possible outcomes - preference-utility still looks good, and we're not in constant severe pain. But it still seems not very good.

Comment by johnswentworth on Very different, very adequate outcomes · 2019-08-02T23:15:06.214Z · score: 4 (2 votes) · LW · GW
It doesn't have to be particularly good, just give non-crazy results.

The intertheoretic utility post makes a lot more sense in that light; I had mostly dismissed it as a hack job when I first saw it. But if this is the sort of thing you're trying to do, it seems more useful. Thanks for clarifying.

Comment by johnswentworth on Very different, very adequate outcomes · 2019-08-02T21:50:23.487Z · score: 3 (2 votes) · LW · GW

How do you imagine standardizing the utility functions? E.g., if we multiply by 2, then it does just as good a job representing our happiness, but gets twice as much weight.

Comment by johnswentworth on Why Subagents? · 2019-08-02T21:17:35.472Z · score: 4 (2 votes) · LW · GW

Ah, that makes more sense. There's several answers; the main answer is that the internal/external division is not arbitrary.

First: at least for coherence-type theorems, they need to work for any choice of system which satisfies the basic type signature (i.e. the environment offers choices, the system "decides" between them, for some notion of decision). The theorem has to hold regardless of where we draw the box. On the other hand, you could argue that some theorems are more useful than others and therefore we should draw our boxes to use those theorems - even if it means fewer "systems" qualify. But then we can get into trouble if there's a specific system we want to talk about which doesn't qualify - e.g. a human.

Second: in this context, when we talk about "internal" variables, that's not an arbitrary modelling choice - the "external" vs "internal" terminology is hiding a functionally-important difference. Specifically, the "external" variables are anything which the system chooses between, anything we could offer in a trade. It's not about the limits of observation, it's about the limits of trade or tradeoffs or choices. The distribution of wealth within a market is "internal" not because we can't observe it (to a large extent we can), but because it's not something that the market itself is capable of choosing, even in principle.

Now, it may be that there are other things in the external world which the market can't make choice about as a practical matter, like the radius of the moon. But if it somehow became possible to change the radius of the moon, then there's no inherent reason why the market can't make a choice on that - as opposed to the internal wealth distribution, where any choice would completely break the market mechanism itself.

That leads into a third answer: think of the "internal" variables as gears, pieces causally involved in making the decision. The system as a whole can have preferences over the entire state of the external world, but if it has preferences about the gears which are used to make decisions... well, then we're going to end up in a self-referential mess. Which is not to say that it wouldn't be useful to think about such self-referential messes; it would be an interesting embedded agency problem.

Comment by johnswentworth on Programming Languages For AI · 2019-08-02T20:44:51.740Z · score: 2 (1 votes) · LW · GW

I've thought a fair bit about PLs for AI, mostly when I get pissed off by how bad current languages are for certain AI-related things.

My biggest complaint is: most languages make it a huge pain to write code which reasons about code written in the same language. For example: try writing a python function which takes in another python function, and tries to estimate its asymptotic runtime by analyzing the code. It doesn't need to always work, since the problem is undecidable in general, but it would be nice if it could handle one specific thing, like maybe estimating runtime of a dynamic programming algorithm. Problem is, even if you have a nice algorithm for calculating asymptotic runtime of DP, it will be a huge pain to implement it - at best you'll be working with an abstract syntax tree of the python function.

LISP makes this a bit more pleasant, since the code itself is already a data structure - the abstract syntax tree is the code. But abstract syntax trees still just aren't that great a representation for reasoning about code. We use trees, and write tree-shaped code, because trees can be represented neatly in linear text files (by nesting lots of parens () or similar delimiters {}). But the semantics of code, or the execution of code, is usually not tree-shaped. What we'd really like is a better data structure for representing programs, something which is inherently closer to the semantics rather than the syntax. Then, with that data structure in hand, we could work backwards to find a good language to represent it.

I also have some opinions about what that data structure should be, but at this point I think posing the question is more useful than arguing about a solution. If you're thinking about proofs and tactics, then I'd recommend thinking about a representation of tactics which makes it elegant and easy for them to operate on other tactics.

Comment by johnswentworth on Why Subagents? · 2019-08-02T18:53:08.080Z · score: 2 (1 votes) · LW · GW

I recommend thinking about the market example. The difficulty for markets is not that the preferences are "conditioned on the environment"; exactly the opposite. The problem is that the preferences are conditional on internal state; they can't be captured only by looking at the external environment.

For examples like pepperoni vs mushroom pizza, where we're just thinking about partial preferences directly, it's reasonable to say that the problem is partial specification. Presumably the system does something when it has to choose between pepperoni and mushroom - see Donald Hobson's comment for more on that. But path dependence is a different beast. Once we start thinking about internal state and path dependence, partial preferences are no longer just due to partial specification - they're due to the system having internal variables which it doesn't "want" to change.

Comment by johnswentworth on Why Subagents? · 2019-08-02T15:59:45.701Z · score: 4 (2 votes) · LW · GW

Good question. Let's talk about analogous choices for a market, since that's a more realistic system, and then we can bring it back to pizza.

In a market, partial preferences result from hidden state. There is never a missing preference between externally-visible states (i.e. the market's aggregate portfolio). However, the market could have a choice between two hidden states: given one aggregate trade, the market could implement it two different ways, resulting in different wealth distributions. For instance, if I offer the market $5000 for 5 shares of AAPL, then those 5 shares can come from any combination of the internal agents holding AAPL, and the$5000 can be distributed among them in many different ways. This means the market's behavior is underspecified: there are multiple possible solutions for its behavior. Economists call the set of possible solutions the "contract curve". Usually, additional mechanics are added to narrow down the possible behavior - most notably the Law of One Price, the strongest form of which gives locally-unique solutions for the market's behavior. For real markets, Law of One Price is an approximation, and the exact outcome will depend on market microstructure: market making, day trading, and so forth.

Now let's translate this back to the original question about pizza.

Short answer: the preferences don't specify which choice the system takes when offered mushroom or pepperoni. It depends on internal structure of the system, which the preferences abstract away. And that's fine - as the market example shows, there are real-world examples where that abstraction is still useful. Additionally, for real-world cases of interest, the underspecified choices will usually be between hidden states, so the underspecified behavior will itself be "hidden" - it will only be externally-visible via path-dependence of later preferences.

Comment by johnswentworth on Gathering thoughts on Distillation · 2019-07-31T20:50:37.881Z · score: 8 (4 votes) · LW · GW

Feels to me like this is a subcase of the more general problem that it's hard to find related content, especially more recent related content. It's easy to include a link in a new post pointing back at whatever's being distilled, but someone who stumbles on the old post/thread has no way to know that there's a summary, less technical explanation, more technical explanation, better central metaphor, etc available in some later post.

For instance, a sidebar of "posts which link to this one" would make distillation-type posts more visible and useful, while also solving other problems too.

It seems to me like working on visibility of related content in general would be higher-impact than working on specific use-cases, at least right now. If there were already a related-visibility system in place, then it would make more sense to add special use-cases like distillation on top of it.

Comment by johnswentworth on Results of LW Technical Background Survey · 2019-07-26T22:27:17.564Z · score: 2 (1 votes) · LW · GW

It's taking the median across two different axes independently, then sticking the results together. In principle, if we measure x and y values in a population, there may not actually be anybody in the population with median x value and median y value. Point is, the concept of "median" doesn't neatly generalize to multiple dimensions.

So I sneakily swept all that under the rug and fudged it by saying "average".

Comment by johnswentworth on From Laplace to BIC · 2019-07-25T01:45:39.803Z · score: 3 (2 votes) · LW · GW

That is exactly correct.

Comment by johnswentworth on Cross-Validation vs Bayesian Model Comparison · 2019-07-22T05:44:35.561Z · score: 2 (1 votes) · LW · GW

They do converge to the same distribution. But they make different predictions about a physical die: the unbiased model predicts uniform outcomes because the die is physically symmetric, whereas the general biased model doesn't say anything about the geometry of a physical die. So if I see uniform outcomes from a physical die, then that's Bayesian evidence that the die is physically symmetric.

See Wolf's Dice II for more examples along these lines.

Comment by johnswentworth on Laplace Approximation · 2019-07-21T21:26:32.346Z · score: 4 (2 votes) · LW · GW

Correct. In general, is the probability density of , so if it's uniform on a unit volume then .

The main advantage of this notation is that it's parameterization-independent. For example: in a coin-flipping example, we could have a uniform prior over the frequency of heads , so . But then, we could re-write that frequency in terms of the odds , so we'd get and

So the probability density is equivalent to the density . (That first step, , is because these two variables contain exactly the same information in two different forms - that's the parameterization independence. After that, it's math: substitute and differentiate.)

(Notice that the uniform prior on is not uniform over . This is one of the main reasons why "use a uniform prior" is not a good general-purpose rule for choosing priors: it depends on what parameters we choose. Cartesian and polar coordinate give different "uniform" priors.)

The moral of the story is that, when dealing with continuous probability densities, the fundamental "thing" is not the density function but the density times the differential , which we call . This is important mainly when changing coordinates: if we have some coordinate change , then , but .

If anybody wants an exercise with this: try transforming to a different coordinate system. Apply Laplace' approximation in both systems, and confirm that they yield the same answer. (This should mainly involve applying the chain rule twice to the Hessian; if you get stuck, remember that is a maximum point and consider what that implies.)

Comment by johnswentworth on Very Short Introduction to Bayesian Model Comparison · 2019-07-21T02:37:28.426Z · score: 2 (1 votes) · LW · GW

At least the way I think about it, the main role of Bayesian model testing is to compare gears-level models. A prior belief like "this phenomenon is going to be quite complex" doesn't have any gears in it, so it doesn't really make sense to think about in this context at all. I could sort-of replace "it's complex" with a "totally ignorant" uniform-prior model (the trivial case of a gears-level model with no gears), but I'm not sure that captures quite the same thing.

Anyway, I recommend reading the second post on Wolf's Dice. That should give a better intuition for why we're privileging the unbiased coin hypothesis here. The prior is not arbitrary - I chose it because I actually do believe that most coins are (approximately) unbiased. The prior is where the (hypothesized) gears are: in this case, the hypothesis that most coins are approximately unbiased is a gear.

Comment by johnswentworth on The Costs of Reliability · 2019-07-20T17:00:29.497Z · score: 30 (10 votes) · LW · GW

Complementary thought: the ability to accept tradeoffs against reliability increases with slack. This is one possible way to "monetize" slack, i.e. turn it into value. Conversely, if slack is held to absorb shocks from unreliability trade-offs, then if we want to ask "how valuable is this slack?" then we must ask "how much excess value do we get from this trade-off?"

Continuing along those lines: (unreliability + slack) is a strategy with increasing returns to scale. Suppose 1 unreliability tradeoff requires keeping 1 unit of slack in reserve, in case of failure. Then N independent tradeoffs, which each require a similar form of slack in reserve, will require less than N units of reserve slack, because it's highly unlikely for everything to fail at once. (It should require around sqrt(N) units of slack, assuming independent failures each with similar slack requirements.)

That suggests that individual people should either:

• specialize in having lots of slack and using lots unreliable opportunities (so they can accept N unreliability trade-offs with only sqrt(N) units of slack), or
• specialize in having little slack and making everything in their life highly reliable (because a relatively large unit of slack would need to be set aside for just one unreliability trade-off).
Comment by johnswentworth on Wolf's Dice · 2019-07-17T22:03:22.007Z · score: 4 (2 votes) · LW · GW

The third answer was meant to be used in conjunction with the second; that's what the scare quotes around "unbiased" were meant to convey, along with the phrase "frequencies very close to uniform". Sorry if that was insufficiently clear.

Also, if we're questioning (i.e. testing) the assumption, then we still need the assumption around as a hypothesis against which to test. That's exactly how it's used in the post.

Comment by johnswentworth on Wolf's Dice · 2019-07-17T15:38:24.912Z · score: 4 (2 votes) · LW · GW

First answer: this question is a perfect lead-in to the next post, in which we try to figure which physical asymmetries the die had. Definitely read that.

Second answer: In physics, to talk about the force applied by a baseball bat on a baseball, we use a delta function. We don't actually think that the force is infinite and applied over an infinitesimal time span, but that's a great approximation for simple calculations. Same in probability: we do actually think that most dice & coins are very close to unbiased. Even if we think there's some small spread, the delta function distribution (i.e. a delta function right at the unbiased probability) is a great approximation for an "unbiased" real-world die or coin. That's what the unbiased model is.

Third answer: "Given the dataset, construct the (multidimensional) probability distribution of biases" translates to "calculate ". That is absolutely a valid question to ask. Our models then enter into the prior for p - each model implies a different prior distribution, so to get the overall prior for , we'd combine them: . In English: we think the world has some "unbiased" dice, which have outcome frequencies very close to uniform, and some "biased" dice, which could have any frequencies at all. Thus our prior for looks like a delta function plus some flatter distribution - a mixture of "unbiased" and "biased" dice.

Comment by johnswentworth on Very Short Introduction to Bayesian Model Comparison · 2019-07-16T22:40:43.584Z · score: 2 (1 votes) · LW · GW

Correct. Thus "at least 10x" on the prior would mean we're at least indifferent, and possibly still in favor of the unbiased model.

Comment by johnswentworth on How much background technical knowledge do LW readers have? · 2019-07-11T22:45:10.820Z · score: 9 (5 votes) · LW · GW

I have no intention of making any such recommendation.

Comment by johnswentworth on How much background technical knowledge do LW readers have? · 2019-07-11T22:40:01.411Z · score: 4 (2 votes) · LW · GW

Mostly I didn't want to over-complicate the survey with too many buckets. I probably should have elaborated a bit more on "equivalent level of knowledge" - I'd say a minor is 70% of the way to a major in most technical fields, since there's decreasing marginal returns on courses in the same field.

Comment by johnswentworth on How much background technical knowledge do LW readers have? · 2019-07-11T22:34:26.758Z · score: 5 (3 votes) · LW · GW

I thought about that briefly, but I didn't come up with a good way around it without making the survey a lot longer (i.e. by asking about specific topics). I've also studied a lot on my own, and I agree it's hard to gauge ones' skill in comparison to more traditional tracks.

Comment by johnswentworth on How much background technical knowledge do LW readers have? · 2019-07-11T22:31:56.837Z · score: 4 (4 votes) · LW · GW

I'd actually view CS majors as the odd ball out, in this case. Most engineers need to cover most of that stuff, and certainly all mathematicians and physicists cover it. Chemists can get away with less and some biologists with a lot less, although at that point we're treading the border of "STEM" fields.

I went to a STEM-only school, and all of the listed courses were core requirements for all students (except PDEs). My understanding is that that's pretty standard for technical schools, or engineering schools within larger universities.

Comment by johnswentworth on How much background technical knowledge do LW readers have? · 2019-07-11T22:25:36.838Z · score: 10 (5 votes) · LW · GW

Think a CS degree with a systems/development/engineering focus, as opposed to algorithms/complexity/computability/theory focus.

It's mainly there for equivalent experience - i.e. someone who's coded a lot more extensively than just a course and playing around, but doesn't work as a programmer. Think someone who could maybe get hired as an entry-level developer, but probably specializes in something else.

Comment by johnswentworth on How to handle large numbers of questions? · 2019-07-04T19:17:40.824Z · score: 9 (4 votes) · LW · GW

As a concrete example to think about: suppose there were a highly visible "related" sidebar on every content page. Related questions, related posts, what have you. (Not necessarily advocating this, it's just something concrete to think around.)

This wouldn't be mutually exclusive with the main home page, or any other centralized discovery page. It could become more important over time, as people click on things out of curiosity. People wouldn't need to learn a new navigation habit all at once; it could just happen organically. Indeed, it would probably be pretty minor at first, until the recommendation algorithm got sorted out.

Any method of making links in posts/questions more actionable would serve a similar role - e.g. that preview thing wikipedia has when you hover over a link.

Comment by johnswentworth on How to handle large numbers of questions? · 2019-07-04T18:36:40.492Z · score: 4 (2 votes) · LW · GW

Feels like the right answer looks less like a newsfeed as the main visibility mechanism, and more like tvtropes. I don't know how to translate that into actual design, but I guess think about ways to encourage moving around a graph locally rather than accessing everything from the home page?

That would probably be a good thing to think about in general, too, since a more graphical model would better leverage LW's massive pool of evergreen content - newsfeeds aren't so good for that.

Comment by johnswentworth on The Right Way of Formulating a Problem? · 2019-07-02T17:45:36.830Z · score: 4 (2 votes) · LW · GW

Here's a programming example which I expect non-programmers will understand. Everyday programming involves a lot of taking data from one place in one format, and moving it to another place in another format. A company I worked for had to do even more of this than usual, and also wanted to track all those data flows and transformations. So I sat down and had a long think about how to make it easier to transform data from one format to another.

Turns out, this sort of problem can be expressed very neatly as high-school algebra with json-like data structures. For instance, you have some data like [{'name':'john',...},{'name':'joe',...},...] and you want to extract a list of all the names. As an algebra problem, that means finding a list of solutions to [{'name': X}] = data. (Of course there's simpler ways of doing it for this simple example, but for more complicated examples with tens or even hundreds of variables, the algebra picture scales up much better.)

Problem formulation is even more important in data analysis and/or machine learning problems. At one company I worked for, our product boiled down to recommendation. We had very fat tails of specific user tastes and corresponding items, so clustering-based approaches (i.e. find similar users, recommend things similar to things they like) gave pretty mediocre recommendations - too many users/items just weren't that similar to any major clusters, and we didn't have enough data to map out tiny clusters. Formulating the problem as pure content-based recommendation - i.e. recommending things for one user without using any information whatsoever about what "similar" users were interested in - turned out to work far better.

Anyway, that's enough from my life. Some historical examples:

... etc. Practically any topic in applied math began with somebody finding a neat new formulation.

Comment by johnswentworth on The Competence Myth · 2019-06-30T23:02:56.650Z · score: 2 (1 votes) · LW · GW

This seems related to that experiment where iterative testing-and-tweaking results in high performance, without the people involved actually understanding how the system works at all.