Posts

Characterizing Real-World Agents as a Research Meta-Strategy 2019-10-08T15:32:27.896Z · score: 24 (8 votes)
What funding sources exist for technical AI safety research? 2019-10-01T15:30:08.149Z · score: 21 (8 votes)
Gears vs Behavior 2019-09-19T06:50:42.379Z · score: 48 (14 votes)
Theory of Ideal Agents, or of Existing Agents? 2019-09-13T17:38:27.187Z · score: 16 (8 votes)
How to Throw Away Information 2019-09-05T21:10:06.609Z · score: 20 (7 votes)
Probability as Minimal Map 2019-09-01T19:19:56.696Z · score: 40 (12 votes)
The Missing Math of Map-Making 2019-08-28T21:18:25.298Z · score: 33 (16 votes)
Don't Pull a Broken Chain 2019-08-28T01:21:37.622Z · score: 27 (13 votes)
Cartographic Processes 2019-08-27T20:02:45.263Z · score: 23 (8 votes)
Embedded Agency via Abstraction 2019-08-26T23:03:49.989Z · score: 33 (12 votes)
Time Travel, AI and Transparent Newcomb 2019-08-22T22:04:55.908Z · score: 12 (7 votes)
Embedded Naive Bayes 2019-08-22T21:40:05.972Z · score: 13 (5 votes)
Computational Model: Causal Diagrams with Symmetry 2019-08-22T17:54:11.274Z · score: 40 (15 votes)
Markets are Universal for Logical Induction 2019-08-22T06:44:56.532Z · score: 64 (26 votes)
Why Subagents? 2019-08-01T22:17:26.415Z · score: 102 (34 votes)
Compilers/PLs book recommendation? 2019-07-28T15:49:17.570Z · score: 10 (4 votes)
Results of LW Technical Background Survey 2019-07-26T17:33:01.999Z · score: 43 (15 votes)
Cross-Validation vs Bayesian Model Comparison 2019-07-21T18:14:34.207Z · score: 21 (7 votes)
Bayesian Model Testing Comparisons 2019-07-20T16:40:50.879Z · score: 13 (3 votes)
From Laplace to BIC 2019-07-19T16:52:58.087Z · score: 13 (3 votes)
Laplace Approximation 2019-07-18T15:23:28.140Z · score: 27 (8 votes)
Wolf's Dice II: What Asymmetry? 2019-07-17T15:22:55.674Z · score: 30 (8 votes)
Wolf's Dice 2019-07-16T19:50:03.106Z · score: 33 (12 votes)
Very Short Introduction to Bayesian Model Comparison 2019-07-16T19:48:40.400Z · score: 23 (7 votes)
How much background technical knowledge do LW readers have? 2019-07-11T17:38:37.839Z · score: 31 (10 votes)
Embedded Agency: Not Just an AI Problem 2019-06-27T00:35:31.857Z · score: 12 (7 votes)
Being the (Pareto) Best in the World 2019-06-24T18:36:45.929Z · score: 171 (80 votes)
ISO: Automated P-Hacking Detection 2019-06-16T21:15:52.837Z · score: 6 (1 votes)
Real-World Coordination Problems are Usually Information Problems 2019-06-13T18:21:55.586Z · score: 29 (12 votes)
The Fundamental Theorem of Asset Pricing: Missing Link of the Dutch Book Arguments 2019-06-01T20:34:06.924Z · score: 43 (13 votes)
When Observation Beats Experiment 2019-05-31T22:58:57.986Z · score: 15 (6 votes)
Constraints & Slackness Reasoning Exercises 2019-05-21T22:53:11.048Z · score: 44 (15 votes)
The Simple Solow Model of Software Engineering 2019-04-08T23:06:41.327Z · score: 26 (10 votes)
Declarative Mathematics 2019-03-21T19:05:08.688Z · score: 60 (25 votes)
Constructing Goodhart 2019-02-03T21:59:53.785Z · score: 31 (12 votes)
From Personal to Prison Gangs: Enforcing Prosocial Behavior 2019-01-24T18:07:33.262Z · score: 81 (28 votes)
The E-Coli Test for AI Alignment 2018-12-16T08:10:50.502Z · score: 58 (22 votes)
Competitive Markets as Distributed Backprop 2018-11-10T16:47:37.622Z · score: 44 (16 votes)
Two Kinds of Technology Change 2018-10-11T04:54:50.121Z · score: 61 (22 votes)
The Valley of Bad Theory 2018-10-06T03:06:03.532Z · score: 63 (29 votes)
Don't Get Distracted by the Boilerplate 2018-07-26T02:15:46.951Z · score: 44 (22 votes)
ISO: Name of Problem 2018-07-24T17:15:06.676Z · score: 32 (13 votes)
Letting Go III: Unilateral or GTFO 2018-07-10T06:26:34.411Z · score: 22 (7 votes)
Letting Go II: Understanding is Key 2018-07-03T04:08:44.638Z · score: 12 (3 votes)
The Power of Letting Go Part I: Examples 2018-06-29T01:19:03.474Z · score: 38 (15 votes)
Problem Solving with Mazes and Crayon 2018-06-19T06:15:13.081Z · score: 128 (60 votes)
Fun With DAGs 2018-05-13T19:35:49.014Z · score: 38 (15 votes)
The Epsilon Fallacy 2018-03-17T00:08:01.203Z · score: 81 (24 votes)
The Cause of Time 2013-10-05T02:56:46.150Z · score: 0 (19 votes)
Recent MIRI workshop results? 2013-07-16T01:25:02.704Z · score: 2 (7 votes)

Comments

Comment by johnswentworth on Prediction Markets Don't Reveal The Territory · 2019-10-13T16:21:10.985Z · score: 7 (3 votes) · LW · GW

This is basically the same problem as Gears vs Behavior, specialized to the context of prediction markets. To a large extent, we can use prediction markets to pull out insights into system gears using tricks similar to those discussed in that piece. In particular, causal models are easily adapted to prediction markets: just use conditional bets, which only activate when certain conditions are satisfied. Robin Hanson talks about these fairly often; they're central to a lot of his ideas about prediction-market-driven decision-making systems (see e.g. here).

Comment by johnswentworth on Gears vs Behavior · 2019-10-11T03:04:05.648Z · score: 2 (1 votes) · LW · GW

Nice links. I actually stopped following deep learning for a few years, and very recently started paying attention again as the new generation of probabilistic programming languages came along (I'm particularly impressed with pyro). Those tools are a major step forward for learning causal structure.

I'd also recommend this recent paper by Friston (the predictive processing guy). I might write up a review of it soonish; it's a really nice piece of math/algorithm for learning causal structure, again using the same ML tools.

Comment by johnswentworth on Characterizing Real-World Agents as a Research Meta-Strategy · 2019-10-09T21:56:39.006Z · score: 2 (1 votes) · LW · GW

I think it will turn out that, with the right notion of abstraction, the underdetermination is much less severe than it looks at first. In particular, I don't think abstraction is entirely described by a pareto curve of information thrown out vs predictive power. There are structural criteria, and those dramatically cut down the possibility space.

Consider the Navier-Stokes equations for fluid flow as an abstraction of (classical) molecular dynamics. There are other abstractions which keep around slightly more or slightly less information, and make slightly better or slightly worse predictions. But Navier-Stokes is special among these abstractions: it has what we might call a "closure" property. The quantities which Navier-Stokes predicts in one fluid cell (average density & momentum) can be fully predicted from the corresponding quantities in neighboring cells plus generic properties of the fluid (under certain assumptions/approximations). By contrast, imagine if we tried to also compute the skew or heteroskedasticity or other statistics of particle speeds in each cell. These would have bizarre interactions with higher moments, and might not be (approximately) deterministically predictable at all without introducing even more information in each cell. Going the other direction, imagine we throw out info about density & momentum in some of the cells. Then that throws off everything else, and suddenly our whole fluid model needs to track multiple possible flows.

So there are "natural" levels of abstraction where we keep around exactly the quantities relevant to prediction of the other quantities. Part of what I'm working on is characterizing these abstractions: for any given ground-level system, how can we determine which such abstractions exist? Also, is this the right formulation of a "natural" abstraction, or is there a more/less general criteria which better captures our intuitions?

All this leads into modelling humans. I expect that there is such a natural level of abstraction which corresponds to our usual notion of "human", and specifically humans as agents. I also expect that this natural abstraction is an agenty model, with "wants" build into it. I do not think that there are a large number of "nearby" natural abstractions.


Comment by johnswentworth on Why Subagents? · 2019-10-09T21:26:53.885Z · score: 2 (1 votes) · LW · GW

Wouldn't the Hahn embedding theorem result in a ranking of the subagents themselves, rather than requiring unanimous agreement? Whichever subagent corresponds to the "largest infinities" (in the sense of ordinals) makes its choice, the choice of the next agent only matters if that first subagent is indifferent, and so on down the line.

Anyway, I find the general idea here interesting. Assuming a group structure seems unrealistic as a starting point, but there's a bunch of theorems of the form "any abelian operation with properties X, Y, Z is equivalent to real/vector addition", so it might not be an issue.

Comment by johnswentworth on Reflections on Premium Poker Tools: Part 2 - Deciding to call it quits · 2019-10-09T05:16:20.470Z · score: 11 (5 votes) · LW · GW

One thing you might look into is selling what you have - there's a few marketplaces for tech projects that haven't really taken off (e.g. I think flippa is the biggest?). If your product really is worth a decent chunk of money, this would at least give you a market price on the business as a whole.

Comment by johnswentworth on AI Alignment Open Thread October 2019 · 2019-10-09T02:47:26.131Z · score: 4 (2 votes) · LW · GW

Oh no, not you too. It was bad enough with just Bena.

Comment by johnswentworth on Characterizing Real-World Agents as a Research Meta-Strategy · 2019-10-08T19:39:23.926Z · score: 6 (3 votes) · LW · GW

I was actually going to leave a comment on this topic on your last post (which btw I liked, I wish more people discussed the issues in it), but it didn't seem quite close enough to the topic of that post. So here it is.

Specifically, the idea that there is no "the" wants and ontology of e. coli

This, I think, is the key. My (as-yet-incomplete) main answer is in "Embedded Naive Bayes": there is a completely unambiguous sense in which some systems implement certain probabilistic world-models and other systems do not. Furthermore, the notion is stable under approximation: systems which approximately satisfy the relevant functional equations use these approximate world-models. The upshot is that it is possible (at least sometimes) to objectively, unambiguously say that a system models the world using a particular ontology.

But it will require abstraction

Yup. Thus "Embedded Agency via Abstraction" - this has been my plurality research focus for the past month or so. Thinking about abstract models of actual physical systems, I think it's pretty clear that there are "natural" abstractions independent of any observer, and I'm well on the way to formalizing this usefully.

Of course any sort of abstraction involves throwing away some predictive power, and that's fine - indeed that's basically the point of abstraction. We throw away information and only keep what's needed to predict something of interest. Navier-Stokes is one example I think about: we throw away the details of microscopic motion, and just keep around averaged statistics in each little chunk of space. Navier-Stokes is a "natural" level of abstraction: it's minimally self-contained, with all the info needed to make predictions about the bulk statistics in each little chunk of space, but no additional info beyond that.

Anyway, I'll probably be writing much more about this in the next month or so.

So if you're aiming for eventually tinkering with hand-coded agential models of humans, one necessary ingredient is going to be tolerance for abstraction and suboptimal predictive power.

Hand-coded models of humans is definitely not something I aim for, but I do think that abstraction is a necessary element of useful models of humans regardless of whether they're hand-coded. An agenty model of humans is necessary in order to talk about humans wanting things, which is the whole point of alignment - and "humans" "wanting" things only makes sense at a certain level of abstraction.

Comment by johnswentworth on AI Alignment Open Thread October 2019 · 2019-10-08T18:26:08.050Z · score: 6 (3 votes) · LW · GW

I think this would be an extremely useful exercise for multiple independent reasons:

  • it's directly attempting to teach skills which I do not currently know any reproducible way to teach/learn
  • it involves looking at how breakthroughs happened historically, which is an independently useful meta-strategy
  • it directly involves investigating the intuitions behind foundational ideas relevant to the theory of agency, and could easily expose alternative views/interpretations which are more useful (in some contexts) than the usual presentations

Comment by johnswentworth on Computational Model: Causal Diagrams with Symmetry · 2019-10-08T14:59:47.166Z · score: 3 (2 votes) · LW · GW

The "n==0?" node is intended to be a ternary operator; its output is n*f(n-1) in the case where n is not 0 (and when n is 0, its output is hardcoded to 1).

Comment by johnswentworth on What are your strategies for avoiding micro-mistakes? · 2019-10-04T19:57:48.894Z · score: 43 (13 votes) · LW · GW

A quote from Wheeler:

Never make a calculation until you know the answer. Make an estimate before every calculation, try a simple physical argument (symmetry! invariance! conservation!) before every derivation, guess the answer to every paradox and puzzle.

When you get into more difficulty math problems, outside the context of a classroom, it's very easy to push symbols around ad-nauseum without making any forward progress. The counter to this is to figure out the intuitive answer before starting to push symbols around.

When you follow this strategy, the process of writing a proof or solving a problem mostly consists of repeatedly asking "what does my intuition say here, and how do I translate that into the language of math?" This also gives built-in error checks along the way - if you look at the math, and it doesn't match what your intuition says, then something has gone wrong. Either there's a mistake in the math, a mistake in your intuition, or (most common) a piece was missed in the translation.

Comment by johnswentworth on What are we assuming about utility functions? · 2019-10-04T15:49:46.400Z · score: 4 (2 votes) · LW · GW

Let me repeat back your argument as I understand it.

If we have a Bayesian utility maximizing agent, that's just a probabilistic inference layer with a VNM utility maximizer sitting on top of it. So our would-be arbitrageur comes along with a source of "objective" randomness, like a quantum random number generator. The arbitrageur wants to interact with the VNM layer, so it needs to design bets to which the inference layer assigns some specific probability. It does that by using the "objective" randomness source in the bet design: just incorporate that randomness in such a way that the inference layer assigns the probabilities the arbitrageur wants.

This seems correct insofar as it applies. It is a useful perspective, and not one I had thought much about before this, so thanks for bringing it in.

The main issue I still don't see resolved by this argument is the architecture question. The coherence theorems only say that an agent must act as if they perform Bayesian inference and then choose the option with highest expected value based on those probabilities. In the agent's actual internal architecture, there need not be separate modules for inference and decision-making (a Kalman filter is one example). If we can't neatly separate the two pieces somehow, then we don't have a good way to construct lotteries with specified probabilities, so we don't have a way to treat the agent as a VNM-type agent.

This directly follows from the original main issue: VNM utility theory is built on the idea that probabilities live in the environment, not in the agent. If there's a neat separation between the agent's inference and decision modules, then we can redefine the inference module to be part of the environment, but that neat separation need not always exist.

EDIT: Also, I should point out explicitly that VNM alone doesn't tell us why we ever expect probabilities to be relevant to anything in the first place. If we already have a Bayesian expected utility maximizer with separate inference and decision modules, then we can model that as an inference layer with VNM on top, but then we don't have a theorem telling us why inference layers should magically appear in the world.

Why do we expect (approximate) expected utility maximizers to show up in the real world? That's the main question coherence theorems answer, and VNM cannot answer that question unless all of the probabilities involved are ontologically fundamental.

Comment by johnswentworth on What are we assuming about utility functions? · 2019-10-03T14:59:50.407Z · score: 6 (4 votes) · LW · GW

I would argue that independence of irrelevant alternatives is not a real coherence criterion. It looks like one at first glance: if it's violated, then you get an Allais Paradox-type situation where someone pays to throw a switch and then pays to throw it back. The problem is, the "arbitrage" of throwing the switch back and forth hinges on the assumption that the stated probabilities are objectively correct. It's entirely possible for someone to come along who believes that throwing the switch changes the probabilities in a way that makes it a good deal. Then there's no real arbitrage, it just comes down to whose probabilities better match the outcomes.

My intuition for this not being real arbitrage comes from finance. In finance, we'd call it "statistical arbitrage": it only works if the probabilities are correct. The major lesson of the collapse of Long Term Capital Management in the 90's is that statistical arbitrage is definitely not real arbitrage. The whole point of true arbitrage is that it does not depend on your statistical model being correct .

This directly leads to the difference between VNM and Bayesian expected utility maximization. In VNM, agents have preferences over lotteries: the probabilities of each outcome are inputs to the preference function. In Bayesian expected utility maximization, the only inputs to the preference function are the choices available to the agent - figuring out the probabilities of each outcome under each choice is the agent's job.

(I do agree that we can set up situations where objectively correct probabilities are a reasonable model, e.g. in a casino, but the point of coherence theorems is to be pretty generally applicable. A theorem only relevant to casinos isn't all that interesting.)

Comment by johnswentworth on What are we assuming about utility functions? · 2019-10-02T21:43:49.005Z · score: 2 (1 votes) · LW · GW

Can you give an example?

Comment by johnswentworth on What are we assuming about utility functions? · 2019-10-02T18:09:55.055Z · score: 18 (8 votes) · LW · GW
In particular, the coherence arguments and other pressures that move agents toward VNM seem to roughly scale with capabilities.

One nit I keep picking whenever it comes up: VNM is not really a coherence theorem. The VNM utility theorem operates from four axioms, and only two of those four are relevant to coherence. The main problem is that the axioms relevant to coherence (acyclicity and completeness) do not say anything at all about probability and the role that it plays - the "expected" part of "expected utility" does not arise from a coherence/exploitability/pareto optimality condition in the VNM formulation of utility.

The actual coherence theorems which underpin Bayesian expected utility maximization are things like Dutch book theorems, Wald's complete class theorem, the fundamental theorem of asset pricing, and probably others.

Why does this nitpick matter? Three reasons:

  • In my experience, most people who object to the use of utilities have only encountered VNM, and correctly point out problems with VNM which do not apply to the real coherence theorems.
  • VNM utility stipulates that agents have preferences over "lotteries" with known, objective probabilities of each outcome. The probabilities are assumed to be objectively known from the start. The Bayesian coherence theorems do not assume probabilities from the start; they derive probabilities from the coherence criteria, and those probabilities are specific to the agent.
  • Because VNM is not really a coherence theorem, I do not expect agent-like systems in the wild to be pushed toward VNM expected utility maximization. I expect them to be pushed toward Bayesian expected utility maximization.
Comment by johnswentworth on What funding sources exist for technical AI safety research? · 2019-10-01T17:15:59.812Z · score: 2 (1 votes) · LW · GW

That is not the main focus of the question, but you're welcome to leave an answer with suggestions in that space. It is "funding", in some sense.

Comment by johnswentworth on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T20:55:59.097Z · score: 7 (7 votes) · LW · GW

I dunno, one life seems like a pretty expensive trade for the homepage staying up for a day. I bet a potential buyer could shop around and obtain launch codes for half a life.

Not saying I'd personally give up my launch code at the very reasonable cost of $836. But someone could probably be found. Especially if the buyer somehow found a way to frame someone else for the launch.

(Of course, now this comment is sitting around in plain view of everyone, the launch codes would have to come from someone other than me, even accounting for the framing.)

Comment by johnswentworth on Free Money at PredictIt? · 2019-09-26T16:41:04.484Z · score: 2 (1 votes) · LW · GW

I'd been checking the numbers on No-side arbitrage for predictit's democratic nominee and president markets every couple weeks, but I didn't realize that predictit frees up your capital. How do the details on that work? Is it documented somewhere?

Comment by johnswentworth on Rationality and Levels of Intervention · 2019-09-26T16:09:44.609Z · score: 3 (2 votes) · LW · GW

The dynamics in a small group are qualitatively different from whole communities. To a large extent, that's exactly why community control is hard/interesting. Again, Personal to Prison Gangs is a good example.

Comment by johnswentworth on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T16:06:18.104Z · score: 6 (3 votes) · LW · GW

That is a very shiny button.

Comment by johnswentworth on Rationality and Levels of Intervention · 2019-09-26T05:45:23.326Z · score: 5 (3 votes) · LW · GW
Do you think that human-control is conserved in some sense, i.e. some humans are controlling community practices, even if you're not?

I think of people today trying to control community practices as a lot like premodern physicians. On rare occasions they accidentally stumble on something that works sometimes, and maybe even get it to work more than once. But the understanding to consistently predict which interventions will have which effects just isn't there, and the vast majority of interventions either have zero effect or negative effects. It's all humors and leeches.

Someday, we will be better at this. Personal to Prison Gangs is the best example I have - it's a step closer to the understanding required to go from "I want to implement change X in this community" to "I personally can reliably make that change happen by doing Y". But we are not yet anywhere near that point.

Meanwhile, in the absence of actually understanding the effects of our actions on community dynamics, the best we can do is try stuff and see what happens. Given that most changes are zero or negative (due to generalized efficient markets), this only works when we have a platform for rapidly testing many changes and quantifying their impacts on the community - video game communities are a good example. In that case, there are clearly people who can modify the community by modifying the interface - assuming they bother to do so. (This does not currently apply to e.g. the LessWrong team, since last I heard they didn't have the tracking and a/b testing framework necessary to find out which changes have which effects. To the extent that they're trying to control community dynamics, they're still on humors and leeches.)

Comment by johnswentworth on Rationality and Levels of Intervention · 2019-09-26T01:56:13.064Z · score: 3 (2 votes) · LW · GW
I was not making a claim about how much babble is necessary – just noting if it were necessary we'd want a good way to handle that fact.

Ah yeah, makes sense on a second read.

The thing I'm thinking of is... not necessarily more intentional, but a specific type of brainstorming.

Now I'm curious, but not yet sure what you mean. Could you give an example or two?

Comment by johnswentworth on Rationality and Levels of Intervention · 2019-09-25T23:25:06.893Z · score: 6 (5 votes) · LW · GW

I actually disagree that that lots of babble is necessary. One of the original motivations for Mazes and Crayon was to show, in an algorithmic context, what some less babble-based strategies might look like.

My own intuition on the matter comes largely from hard math problems. Outside of intro classes, if you sit down to write a proof without a pre-existing intuitive understanding of why it works, you'll math-babble without getting any closer to a proof. I've spent weeks at a time babbling math, many times, with nothing to show for it. It reliably does not work on hard problems.

Something like babbling is still necessary to build intuitions, of course, but even there it's less like random branching and more like A* search.

Comment by johnswentworth on Rationality and Levels of Intervention · 2019-09-25T20:41:07.112Z · score: 3 (2 votes) · LW · GW
It is obvious that one can make similar levels and ask a similar question about rationality and the pursuit of the truth. What should we be trying to optimize in order to optimize the intellectual performance of a community?

This presupposes that optimizing the intellectual performance of a community is the goal in the first place. Individuals have a great deal more control over their own thoughts/behavior than over community norms; there is little point attempting to optimize something over which one exercises minimal control.

Comment by johnswentworth on Taxing investment income is complicated · 2019-09-23T19:59:05.262Z · score: 5 (3 votes) · LW · GW

Cochrane mainly talks about this in the context of the equity premium. His main answer is "we don't know why there's an equity premium, we've tried the obvious risk-aversion models and they don't make sense."

The key issue is not just "most investors are much more risk averse than log utility", but how much more risk averse exactly. Cochrane tries to back out the curvature of the utility function (measured as , where c is consumption) based on observed market parameters, and he shows that needs to be around 50. For sense of scale, log utility would imply , and in the range of 1 to 5 is typical in theoretical models - that's the sort of risk aversion you'd expect to see e.g. in a casino. would imply some bizarre things - for example, assuming real consumption growth of around 1% annually with 1% std dev, the risk free rate should be around 40%. (Cochrane has a bunch more discussion of weird things implied by very high risk aversion, and looks at some variations of the basic model as well. I don't know it well enough to expound on the details.)

Personally, I suspect that the "true" answer to the problem is some combination of:

  • Despite using the words "log utility", most of these are actually second-order expansion models which don't account for the tail behavior or details of "bankruptcy" (i.e. margin calls).
  • Most of these models ignore the Volker fence and functionally-similar reserve requirements on banks - factors which we would expect to dramatically lower the rates on bonds and other low-reserve-requirement assets relative to stocks.

... but I haven't gotten around to building and solving models for these yet; my interest is more on the market microstructure end of things.



Comment by johnswentworth on Taxing investment income is complicated · 2019-09-22T18:26:29.693Z · score: 3 (2 votes) · LW · GW
taxing excess returns seems like it’s almost a free lunch: it reduces an investor’s losses as well as their gains, so they can just lever up their investments to offset the effect of taxes.

Another factor which pays for the lunch is the increase in demand and decrease in supply of risk-free capital. Demand increases in order to fund the excess margin needed for all that leverage. On the supply side, people should keep a somewhat smaller chunk of their funds in risk-free assets, as they leverage up the risky side of their portfolios. The overall effect should be an increase in risk-free capital costs, i.e. the real risk-free interest rate.

I'd have to do the math, but my guess is that the change in real risk-free rate would (to first order) match the gains from the tax, and pay for the lunch. That said, I love this idea of (properly structured) capital gains taxes as a substitute for a sovereign wealth fund.

Also, my current understanding is that risk compensation is definitely not the large majority of investment returns. The last chapter of Cochrane's Asset Pricing text has a great discussion of the topic. The main conclusion is that explaining returns via risk exposure requires unrealistically high levels of risk aversion - like, one or two orders of magnitude above the risk aversion levels implied by other activities.

Comment by johnswentworth on Meetups: Climbing uphill, flowing downhill, and the Uncanny Summit · 2019-09-22T00:33:53.632Z · score: 6 (3 votes) · LW · GW

I wonder how many people are primarily interested in doing effortful things at meetup-ish events, rather than hanging out. I, for one, would actually go to meetups if they weren't just social clubs, but I have near-zero interest in a social club.

I'm sure people have divergent interests in terms of what effortful things they'd like to do, but on the other hand I'd personally be interested in a pretty wide range of possibilities if there were other interested people and it actually involved doing something substantive. Anyone else?

Comment by johnswentworth on Feature Wish List for LessWrong · 2019-09-19T20:18:25.296Z · score: 2 (1 votes) · LW · GW

2-3 related posts would be plenty; the top 2-3 in a list are all people usually click on anyway.

Why the aversion to sidebars? I don't disagree, bottom or ToC or elsewhere is fine, just curious.

Comment by johnswentworth on Feature Wish List for LessWrong · 2019-09-19T18:19:33.103Z · score: 4 (2 votes) · LW · GW

BTW I love the link preview functionality, I think that one is huge for content discovery.

Comment by johnswentworth on Feature Wish List for LessWrong · 2019-09-19T18:14:50.836Z · score: 22 (6 votes) · LW · GW

I think the highest payoff-per-unit-effort changes to LW as it currently exists are in content discovery.

We have this massive pile of mostly-evergreen content, yet we have very few ways for a user to find specific things in that pile which interest them. We have the sequences, and we have some recommendations at the top of the homepage. But neither of these shows things which are more likely than average to be of interest to this specific user. With such a huge volume of content, showing users the things most relevant to their interests is crucial.

So, two main criteria for changes to be high-value:

  • Content discovery needs to be on every post, not just the homepage
  • Content discovery needs to be relevant, i.e. not just showing everyone the same things all the time

Now, I'm not saying we need a fancy engine for user-level recommendations; "related posts" would be much easier and probably more effective than an off-the-shelf recommendation engine. The ideal starting point would be a sidebar on every post containing (some subset of):

  • Posts/comments which link to this post
  • Other posts by this author
  • Author-suggested related posts
  • "Users who upvoted/commented on this post also upvoted/commented on..."
  • Commenter-suggested related posts (would potentially help for forward-linking to newer versions of old ideas)
  • "Similar" posts, as judged by some automated natural language model and/or hyperlink-based model
  • Posts with similar tags (would require tag functionality)

... etc. Just getting a sidebar with one or two of these would give a good platform for extending functionality later on, and would dramatically improve visibility into our mountain of content.

Comment by johnswentworth on Request for stories of when quantitative reasoning was practically useful for you. · 2019-09-18T18:37:07.471Z · score: 17 (3 votes) · LW · GW

Back of the envelope statistical significance calculations: "Yes, sales per lead have gone up since the change went out, but we've only had 150 sales come through since then so we'd expect a difference of about 12 just based on random noise, and indeed the difference is about 10 sales."

Noticing unknown unknowns: "We'd expect hourly signups to vary by about 50 people based on independent random decisions, but we're actually seeing variation of about 250 people, without any clear time-of-day pattern. There must be some underlying factor which makes a bunch who come in around the same time all more/less likely to sign up." (In this case, the underlying factor was that our server wasn't able to handle the load, so when it got behind lots of people had lots of lag at the same time.)

Noticing corner cases: "If two people are sharing one account, what do we do when they both edit at the same time? What if someone updates X, but we've already done a bunch of logic using the original X value, and now we need them to go back and input some other information?"

Critical path reasoning: "We usually end up waiting around on the appraisal - that's what takes longest. And we're not ordering the appraisal until we have form X. But we don't actually need to wait that long - we can parallelize, and order the appraisal while we're still waiting on form X. We could also parallelize fetching the insurance forms, but that won't matter much - they're usually pretty fast anyway."

Sparsity: "This loop is running over every possible pair of words. That's something like 100M word pairs. But the data only contains 100k sentences with ~10 word pairs in each of them, so at most only about 1M word pairs actually occur. If we loop over sentences and only look at pairs which actually occur, that should be at least a 100X speedup."

Little-o reasoning: "These changes just aren't that big, so the relationship should be roughly linear."

Queueing theory: "If we're trying to keep all of our people busy all of the time, then our wait times are going to grow longer and longer. If we want short wait times, then we need to have idle capacity."

I'm excluding economic-style reasoning, because so much has already been written about applications of economics in everyday life. But if you want me to add some econ examples, let me know.

Comment by johnswentworth on [Site Feature] Link Previews · 2019-09-18T18:19:51.008Z · score: 3 (2 votes) · LW · GW

Awesome job, I love this.

Comment by johnswentworth on Why Subagents? · 2019-09-18T00:26:51.321Z · score: 5 (3 votes) · LW · GW

I definitely agree that most of the work is being done by the structure in which the subagents interact (i.e. committee requiring unanimous agreement) rather than the subagents themselves. That said, I wouldn't get too hung up on "committee requiring unanimous agreement" specifically - there are structures which behave like unanimous committees but don't look like a unanimous committee on the surface, e.g. markets. In a market, everyone has a veto, but each agent only cares about their own basket of goods - they don't care if somebody else' basket changes.

In the context of humans, one way to interpret this post is that it predicts that subagents in a human usually have veto power over decisions directly touching on the thing they care about. This sounds like a pretty good model of, for example, humans asked about trade-offs between sacred values.


Comment by johnswentworth on Effective Altruism and Everyday Decisions · 2019-09-16T21:55:14.708Z · score: 16 (9 votes) · LW · GW

I wrote a piece last year about the general version of this problem, which I called the epsilon fallacy: optimizing choices with small impact on the objective, without thinking about opportunity costs or trade-offs.

Comment by johnswentworth on Why Subagents? · 2019-09-16T18:01:06.894Z · score: 4 (2 votes) · LW · GW

If we have a cycle detector which prevents cycling, then we don't have true cycles. Indeed, that would be an example of a system with internal state: the externally-visible state looks like it cycles, but the full state never does - the state of the cycle detector changes.

So this post, as applied to cycle-detectors, says: any system which detects cycles and prevents further cycling can be represented by a committee of utility-maximizing agents.

Comment by johnswentworth on The Power to Solve Climate Change · 2019-09-12T21:03:38.071Z · score: 17 (10 votes) · LW · GW

Why is the UNFCCC and the Paris agreement under "definite" solutions? That seems like a textbook case where specificity was completely lacking - there wasn't any actual plan for how any particular signatory would actually hit their targets, or any concrete plan for what would happen when if they didn't. It was all indefinite: they all agreed to somehow reduce emissions, without any definite vision.

On the other hand, personal behavior change like washing laundry in cold water is plenty specific and definite - the strategy says exactly what to do and has a simple cause-and-effect story from actions to results. The story may be wrong, but that doesn't make it any less definite - just because a model doesn't match reality does not mean that it's lacking specificity. (On a side note, I'm not entirely convinced by your economic argument, although I'm unimpressed with personal behavior change strategies for other reasons. But again, all of that is orthogonal to specificity.) Same with offsetting: there's an argument to be made that something is wrong with the strategy, but the problem is orthogonal to specificity/definiteness.

More generally, I feel like this sequence has been losing sight of the target (although your writing skills in themselves are great). The central point of the sequence, as I see it, is using specificity to notice when our models don't have any gears in them - that's a useful skill which applies in huge variety of areas, and you're doing a decent job providing lots of examples. But specificity, like gears-y-ness, is not the only criteria for having a correct model. It's independent of whether the specific examples are realistic, or of whether the model matches reality. The central message of specificity would be more effectively conveyed if the examples clearly separated it from correctness. "Specificity" is not just another synonym for "yay".

Comment by johnswentworth on Wolf's Dice · 2019-09-11T16:14:13.596Z · score: 4 (2 votes) · LW · GW

It's not really important, all that matters is that we're consistent in which one we use. We have to always include the symmetry factor, or never include it.

In this case, I went with counts because our data does, in fact, consist of counts. Because we're assuming each die roll is independent, we'd get the same answer if we just made up a string of outcomes with the same counts, and used that as the data instead.

Comment by johnswentworth on Hackable Rewards as a Safety Valve? · 2019-09-11T00:08:14.275Z · score: 2 (1 votes) · LW · GW

I'm on-board with that distinction, and I was also thinking of reward-maximizers (despite my loose language).

Part of the confusion may be different notions of "wireheading": seizing an external reward channel, vs actual self-modification. If you're picturing the former, then I agree that the agent won't hack its expectation operator. It's the latter I'm concerned with: under what circumstances would the agent self-modify, change its reward function, but leave the expectation operator untouched?

Example: blue-maximizing robot. The robot might modify its own code so that get_reward(), rather than reading input from its camera and counting blue pixels, instead just returns a large number. The robot would do this because it doesn't model itself as embedded in the environment, and it notices a large correlation between values computed by a program running in the environment (i.e. itself) and its rewards. But in this case, the modified function always returns the same large number - the robot no longer has any reason to worry about the rest of the world.

Comment by johnswentworth on Hackable Rewards as a Safety Valve? · 2019-09-10T23:39:03.160Z · score: 4 (2 votes) · LW · GW

The root issue is that Reward ≠ Utility. A utility function does not take in a policy, it takes in a state of the world - an expected utility maximizer chooses its policy based on what state(s) of the world it expects that policy to induce. Its objective looks like , where is the state of the world, and the policy/action matters only insofar as it changes the distribution of . The utility is internal to the agent. , as a function of the world state, is perfectly known to the utility maximizer - the only uncertainty is in the world state , and the only thing which the agent tries to control is the world-state . That's why it's reflectively stable: the utility function is "inside" the agent, not part of the "environment", and the agent has no way to even consider changing it.

A reward function, on the other hand, just takes in a policy directly - an expected reward maximizer's objective looks like . Unlike a utility, the reward is "external" to the agent, and the reward function is unknown to the agent - the agent does not necessarily know what reward it will receive given some state of the world. The reward "function", i.e. the function mapping a state of the world to a reward, is itself just another part of the environment, and the agent can and will consider changing it.

Example: the blue-maximizing robot.

A utility-maximizing blue-bot would model the world, look for all the blue things in its world-model, and maximize that number. This robot doesn't actually have any reason to stick a blue screen in front of its camera, unless its world-model lacks object permanence. To make a utility-maximizing blue-bot which does sit in front of a blue screen would actually be more complicated: we'd need a model of the bot's own camera, and a utility function over the blue pixels detected by that camera. (Or we'd need a world-model which didn't include anything outside the camera's view.)

On the other hand, a reward-maximizing blue-bot doesn't necessarily even have a notion of "state of the world". If its reward is the number of blue pixels in the camera view, that's what it maximizes - and if it can change the function mapping external world to camera pixels, in order to make more pixels blue, then it will. So it happily sits in front of a blue screen. Furthermore, a reward maximizer usually needs to learn the reward function, since it isn't built-in. That leads to the sort of problem I mentioned above, where the agent doesn't realize it's embedded in the environment and "accidentally" self-modifies. That wouldn't be a problem for a true utility maximizer with a decent world-model - the utility maximizer would recognize that modifying this chunk of the environment won't actually cause higher utility, it's just a correlation.

Comment by johnswentworth on Hackable Rewards as a Safety Valve? · 2019-09-10T22:31:54.030Z · score: 2 (1 votes) · LW · GW

So concretely, we have a blue-maximizing robot, it uses its current world-model to forecast the reward from holding a blue screen in front of its camera, and find that it's probably high-reward. Now it tries to minimize the probability that someone takes the screen away. That's the sort of scenario you're talking about, yes?

I agree that Wei Dai's argument applies just fine to this sort of situation.

Thing is, this kind of wireheading - simply seizing the reward channel - doesn't actually involve any self-modification. The AI is still "working" just fine, or at least as well as it was working before. The problem here isn't really wireheading at all, it's that someone programmed a really dumb utility function.

True wireheading would be if the AI modifies its utility function - i.e. the blue-minimizing robot changes its code (or hacks its hardware) to count red as also being blue. For instance, maybe the AI does not model itself as embedded in the environment, but learns that it gets a really strong reward signal when there's a big number at a certain point in a program executed in the environment - which happens to be its own program execution. So, it modifies this program in the environment to just return a big number for expected_utility, thereby "accidentally" self-modifying.

What I'm not seeing is, in situations where an AI would actually modify itself, when and why would it go for the utility function but not the expectation operator? Maybe people are just imagining "wireheading" in the form of seizing an external reward channel?

Comment by johnswentworth on Hackable Rewards as a Safety Valve? · 2019-09-10T21:22:20.558Z · score: 7 (4 votes) · LW · GW

Could you give an example for the latter, which wouldn't also apply to hacking the expectation operator? The argument sounds plausible, but I'm not yet seeing what qualitative difference between the expectation and utility operators would make a wireheading AI modify one but not the other.

Comment by johnswentworth on Hackable Rewards as a Safety Valve? · 2019-09-10T19:08:46.443Z · score: 8 (3 votes) · LW · GW

It would be much easier for the AI to hack its own expectation operator, so that it predicts a 100% chance of continued survival, rather than taking over the universe. If you're gonna wirehead, why stop early?

I do agree that the builder would probably just try another design. Ideally, they keep adding hacks to make wireheading harder until the AI kills the builder and wireheads itself - hopefully without killing everyone else in the process.

Comment by johnswentworth on Why Subagents? · 2019-09-10T17:39:15.552Z · score: 6 (3 votes) · LW · GW

Not sure if you've ever taken a class on electricity & magnetism, but one of the central notions is the conservative vector field - electric fields being the standard example. You take an electron, and drag it around the electric field. Sometimes you'll have to push it against the field, sometimes the field will push it along for you. You add up all the energy spent pushing (or energy extracted when the field pushes it for you), and find an interesting result: the energy spent moving the electron from point A to point B is completely independent of the path taken. Any two paths from A to B will require exactly the same energy expenditure.

That's a pretty serious constraint on the field - the vast majority of possible vector fields are not conservative.

It's also exactly the same constraint as a utility function: a vector field is conservative if-and-only-if it is acyclic, in the sense of having zero circulation around any closed curve. Indeed, this means that conservative vector fields can be viewed as utility functions: the field itself is the gradient of a "utility function" (called the potential field), and it accepts any local "trade" which increases utility - i.e. moving an electron up the gradient of the utility function. Conversely, if we have preferences represented by local preferences in a (finite-dimensional) vector space, then we can summarize those preferences with a utility function if-and-only-if the field is conservative.

My point is: acyclicity is a major constraint on a system's behavior. It is definitely not the case that "everything can be represented as having a utility function".

Now, there is a separate piece to your concern: when people talk about subagent theories of mind, they think that the brain is actually implemented using subagents, not merely behaving in a manner equivalent to having subagents. It's a variant of the behavior vs architecture question. In this case, we can partially answer the question: subagent architectures have a relative advantage over most non-subagent architectures in that the subagent architectures won't throw away resources via cyclic preferences, whereas most of the non-subagent architectures will. The only non-subagent architectures which don't throw away resources are those whose behavior just so happens to be equivalent to subagents.

If a system with a subagent architecture is evolving, then it will mostly be exploring different configurations of subagents - so any configuration it explores will at least not throw away resources. On the other hand, with a non-subagent architecture, we'd expect that there's some surface in configuration space which happens to implement agent-like behavior, and any changes which move off that surface will throw away at least some resources - and any single-nucleotide change is likely to move off the surface. In other words, a subagent architecture is more likely to have a nice evolutionary path from wherever it starts to the maximum-fitness design, whereas a non-subagent architecture may not have such a smooth path. As an evolutionary analogue to the behavior vs architecture question, I'd conjecture: subagent-like behavior generally won't evolve without subagent-like architecture, because it's so much easier to explore efficient designs within a subagent architecture.

Comment by johnswentworth on Are minimal circuits deceptive? · 2019-09-07T20:03:14.097Z · score: 11 (7 votes) · LW · GW

Interesting proof. A couple off-the-cuff thoughts...

First, this is proving something much more general than just deceptiveness. We can just scribble out the three or four English sentences with the word "deceptive" in them, and just say " is a predicate" - the proof doesn't actually use any properties of C. So this is a fairly general result about minimal circuits on MDPs - and I'm wondering what other predicates could be plugged into it to yield interesting results.

Second, the proof doesn't actually use the assumption that the minimal circuit on the original set of tasks is performing search. All that matters is that there is some MDP on which the circuit is deceptive, and that the circuit is minimal subject to an average performance bound on the full set of tasks.

Third, we can state the overall conclusion as roughly "if there exists a set of MDPs for which the minimal circuit achieving some average performance on the set is deceptive on at least one of the MDPs, then there exists a single MDP whose minimal circuit is deceptive." On the other hand, the reverse implication holds trivially: if there exists a single MDP whose minimal circuit is deceptive, then there exists a set of MDPs for which the minimal circuit achieving some average performance on the set is deceptive on at least one of the MDPs. Proof: take the set to contain just the one MDP whose minimal dircuit is deceptive. So we don't just have a one-way implication here; we actually have equivalence between two open problems. Obvious next question: what other minimal-circuit problems are equivalent to these two?

Comment by johnswentworth on How to Throw Away Information · 2019-09-06T20:20:03.444Z · score: 4 (2 votes) · LW · GW

Nice argument - I'm convinced that the bound isn't always achievable. See my response to cousin_it's comment for some related questions.

Comment by johnswentworth on How to Throw Away Information · 2019-09-06T20:18:43.924Z · score: 2 (1 votes) · LW · GW

Nice example - I'm convinced that the bound isn't always achievable. So the next questions are:

  • Is there a nice way to figure out how much information we can keep, in any particular problem, when Y is not observed?
  • Is there some standard way of constructing S which achieves optimal performance in general when Y is not observed?

My guess is that optimal performance would be achieved by throwing away all information contained in X about the distribution P[Y|X]. We always know that distribution just from observing X, and throwing away all info about that distribution should throw out all info about Y, and the minimal map interpretation suggests that that's the least information we can throw out.

Comment by johnswentworth on Embedded Agency via Abstraction · 2019-09-06T01:43:15.040Z · score: 13 (3 votes) · LW · GW

Great explanation, thanks. This really helped clear up what you're imagining.

I'll make a counter-claim against the core point:

... at that high level of abstraction, I am claiming that you should imagine an agent more as a flow through a fluid.

I think you make a strong case both that this will capture most (and possibly all) agenty behavior we care about, and that we need to think about agency this way long term. However, I don't think this points toward the right problems to tackle first.

Here's roughly the two notions of agency, as I'm currently imagining them:

  • "one-shot" agency: system takes in some data, chews on it, then outputs some actions directed at achieving a goal
  • "dynamic" agency: system takes in data and outputs decisions repeatedly, over time, gradually improving some notion of performance

I agree that we need a theory for the second version, for all of the reasons you listed - most notably robust delegation. I even agree that robust delegation is a central part of the problem - again, the considerations you list are solid examples, and you've largely convinced me on the importance of these issues. But consider two paths to build a theory of dynamic agency:

  • First understand one-shot agency, then think about dynamic agency in terms of processes which produce (a sequence of) effective one-shot agents
  • Tackle dynamic agency directly

My main claim is that the first path will be far easier, to the point that I do not expect anyone to make significant useful progress on understanding dynamic agency without first understanding one-shot agency.

Example: consider a cat. If we want to understand the whole cause-and-effect process which led to a cat's agenty behavior, then we need to think a lot about evolution. On the other hand, presumably people recognized that cats have agenty behavior long before anybody knew anything about evolution. People recognized that cats have goal-seeking behavior, people figured out (some of) what cats want, people gained some idea of what cats can and cannot learn... all long before understanding the process which produced the cat.

More abstractly: I generally agree that agenty behavior (e.g. a cat) seems unlikely to show up without some learning process to produce it (e.g. evolution). But it still seems possible to talk about agenty things without understanding - or even knowing anything about - the process which produced the agenty things. Indeed, it seems easier to talk about agenty things than to talk about the processes which produce them. This includes agenty things with pretty limited learning capabilities, for which the improving-over-time perspective doesn't work very well - cats can learn a bit, but they're finite and have pretty limited capacity.

Furthermore, one-shot (or at least finite) agency seems like it better describes the sort of things I mostly care about when I think about "agents" - e.g. cats. I want to be able to talk about cats as agents, in and of themselves, despite the cats not living indefinitely or converging to any sort of "optimal" behavior over long time spans or anything like that. I care about evolution mainly insofar as it lends insights into cats and other organisms - i.e., I care about long-term learning processes mainly insofar as it lends insights into finite agents. Or, in the language of subsystem alignment, I care about the outer optimization process mainly insofar as it lends insight into the mesa-optimizers (which are likely to be more one-shot-y, or at least finite). So it feels like we need a theory of one-shot agency just to define the sorts of things we want our theory of dynamic agency to talk about, especially from a mesa-optimizers perspective.

Conversely, if we already had a theory of what effective one-shot agents look like, then it would be a lot easier to ask "what sort of processes produce these kinds of systems"?

Comment by johnswentworth on Probability as Minimal Map · 2019-09-03T18:33:44.055Z · score: 2 (1 votes) · LW · GW

Throwing away information to deal with self-reference is a natural strategy in general; the ideas in this post specifically suggest that we use probabilities to represent the information which remains.

Comment by johnswentworth on Probability as Minimal Map · 2019-09-02T15:33:52.661Z · score: 2 (1 votes) · LW · GW

That is highly relevant, and is basically where I was planning to go with my next post. In particular, see the dice problem in this comment - sometimes throwing away information requires randomizing our probability distribution. I suspect that this idea can be used to rederive Nash equilibria in a somewhat-more-embedded-looking scenario, e.g. our opponent making its decision by running a copy of us to see what we do.

Thanks for pointing out the relevance of your work, I'll probably use some of it.

Comment by johnswentworth on AI Alignment Writing Day Roundup #1 · 2019-08-30T02:15:02.810Z · score: 12 (3 votes) · LW · GW

Planning of vengeance continues apace, either way.

Comment by johnswentworth on AI Alignment Writing Day Roundup #1 · 2019-08-30T02:06:14.069Z · score: 2 (1 votes) · LW · GW

You know it's "John S. Wentworth", not "Swentworth", right?