Posts

Better priors as a safety problem 2020-07-05T21:20:02.851Z · score: 20 (4 votes)
Learning the prior 2020-07-05T21:00:01.192Z · score: 36 (7 votes)
Inaccessible information 2020-06-03T05:10:02.844Z · score: 77 (26 votes)
Writeup: Progress on AI Safety via Debate 2020-02-05T21:04:05.303Z · score: 81 (25 votes)
Hedonic asymmetries 2020-01-26T02:10:01.323Z · score: 84 (30 votes)
Moral public goods 2020-01-26T00:10:01.803Z · score: 125 (43 votes)
Of arguments and wagers 2020-01-10T22:20:02.213Z · score: 58 (19 votes)
Prediction markets for internet points? 2019-10-27T19:30:00.898Z · score: 40 (18 votes)
AI alignment landscape 2019-10-13T02:10:01.135Z · score: 43 (16 votes)
Taxing investment income is complicated 2019-09-22T01:30:01.242Z · score: 34 (13 votes)
The strategy-stealing assumption 2019-09-16T15:23:25.339Z · score: 68 (18 votes)
Reframing the evolutionary benefit of sex 2019-09-14T17:00:01.184Z · score: 67 (23 votes)
Ought: why it matters and ways to help 2019-07-25T18:00:27.918Z · score: 88 (36 votes)
Aligning a toy model of optimization 2019-06-28T20:23:51.337Z · score: 52 (17 votes)
What failure looks like 2019-03-17T20:18:59.800Z · score: 222 (97 votes)
Security amplification 2019-02-06T17:28:19.995Z · score: 20 (4 votes)
Reliability amplification 2019-01-31T21:12:18.591Z · score: 22 (6 votes)
Techniques for optimizing worst-case performance 2019-01-28T21:29:53.164Z · score: 24 (7 votes)
Thoughts on reward engineering 2019-01-24T20:15:05.251Z · score: 31 (9 votes)
Learning with catastrophes 2019-01-23T03:01:26.397Z · score: 28 (9 votes)
Capability amplification 2019-01-20T07:03:27.879Z · score: 24 (7 votes)
The reward engineering problem 2019-01-16T18:47:24.075Z · score: 24 (5 votes)
Towards formalizing universality 2019-01-13T20:39:21.726Z · score: 29 (6 votes)
Directions and desiderata for AI alignment 2019-01-13T07:47:13.581Z · score: 30 (7 votes)
Ambitious vs. narrow value learning 2019-01-12T06:18:21.747Z · score: 21 (7 votes)
AlphaGo Zero and capability amplification 2019-01-09T00:40:13.391Z · score: 30 (13 votes)
Supervising strong learners by amplifying weak experts 2019-01-06T07:00:58.680Z · score: 28 (7 votes)
Benign model-free RL 2018-12-02T04:10:45.205Z · score: 13 (4 votes)
Corrigibility 2018-11-27T21:50:10.517Z · score: 40 (10 votes)
Humans Consulting HCH 2018-11-25T23:18:55.247Z · score: 20 (4 votes)
Approval-directed bootstrapping 2018-11-25T23:18:47.542Z · score: 19 (4 votes)
Approval-directed agents 2018-11-22T21:15:28.956Z · score: 29 (5 votes)
Prosaic AI alignment 2018-11-20T13:56:39.773Z · score: 38 (11 votes)
An unaligned benchmark 2018-11-17T15:51:03.448Z · score: 28 (7 votes)
Clarifying "AI Alignment" 2018-11-15T14:41:57.599Z · score: 61 (18 votes)
The Steering Problem 2018-11-13T17:14:56.557Z · score: 39 (11 votes)
Preface to the sequence on iterated amplification 2018-11-10T13:24:13.200Z · score: 40 (15 votes)
The easy goal inference problem is still hard 2018-11-03T14:41:55.464Z · score: 42 (13 votes)
Could we send a message to the distant future? 2018-06-09T04:27:00.544Z · score: 40 (14 votes)
When is unaligned AI morally valuable? 2018-05-25T01:57:55.579Z · score: 102 (32 votes)
Open question: are minimal circuits daemon-free? 2018-05-05T22:40:20.509Z · score: 122 (39 votes)
Weird question: could we see distant aliens? 2018-04-20T06:40:18.022Z · score: 85 (25 votes)
Implicit extortion 2018-04-13T16:33:21.503Z · score: 74 (22 votes)
Prize for probable problems 2018-03-08T16:58:11.536Z · score: 138 (38 votes)
Argument, intuition, and recursion 2018-03-05T01:37:36.120Z · score: 103 (31 votes)
Funding for AI alignment research 2018-03-03T21:52:50.715Z · score: 108 (29 votes)
Funding for independent AI alignment research 2018-03-03T21:44:44.000Z · score: 5 (1 votes)
The abruptness of nuclear weapons 2018-02-25T17:40:35.656Z · score: 105 (37 votes)
Arguments about fast takeoff 2018-02-25T04:53:36.083Z · score: 116 (38 votes)
Funding opportunity for AI alignment research 2017-08-27T05:23:46.000Z · score: 1 (1 votes)

Comments

Comment by paulfchristiano on Learning the prior · 2020-07-06T03:30:40.809Z · score: 4 (2 votes) · LW · GW
I'm not totally sure what actually distinguishes f and Z, especially once you start jointly optimizing them. If f incorporates background knowledge about the world, it can do better at prediction tasks. Normally we imagine f having many more parameters than Z, and so being more likely to squirrel away extra facts, but if Z is large then we might imagine it containing computationally interesting artifacts like patterns that are designed to train a trainable f on background knowledge in a way that doesn't look much like human-written text.

f is just predicting P(y|x, Z), it's not trying to model D. So you don't gain anything by putting facts about the data distribution in f---you have to put them in Z so that it changes P(y|x,Z).

Now, maybe you can try to ensure that Z is at least somewhat textlike via making sure it's not too easy for a discriminator to tell from human text, or requiring it to play some functional role in a pure text generator, or whatever.

The only thing Z does is get handed to the human for computing P(y|x,Z).

Comment by paulfchristiano on Learning the prior · 2020-07-06T03:27:22.524Z · score: 4 (2 votes) · LW · GW

The difference is that you can draw as many samples as you want from D* and they are all iid. Neural nets are fine in that regime.

Comment by paulfchristiano on AI Unsafety via Non-Zero-Sum Debate · 2020-07-06T00:10:04.628Z · score: 4 (2 votes) · LW · GW

It seems even worse than any of that. If your AI wanted anything at all it might debate well in order to survive. So if you are banking on it single-mindedly wanting to win the debate then you were already in deep trouble.

Comment by paulfchristiano on Second Wave Covid Deaths? · 2020-07-04T00:31:59.721Z · score: 12 (3 votes) · LW · GW
I don't understand how the second wave can't be explained by increase in testing. Before only people who were sick were allowed to be tested, who correlate more with hospital visits, which correlates more with deaths, so it more closely follows the death graph.

US positive test rate is up from 4.4% to 7.4%: https://coronavirus.jhu.edu/testing/individual-states

It used to be the case that 4.4% of people you tested had COVID-19.

Now you test more people, who look less risky on average, and find that 7.4% of people you test have COVID-19. The people you would have tested in the old days are the riskiest subgroup, so more than 7.4% of them have COVID-19.

So it sure seems like the infection rate went up by at least (7.4/4.4) = +70%.

Comment by paulfchristiano on High Stock Prices Make Sense Right Now · 2020-07-04T00:18:48.828Z · score: 4 (2 votes) · LW · GW

My impression is that most individual investors and pension funds put a significant part of their portfolio into bonds.

Comment by paulfchristiano on High Stock Prices Make Sense Right Now · 2020-07-04T00:09:02.642Z · score: 9 (5 votes) · LW · GW

I'd love to get evidence on that and it seems important.

Your position doesn't sound right to me. You don't need many people changing their allocations moderately to totally swamp a 1% change in inflows.

My guess would be that more than 10% of investors, weighted by total equity holdings, adjust their exposure deliberately, but I'd love to know the real numbers.

Comment by paulfchristiano on The "AI Debate" Debate · 2020-07-03T23:57:16.832Z · score: 2 (1 votes) · LW · GW
Do you think something like IDA is the only plausible approach to alignment? If so, I hadn't realized that, and I'd be curious to hear more arguments, or just intuitions are fine. The aligned overseer you describe is supposed to make treachery impossible by recognizing it, so it seems your concern is equivalent to the concern: "any agent (we make) that learns to act will be treacherous if treachery is possible." Are all learning agents fundamentally out to get you? I suppose that's a live possibility to me, but it seems to me there is a possibility we could design an agent that is not inclined to treachery, even if the treachery wouldn't be recognized

No, but what are the approaches to avoiding deceptive alignment that don't go through competitiveness?

I guess the obvious one is "don't use ML," and I agree that doesn't require competitiveness.

Edit: even so, having two internal components that are competitive with each other (e.g. overseer and overseee) does not require competitiveness with other projects.

No, but now we are starting to play the game of throttling the overseee (to avoid it overpowering the overseer) and it's not clear how this is going to work and be stable. It currently seems like the only appealing approach to getting stability there is to ensure the overseer is competitive.

Comment by paulfchristiano on The "AI Debate" Debate · 2020-07-03T23:55:34.977Z · score: 2 (1 votes) · LW · GW
This argument seems to prove too much. Are you saying that if society has learned how to do artificial induction at a superhuman level, then by the time we give a safe planner that induction subroutine, someone will have already given that induction routine to an unsafe planner? If so, what hope is there as prediction algorithms relentlessly improve? In my view, the whole point of AGI Safety research is to try to come up with ways to use powerful-enough-to-kill-you artificial induction in a way that it doesn't kill you (and helps you achieve your other goals). But it seems you're saying that there is a certain level of ingenuity where malicious agents will probably act with that level of ingenuity before benign agents do.

I'm saying that if you can't protect yourself from an AI in your lab, under conditions that you carefully control, you probably couldn't protect yourself from AI systems out there in the world.

The hope is that you can protect yourself from an AI in your lab.

Comment by paulfchristiano on The "AI Debate" Debate · 2020-07-03T23:53:10.540Z · score: 4 (2 votes) · LW · GW
So competitiveness still matters somewhat, but here's a potential disagreement we might have: I think we will probably have at least a few months, and maybe more than a year, where the top one or two teams have AGI (powerful enough to kill everyone if let loose), and nobody else has anything more valuable than an Amazon Mechanical Turk worker.

Definitely a disagreement, I think that before anyone has an AGI that could beat humans in a fistfight, tons of people will have systems much much more valuable than a mechanical turk worker.

Comment by paulfchristiano on The "AI Debate" Debate · 2020-07-03T23:50:59.784Z · score: 2 (1 votes) · LW · GW
The way I map these concepts, this feels like an elision to me. I understand what you're saying, but I would like to have a term for "this AI isn't trying to kill me", and I think "safe" is a good one. That's the relevant sense of "safe" when I say "if it's safe, we can try it out and tinker". So maybe we can recruit another word to describe an AI that is both safe itself and able to protect us from other agents.

I mean that we don't have any process that looks like debate that could produce an agent that wasn't trying to kill you without being competitive, because debate relies on using aligned agents to guide the training process (and if they aren't competitive then the agent-being-trained will, at least in the limit, converge to an equilibrium where it kills you).

Comment by paulfchristiano on High Stock Prices Make Sense Right Now · 2020-07-03T22:14:32.302Z · score: 18 (6 votes) · LW · GW

The main reason I'm personally confused is that 2 months ago I thought there was real uncertainty about whether we'd be able to keep the pandemic under control. Over the last 2 months that uncertainty has gradually been resolved in the negative, without much positive news about people's willingness to throw in the towel rather than continuing to panic and do lockdowns, and yet over that period SPY has continued moving up.

I'm making no attempt at all to estimate prices based on fundamentals and I'm honestly not even sure how that exercise is supposed to work. Interest rates are very low and volatility isn't that high so it seems like you would have extremely high equity prices if e.g. most investors were rational with plausible utility functions. But equity prices are never nearly as high as that kind of analysis would suggest.

Comment by paulfchristiano on High Stock Prices Make Sense Right Now · 2020-07-03T22:05:25.056Z · score: 30 (14 votes) · LW · GW

I think people's annual income is on average <20% of their net worth ($100T vs $20T), maybe more like 15%.

So 2 months of +20% savings amounts to <1% increase in total savings, right?

If that's right, this doesn't seem very important relative to small changes in people's average allocation between equities/debt/currency, which fluctuate by 10%s during the normal business cycle.

Normally I expect higher savings rates to represent concern about having money in the future, which will be accompanied by a move to safer assets. And of course volatility is currently way up, so rational investors probably couldn't afford to invest nearly as much in stocks unless they were being compensated with significantly higher returns (which should involve prices only returning to normal levels as volatility falls).

Comment by paulfchristiano on The "AI Debate" Debate · 2020-07-03T00:24:25.726Z · score: 5 (3 votes) · LW · GW
So what if AI Debate survives this concern? That is, suppose we can reliably find a horizon-length for which running AI Debate is not existentially dangerous. One worry I've heard raised is that human judges will be unable to effectively judge arguments way above their level. My reaction is to this is that I don't know, but it's not an existential failure mode, so we could try it out and tinker with evaluation protocols until it works, or until we give up. If we can run AI Debate without incurring an existential risk, I don't see why it's important to resolve questions like this in advance.

There are two reasons to worry about this:

  • The purpose of research now is to understand the landscape of plausible alignment approaches, and from that perspective viability is as important as safety.
  • I think it is unlikely for a scheme like debate to be safe without being approximately competitive---the goal is to get honest answers which are competitive with a potential malicious agent, and then use those answers to ensure that malicious agent can't cause trouble and that the overall system can be stable to malicious perturbations. If your honest answers aren't competitive, then you can't do that and your situation isn't qualitatively different from a human trying to directly supervise a much smarter AI.

In practice I doubt the second consideration matters---if your AI could easily kill you in order to win a debate, probably someone else's AI has already killed you to take your money (and long before that your society totally fell apart). That is, safety separate from competitiveness mostly matters in scenarios where you have very large leads / very rapid takeoffs.

Even if you were the only AI project on earth, I think competitiveness is the main thing responsible for internal regulation and stability. For example, it seems to me you need competitiveness for any of the plausible approaches for avoiding deceptive alignment (since they require having an aligned overseer who can understand what a treacherous agent is doing). More generally, trying to maintain a totally sanitized internal environment seems a lot harder than trying to maintain a competitive internal environment where misaligned agents won't be at a competitive advantage.

Comment by paulfchristiano on Karma fluctuations? · 2020-07-02T16:03:28.423Z · score: 4 (2 votes) · LW · GW
(including and indeed especially content that I mostly agree with)

In retrospect this was too self-flattering. Plenty of the stuff I don't want to see expresses ideas that I agree with, but the majority expresses ideas I disagree with.

Comment by paulfchristiano on Second Wave Covid Deaths? · 2020-07-02T01:53:57.161Z · score: 27 (8 votes) · LW · GW

(Disclaimer: I don't know what I'm talking about, pointers to real literature would be more useful than this, every sentence deserves to be aggressively hedged/caveated, etc.)

Increasing test capacity: I've seen some people suggest that the second wave is just an artifact of increased testing in these states. If that were the case, then there would be no rise in covid cases to be explained. But then I would expect the fraction of tests that returned positive to be decreasing, and we aren't seeing that. This one seems like wishful thinking to me.

I don't think the increase in testing capacity fully explains the "second wave," but I think it does totally change the quantitative picture.

Intuitively I expect that (rate of change in positive test %) is better than (rate of change in confirmed cases) as a way of approximating (rate of change in actual cases). It also doesn't seem great, especially over multiple weeks, but I'll use it here until someone convinces me this is dumb.

Johns Hopkins aggregates testing numbers here. Picking CA as a second-wave state, it hit its minimum positive test rate of .04 on May 24. That rate rose by 20% by June 21, to 0.048 (and has kept going up).

If there was a 7 day lag, we'd expect to see a 20% increase in deaths by from May 31 to June 28. Eyeballing the google deaths data things look basically flat. So I guess that means a drop of ~20% in fatality rate over that month.

Trying again, let's take Georgia. Minimum of .058 on June 10, up 50% to .091 by June 21. Google seems to have deaths roughly constant or maybe decreasing from June 17 to June 28, which is a ballpark ~30% drop in fatality rate to offset the ~50% increase in infections.

One problem with these numbers is that I think the test numbers are for day the test occurred, but the death numbers are for the day they are reported. Would probably be better to use numbers for the day the death actually occurred, though I think that probably requires going at least a few days further back in time (which is going to make it harder to interpret cases like Georgia that hit the minimum only 3 weeks ago).

Delayed initial testing: When things were first taking off in first wave states, our testing capacity was way behind where it needed to be. Perhaps this heavily suppressed the initial "confirmed" numbers for the first wave, and so we should expect to see second wave deaths rise in the next few weeks?

It seems like the average time lag between showing symptoms and dying from COVID is something like 18 days (here, data from China but if anything I expect longer lags here). So if we were testing people earlier it seems like we could easily have more like a 2 week lag than a 1 week lag. That could mostly explain Georgia and California.

Overall I can't really tell what's going on, my sense is that your story in the post is basically right (and demographic changes sound likely) but that the mystery to be explained is *much* less than a 5x change in fatality rate. I feel like the constant death rate in the face of exploding cases is suspicious but best guess is that it's a coincidence, death rates will end up rising and IFR will end up modestly lower than the initial wave.

I would love to see a version of the analysis in the OP controlling for big increases in testing, and getting a more careful handle on lags between testing and death. Hopefully someone has already done that and it's just a matter of someone here finding the cite.

Comment by paulfchristiano on Relating HCH and Logical Induction · 2020-06-17T02:02:06.797Z · score: 2 (1 votes) · LW · GW
Think of it this way. We want to use a BRO to answer questions. We know it's very powerful, but at first, we don't have a clue as to how to answer questions with it. So we implement a Bayesian mixture-of-experts, which we call the "market". Each "trader" is a question-answering strategy: a way to use the BRO to answer questions. We give each possible strategy for using the BRO some weight. However, our "market" is itself a BRO computation. So, each trader has access to the market itself (in addition to many other computations which the BRO can access for them).

But a BRO only has oracle access to machines using smaller BROs, right? So a trader can't access the market?

(I don't think very much directly about the tree-size-limited version of HCH, normally I think of bounded versions like HCH(P) = "Humans consulting P's predictions of HCH(P)".)

Comment by paulfchristiano on Relating HCH and Logical Induction · 2020-06-17T01:59:13.079Z · score: 4 (2 votes) · LW · GW

There are two salient ways to get better predictions: deliberation and trial+error. HCH is about deliberation, and logical inductors are about trial and error. The benefit of trial and error is that it works eventually. The cost is that it doesn't optimize what you want (unless what you want is the logical induction criterion) and that it will generally get taken over by consequentialists who can exercise malicious influence a constant number of times before the asymptotics assert themselves. The benefit of deliberation is that its preferences are potentially specified indirectly by the original deliberator (rather than externally by the criterion for trial and error) and that if the original deliberator is strong enough they may suppress internal selection pressures, while the cost is that who knows if it works.

Comment by paulfchristiano on Karma fluctuations? · 2020-06-11T00:47:47.181Z · score: 15 (8 votes) · LW · GW
Is downvoting really used here for posts that are not spam or trolling?

Yes.

But I guess I’m surprised if people actually behave that way?

What makes this surprising?

some posts are controversial enough to receive active downvotes vs passive ignoring.

The point is to downvote content that you want to see less of, not content that you disagree with. If by "controversial" you mean "that some people don't want to see it," then I can't speak for others but I can say that personally the whole internet is full of content that I don't want to see (including and indeed especially content that I mostly agree with).

Comment by paulfchristiano on Inaccessible information · 2020-06-06T16:19:38.106Z · score: 2 (1 votes) · LW · GW

I agree that if you had a handle on accessing average optimal value then you'd be making headway.

I don't think it covers everything, since e.g. safety / integrity of deliberation / etc. are also important, and because instrumental values aren't quite clean enough (e.g. even if AI safety was super easy these agents would only work on the version that was useful for optimizing values from the mixture used).

But my bigger Q is how to make headway on accessing average optimal value, and whether we're able to make the problem easier by focusing on average optimal value.

Comment by paulfchristiano on Reply to Paul Christiano's “Inaccessible Information” · 2020-06-05T18:22:05.960Z · score: 34 (13 votes) · LW · GW

I thought this was a great summary, thanks!

Yes it’s true that much of MIRI’s research is about finding a solution to the design problem for intelligent systems that does not rest on a blind search for policies that satisfy some evaluation procedure. But it seems strange to describe this approach as “hope you can find some other way to produce powerful AI”, as though we know of no other approach to engineering sophisticated systems other than search.

I agree that the success of design in other domains is a great sign and reason for hope. But for now such approaches are being badly outperformed by search (in AI).

Maybe it's unfair to say "find some other way to produce powerful AI" because we already know the way: just design it yourself. But I think "design" is basically just another word for "find some way to do it," and we don't yet have any history of competitive designs to imitate or extrapolate from.

Personally, the main reason I'm optimistic about design in the future is that the designers may themselves be AI systems. That may help close the current gap between design and search, since both could then benefit from large amounts of computing power. (And it's plausible that we are currently bottlenecked on a meta-design problem of figuring out how to build automated designers.) That said, it's completely unclear whether that will actually beat search.

I consider my job as preparing for the worst w.r.t. search, since that currently seems like a better place to invest resources (and I think it's reasonably likely that dangerous search will be involved even if our AI ecosystem mostly revolves around design). I do think that I'd fall back to pushing on design if this ended up looking hopeless enough. If that happens, I'm hoping that by that time we'll have some much harder evidence that search is a lost cause, so that we can get other people to also jump ship from search to design.

Comment by paulfchristiano on Inaccessible information · 2020-06-03T23:36:34.060Z · score: 8 (4 votes) · LW · GW
To help check my understanding, your previously described proposal to access this "inaccessible" information involves building corrigible AI via iterated amplification, then using that AI to capture "flexible influence over the future", right? Have you become more pessimistic about this proposal, or are you just explaining some existing doubts? Can you explain in more detail why you think it may fail?
(I'll try to guess.) Is it that corrigibility is about short-term preferences-on-reflection and short-term preferences-on-reflection may themselves be inaccessible information?

I think that's right. The difficulty is that short-term preferences-on-reflection depend on "how good is this situation actually?" and that judgment is inaccessible.

This post doesn't reflect me becoming more pessimistic about iterated amplification or alignment overall. This post is part of the effort to pin down the hard cases for iterated amplification, which I suspect will also be hard cases for other alignment strategies (for the kinds of reasons discussed in this post).

This seems similar to what I wrote in an earlier thread: "What if the user fails to realize that a certain kind of resource is valuable?

Yeah, I think that's similar. I'm including this as part of the alignment problem---if unaligned AIs realize that a certain kind of resource is valuable but aligned AIs don't realize that, or can't integrate it with knowledge about what the users want (well enough to do strategy stealing) then we've failed to build competitive aligned AI.

(By “resources” we’re talking about things that include more than just physical resources, like control of strategic locations, useful technologies that might require long lead times to develop, reputations, etc., right?)"

Yes.

At the time I thought you proposed to solve this problem by using the user's "preferences-on-reflection", which presumably would correctly value all resources/costs. So again is it just that "preferences-on-reflection" may itself be inaccessible?

Yes.

Besides the above, can you give some more examples of (what you think may be) "inaccessible knowledge that is never produced by amplification"?

If we are using iterated amplification to try to train a system that answers the question "What action will put me in the best position to flourish over the long term?" then in some sense the only inaccessible information that matters is "To what extent will this action put me in a good position to flourish?" That information is potentially inaccessible because it depends on the kind of inaccessible information described in this post---what technologies are valuable? what's the political situation? am I being manipulated? is my physical environment being manipulated?---and so forth. That information in turn is potentially inaccessible because it may depend on internal features of models that are only validated by trial and error, for which we can't elicit the correct answer either by directly checking it nor by transfer from other accessible features of the model.

(I might be misunderstanding your question.)

(I guess an overall feedback is that in most of the post you discuss inaccessible information without talking about amplification, and then quickly talk about amplification in the last section, but it's not easy to see how the two ideas relate without more explanations and examples.)

By default I don't expect to give enough explanations or examples :) My next step in this direction will be thinking through possible approaches for eliciting inaccessible information, which I may write about but which I don't expect to be significantly more useful than this. I'm not that motivated to invest a ton of time in writing about these issues clearly because I think it's fairly likely that my understanding will change substantially with more thinking, and I think this isn't a natural kind of "checkpoint" to try to explain clearly. Like most posts on my blog, you should probably regard this primarily as a record of Paul's thinking. (Though it would be great if it could be useful as explanation as a side effect, and I'm willing to put in a some time to try to make it useful as explanation, just not the amount of time that I expect would be required.)

(My next steps on exposition will be trying to better explain more fundamental aspects of my view.)

Comment by paulfchristiano on Inaccessible information · 2020-06-03T23:25:23.242Z · score: 6 (3 votes) · LW · GW

I don't mean to say that "What's the weight of Neptune?" is accessible if a model transfers to saying "The weight of Neptune is 100kg." I mean that "What's the weight of Neptune?" is accessible if a model transfers to correctly reporting the weight of Neptune (or rather if it transfers in such a way that its answers give real evidence about the weight of Neptune, or rather that the evidence is accessible in that case, or... you can see why it's hard to be formal).

If we wanted to be more formal but less correct, we could talk about accessibility of functions from possible worlds. Then a function f* is accessible when you can check a claimed value f* (using oracles for other accessible functions), or if you can find some encoding R of functions and some value r* such that the simplest function mapping R(f) -> f(real world) for all accessible functions also maps r* -> f*(real world).

Comment by paulfchristiano on Solar system colonisation might not be driven by economics · 2020-04-24T20:03:27.770Z · score: 5 (3 votes) · LW · GW

I think we have about 10 more doublings of energy consumption before we're using most incident solar energy. We're currently doubling energy use every few decades, so that could sustain a few centuries of growth at the current rate. (Like many folks on LW, I expect growth to accelerate enough that we start running up against those limits within this century though.)

Comment by paulfchristiano on Solar system colonisation might not be driven by economics · 2020-04-23T03:27:04.851Z · score: 4 (3 votes) · LW · GW

What timescale are you talking about? I guess it's asking "are we going to colonize the solar system before growing a lot here on earth?" I agree that seems pretty unlikely though I'm not sure this is the best argument.

My default expectation would be that humans would be motivated to move to space when we've expanded enough that doing things on earth is getting expensive---we are running out of space, sunlight, material, or whatever else. You don't have to extrapolate growth that far before you start having quite severe crunches, so if growth continues (even at the current rate) then it won't be that long before we are colonizing the solar system.

(Even if people did expand into space before we needed the resources, it wouldn't matter much since they'd be easily overtaken by later colonists.)

Comment by paulfchristiano on Seemingly Popular Covid-19 Model is Obvious Nonsense · 2020-04-13T01:37:49.922Z · score: 26 (8 votes) · LW · GW

From a quick skim of the paper it looks like they effectively assume that implementing any 3 of those social distancing measures at the same time that Wuhan implemented their lockdown would lead to the same number of total deaths (with some adjustments).

This is less aggressive than assuming no new deaths after lockdown, but does seem quite optimistic given that the lockdown in Wuhan seems (much) more severe than school closures + travel restrictions + non-essential business closures. And this part of the model seems to be assumed rather than fit to data.

Comment by paulfchristiano on Three Kinds of Competitiveness · 2020-04-01T01:19:07.370Z · score: 5 (3 votes) · LW · GW

I think our current best implementation of IDA would neither be competitive nor scalably aligned :)

Comment by paulfchristiano on Three Kinds of Competitiveness · 2020-03-31T16:49:23.928Z · score: 4 (2 votes) · LW · GW

In most cases you can continuously trade off performance and cost; for that reason I usually think of them as a single metric of "competitive with X% overhead." I agree there are cases where they come apart, but I think there are pretty few examples. (Even for nuclear weapons you could ask "how much more expensive is it to run a similarly-destructive bombing campaign with conventional explosives.")

I think this works best if you consider a sequence of increments each worth +10%, rather than say accumulating 70 of those increments, because "spend 1000x more" is normally not available and so we don't have a useful handle on what a technology looks like when scaled up 1000x (and that scaleup would usually involve a bunch of changes that are hard to anticipate).

That is, if we have a sequence of technologies A0, A1, A2, ..., AN, each of which is 10% cheaper than the one before, then we may say that AN is better than A0 by N 10% steps (rather than trying to directly evaluate how many orders of magnitude you'd have to spend on A0 to compete with AN, because the process "spend a thousand times more on A0 in a not-stupid way" is actually kind of hard to imagine).

Comment by paulfchristiano on Three Kinds of Competitiveness · 2020-03-31T16:43:32.877Z · score: 5 (3 votes) · LW · GW

IDA is really aiming to be cost-competitive and performance-competitive, say to within overhead of 10%. That may or may not be possible, but it's the goal.

If the compute required to build and run your reward function is small relative to the compute required to train your model, then it seems like overhead is small. If you can do semi-supervised RL and only require a reward function evaluation on a minority of trajectories (e.g. because most of the work is learning about how to manipulate the environment), then you can be OK as long as the cost of running the reward function isn't too much higher.

Whether that's possible is a big open question. Whether it's date competitive depends on how fast you figure out how to do it.

Comment by paulfchristiano on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T16:00:02.205Z · score: 20 (12 votes) · LW · GW

I think "makes 50% of currently-skeptical people change their minds" is a high bar for a warning shot. On that definition e.g. COVID-19 will probably not be a warning shot for existential risk from pandemics. I do think it is plausible that AI warning shots won't be much better than pandemic warning shots. (On your definition it seems likely that there won't ever again be a warning shot for any existential risk.)

For a more normal bar, I expect plenty of AI systems to fail at large scales in ways that seem like "malice," and then to cover up the fact that they've failed. AI employees will embezzle funds, AI assistants will threaten and manipulate their users, AI soldiers will desert. Events like this will make it clear to most people that there is a serious problem, which plenty of people will be working on in order to make AI useful. The base rate will remain low but there will be periodic high-profile blow-ups.

I don't expect this kind of total unity of AI motivations you are imagining, where all of them want to take over the world (so that the only case where you see something frightening is a failed bid to take over the world). That seems pretty unlikely to me, though it's conceivable (maybe 10-20%?) and may be an important risk scenario. I think it's much more likely that we stamp out all of the other failures gradually, and are left with only the patient+treacherous failures, and in that case whether it's a warning shot or not depends entirely on how much people are willing to generalize.

I do think the situation in the AI community will be radically different after observing these kinds of warning shots, even if we don't observe an AI literally taking over a country.

There is a very narrow range of AI capability between "too stupid to do significant damage of the sort that would scare people" and "too smart to fail at takeover if it tried."

Why do you think this is true? Do you think it's true of humans? I think it's plausible if you require "take over a country" but not if you require e.g. "kill plenty of people" or "scare people who hear about it a lot."

(This is all focused on intent alignment warning shots. I expect there will also be other scary consequences of AI that get people's attention, but the argument in your post seemed to be just about intent alignment failures.)

Comment by paulfchristiano on March Coronavirus Open Thread · 2020-03-12T02:36:09.213Z · score: 22 (11 votes) · LW · GW

Disclaimer: I don't know if this is right, I'm reasoning entirely from first principles.

If there is dispersion in R0, then there would likely be some places where the virus survives even if you take draconian measures. If you later relax those draconian measures, it will begin spreading in the larger population again at the same rate as before.

In particular, if the number of cases is currently decreasing overall most places, then soon most of the cases will be in regions or communities where containment was less successful and so the number of cases will stop decreasing.

If it's infeasible to literally stamp it out everywhere (which I've heard), then you basically want to either delay long enough to have a vaccine or have people get sick at the largest rate that the health care system can handle.

Comment by paulfchristiano on Writeup: Progress on AI Safety via Debate · 2020-02-20T02:37:43.449Z · score: 6 (3 votes) · LW · GW

The intuitive idea is to share activations as well as weights, i.e. to have two heads (or more realistically one head consulted twice) on top of the same model. There is a fair amount of uncertainty about this kind of "detail" but I think for now it's smaller than the fundamental uncertainty about whether anything in this vague direction will work.

Comment by paulfchristiano on On the falsifiability of hypercomputation, part 2: finite input streams · 2020-02-17T20:27:51.761Z · score: 6 (3 votes) · LW · GW

It's an interesting coincidence that arbitration is the strongest thing we can falsify, and also apparently the strongest thing that can consistently apply to itself (if we allow probabilistic arbitration). Maybe not a coincidence?

Comment by paulfchristiano on On the falsifiability of hypercomputation, part 2: finite input streams · 2020-02-17T20:27:35.185Z · score: 8 (4 votes) · LW · GW

It's not obvious to me that "consistent with PA" is the right standard for falsification though. It seems like simplicity considerations might lead you to adopt a stronger theory, and that this might allow for some weaker probabilistic version of falsification for things beyond arbitration. After all, how did we get induction anyway?

(Do we need induction, or could we think of falsification as being relative to some weaker theory?)

(Maybe this is just advocating for epistemic norms other than falsification though. It seems like the above move would be analogous to saying: the hypothesis that X is a halting oracle is really simple and explains the data, so we'll go with it even though it's not falsifiable.)

Comment by paulfchristiano on Open & Welcome Thread - February 2020 · 2020-02-05T17:35:38.882Z · score: 9 (4 votes) · LW · GW

tl;dr: seems like you need some story for what values a group highly regards / rewards. If those are just the values that serve the group, this doesn't sound very distinct from "groups try to enforce norms which benefit the group, e.g. public goods provision" + "those norms are partially successful, though people additionally misrepresent the extent to which they e.g. contribute to public goods."

Similarly, larger countries do not have higher ODA as the public goods model predicts

Calling this the "public goods model" still seems backwards. "Larger countries have higher ODA" is a prediction of "the point of ODA is to satisfy the donor's consequentialist altruistic preferences."

The "public goods model" is an attempt to model the kind of moral norms / rhetoric / pressures / etc. that seem non-consequentialist. It suggests that such norms function in part to coordinate the provision of public goods, rather than as a direct expression of individual altruistic preferences. (Individual altruistic preferences will sometimes be why something is a public good.)

This system probably evolved to "solve" local problems like local public goods and fairness within the local community, but has been co-opted by larger-scale moral memeplexes.

I agree that there are likely to be failures of this system (viewed teleologically as a mechanism for public goods provision or conflict resolution) and that "moral norms are reliably oriented towards provide public goods" is less good than "moral norms are vaguely oriented towards providing public goods." Overall the situation seems similar to a teleological view of humans.

For example if global anti-poverty suddenly becomes much more cost effective, one doesn't vote or donate to spend more on global poverty, because the budget allocated to that faction hasn't changed.

I agree with this, but it seems orthogonal to the "public goods model," this is just about how people or groups aggregate across different values. I think it's pretty obvious in the case of imperfectly-coordinated groups (who can't make commitments to have their resource shares change as beliefs about relative efficacy change), and I think it also seems right in the case of imperfectly-internally-coordinated people.

(We have preference alteration because preference falsification is cognitively costly, and we have preference falsification because preference alteration is costly in terms of physical resources.)

Relevant links: if we can't lie to others, we will lie to ourselves, the monkey and the machine.

E.g., people overcompensate for private deviations from moral norms by putting lots of effort into public signaling including punishing norm violators and non-punishers, causing even more preference alteration and falsification by others.

I don't immediately see why this would be "compensation," it seems like public signaling of virtue would always be a good idea regardless of your private behavior. Indeed, it probably becomes a better idea as your private behavior is more virtuous (in economics you'd only call the behavior "signaling" to the extent that this is true).

As a general point, I think calling this "signaling" is kind of misleading. For example, when I follow the law, in part I'm "signaling" that I'm law-abiding, but to a significant extent I'm also just responding to incentives to follow the law which are imposed because other people want me to follow the law. That kind of thing is not normally called signaling. I think many of the places you are currently saying "virtue signaling" have significant non-signaling components.

Comment by paulfchristiano on Moral public goods · 2020-02-02T08:51:09.114Z · score: 2 (1 votes) · LW · GW
That reminds me that another prediction your model makes is that larger countries should spend more on ODA (which BTW excludes military aid), but this is false

The consideration in this post would help explain why smaller countries spend more than you would expect on a naive view (where ODA just satisfies the impartial preferences of the voting population in a simple consequentialist way). It seems like there is some confusion here, but I still don't feel like it's very important.

I think there was an (additional?) earlier miscommunication or error regarding the "factions within someone's brain":

  • When talking about the weight of altruistic preferences, I (like you) am generally more into models like "X% of my resources are controlled by an altruistic faction" rather than "I have X exchange rate between my welfare and the welfare of others." (For a given individual at a given time we can move between these freely, so it doesn't matter for any of the discussion in the OP.)
  • When I say that "resources controlled by altruistic factions" doesn't explain everything, I mean that you still need to have some additional hypothesis like "donations are like contributions to public goods." I don't think those two hypotheses are substitutes, and you probably need both (or some other alternative to "donations are like contributions to public goods," like some fleshed out version of "nothing is altruistic after all" which seems to be your preference but which I'm withholding judgment on until it's fleshed out.)
  • In the OP, I agree that "and especially their compromises between altruistic and selfish ends" was either wrong or unclear. I really meant the kind of tension that I described in the immediately following bullet point, where people appear to make very different tradeoffs between altruistic and selfish values in different contexts.
Comment by paulfchristiano on High-precision claims may be refuted without being replaced with other high-precision claims · 2020-01-31T02:24:34.911Z · score: 38 (13 votes) · LW · GW

It seems like there is a real phenomenon in computers and proofs (and some other brittle systems), where they are predicated on long sequences of precise relationships and so quickly break down as the relationships become slightly less true. But this situation seems rare in most domains.

If there's a single exception to conservation of energy, then a high percentage of modern physics theories completely break. The single exception may be sufficient to, for example, create perpetual motion machines. Physics, then, makes a very high-precision claim that energy is conserved, and a refuter of this claim need not supply an alternative physics.

I don't know what "break" means, these theories still give good predictions in everyday cases and it would be a silly reason to throw them out unless weird cases became common enough. You'd end up with something like "well we think these theories work in the places we are using them, and will keep doing so until we get a theory that works better in practice" rather than "this is a candidate for the laws governing nature." But that's just what most people have already done with nearly everything they call a "theory."

Physics is weird example because it's one of the only domains where we could hope to have a theory in the precise sense you are talking about. But even e.g. the standard model isn't such a theory! Maybe in practice "theories" are restricted to mathematics and computer science? (Not coincidentally, these are domains where the word "theory" isn't traditionally used.)

In particular, theories are also responsible for a negligible fraction of high-precision knowledge. My claim that there's an apple because I'm looking at an apple is fairly high-precision. Most people get there without having anything like an exceptionless "theory" explaining the relationship between the appearance of an apple and the actual presence of an apple. You could try and build up some exceptionless theories that can yield these kinds of judgments, but it will take you quite some time.

I'm personally happy never using the word "theory," not knowing what it means. But my broader concern is that there are a bunch of ways that people (including you) arrive at truth, that in the context of those mechanisms it's very frequently correct to say things like "well it's the best we have" of an explicit model that makes predictions, and that there are relatively few cases of "well it's the best we have" where the kind of reasoning in this post would move you from "incorrectly accept" to "correctly reject." (I don't know if you have an example in mind.)

(ETA: maybe by "theory" you mean something just like "energy is conserved"? But in these cases the alternative is obvious, namely "energy is often conserved," and it doesn't seem like that's a move anyone would question after having exhibited a counterexample. E.g. most people don't question "people often choose the option they prefer" as an improvement over "people always choose the option they prefer." Likewise, I think most people would accept "there isn't an apple on the table" as a reasonable alternative to "there is an apple on the table," though they might reasonably ask for a different explanation for their observations.)

Comment by paulfchristiano on Moral public goods · 2020-01-31T01:51:05.098Z · score: 2 (1 votes) · LW · GW
Looking at https://data.oecd.org/oda/net-oda.htm it appears that foreign aid as %GNI for DAC countries has actually gone down since 1960, and I don't see any correlation with any of the (non-enforced) agreements signed about the 0.7% target. It just looks like countries do ODA for reasons completely unrelated to the target/agreements.

Do you think the story is different for the climate change agreements? I guess the temporal trend is different, but I think the actual causal story from agreements to outcomes is equally unclear (I don't think the agreements have much causal role) and enforcement seems similarly non-existent.

Information goods are a form of public goods, and they are provided (in large part) because governments enforce copyrights.

Copyright enforcement seems more like trade. I will require my citizens to pay you for your information, if you require your citizens to pay me for my information.

You can analogize copyright enforcement to a public good if you want, but the actual dynamics of provision and cost-benefit analyses seem quite different. For example, signing up to a bilateral copyright agreement is a good deal between peer states (if copyright agreements ever are)---you've protected your citizens copyright to the same extent you've lost the ability to infringe on others. The same is not true of a public good, where bilateral agreement is almost the same as unilateral action.

At any rate, I actually don't think almost anything from the OP hinges on this disagreement (though it seems like an instructive difference in background views about the international order). We are just debating whether the lack of international agreements on foreign aid implies that people don't much care about the humanitarian impacts of aid, with me claiming that international coordination is generally weak with rare exceptions and so it's not much evidence.

There is plenty of other evidence though. E.g. when considering the US you don't really need to invoke international agreements. The US represents >20% of gross world product, so US unilateral action is nearly as good as international action. US government aid is mostly military aid which has no real pretension of humanitarian motivation, and I assume US private transfers to developing countries are mostly remittances. So I certainly agree that people in the US don't care to spend to very much on aid.

Comment by paulfchristiano on Moral public goods · 2020-01-30T20:44:52.206Z · score: 2 (1 votes) · LW · GW

I don't know much about this, but oecd.org describes a 0.7% target for ODA and claims that

the 0.7% target served as a reference for 2005 political commitments to increase ODA from the EU, the G8 Gleneagles Summit and the UN World Summit

and

DAC members generally accepted the 0.7% target for ODA, at least as a long-term objective, with some notable exceptions: Switzerland – not a member of the United Nations until 2002 – did not adopt the target, and the United States stated that it did not subscribe to specific targets or timetables, although it supported the more general aims of the Resolution.

At face value it seems like like 0.7%/year is considerably larger than the investments in any of the other efforts at international coordination you mention (and uptake seems comparable).

(The Montreal Protocol seems like a weird case in that the gains are so large---I've been told that the gains were large enough for the US that unilateral participation was basically justifiable. Copyright agreements don't seem like public goods provision. I don't think countries are meeting their Paris agreement obligations any better than they are meeting their 0.7%/year ODA targets, and enforcement seems just as non-existent.)

Comment by paulfchristiano on Moral public goods · 2020-01-30T20:15:55.003Z · score: 2 (1 votes) · LW · GW
Assuming I now have a correct understanding, I can restate my objection as, if anti-poverty is a public good, why hasn't it followed the trend of other public goods, and shifted from informal private provision to formal government or internationally-coordinated provision?

Most redistribution is provided formally by governments and it may be the single most common topic of political debate. I'm not even sure this is evidence one way or the other though---why would you expect people not to signal virtue by advocating for policies? (Isn't that a key part of your story?)

Relatedly, how does "we don't want the government to enforce X so that we can signal our virtue by doing X" even work? Advocating for "make everyone do X" signals the same kind of virtue as doing X, advocating against seems to send the opposite signal, and surely the signaling considerations are just as dominant for advocacy as for the object-level decision? I think I often can't really engage with the virtue signaling account because I don't understand it at the level of precision that would be needed to actually make a prediction about anything.

Domestically, are you asking: "why do people donate so much more to charity than to other public goods"? I don't think any of the competing theories really say much about that until we get way more specific about them and what makes a situation good for signaling virtue vs. what makes public goods easy to coordinate about in various ways vs. etc. (and also get way more into the quantitative data about other apparent public goods which are supported by donations).

(Overall this doesn't seem like a particularly useful line of discussion to me so I'm likely to drop it. Most useful for me would probably be a description of the virtue signaling account that makes sense to me.)

Comment by paulfchristiano on Moral public goods · 2020-01-30T16:54:23.004Z · score: 2 (1 votes) · LW · GW

I don't think Scott is talking about the bay area in that quote, is he?

(ETA: also if his estimate is per year then I think it's similar to the report you quoted, which estimates $700M/year to provide shelter to all of the homeless at a cost of ~$25k/person/year, so that seems like another plausible source of discrepancy.)

Comment by paulfchristiano on Moral public goods · 2020-01-30T16:41:33.572Z · score: 2 (1 votes) · LW · GW

For the nobles the ratio is only 1000 (= the total number of nobles). In e.g. the modern US the multiples are much higher since the tax base is much larger. That is, there is a gap of > a million between the levels of altruism at which you would prefer higher taxes vs. actually give away some money.

Comment by paulfchristiano on Moral public goods · 2020-01-29T04:03:25.865Z · score: 5 (2 votes) · LW · GW
What does it not explain, that your model explains?

If you literally just mean to compare to "people have a fixed fraction of their budget they spend on altruistic things":

  • Rhetoric about doing your part, feelings of guilt. In general the structural similarities between discourse and norms around alms on the one hand and public goods on the other.
  • If the bucket served by US donations is "caring about US needy" then I think you have to explain people's apparent relative ambivalence between political advocacy for more redistribution and direct donations.
  • I think that local giving makes more sense as part of a story about collective provision of public goods, though I haven't thought it through much and this may just be responding to rhetoric and so double-counting the first observation.

I haven't thought about it that much, but my basic sense is that you are going to have to invoke a virtue signaling explanation for lots of behaviors, and that's going to start to look more similar to norms for providing public goods. E.g. is your view that normal public goods (like funding a local park) are provided because of virtue signaling? If so, then it's not as clear there is much gap between our views and maybe this is more an argument about some accounts of "virtue signaling."

Comment by paulfchristiano on Moral public goods · 2020-01-29T03:49:40.136Z · score: 2 (1 votes) · LW · GW
I don't think such a comparison would make sense, since different public goods have different room for funding. For example the World Bank has a bigger budget than the WHO, but development/anti-poverty has a lot more room for funding (or less diminishing returns) than preventing global pandemics.

Do you have some example of a public good that you are using to calibrate your expectations about international spending on typical public goods?

I don't think it's enough to say: people do a tiny amount of X but they don't coordinate explicitly. You should also provide some evidence about the overall ability to coordinate.

(That said, I also agree that most of what's going on, for explaining the difference between real aid budgets and what a utilitarian would spend, is that people don't care very much.)

Comment by paulfchristiano on Moral public goods · 2020-01-28T18:12:35.278Z · score: 4 (3 votes) · LW · GW
I feel like you've come up with an example where people are just barely charitable enough that they support redistribution, but not charitable enough that they would ever give a gift themselves. This is a counterexample to Friedman's claim, but it's not obvious that it's real.

For consequentialists, the gap between "charitable enough to give" and "charitable enough to support redistribution" seems to be more than a million-fold; if so, I don't think it warrants that "just barely" modifier.

Comment by paulfchristiano on Hedonic asymmetries · 2020-01-28T18:10:47.249Z · score: 3 (2 votes) · LW · GW

I think this part of the reversed argument is wrong:

The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly

Even if the behaviors are very rare, and have a "normal" reward, then the agent will seek them out and so miss out on actually good states.

Comment by paulfchristiano on Moral public goods · 2020-01-27T16:35:51.029Z · score: 2 (1 votes) · LW · GW
Why do we see so little of that for global poverty?

I'm not convinced this is the case. Do you have some comparisons of international spending on different public goods, or lobbying for such spending?

(I agree that there is more international coordination on arms control, but don't think that this is analogous.)

Comment by paulfchristiano on Hedonic asymmetries · 2020-01-27T16:34:14.453Z · score: 3 (2 votes) · LW · GW
For example, if the world is symmetric in the appropriate sense in terms of what actions get you rewarded or penalized, and you maximize expected utility instead of satisficing in some way, then the argument is wrong. I'm sure there is good literature on how to model evolution as a player, and the modeling of the environment shouldn't be difficult.

I would think it would hold even in that case, why is it clearly wrong?

Comment by paulfchristiano on Moral public goods · 2020-01-27T07:10:04.942Z · score: 5 (3 votes) · LW · GW
Where does most of the income of rich people come from, then?

I think it's mostly wages.

Can you point me to some relevant resource?

Might be misreading, but see table III here (h/t Howie Lempel for the source). Looks like even the top 0.01% is still <50% capital income.

[Edit: though in the past the capital shares were higher, in 1929 gets up to 50% of income for the top 0.1%.]

There are various ways this data isn't exactly what you want, but I still think it's very unlikely that it's more than half capital income.

Comment by paulfchristiano on Moral public goods · 2020-01-26T17:56:31.274Z · score: 5 (3 votes) · LW · GW
'Redistribution' (ie. theft) is an exercise in pointlessness.

Using coercive force to fund public goods is also 'theft', but still it can end up with near-unanimous support. So I don't think that this is a good argument in and of itself.

As long as there is scarcity there will be haves and have nots, and wealth will accumulate as a natural function of time and successful strategies. You can reset the game board as often as you like but you can never ensure a permanent and even stalemate. Even assuming you could destroy the entire point of competing, well then you've destroyed everything you get from that too.

This post isn't really about leveling the playing field. (Even in the stupid example with nobles, the nobles still end up 1000x richer than the peasants.)

Comment by paulfchristiano on Moral public goods · 2020-01-26T17:48:04.224Z · score: 4 (2 votes) · LW · GW
I don't understand the new model that you're proposing here. If people want to see a world free from extreme poverty and view that as a public good provision problem, shouldn't they advocate for or work towards international coordination on achieving that? (Given international coordination on other issues, this clearly isn't an impossible goal.) Why are they unilaterally chipping in small amounts towards reducing poverty in a piecemeal fashion?

This seems to be how people relate to local public goods.

I've been modeling this as people having moral uncertainty which are modeled as factions within someone's brain, where the altruist faction has control over some (typically small) fraction of their budget.

I think that's a better model than fixed weights for different values, but I don't think it explains everything.