The Argument from Philosophical Difficulty

post by Wei_Dai · 2019-02-10T00:28:07.472Z · score: 47 (13 votes) · LW · GW · 16 comments

(I'm reposting this comment [LW · GW] as a top-level post, for ease of future reference. The context [LW · GW] here is a discussion about the different lines of arguments for the importance of AI safety.)

Here's another argument that I've been pushing since the early days (apparently not very successfully since it didn't make it to this list :) which might be called "argument from philosophical difficulty". It appears that achieving a good long term future requires getting a lot of philosophical questions right that are hard for us to answer. Given this, initially [LW · GW] I thought there are only three ways for AI to go right in this regard (assuming everything else goes well with the AI):

  1. We solve all the important philosophical problems ahead of time and program the solutions into the AI.

  2. We solve metaphilosophy (i.e., understand philosophical reasoning as well as we understand mathematical reasoning) and program that into the AI so it can solve philosophical problems on its own.

  3. We program the AI to learn philosophical reasoning from humans or use human simulations to solve philosophical problems.

Since then people have come up with a couple more scenarios (which did make me slightly more optimistic about this problem):

  1. We all coordinate to stop technological progress some time after AI but before space colonization, and have a period of long reflection where humans, maybe with help from AIs, spend thousands or millions of years to solve philosophical problems.

  2. We program AIs to be corrigible to their users, some users care about getting philosophy correct so the AIs help keep them safe and get their "fair share" of the universe until philosophical problems are solved eventually, enough users care about this so that we end up with a mostly good future, and lack of philosophical knowledge doesn't cause disaster in the meantime. (My writings on "human safety problems" were in part a response to this suggestion, outlining how hard it would be to keep humans "safe" in this scenario.)

The overall argument is that, given human safety problems, realistic competitive pressures, difficulties with coordination, etc., it seems hard to end up in any of these scenarios and not have something go wrong along the way. Maybe another way to put this is, given philosophical difficulties, the target we'd have to hit with AI is even smaller than it might otherwise appear.

16 comments

Comments sorted by top scores.

comment by shminux · 2019-02-10T01:43:10.607Z · score: 12 (9 votes) · LW · GW

Whenever someone says "there are only N ways that X is possible" outside of a mathematical proof, my immediate reaction is "Oh, great, here is another argument from lack of imagination". This seems like a typical case.

comment by Wei_Dai · 2019-02-11T00:20:21.974Z · score: 13 (7 votes) · LW · GW

Whenever someone says “there are only N ways that X is possible” outside of a mathematical proof, my immediate reaction is “Oh, great, here is another argument from lack of imagination”.

I think I made it pretty clear that these are the N ways that I could come up with, plus M more that others came up with later. Plus, in a later post [LW · GW], I explicitly ask what else might be possible. Did you see any other language I used where I was claiming something stronger than I should have?

If not, would you agree that people trying to solve a problem over some time only to find that all the plausible approaches they could come up with seem quite difficult is useful evidence for that problem being intrinsically difficult?

This seems like a typical case.

It might be interesting to consider this argument from an outside view perspective. Can you give a sample of arguments that you think are comparable to this one so we can check how valid they tend to be in retrospect?

comment by shminux · 2019-02-11T07:04:07.375Z · score: 2 (1 votes) · LW · GW

I may have misunderstood, sorry. I thought you gave it near 100% certainty that there could be only 3 ways, not the more reasonable "my knowledge of this problem is so marginal, I can't give it a good estimate of probability, since it would be drowned in error bars".

would you agree that people trying to solve a problem over some time only to find that all the plausible approaches they could come up with seem quite difficult is useful evidence for that problem being intrinsically difficult?

Certainly it's an indicator, especially if a group of smart people who have been able to successfully solve a number of related problems get stumped by something that appears to be in the same reference class. In my area it was the attempts to quantize gravity some time in the 50s and 60s, after successes with electromagnetism and weak interactions. After all, gravity is the weakest of them all. No one expected that there would be very little progress, half a century later, despite the signs being there.

I am not sure what "intrinsically difficult" means. My best guess is that it requires a Kuhnian paradigm change. Though sometimes it's not enough, and there are also the issues of just having to grind through a lot of calculations, like with the Fermat's last theorem, and the Poincaré conjecture. Special relativity, on the other hand, only required a "paradigm shift", the underlying math is trivial.

It might be interesting to consider this argument from an outside view perspective. Can you give a sample of arguments that you think is comparable to this one so we can check how valid they tend to be in retrospect?

One off-hand example that springs to mind is the Landau pole, inevitable and unavoidable in gauge theories. That resulted in the whole approach having been rejected in the Soviet Union for years, yet the the development of the renormalization formalism made QED the most precise physical theory, while still being mathematically inconsistent to this day. I strongly suspect that similarly adequate progress in AI alignment, for example, can be made without resolving all the mathematical, philosophical or meta-philosophical difficulties. The scenario 5 hints at something like that.

comment by William_S · 2019-02-10T19:02:57.074Z · score: 6 (3 votes) · LW · GW

One important dimension to consider is how hard it is to solve philosophical problems well enough to have a pretty good future (which includes avoiding bad futures). It could be the case that this is not so hard, but fully resolving questions so we could produce an optimal future is very hard or impossible. It feels like this argument implicitly relies on assuming that "solve philosophical problems well enough to have a pretty good future" is hard (ie. takes thousands of millions of years in scenario 4) - can you provide further clarification on whether/why you think that is the case?

comment by Wei_Dai · 2019-02-11T00:38:56.292Z · score: 3 (1 votes) · LW · GW

I tried to make arguments in this direction in Beyond Astronomical Waste [LW · GW] and Two Neglected Problems in Human-AI Safety [LW · GW]. Did you read them and/or find them convincing? To be clear I do think there's a significant chance that we could just get lucky and it turns out that solving philosophical problems well enough to have a pretty good future isn't that hard. (For example maybe it turns out to be impossible or not worthwhile to influence bigger/richer universes so we don't lose anything even if we never solve that problem.) But from the perspective of trying to minimize x-risk, it doesn't seem like a good idea to rely on that.

thousands of millions

That was "thousands or millions". I think it's unlikely we'll need billions of years. :) BTW I think I got the idea of thousands or millions of years of "the long reflection" from William MacAskill's 80,000 Hours interview, but I'm not sure who was the first to suggest it. (I think it's fairly likely that we'll need at least a hundred years which doesn't seem very different from thousands or millions from a strategic perspective. Not sure if that's the part that you're having an issue with.)

comment by William_S · 2019-02-11T17:47:18.818Z · score: 3 (2 votes) · LW · GW

Thanks, this position makes more sense in light of Beyond Astronomical Waste (I guess I have some concept of "a pretty good future" that is fine with something like a bunch of human-descended beings living a happy lives that misses out on the sort of things mentioned in Beyond Astronomical Waste, and "optimal future" which includes those considerations). I buy this as an argument that "we should put more effort into making philosophy work to make the outcome of AI better, because we risk losing large amounts of value" rather than "our efforts to get a pretty good future are doomed unless we make tons of progress on this" or something like that.

"Thousands of millions" was a typo.

comment by Wei_Dai · 2019-02-11T22:27:28.247Z · score: 5 (2 votes) · LW · GW

I buy this as an argument that “we should put more effort into making philosophy work to make the outcome of AI better, because we risk losing large amounts of value” rather than “our efforts to get a pretty good future are doomed unless we make tons of progress on this” or something like that.

What about the other post I linked, Two Neglected Problems in Human-AI Safety [LW · GW]? A lot more philosophical progress would be one way to solve those problems, and I don't see many other options.

comment by rohinmshah · 2019-02-16T19:55:29.801Z · score: 5 (2 votes) · LW · GW

A lot of this doesn't seem specific to AI. Would you agree that AI accelerates the problem and makes it more urgent, but isn't the primary source of the problem you've identified?

How would you feel about our chances for a good future if AI didn't exist (but we still go forward with technological development, presumably reaching space exploration eventually)? Are human safety problems an issue then? Some of the problems, like intentional value manipulation, do seem to become significantly easier.

comment by Wei_Dai · 2019-02-17T08:32:41.163Z · score: 5 (2 votes) · LW · GW

A lot of this doesn’t seem specific to AI.

Some philosophical problems are specific to AI though, or at least to specific alignment approaches. For example decision theory and logical uncertainty for MIRI's approach, corrigibility and universality (small core of corrigible and universal reasoning) for Paul's.

Would you agree that AI accelerates the problem and makes it more urgent, but isn’t the primary source of the problem you’ve identified?

That sounds reasonable but I'm not totally sure what you mean by "primary source". What would you say is the primary source of the problem?

How would you feel about our chances for a good future if AI didn’t exist (but we still go forward with technological development, presumably reaching space exploration eventually)? Are human safety problems an issue then?

Yeah, sure. I think if AI didn't exist we'd have a better chance that moral/philosophical progress could keep up with scientific/technological progress but I would still be quite concerned about human safety problems. I'm not sure why you ask this though. What do you think the implications of this are?

comment by rohinmshah · 2019-02-17T19:56:33.041Z · score: 2 (1 votes) · LW · GW
What would you say is the primary source of the problem?

The fact that humans don't generalize well out of distribution, especially on moral questions; and the fact that progress can cause distribution shifts that cause us to fail to achieve our "true values".

What do you think the implications of this are?

Um, nothing in particular.

I'm not sure why you ask this though.

It's very hard to understand what people actually mean when they say things, and a good way to check is to formulate an implication of (your model of) their model that they haven't said explicitly, and then see whether you were correct about that implication.

comment by Wei_Dai · 2019-02-17T21:44:25.219Z · score: 3 (1 votes) · LW · GW

Ah, I think that all makes sense, but next time I suggest saying something like "to check my understanding" so that I don't end up wondering what conclusions you might be leading me to. :)

comment by greylag · 2019-02-10T07:35:21.029Z · score: 3 (3 votes) · LW · GW

Optimistic scenario 6: Technological progress in AI makes difficult philosophical problems much easier. (Lots of overlap with corrigibility). Early examples: Axelrod’s tournaments, Dennett on Conway’s Life as a tool for thinking more clearly about free will.

(This is probably a special case of corrigibilty).

comment by G Gordon Worley III (gworley) · 2019-02-12T20:53:15.181Z · score: 2 (1 votes) · LW · GW

This seems fairly unlikely to me except insofar as AI acts as a filter that forces us to refine our understanding. The examples you provide arguably didn't make anything easier, just made what was already there more apparent to more people. This won't help resolve the fundamental issues, though, although it may at least make more people aware of them (something, I'll add, I hope to make more progress on at least within the community of folks already doing this work, let alone within a wider audience, because I continue to see, especially as goes epistemology, dangerous misunderstandings or ignorances of key ideas that pose a threat to successfully achieving AI alignment).

comment by G Gordon Worley III (gworley) · 2019-02-12T20:48:23.689Z · score: 2 (1 votes) · LW · GW

Unfortunately many philosophical problems may not have solutions of a form that allow us to construct something that definitely is what we want, but rather only permits us to say something is probably not what we want due to the fundamental ungroundability of our beliefs [LW · GW]. My suspicion is that you are right, the problem is even harder than anyone currently realizes, and the best we can hope for is to winnow away as much stuff that obviously doesn't work while still leaving us with lots of uncertainty about whether or not we can succeed at our safety objectives.

comment by Gurkenglas · 2019-02-17T09:48:22.219Z · score: 1 (1 votes) · LW · GW

Everyone choosing how their share of ressources is used has the problem that everyone might be horrified at what someone else is doing.

comment by avturchin · 2019-02-10T08:55:23.505Z · score: 1 (1 votes) · LW · GW

A possible solution: we decide not to solve philosophical problems in irreversible way (e.g. "tiling universe with orgasmatronium is good") - which obviously creates astronomical opportunity costs, but also prevent astronomical risks of wrong solutions. Local agents solve different problems locally in different period of time (the same way as a normal human changes many philosophical systems and believes during his life).