Some Thoughts on Metaphilosophy

post by Wei_Dai · 2019-02-10T00:28:29.482Z · score: 53 (13 votes) · LW · GW · 21 comments

A powerful AI (or human-AI civilization) guided by wrong philosophical ideas would likely cause astronomical (or beyond astronomical [LW · GW]) waste. Solving metaphilosophy is one way in which we can hope to avoid this kind of disaster. For my previous thoughts on this topic and further motivation see Metaphilosophical Mysteries [LW · GW], The Argument from Philosophical Difficulty, Three AI Safety Related Ideas [LW · GW], and Two Neglected Problems in Human-AI Safety [LW · GW].

Some interrelated ways of looking at philosophy

Philosophy as answering confusing questions

This was my starting point for thinking about what philosophy is: it's what we do when we try to answer confusing questions, or questions that we don't have any other established methodology for answering. Why do we find some questions confusing, or lack methods for answering them? This leads to my next thought.

Philosophy as ability to generalize / handle distributional shifts

ML systems tend to have a lot of trouble dealing with distributional shifts. (It seems to be a root cause of many AI as well as human safety problems.) But humans seem to have some way of (sometimes) noticing out-of-distribution inputs, and can feel confused instead of just confidently using their existing training to respond to it. This is perhaps most obvious in unfamiliar ethical situations like Torture vs Dust Specks [LW · GW] or trying to determine whether our moral circle should include things like insects and RL algorithms. Unlike ML algorithms that extrapolate in an essentially random way when given out-of-distribution inputs, humans can potentially generalize in a principled or correct way, by using philosophical reasoning.

Philosophy as slow but general purpose problem solving

Philosophy may even be a fully general purpose problem solving technique. At least we don't seem to have reason to think that it's not. The problem is that it's painfully slow and resource intensive. Individual humans acting alone seem to have little chance of achieving justifiably high confidence in many philosophical problems even if they devote their entire lives to those problems. Humanity has been collectively trying to solve some philosophical problems for hundreds or even thousands of years, without arriving at final solutions. The slowness of philosophy explains why distributional shifts remain a safety problem for humans, even though we seemingly have a general way of handling them.

Philosophy as meta problem solving

Given that philosophy is extremely slow, it makes sense to use it to solve meta problems (i.e., finding faster ways to handle some class of problems) instead of object level problems. This is exactly what happened historically. Instead of using philosophy to solve individual scientific problems (natural philosophy) we use it to solve science as a methodological problem (philosophy of science). Instead of using philosophy to solve individual math problems, we use it to solve logic and philosophy of math. Instead of using philosophy to solve individual decision problems, we use it to solve decision theory. Instead of using philosophy to solve individual philosophical problems, we can try to use it to solve metaphilosophy.

Philosophy as "high computational complexity class"

If philosophy can solve any problem within a very large class, then it must have a "computational complexity class" that's as high as any given problem within that class. Computational complexity can be measured in various ways, such as time and space complexity (on various actual machines or models of computation), whether and how high a problem is in the polynomial hierarchy, etc. "Computational complexity" of human problems can also be measured in various ways, such as how long it would take to solve a given problem using a specific human, group of humans, or model of human organizations or civilization, and whether and how many rounds of DEBATE [LW · GW] would be sufficient to solve that problem either theoretically (given infinite computing power) or in practice.

The point here is that no matter how we measure complexity, it seems likely that philosophy would have a "high computational complexity class" according to that measure.

Philosophy as interminable debate

The visible aspects of philosophy (as traditionally done) seem to resemble an endless (both in clock time and in the number of rounds) game of debate, where people propose new ideas, arguments, counterarguments, counter-counterarguments, and so on, and at the same time to try judge proposed solutions based on these ideas and arguments. People sometimes complain about the interminable nature of philosophical discussions, but that now seems understandable if philosophy is a "high computational complexity" method of general purpose problem solving.

In a sense, philosophy is the opposite of math: whereas in math any debate can be settled by producing a proof (hence analogous to the complexity class NP) (in practice maybe a couple more rounds is needed of people finding or fixing flaws in the proof), potentially no fixed number of rounds of debate (or DEBATE) is enough to settle all philosophical problems.

Philosophy as Jürgen Schmidhuber's General TM

Unlike traditional Turing Machines, a General TM or GTM may edit their previous outputs, and can be considered to solve a problem even if it never terminates, as long as it stops editing its output after a finite number of edits and the final output is the correct solution. So if a GTM solves a certain problem, you know that it will eventually converge to the right solution, but you have no idea when, or if what's on its output tape at any given moment is the right solution. This seems a lot of like philosophy, where people can keep changing their minds (or adjust their credences) based on an endless stream of new ideas, arguments, counterarguments, and so on, and you never really know when you've arrived at a correct answer.

What to do until we solve metaphilosophy?

Protect the trajectory?

What would you do if you had a GTM that could solve a bunch of really important problems, and that was the only method you had of solving them? You'd try to reverse-engineer it and make a bunch of copies. But if you couldn't do that, then you'd want to put layers and layers of protection around it. Applied to philosophy, this line of thought seems to lead to the familiar ideas of using global coordination (or a decisive strategic advantage) to stop technological progress, or having AIs derive their terminal goals from simulated humans who live in a safe virtual environment.

Replicate the trajectory with ML?

Another idea is to try to build a good enough approximation of the GTM by training ML on its observable behavior (including whatever work tapes you have read access to). But there are two problems with this: 1. This is really hard or impossible to do if the GTM has internal state that you can't observe. And 2. If you haven't already reverse engineered the GTM, there's no good way to know that you've built a good enough approximation, i.e., to know that the ML model won't end up converging to answers that are different from the GTM.

A three part model of philosophical reasoning

It may be easier to understand the difficulty of capturing philosophical reasoning with ML by considering a more concrete model. I suggest we can divide it into three parts as follows: A. Propose new ideas/arguments/counterarguments/etc. according to some (implicit) distribution. B. Evaluate existing ideas/arguments/counterarguments/etc. C. Based on past ideas/arguments/counterarguments/etc., update some hidden state that changes how one does A and B. It's tempting to think that building an approximation of B using ML perhaps isn't too difficult, and then we can just search for the "best" ideas/arguments/counterarguments/etc. using standard optimization algorithms (maybe with some safety precautions like trying to avoid adversarial examples for the learned model). There's some chance this could work out well, but without having a deeper understanding of metaphilosophy, I don't see how we can be confident that throwing out A and C won't lead to disaster, especially in the long run. But A and C seem very hard or impossible for ML to capture (A due to paucity of training data, and C due to the unobservable state).

Is there a way around this difficulty? What else can we do in the absence of a full white-box [LW · GW] solution to metaphilosophy?

21 comments

Comments sorted by top scores.

comment by avturchin · 2019-02-10T08:40:17.324Z · score: 7 (4 votes) · LW · GW

All else equal, I prefer an AI which is not capable to philosophy, as I am afraid of completely alien conclusions which it could come to (e.g. insect are more important than humans).

More over, I am skeptical that going on meta-level simplifies the problem to the level that it will be solvable by humans (the same about meta-ethics and theory of human values). For example, if someone says that he is not able to understand math, but instead will work on meta-mathematical problems, we would be skeptical about his ability to contribute. Why meta-level would be simpler?

comment by jessicata (jessica.liu.taylor) · 2019-02-11T02:09:01.594Z · score: 6 (3 votes) · LW · GW

More over, I am skeptical that going on meta-level simplifies the problem to the level that it will be solvable by humans (the same about meta-ethics and theory of human values).

This is also my reason for being pessimistic about solving metaphilosophy before a good number of object-level philosophical problems have been solved (e.g. in decision theory, ontology/metaphysics, and epistemology). If we imagine being in a state where we believe running computation X would solve hard philosophical problem Y, then it would seem that we already have a great deal of philosophical knowledge about Y, or a more general class of problems that includes Y.

More generally, we could look at the history difficulty of solving a problem vs. the difficulty of automating it. For example: the difficulty of walking vs. the difficulty of programming a robot to walk; the difficulty of adding numbers vs. the difficulty of specifying an addition algorithm; the difficulty of discovering electricity vs. the difficulty of solving philosophy of science to the point where it's clear how a reasoner could have discovered (and been confident in) electricity; and so on.

The plausible story I have that looks most optimistic for metaphilosophy looks something like:

  1. Some philosophical community makes large progress on a bunch of philosophical problems, at a high level of technical sophistication.
  2. As part of their work, they discover some "generators" that generate a bunch of the object-level solutions when translated across domains; these generators might involve e.g. translating a philosophical problem to one of a number of standard forms and then solving the standard form.
  3. They also find philosophical reasons to believe that these generators will generate good object-level solutions to new problems, not just the ones that have already been studied.
  4. These generators would then constitute a solution to metaphilosophy.
comment by Wei_Dai · 2019-02-11T07:14:55.902Z · score: 3 (1 votes) · LW · GW

I think our positions on this are pretty close, but I may put a bit more weight on other "plausible stories" for solving metaphilosophy relative to your "plausible story". (I'm not sure if overall I'm more or less optimistic than you are.)

If we imagine being in a state where we believe running computation X would solve hard philosophical problem Y, then it would seem that we already have a great deal of philosophical knowledge about Y, or a more general class of problems that includes Y.

It seems quite possible that understanding the general class of problems that includes Y is easier than understanding Y itself, and that allows us to find a computation X that would solve Y without much understanding of Y itself. As an analogy, suppose Y is some complex decision problem that we have little understanding of, and X is an AI that is programmed with a good decision theory.

More generally, we could look at the history difficulty of solving a problem vs. the difficulty of automating it. For example: the difficulty of walking vs. the difficulty of programming a robot to walk;

This does not seem like a very strong argument for your position. My suggestion in the OP is that humans already know the equivalent of "walking" (i.e., doing philosophy), we're just doing it very slowly. Given this, your analogies don't seem very conclusive about the difficulty of solving metaphilosophy or whether we have to make a bunch more progress on object-level philosophical problems before we can solve metaphilosophy.

comment by avturchin · 2019-02-11T08:23:39.465Z · score: 2 (2 votes) · LW · GW

Creating AI for solving hard philosophical problems is like passing hot potato from right hand to left.

For example, I want to solve the problem of qualia. I can't solve it myself, but may be I can create super-intelligent AI which will help me to solve it? Now I start to working on AI, and soon encounter the the control problem. Trying to solve the control problem, I would have to specify nature of human values, and soon I will find the need to tell something about existing and nature of qualia. Now the circle is done: I have the same problem of qualia, but packed inside the control problem. If I make some assumption about what qualia should be, they will probably affect the final answer by AI.

However, I still could use some forms of AI to solve qualia problem: if I use google search, I could quickly find all relevant articles, identify the most cited, newest, maybe create an argument map. This is where Drexler's CAIS may help.

comment by Wei_Dai · 2019-02-11T04:32:55.107Z · score: 4 (2 votes) · LW · GW

More over, I am skeptical that going on meta-level simplifies the problem to the level that it will be solvable by humans

If I gave the impression in this post that I expect metaphilosophy to be solved before someone builds an AGI, that was far from my intentions. I think this is a small-chance-of-high-return kind of situation, plus I think someone has to try to attack the problem if only to generate evidence that it really is a hard problem, otherwise I don't know how to convince people to adopt costly social solutions like stopping technological progress. (And actually I don't expect [LW · GW] the evidence to be highly persuasive either, so this amounts to just another small chance of high return.)

What I wrote in an earlier post [LW · GW] still describes my overall position:

There is no strong empirical evidence that solving metaphilosophy is superhumanly difficult, simply because not many people have attempted to solve it. But I don’t think that a reasonable prior combined with what evidence we do have (i.e., absence of visible progress or clear hints as to how to proceed) gives much hope for optimism either.

comment by cousin_it · 2019-02-10T09:11:14.547Z · score: 6 (4 votes) · LW · GW

But humans seem to have some way of (sometimes) noticing out-of-distribution inputs, and can feel confused instead of just confidently use their existing training to respond to it.

I think what you're describing can be approximated by a Bayesian agent having a wide prior, and feeling "confused" when some new piece of evidence makes its posterior more diffuse. Evolutionarily it makes sense to have that feeling, because it tells the agent to do more exploration and less exploitation.

For example, if you flip a coin 1000 times and always get heads, your posterior is very concentrated around "the coin always comes up heads". But then it comes up tails once, your posterior becomes more diffuse, you feel confused, and you change your betting behavior until you can learn more.

comment by romeostevensit · 2019-02-11T18:41:09.681Z · score: 2 (1 votes) · LW · GW

I think it is driven by a general heuristic of finding compressibility. If a distribution seems complex we assume we're accidentally conflating two variables and seek the decomposition that makes the two resultant distributions approximate-able by simpler functions.

comment by shminux · 2019-02-10T02:01:27.024Z · score: 0 (7 votes) · LW · GW

As I said here countless times before, answering questions is not what philosophy is good at. It's good at asking questions, and figuring out how to slice a small manageable piece of a big question for some other science to work on. Sadly, most philosophers misunderstand what their job is. They absolutely suck at finding answers, even as they excel as debating the questions. The debate is important as it crystallizes how to slice the big question into smaller ones, but it does not provide answers. Sometimes it's the philosophers themselves that are polymaths enough to be able to both slice a question and to answer it, like Pierce/Russell/Wittgenstein with truth tables. Most of the time a good question is posed, or a non-obvious perspective is highlighted, like the oft-discussed here Searle's Chinese room argument, or Jackson's Mary's room setup, but the proposed solution itself is nowhere close to satisfactory.

Philosophy is NOT a general purpose problem solver, and NOT a meta problem solver, it is a (meta) problem problem asker and slicer.

comment by G Gordon Worley III (gworley) · 2019-02-12T21:01:56.771Z · score: 5 (2 votes) · LW · GW

I object rather strongly to this categorization. This feels strongly to me like a misunderstanding borne of having only encountered analytic philosophy in rather limited circumstances and having assumed the notion of the "separate magisterium" that the analytic tradition developed as it broke from the rest of Western philosophy.

Many people doing philosophy, myself included, think of it more as the "mother" discipline from which we might specialize into other disciplines once we have the ground well understood enough to cleave off a part of reality for a time being while we work with that small part so as to avoid constantly facing the complete, overwhelming complexity of facing all of reality at once. What is today philosophy is perhaps tomorrow a more narrow field of study, except it seems in those cases where we touch so closely upon fundamental uncertainty that we cannot hope to create a useful abstraction, like physics or chemistry, to let us manipulate some small part of the world accurately without worrying about the rest of it.

comment by shminux · 2019-02-13T04:16:39.069Z · score: 2 (1 votes) · LW · GW
Many people doing philosophy, myself included, think of it more as the "mother" discipline from which we might specialize into other disciplines once we have the ground well understood enough to cleave off a part of reality for a time being while we work with that small part so as to avoid constantly facing the complete, overwhelming complexity of facing all of reality at once.

That's a great summary, yeah. I don't see any contradiction with what I said.

What is today philosophy is perhaps tomorrow a more narrow field of study, except it seems in those cases where we touch so closely upon fundamental uncertainty that we cannot hope to create a useful abstraction, like physics or chemistry, to let us manipulate some small part of the world accurately without worrying about the rest of it.

You have a way with words :) Yes, specific sciences study small slivers of what we experience, and philosophy ponders the big picture, helping to spawn another sliver to study. Still don't see how it provides answers, just helps crystallize questions.

comment by G Gordon Worley III (gworley) · 2019-02-14T02:00:13.346Z · score: 2 (1 votes) · LW · GW
Yes, specific sciences study small slivers of what we experience, and philosophy ponders the big picture, helping to spawn another sliver to study. Still don't see how it provides answers, just helps crystallize questions.

It sounds like a disagreement on whether A contains B means B is an A or B is not an A. That is, whether or not that, say, physics, which is contained within the realm of study we call philosophy, although carefully cordoned off with certain assumptions from the rest of it, is still philosophy or whether philosophy is the stuff that isn't broken down into a smaller part, because to my way of thinking physics is largely philosophy of the material and so by example we have a case where philosophy provides answers.

comment by shminux · 2019-02-14T15:44:26.825Z · score: 2 (1 votes) · LW · GW

I don't see this as anything related to containment. Just interaction. Good philosophy provides a well-defined problem to investigate for a given science, and, once in a blue moon, an outline of methodology, like Popper did. In turn, the scientific investigation in question can give philosophy some new "big" problems to ponder.

comment by bfinn · 2019-02-12T17:01:06.922Z · score: 3 (2 votes) · LW · GW

But philosophers are good at proposing answers - they all do that, usually just after identifying a flaw with an existing proposal.

What they're not good at is convincing everyone else that their solution is the right one. (And presumably this is because multiple solutions are plausible. And maybe that's because of the nature of proof - it's impossible to prove something definitively, and disproving typically involves finding a counterexample, which may be hard to find.)

I'm not convinced philosophy is much less good at finding actual answers than say physics. It's not as if physics is completely solved, or even particularly stable. Perhaps its most promising period of stability was specifically the laws of motion & gravity after Newton - though for less than two centuries. Physics seems better than philosophy at forming a temporary consensus; but that's no use (and indeed is counterproductive) unless the solution is actually right.

Cf a rare example of consensus in philosophy: knowledge was 'solved' for 2300 years with the theory that it's a 'true justified belief'. Until Gettier thought of counterexamples.

comment by Signer · 2019-02-10T09:57:02.788Z · score: 1 (1 votes) · LW · GW

Jackson’s Mary’s room setup

Never understood why it is considered good - isn't just confusion between "being in a state" and "knowing about a state"? The same way there is a difference between knowing everything about axes and there being axe in your head.

comment by TAG · 2019-02-10T16:48:56.087Z · score: 2 (2 votes) · LW · GW

Physicalists sometimes respond to Mary's Room by saying that one can not expect Mary actually to actually instantiate Red herself just by looking at a brain scan. It seems obvious to then that a physical description of brain state won't convey what that state is like, because it doesn't put you into that state. As an argument for physicalism, the strategy is to accept that qualia exist, but argue that they present no unexpected behaviour, or other difficulties for physicalism.

That is correct as stated but somewhat misleading: the problem is why is it necessary, in the case of experience, and only in the case of experience to instantiate it in order to fully understand it. Obviously, it is true a that a descirption of a brain state won't put you into that brain state. But that doesn't show that there is nothing unusual about qualia. The problem is that there in no other case does it seem necessary to instantiate a brain state in order to undertstand something.

If another version of Mary were shut up to learn everything about, say, nuclear fusion, the question "would she actually know about nuclear fusion" could only be answered "yes, of course....didn't you just say she knows everything"? The idea that she would have to instantiate a fusion reaction within her own body in order to understand fusion is quite counterintuitive. Similarly, a description of photosynthesis will make you photosynthesise, and would not be needed for a complete understanding of photosynthesis.

There seem to be some edge cases.: for instance, would an alternative Mary know everything about heart attacks without having one herself? Well, she would know everything except what a heart attack feels like, and what it feels like is a quale. the edge cases, like that one, are cases are just cases where an element of knowledge-by-acquaintance is needed for complete knowledge. Even other mental phenomena don't suffer from this peculiarity. Thoughts and memories are straightforwardly expressible in words, so long as they don't involve qualia.

So: is the response "well, she has never actually instantiated colour vision in her own brain" one that lays to rest and the challenge posed by the Knowledge argument, leaving physicalism undisturbed? The fact that these physicalists feel it would be in some way necessary to instantiate colour, but not other things, like photosynthesis or fusion, means they subscribe to the idea that there is something epistemically unique about qualia/experience, even if they resist the idea that qualia are metaphysically unique.

comment by Signer · 2019-02-11T05:21:11.180Z · score: 1 (1 votes) · LW · GW

The problem is that there in no other case does it seem necessary to instantiate a brain state in order to undertstand something.

The point is you either define "to understand" as "to experience", or it is not necessary to see red in order to understand experience. What part of knowledge is missing if Mary can perfectly predict when she will see red? It just that ability to invoke qualia from memory is not knowledge, just because it is also in the brain - the same way that reflexes are not additional knowledge. And even ability to transfer thoughts with words is just approximation... I mean it doesn't solve the Hard problem by itself (panpsychism does) - but I think bringing knowledge into it doesn't help. Maybe its intuitive, but it seems to be very easily disprovable intuition - not the kind of "I am certain that I am conscious".

comment by ChristianKl · 2019-02-11T17:01:17.288Z · score: 3 (2 votes) · LW · GW

Most people who rides bikes don't have explicit knowledge about how riding a bike works. They are relying on reflexes to ride a bike.

Would you say that most people who ride bikes don't know how to ride a bike?

comment by Signer · 2019-02-12T04:03:58.309Z · score: 3 (2 votes) · LW · GW

Basically, yes, I would like to use different words for different things. And if we don't accept that knowing how to ride a bike and being able to ride a bike are different, then what? A knowledge argument for unphysical nature of reflexes?

comment by ChristianKl · 2019-02-13T09:43:31.079Z · score: 2 (1 votes) · LW · GW

By that reasoning a native speaker of a language would often have less knowledge of a language then a person who learned it as a foreign language in a formal matter even when the native speaker speaks it much better for all practical purposes.

When we speak about whether Mary understanding Chinese, I think what we care about is to what extend she will be able to use the language the way a speaker of Chinese would.

A lot of most expert decision making is based on "unconscious competence" and you have to be very careful about how you use the term knowledge if you think that "unconscious competence" doesn't qualify as knowledge.

comment by Signer · 2019-02-14T20:21:34.005Z · score: 1 (1 votes) · LW · GW

Again, this seems to me like a pretty consistent way to look at things that also more accurately matches reality. Whether we use words "knowledge" and "ability" or "explicit knowledge" and "knowledge" doesn't matter, of course. And for what its worth, I much less sure of usefulness of being precise about such terms in practice. But if there is an obvious physical model of this thought experiment, where there are roughly two kinds of things in Mary's brain - one easily influenceable by words, and another not - and this model explains everything without introducing anything unphysical, then I don't see what's the point of saying "well, if we first group everything knowledge-sounding together, then that grouping doesn't make sense in Mary's situation".