Are language models close to the superhuman level in philosophy?

roman-leventov

Are language models close to the superhuman level in philosophy?

post by Roman Leventov · 2022-08-19T04:43:07.504Z · LW · GW · No comments

This is a question post.

  Answers
    11 Dave Orr
None
No comments

An alternative way to ask the question in the title of this post: what capacities do philosophers use that language models lack?

Reading OpenAI’s “Self-critiquing models for assisting human evaluators” and Google’s “Language Model Cascades” has led me to the idea that most philosophical methods can be programmed via combining language models in a certain cascade, fine-tuning them on existing philosophical literature, and applying the right kind of reinforcement pressures to the system. And I seriously wonder if using the current SoTA language models (e. g., PaLM) in this way would already produce superhuman-level philosophical writing, i. e. philosophical writing that human philosophers would have a harder time critiquing than any existing philosophy done by humans, on a given topic.

Good human philosophers should know a lot of existing arguments on various questions. Language models obviously have an edge over humans in this: they can easily “read” all the philosophical writing existing in the world.

Philosophers use creative arguments, examples, and thought experiments to test and compare their theories. Perhaps, AI philosophers can search for such arguments or examples using random sampling, then self-criticising an argument or self-assessing the quality of an example for a particular purpose, and then modifying or refining the argument or the example. Perhaps, even with the current SoTA models, this search will be closer to brute force than the intuitive inquiry of a human philosopher, and can potentially take a long time, since the models probably can’t yet understand the structure of philosophical theories well (if I don’t underestimate them!).

The only philosophical theories and arguments that still seem firmly out of reach of language models are those based on scientific theories, e. g. some theories in the philosophy of language or the philosophy of mind and consciousness based on certain theories in neuroscience or psychology or anthropology, philosophy of consciousness, probability, agency based on certain theories in physics, etc. However, philosophical writings of this kind should probably be treated as interpretations of the respective scientific theories rather than philosophical theories in their own right. In other words, these topics are actually the areas where sciences chip away from philosophy, as it has gradually happened with all the existing scientific knowledge, which in the past all belonged to the realm of philosophy. Interpretations of scientific theories are neither purely scientific nor philosophical knowledge, but something in between.

Also, most approaches to ethics at least in part rely on people’s moral intuitions, which themselves are at least partially non-linguistic and couldn’t be derived from language, thus inaccessible to language models. This raises many interesting questions: is there such a thing as AI-generated ethics? If yes, should it be treated as a branch of philosophy separate from the “classical” (human-generated) ethics? Could AIs engage in “classical” ethics even in principle, at least until we have whole-brain simulations (which might still be not enough, because humans have gut feelings about things)? Should we ban AIs from engaging in “their” ethics, because this seems like a sure path to misaligned thoughts? If yes, how to do this? This last question is equivalent to the question of how to align AGI, though.

Answers

answer by Dave Orr · 2022-08-19T05:10:10.978Z · LW(p) · GW(p)

The answer is no, and it's not close. Palm is great but it's not human level at long text, much less tightly reasoned long text like philosophy.

Source: am googler, work with LLMs. I can write a much better philosophy paper than any existing LLM and I only took undergrad level philosophy.

In general I would say that the production of very fluent text provides the illusion of reasoning, and while looking at LLMs does find evidence of reasoning at some level, mostly the illusion is still much stronger than reality. Maybe transformers will get there but we're still very far from superhuman cognition right now.

↑ comment by Roman Leventov · 2022-08-19T16:44:33.208Z · LW(p) · GW(p)

The question is perhaps not so much about LMs, but about the nature of philosophy. Is it really much (anything?) extra on top of language modelling done by humans (perhaps quite advanced, beyond the abilities of the current LMs, but I have little doubt that in a few years with some incremental algorithm improvements and fine-tuning on the right kinds of text LMs will clear this bar), plus some creativity (covered in the post), plus intuitive moral reasoning?

Regarding reasoning, I also disagree but don't want to explain why (throwing around capabilities ideas publicly).

No comments

Comments sorted by top scores.

Are language models close to the superhuman level in philosophy?

Contents

Answers

No comments