GPT-3, belief, and consistency

post by skybrian · 2020-08-16T23:12:10.659Z · LW · GW · 7 comments

Contents

7 comments

I've seen a few people debating what GPT-3 understands, and how this compares to human understanding. I think there's an easier and more fruitful question to consider: what does it believe?

It seems like it doesn't believe anything, or alternately, it believes everything. It's a category error, like asking what a library believes, or what the Internet believes. But let's go with that metaphor for a bit, because it seems interesting to think about.

For a library, contradictions don't matter. A library can contain two books by different authors saying opposite things, and that's okay since they are just being stored. Maybe it's better to think of GPT-3 as a large, interestingly-organized memory than as an agent? But like human memory, it's lossy, and can mix up stuff from different sources, sometimes in creative ways.

How does GPT-3 resolve inconsistency? If the Internet is very consistent about something, like the words to Jabberwocky, then GPT-3 will be consistent as well. If there were two different versions of Jabberwocky that started the same and diverged at a certain point and they were equally popular in the corpus, then it would probably choose between them randomly, if you have randomization turned on at all.

Sometimes, GPT-3 can choose between beliefs based on style. Suppose that grade-school science material is written in one style and flat-earth rants are written in a different style. It wouldn't be surprising that GPT-3 would appear to have different beliefs about the shape of the earth based on which style of work it's completing. Or, if it can recognize an author's style, it might seem to have different beliefs based on which author it's pretending to be.

If GPT-3 can play chess, it's due to online consensus about how to play chess. If we had two different chess-like games using similar notation then it might get them confused, unless the context could be used to distinguish them.

If base-10 and base-8 arithmetic were equally common in the corpus then I don't think it could do arithmetic very well either, though again, maybe it can distinguish them from context. But if it doesn't know the context, it would just guess randomly.

Of course, contradictions are everywhere. We compartmentalize. None of us are logic robots that halt when we find a contradiction. However, contradictions often bother us and we try to iron them out. Wikipedia contributors try to resolve their inconsistencies through research, debate, or giving up and saying that there's no consensus and documenting the controversy.

If you consider a search engine and Wikipedia together as an algorithm for answering questions, you wouldn't expect it to resolve inconsistency by returning one version of an article 40% of the time and the other 60% of the time, or by serving up different versions of the same article based on stylistic differences in how you ask the question. You might have to resolve inconsistency yourself, but with static documents that have distinct titles and URL's, it's easier to see what you have.

GPT-3's ways of resolving inconsistency happen to work pretty well for some kinds of art and entertainment, but they're not what we expect of a factual reference, or even of a consistent fictional world.

This suggests some possible areas of research. What are smarter ways to resolve inconsistency and how can we get an machine learning to use them? Is there some way to use machine learning to notice inconsistency in Wikipedia?

In the meantime, I would guess that for factual use, we will need to resolve inconsistencies ourselves and feed our machines a relatively consistent corpus. Feeding Wikipedia articles to the machine means that the most glaring inconsistencies have been ironed out in advance, which is why GPT-3 can answer factual questions correctly sometimes.

But if your interest is in fiction or in making interesting forgeries, maybe you don't care about this?

7 comments

Comments sorted by top scores.

comment by gbear605 · 2020-08-17T00:04:40.450Z · LW(p) · GW(p)

This issue seems to be relevant for humans too

If base-10 and base-8 arithmetic were equally common in the corpus then I don't think it could do arithmetic very well either, though again, maybe it can distinguish them from context. But if it doesn't know the context, it would just guess randomly.

If we were in this world, then humans would be in the same spot. If there's no context to distinguish between the two types of arithmetic, they'd have to choose randomly or rely on some outside knowledge (which could theoretically be learned from the internet). Similarly, if we have two variants of chess that are the same until the end game, humans would have to predecide what version they're playing.

Humans certainly aren't perfectly repeatable either - if you ask a person a question, the manner in which they respond would probably be different from if you asked them the same question the next day.

Despite that, we have a lot more knowledge about the way the world is structured than even GPT-3 does, so none of these are issues.

Replies from: skybrian
comment by skybrian · 2020-08-17T00:52:50.162Z · LW(p) · GW(p)

It's not quite the same, because if you're confused and you notice you're confused, you can ask. "Is this in American or European date format?" For GPT-3 to do the same, you might need to give it some specific examples of resolving ambiguity this way, and it might only do so when imitating certain styles.

It doesn't seem as good as a more built-in preference for noticing and wanting to resolve inconsistency? Choosing based on context is built in using attention, and choosing randomly is built in as part of the text generator.

It's also worth noticing that the GPT-3 world is the corpus, and a web corpus is a inconsistent place.

Replies from: shminux
comment by Shmi (shminux) · 2020-08-17T02:32:49.305Z · LW(p) · GW(p)
It's not quite the same, because if you're confused and you notice you're confused, you can ask.

You can if you do, but most people never notice and those who notice some confusion are still blissfully ignorant of the rest of their self-contradicting beliefs. And by most people I mean you, me and everyone else. In fact, if someone pointed out a contradictory belief in something we hold dear, we would vehemently deny the contradiction and rationalize it to no end. And yet we consider ourselves believing something. If anything, GPT-3's beliefs are more belief-like than those of humans.

Replies from: skybrian
comment by skybrian · 2020-08-17T03:21:58.289Z · LW(p) · GW(p)

Yes, sometimes we don't notice. We miss a lot. But there are also ordinary clarifications like "did I hear you correctly" and "what did you mean by that?" Noticing that you didn't understand something isn't rare. If we didn't notice when something seems absurd, jokes wouldn't work.

comment by ChristianKl · 2020-08-18T10:52:35.338Z · LW(p) · GW(p)

Or, if it can recognize an author's style, it might seem to have different beliefs based on which author it's pretending to be.

This seems to be a testable empiric claim. I don't think we should have strong views about whether or not GPT-3 acts this way without actual testing.

Replies from: gwern
comment by gwern · 2020-08-18T17:29:19.449Z · LW(p) · GW(p)

What would be a test? Pulling up a dialogue with Thomas Jefferson and asking his views on the growth of the American federal government?

comment by Gordon Seidoh Worley (gworley) · 2020-08-18T01:34:57.079Z · LW(p) · GW(p)

The other side of this is to ask, what do humans believe? As in, what are the mechanisms going on that we then categorize as constructing beliefs.

At a certain level I think if we took a fresh look at beliefs, we'd see humans and GPT are doing similar things, albeit with different optimization pressures. But on another level, as you point out by addressing the question of resolving inconsistency, GPT seems to lack the sort of self-referential quality that humans have, except insofar as GPT, say, is fed articles about GPT.