Proposal on AI evaluation: false-proving

post by ProgramCrafter (programcrafter) · 2023-03-31T12:12:15.636Z · LW · GW · 2 comments

Currently, one of arguments against deploying powerful non-agentic AIs is that they can deceive you: you ask for plan for doing something, AI creates one and then unexpectedly gets power as result of that plan.

We can ask AI to provide proof or reasoning why its plan works. However, it can reason as fictional characters do (e.g. choose the outcome producing the most emotions; mostly applicable to LLMs), or it can include false statements in the speech. So there should be a way to find false statements (logic or factual errors).

Usually people (most importantly, pupils and students) are taught on true proofs but not on false ones. This can lead to thinking "if something looks like it's based on logic, then the result is true". So at the moment people are not very effective in searching reasoning flaws.

A possible solution to that is to introduce false-prove tasks. Such tasks shouldn't be long. Also, the conclusion could be true, to mix things up a bit.

Task: prove that 2 + 2 = 4.
 

Lemma: Any two natural numbers (a and b) are equal.

Let's prove that lemma by induction on max(a, b). The base case is a = b = 1, where the statement is true.

Now, we have max(a, b) > 1. Let's reduce both numbers by one and apply the induction hypothesis. Because a-1 = b-1, a=b follows, so the lemma is correct.
---

3 + 1 = 4, as 4 is the number following 3

3 = 2 by lemma

1 = 2 by lemma

After substituion we get 2 + 2 = 4. Q.E.D.

And students' own research should probably be rewarded, even leading to wrong results but where flaw is not obvious.

2 comments

Comments sorted by top scores.

comment by Measure · 2023-03-31T18:29:10.952Z · LW(p) · GW(p)

Is your suggestion to use these false proofs to train AIs or to train humans (or both)?

Replies from: programcrafter
comment by ProgramCrafter (programcrafter) · 2023-04-01T05:45:51.206Z · LW(p) · GW(p)

I think we should train both, because otherwise AI could honestly not understand where his logic errors are. Also, probably we should not say deliberately false things to AGI, or we're going to have hard time while aligning it explaining why lies are justifiable within human values.