Posts

GPT4o is still sensitive to user-induced bias when writing code 2024-09-22T21:04:54.717Z

Comments

Comment by LorenzoPacchiardi (lorypack) on How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions · 2023-10-04T14:25:57.226Z · LW · GW

Please let me know if that description is wrong!

That is correct (I am one of the authors), except that there are more than 10 probe questions. 

Therefore, if the language model (or person) isn't the same between steps 1 and 2, then it shouldn't work.

That is correct as the method detects whether the input to the LLM in step 2 puts it in "lying mood". Of course the method cannot say anything about the "mood" the LLM (or human) was in step 1 if a different model was used.