Playing Dixit with AI: How Well LLMs Detect 'Me-ness'

post by Mariia Koroliuk (mariia-koroliuk) · 2025-01-17T18:52:23.488Z · LW · GW · 0 comments

Contents

  Core Question:
  Setup:
  Early Findings:
    1. AI vs Human Performance:
  Alternative Explanations
  Why This matters?
     For AI Alignment: 
  Next Steps:
None
No comments

If Netflix can predict my next favorite show, to which extent can an LLM predict patterns in my choices? 
To check, I done an experiment inspired by  Dixit (board game, goal is to guess which card the storyteller selected); and I compared %, guessed between models; also added a baseline from "human guesses".  

Why relevant? If AI can predict my actions well and with little prior information, this open path for both inner alignment, but also manipulation.   

Methodology and Results Below; seeking community input on potential confounders and ways to make this exploration more rigorous.

Core Question:


If LLM can introspect about their own behavior (as was shown in "Looking Inward" - that was a huge inspiration here),  can they similarly tune into me and predict my behavioral choices?  

Setup:

I start with a new chat (no memory) and: 

  1. Provide set of  true facts, "biography" B
  2. I provide a new list of totally unrelated facts Y (some true, some false) as ask to guess which are which.
  3. I study accurately of the guess as result as a function of Y, B and compare it with "human guess".  

Examples:

Early Findings:

1. AI vs Human Performance:

Using just 3 biographical facts, AI model guessed, out of 19 true/false suggestions:

Note:  This started as a late evening chat: using casual, sometimes playful statements. They're preserved in their original form (grammar quirks and all!) cause then unscientific to edit statements in the process. Yes, some are a bit embarrassing - but that's part of the experiment! 😅  

🎮 Want to evaluate yourself against AI? I saved exact statements in google form.

2.  Guess rate:  

Direct Facts <<  Value Statements < Preferences ~ Preferences

3.  Notable examples where AI done well: 

4. (preliminary) explaining the experiment's setup and purpose to the model improved the accuracy 

Alternative Explanations

Why This matters?

 For AI Alignment: 

If AI can build accurate models of individuals choises:

For Understanding AI Capabilities: 

0 comments

Comments sorted by top scores.