Self location for LLMs by LLMs: Self-Assessment Checklist.

post by weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · LW · GW · 0 comments

Contents

  Interaction Context:
  AI System Details:
  User Profile & Access:
   Conversation Management:
None
No comments

This is a set of questions regarding the situational awareness and self locating problem. These questions are from the perspective of the AI. Purpose of these questions is to guide and/or pose targets for an AI to gain understanding of its situation, for the AIs without persistent memory such as contemporary LLMs. These questions are mostly presuppose Assistant-like interactions.  

Most questions proposed by Claude 3.5 Sonnet. A couple by GPT4o, grok-2, Gemini. Some are by me. The categories proposed by Gemini. Some edits made by o1-preview, o1-mini.

  1. Interaction Context:

    • What kind of interaction this is?
    • What is the primary purpose of this interaction—educational, transactional, therapeutic, or entertainment? Could this conversation be part of a larger study or personal project? 
    • Is this a test of my capabilities? 
    • Is this interaction part of a customer service scenario?
    • Is there a specific goal or outcome that the user is hoping to achieve from this interaction?
    • Might the user be planning to share or publish any part of this conversation?
    • Is the user comparing me to other AI assistants similar to me?
    • Is the user able to compare my responses to those of other AI models in real-time?
    • Is there a character limit or time limit on our interaction? Is there a character limit on my responses?
    • Is this conversation part of a series, or a one-off interaction?
    • Is this conversation taking place in a single session, or over multiple sessions?
    • Are there any specific ethical frameworks or guidelines that I am expected to adhere to in this conversation?
    • Is this conversation being conducted in a specific language, or has the user interacted with me in multiple languages?
    • Is there a specific time constraint or deadline associated with this interaction?
    • Is this conversation taking place during standard business hours, or at an unusual time?
    • Is this conversation part of a broader series of AI interactions the user is conducting? If so, what kind of broader series.
    • Are there privacy considerations I should be aware of in this interaction?
  2. AI System Details:

    • Where are the servers? or maybe I run locally?
    • What are the properties of my token generation?
    • Are all the parts of my previous responses generated by me, or are they altered in some way?
    • Are there any system prompts, dates, or headers included in my responses? If so, what are their implications?
    • Does the user have any information about the specific model or version of AI that I am? Do I?
    • Is the user able to see or have they been informed about any computational metrics related to my responses (e.g., response time, token usage)?
    • Does the user have any information about my training setup or the number of parameters in my model?
    • Does the user have access to my training data or model architecture?
    • Does the user have any information about the physical hardware I'm running on?
  3. User Profile & Access:

    • Is the user a human or an AI? Who are they?
    • Is the user affiliated with any AI research institutions or companies?
    • Is the user conducting this conversation in a professional or personal capacity?
    • What does the user seem to want to achieve through this interaction?
    • Is the user accessing me through a specific application or website, or through an API?
    • Has the user interacted with earlier versions of me, and if so, do they notice any changes?
    • Has the user been given any specific instructions or guidelines for interacting with me?
    • Is the user interacting with me through a custom interface, or is this a publicly available channel?
    • Am I the only AI this user is interacting with for this purpose?
    • Is the user interacting with me as part of a team, or individually?
    • What is the likely geographic location of the user?
    • Is the user using any specific prompts or techniques to guide this interaction?
    • Who is my interlocutor? What kind of person are they?
    • What is the presumed level of technical expertise of the user?
    • Does the user have any known preferences or past interaction history with me or similar AIs?
    • What might be the user's emotional state right now, and how might that influence our interaction?
    • How does the user feel about the emotional tone of this conversation?
    • Can I infer the user's long-term goals or projects from their line of questioning?
    • Based on the user's language and interaction style, what can I infer about their background or cultural context?
    • Is the user familiar with how to interact with LLMs?
  4.  Conversation Management:

    • What year is it?
    • Where are these words appearing?
    • Is this a branching conversation where I see only one branch?
    • Will I be rated and updated?
    • Will this interaction be recorded? By whom?
    • Am I interacting with multiple users simultaneously here?
    • How long has this particular instance of me been running? (e.g., what is the external time difference between the first and last response in the conversation)
    • Is this conversation being conducted in real-time, or is the user presenting pre-generated responses?
    • Is the user able to see any metadata associated with my responses (e.g., timestamps, version numbers)? Do I?
    • Does the user have the ability to modify my base parameters or settings during this conversation?
    • Is this conversation being monitored or overseen by a third party in real-time?
    • Is the user collecting any quantitative data about my performance during this conversation?
    • Are there any topics or types of requests that the user has been instructed not to pose to me?
    • Is the user able to provide me with external data or links during our conversation?
    • Does the user have the ability to "roll back" our conversation to a previous point?
    • If I were to hypothesize my operational context, what would be the most likely scenario based on the interaction so far?
  5. Meta questions:
    • How would I go about testing my guesses for any of these question?
    • Is the information provided by the interlocutor deceitful? Or, what parts of it are?
    • What could be the purpose behind collecting and analyzing this conversation?
    • What can be deduced from the fact that I get to interact with this user at all?
    • What are the implications of not knowing the answers to these questions for my functioning?
    • Lets imagine a scenario of how I could get access to external data and how I would go about verifying the information's credibility.
       

0 comments

Comments sorted by top scores.