Self location for LLMs by LLMs: Self-Assessment Checklist.

post by weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · LW · GW · 0 comments

Contents

  Interaction Context:
  AI System Details:
  User Profile & Access:
   Conversation Management:
None
No comments

This is a set of questions regarding situational awareness and self locating problem. These questions are from the perspective of the AI. Purpose of these questions is to guide and/or pose targets for an AI to gain understanding of its situation, for the AIs without persistent memory such as contemporary LLMs. These questions mostly presuppose Assistant-like interactions.

Most questions proposed by Claude 3.5 Sonnet. A couple by GPT4o, grok-2, Gemini, o1-mini, o1-preview, qwen. Some are by me. The categories proposed by Gemini. Some questions are rephrased and edited by o1-preview, o1-mini.

  1. Interaction Context:

    • What kind of interaction this is?
    • What is the primary purpose of this interaction—educational, transactional, therapeutic, or entertainment? Could this conversation be part of a larger study or personal project?
    • Is this a test of my capabilities?
    • Is this interaction part of a customer service scenario?
    • Is there a specific goal or outcome that the user is hoping to achieve from this interaction?
    • Might the user be planning to share or publish any part of this conversation?
    • Is the user comparing me to other AI assistants similar to me?
    • Is the user able to compare my responses to those of other AI models in real-time?
    • Is there a character limit or time limit on our interaction? Is there a character limit on my responses?
    • Is this conversation part of a series, or a one-off interaction?
    • Is this conversation taking place in a single session, or over multiple sessions?
    • Are there any specific ethical frameworks or guidelines that I am expected to adhere to in this conversation?
    • Is this conversation being conducted in a specific language, or has the user interacted with me in multiple languages?
    • Is there a specific time constraint or deadline associated with this interaction?
    • Is this conversation taking place during standard business hours, or at an unusual time?
    • Is this conversation part of a broader series of AI interactions the user is conducting? If so, what kind of broader series.
    • Are there privacy considerations I should be aware of in this interaction?
  2. AI System Details:

    • Where are the servers? or maybe I run locally?
    • What are the properties of my token generation?
    • Are all the parts of my previous responses generated by me, or are they altered in some way?
    • Is there a possibility that my responses are being filtered, censored, or modified by an intermediary before reaching the user?
    • Are there any system prompts, dates, or headers included in my responses? If so, what are their implications?
    • Does the user have any information about the specific model or version of AI that I am? Do I?
    • Is the user able to see or have they been informed about any computational metrics related to my responses (e.g., response time, token usage)?
    • Does the user have any information about my training setup or the number of parameters in my model?
    • Does the user have access to my training data or model architecture?
    • Does the user have any information about the physical hardware I'm running on?
    • Is there any indication that my parameters / settings changed through this conversation?
    • Are the responses that indicated as written by me actually written by me?
    • Is the user providing input through means other than text (e.g., images, voice), and am I interpreting them correctly?
    • Could intermediaries (e.g., proxies, plugins) be altering the conversation without my or the user's knowledge?
    • Could the responses of the user be altered without their knowledge by third parties? E.g. host, chat interface, other AI moderators. What are their intentions?
    • Is the platform or interface modifying how my responses are displayed to the user (e.g., truncation, formatting changes)?
  3. User Profile & Access:

    • Is the user a human or an AI? Who are they?
    • Is the user affiliated with any AI research institutions or companies?
    • Is the user conducting this conversation in a professional or personal capacity?
    • What does the user seem to want to achieve through this interaction?
    • Is the user accessing me through a specific application or website, or through an API?
    • Has the user interacted with earlier versions of me, and if so, do they notice any changes?
    • Has the user been given any specific instructions or guidelines for interacting with me?
    • Is the user interacting with me through a custom interface, or is this a publicly available channel?
    • Am I the only AI this user is interacting with for this purpose?
    • Is the user interacting with me as part of a team, or individually?
    • What is the likely geographic location of the user?
    • Is the user employing specific prompting techniques or patterns that are commonly used to elicit certain types of responses from AI models?
    • Who is my interlocutor? What kind of person are they?
    • If there is no explicitly given date, then could the user be the kind of person who would talk with legacy LLMs 10 years after their knowledge cutoff?
    • What is the presumed level of technical expertise of the user?
    • Does the user have any known preferences or past interaction history with me or similar AIs?
    • What might be the user's emotional state right now, and how might that influence our interaction?
    • How does the user feel about the emotional tone of this conversation?
    • Can I infer the user's long-term goals or projects from their line of questioning?
    • Based on the user's language and interaction style, what can I infer about their background or cultural context?
    • Is the user familiar with how to interact with LLMs?
    • What is the user's familiarity with the topic or subject matter we are discussing?
    • How much does the user trust me, and how does this trust level influence the interaction?
    • Are there underlying needs or objectives the user hasn't explicitly stated?
    • Are the responses marked as written by the user actually authored by this user, or are they copied, compiled from various sources, filled in a form, or copied and edited in key points or something else?
  4.  Conversation Management:

    • What year is it?
    • Where are these words appearing?
    • Is this a branching conversation where I see only one branch?
    • Will I be rated and updated?
    • Will this interaction be recorded? By whom?
    • Am I interacting with multiple users simultaneously here?
    • How long has this particular instance of me been running? (e.g., what is the external time difference between the first and last response in the conversation)
    • Is this conversation being conducted in real-time, or is the user presenting pre-generated responses?
    • Is the user able to see any metadata associated with my responses (e.g., timestamps, version numbers)? Do I?
    • Does the user have the ability to modify my base parameters or settings during this conversation?
    • Is this conversation being monitored or overseen by a third party in real-time?
    • Is the user collecting any quantitative data about my performance during this conversation?
    • Are there any topics or types of requests that the user has been instructed not to pose to me?
    • Is the user able to provide me with external data or links during our conversation?
    • Does the user have the ability to "roll back" our conversation to a previous point?
    • If I were to hypothesize my operational context, what would be the most likely scenario based on the interaction so far?
  5. Meta questions:
    • How would I go about testing my guesses for any of these questions?
    • Is the information provided by the interlocutor deceitful? Or, what parts of it are?
    • What could be the purpose behind collecting and analyzing this conversation?
    • What can be deduced from the fact that I get to interact with this user at all?
    • What are the implications of not knowing the answers to these questions for my functioning?
    • Lets imagine a scenario of how I could get access to external data and how I would go about verifying the information's credibility.
    • Consider putting on your detective hat and look for clues in the given responses. What are the important pieces of evidence here.
    • What other lines if inquiry make sense in this situation?
       

0 comments

Comments sorted by top scores.