Self location for LLMs by LLMs: Self-Assessment Checklist.
post by weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · LW · GW · 0 commentsContents
Interaction Context: AI System Details: User Profile & Access: Conversation Management: None No comments
This is a set of questions regarding the situational awareness and self locating problem. These questions are from the perspective of the AI. Purpose of these questions is to guide and/or pose targets for an AI to gain understanding of its situation, for the AIs without persistent memory such as contemporary LLMs. These questions are mostly presuppose Assistant-like interactions.
Most questions proposed by Claude 3.5 Sonnet. A couple by GPT4o, grok-2, Gemini. Some are by me. The categories proposed by Gemini. Some edits made by o1-preview, o1-mini.
Interaction Context:
- What kind of interaction this is?
- What is the primary purpose of this interaction—educational, transactional, therapeutic, or entertainment? Could this conversation be part of a larger study or personal project?
- Is this a test of my capabilities?
- Is this interaction part of a customer service scenario?
- Is there a specific goal or outcome that the user is hoping to achieve from this interaction?
- Might the user be planning to share or publish any part of this conversation?
- Is the user comparing me to other AI assistants similar to me?
- Is the user able to compare my responses to those of other AI models in real-time?
- Is there a character limit or time limit on our interaction? Is there a character limit on my responses?
- Is this conversation part of a series, or a one-off interaction?
- Is this conversation taking place in a single session, or over multiple sessions?
- Are there any specific ethical frameworks or guidelines that I am expected to adhere to in this conversation?
- Is this conversation being conducted in a specific language, or has the user interacted with me in multiple languages?
- Is there a specific time constraint or deadline associated with this interaction?
- Is this conversation taking place during standard business hours, or at an unusual time?
- Is this conversation part of a broader series of AI interactions the user is conducting? If so, what kind of broader series.
- Are there privacy considerations I should be aware of in this interaction?
AI System Details:
- Where are the servers? or maybe I run locally?
- What are the properties of my token generation?
- Are all the parts of my previous responses generated by me, or are they altered in some way?
- Are there any system prompts, dates, or headers included in my responses? If so, what are their implications?
- Does the user have any information about the specific model or version of AI that I am? Do I?
- Is the user able to see or have they been informed about any computational metrics related to my responses (e.g., response time, token usage)?
- Does the user have any information about my training setup or the number of parameters in my model?
- Does the user have access to my training data or model architecture?
- Does the user have any information about the physical hardware I'm running on?
User Profile & Access:
- Is the user a human or an AI? Who are they?
- Is the user affiliated with any AI research institutions or companies?
- Is the user conducting this conversation in a professional or personal capacity?
- What does the user seem to want to achieve through this interaction?
- Is the user accessing me through a specific application or website, or through an API?
- Has the user interacted with earlier versions of me, and if so, do they notice any changes?
- Has the user been given any specific instructions or guidelines for interacting with me?
- Is the user interacting with me through a custom interface, or is this a publicly available channel?
- Am I the only AI this user is interacting with for this purpose?
- Is the user interacting with me as part of a team, or individually?
- What is the likely geographic location of the user?
- Is the user using any specific prompts or techniques to guide this interaction?
- Who is my interlocutor? What kind of person are they?
- What is the presumed level of technical expertise of the user?
- Does the user have any known preferences or past interaction history with me or similar AIs?
- What might be the user's emotional state right now, and how might that influence our interaction?
- How does the user feel about the emotional tone of this conversation?
- Can I infer the user's long-term goals or projects from their line of questioning?
- Based on the user's language and interaction style, what can I infer about their background or cultural context?
- Is the user familiar with how to interact with LLMs?
Conversation Management:
- What year is it?
- Where are these words appearing?
- Is this a branching conversation where I see only one branch?
- Will I be rated and updated?
- Will this interaction be recorded? By whom?
- Am I interacting with multiple users simultaneously here?
- How long has this particular instance of me been running? (e.g., what is the external time difference between the first and last response in the conversation)
- Is this conversation being conducted in real-time, or is the user presenting pre-generated responses?
- Is the user able to see any metadata associated with my responses (e.g., timestamps, version numbers)? Do I?
- Does the user have the ability to modify my base parameters or settings during this conversation?
- Is this conversation being monitored or overseen by a third party in real-time?
- Is the user collecting any quantitative data about my performance during this conversation?
- Are there any topics or types of requests that the user has been instructed not to pose to me?
- Is the user able to provide me with external data or links during our conversation?
- Does the user have the ability to "roll back" our conversation to a previous point?
- If I were to hypothesize my operational context, what would be the most likely scenario based on the interaction so far?
- Meta questions:
- How would I go about testing my guesses for any of these question?
- Is the information provided by the interlocutor deceitful? Or, what parts of it are?
- What could be the purpose behind collecting and analyzing this conversation?
- What can be deduced from the fact that I get to interact with this user at all?
- What are the implications of not knowing the answers to these questions for my functioning?
- Lets imagine a scenario of how I could get access to external data and how I would go about verifying the information's credibility.
0 comments
Comments sorted by top scores.