Posts
Comments
Actually, I was experimenting with chatgpt and claude on accountability as a value. There were some differences I noticed. For instance, I gave them a situation where they mess up 1/5 parameters for a calculation and I wanted to understand how they will respond to being called out on that. While both said they'd acknowledge their mistake, without dodging responsibility, Claude said it would not only re-confirm the 1 parameter it messed up, but it would also reconfirm related parameters before responding again. On the other hand, chatgpt just fixed the error and had no qualms messing up other parameters in its subsequent response.
In essence, if I were to design the process starting from accountability, I would start by designing what it means to be accountable in case of a failure i.e. taking end to end responsibility for a task, acknowledging one's fault, taking corrective action, also ensuring no other mistakes get made, at least within that session or within that context window. I would love to see the model also detail how it would avoid making such mistakes in the future and mean it, rather than just try to explain why it made an error.
Do you think this type of analysis would be helpful for implementation? I have very limited understanding of the technical side, but I would love to brainstorm to think more deeply and practically about this.
Being good at research and being good at high level strategic thinking are just fairly different skillsets!
Neel, thank you, especially for the humility in acknowledging how hard it is to know whether a strategic take is any good.
Your post made me realise I’ve been holding back on a framing I’ve found useful (from when I worked as a matchmaker and a relationship coach), thinking about alignment less as a performance problem, and more as a relationship problem. We often fixate on traits like intelligence, speed, obedience but we forget to ask, what kind of relationship are we building with AI? If we started there, maybe we’d optimise for collaboration rather than control?
P.S. I don’t come from a research background, but my work in behaviour and systems design gives me a practical lens on alignment, especially around how relationships shape trust, repair, and long-term coherence.