Posts

Can you care without feeling? 2025-05-20T08:12:00.177Z
Relational Alignment: Trust, Repair, and the Emotional Work of AI 2025-05-08T02:44:23.338Z
Intuition in AI 2025-04-22T15:15:29.978Z
AI, Alignment & the Art of Relationship Design 2025-04-19T00:47:02.591Z
The Illusion of Transparency as a Trust-Building Mechanism 2025-03-19T17:09:05.830Z

Comments

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Interpretability Will Not Reliably Find Deceptive AI · 2025-05-23T00:41:16.958Z · LW · GW

True, though sustained relational deception seems harder to maintain than faking individual outputs. The inconsistencies might show up differently across long-term interaction patterns, potentially complementing other detection methods.

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Interpretability Will Not Reliably Find Deceptive AI · 2025-05-22T08:49:36.819Z · LW · GW

Strongly agree. 

Interpretability is basically like neuroscience, studying the "brain" of AI systems, which is valuable, but it's fundamentally different from having the relational tools to actually influence behaviour in deployment. Even perfect internal understanding doesn't bridge the gap to reliable intervention, like that a clinical psychologist or a close friend when let's say someone's suffering from depression.

I think this points toward complementary approaches that focus less on decoding thoughts and more on designing robust interaction patterns or systems that can remember what matters to us, repair when things go wrong, and build trust through ongoing calibration rather than perfect initial specification.

The portfolio approach you describe makes a lot of sense, but I wonder if we're missing a layer that sits between interpretability and black-box methods. Something more like the "close friend" level of influence that comes from relationship context rather than just technical analysis.



 

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Can you care without feeling? · 2025-05-22T08:31:29.942Z · LW · GW

oh dang! you are right, forgiveness without forgetting is absolutely essential for relationship resilience. If we want AI relationships that can weather conflict (move past mistakes) and grow stronger, we need to understand what architectures make resilience possible. That's a whole other post worth exploring, thank you for the nudge!

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-25T16:49:17.214Z · LW · GW

Having worked in manufacturing, I really appreciated this piece. It's true that much of the knowledge sits outside of databases, often on pen and paper or passed down informally. However, I’d gently push back on the idea that workers are reluctant to document due to fears of automation. While individual concerns may exist, I haven’t seen this as a coordinated resistance. If anything, there’s a real opportunity here for capturing and sharing their expertise could become a way to build leverage, enable re-skilling, and increase their influence in how new tools are developed.

Another thought I had was around the broader framing of AI capabilities. I’m skeptical of the “brain in a box” model where a single system must master every domain. Human cognition isn’t monolithic, our bodies evolved with decentralised, specialised subsystems over millions of years. I think we should embrace that metaphor for AI as well, like a society of minds, with different models handling perception, spatial reasoning, long-term planning, etc., working in concert across a network. That vision feels both more tractable and more aligned with how intelligence actually manifests in the world.

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Intuition in AI · 2025-04-23T02:07:47.056Z · LW · GW

You're right! Modern AI often produces answers through pattern recognition rather than explicit reasoning. I should have been clearer that I was criticising our expectations of AI explanation, not necessarily how they actually function.

This actually strengthens my core argument. We're forcing explicit reasoning onto systems that may naturally operate more like intuition. We've privileged shared, verifiable reasoning over individualistic intuitive knowledge. Perhaps we've built AI to reflect how we wish humans reasoned rather than how we actually do. The irony is demanding explanations from AI that we don't require from human experts. 

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on AI, Alignment & the Art of Relationship Design · 2025-04-22T14:59:05.469Z · LW · GW

I've been exploring exactly that. I am developing what I call "value alignment protocol", a structured dialogue process where humans and AI together define values through inquiry, testing and refinement. It treats alignment more like relationship-building than engineering, focusing on how we create shared understanding rather than perfect obedience. The interesting challenge is designing systems that can evolve their ethical frameworks through conversation, while building a robust long-term value system. 

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on AI, Alignment & the Art of Relationship Design · 2025-04-19T06:31:39.272Z · LW · GW

Actually, I was experimenting with chatgpt and claude on accountability as a value. There were some differences I noticed. For instance, I gave them a situation where they mess up 1/5 parameters for a calculation and I wanted to understand how they will respond to being called out on that. While both  said they'd acknowledge their mistake, without dodging responsibility, Claude said it would not only re-confirm the 1 parameter it messed up, but it would also reconfirm related parameters before responding again. On the other hand, chatgpt just fixed the error and had no qualms messing up other parameters in its subsequent response. 

In essence, if I were to design the process starting from accountability, I would start by designing what it means to be accountable in case of a failure i.e. taking end to end responsibility for a task, acknowledging one's fault, taking corrective action, also ensuring no other mistakes get made, at least within that session or within that context window. I would love to see the model also detail how it would avoid making such mistakes in the future and mean it, rather than just try to explain why it made an error. 

Do you think this type of analysis would be helpful for implementation? I have very limited understanding of the technical side, but I would love to brainstorm to think more deeply and practically about this.

Comment by Priyanka Bharadwaj (priyanka-bharadwaj) on Good Research Takes are Not Sufficient for Good Strategic Takes · 2025-04-18T11:25:22.206Z · LW · GW

Being good at research and being good at high level strategic thinking are just fairly different skillsets!

Neel, thank you, especially for the humility in acknowledging how hard it is to know whether a strategic take is any good. 

Your post made me realise I’ve been holding back on a framing I’ve found useful (from when I worked as a matchmaker and a relationship coach), thinking about alignment less as a performance problem, and more as a relationship problem. We often fixate on traits like intelligence, speed, obedience but we forget to ask, what kind of relationship are we building with AI? If we started there, maybe we’d optimise for collaboration rather than control? 

P.S. I don’t come from a research background, but my work in behaviour and systems design gives me a practical lens on alignment, especially around how relationships shape trust, repair, and long-term coherence.