Insights from a Lawyer turned AI Safety researcher (ShortForm)
post by Katalina Hernandez (katalina-hernandez) ยท 2025-03-03T19:14:49.241Z ยท LW ยท GW ยท 5 commentsContents
Main Quick Take for debate: "Alignment with human intent" explicitly mentioned in European law Main Post for debate: For Policyโs Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance ๐๐;๐๐ Opinion Post: Scaling AI Regulation: Realistically, what Can (and Canโt) Be Regulated? None 5 comments
I will use the Shortform to link my posts and Quick Takes:
Main Quick Take for debate: "Alignment with human intent" explicitly mentioned in European law
The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time โalignmentโ has been mentioned by a law, or major regulatory text.
Itโs buried in Recital 110, but itโs there.
And it also makes research on AI Control relevant:
"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".
The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.
This means that alignment is now part of the EUโs regulatory vocabulary.
Main Post for debate: For Policyโs Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance [LW ยท GW]
๐๐;๐๐
I understand that Safety and Security are two sides of the same coin.
But if we donโt clearly articulate ๐ญ๐ก๐ ๐ข๐ง๐ญ๐๐ง๐ญ ๐๐๐ก๐ข๐ง๐ ๐๐ ๐ฌ๐๐๐๐ญ๐ฒ ๐๐ฏ๐๐ฅ๐ฎ๐๐ญ๐ข๐จ๐ง๐ฌ, we risk misallocating stakeholder responsibilities when defining best practices or regulatory standards.
For instance, a provider might point to adversarial robustness testing as evidence of โsafetyโ compliance: when in fact, the measure only hardens the model against ๐๐ฑ๐ญ๐๐ซ๐ง๐๐ฅ ๐ญ๐ก๐ซ๐๐๐ญ๐ฌ (security), without addressing the internal model behaviors that could still cause harm to users.
๐๐ ๐ซ๐๐ ๐ฎ๐ฅ๐๐ญ๐จ๐ซ๐ฌ ๐๐จ๐ง๐๐ฅ๐๐ญ๐ ๐ญ๐ก๐๐ฌ๐, ๐ก๐ข๐ ๐ก-๐๐๐ฉ๐๐๐ข๐ฅ๐ข๐ญ๐ฒ ๐ฅ๐๐๐ฌ ๐ฆ๐ข๐ ๐ก๐ญ "๐ฆ๐๐๐ญ ๐ญ๐ก๐ ๐ฅ๐๐ญ๐ญ๐๐ซ ๐จ๐ ๐ญ๐ก๐ ๐ฅ๐๐ฐ" ๐ฐ๐ก๐ข๐ฅ๐ ๐๐ฒ๐ฉ๐๐ฌ๐ฌ๐ข๐ง๐ ๐ญ๐ก๐ ๐ฌ๐ฉ๐ข๐ซ๐ข๐ญ ๐จ๐ ๐ฌ๐๐๐๐ญ๐ฒ ๐๐ฅ๐ญ๐จ๐ ๐๐ญ๐ก๐๐ซ.
Opinion Post: Scaling AI Regulation: Realistically, what Can (and Canโt) Be Regulated? [LW ยท GW]
- Should we even expect regulation to be useful for AI safety?
- Is there a version of AI regulation that wouldnโt be performative?
- How do you see the "Brussels effect" playing out for AI Safety?
- Are regulatory sandboxes a step in the right direction?
5 comments
Comments sorted by top scores.
comment by Katalina Hernandez (katalina-hernandez) ยท 2025-04-14T10:47:13.408Z ยท LW(p) ยท GW(p)
The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.
The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.
As far as I know, this is the first time โalignmentโ has been mentioned by a law, or major regulatory text.
Itโs buried in Recital 110, but itโs there. And it also makes research on AI Control relevant:
"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".
The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.
This means that alignment is now part of the EUโs regulatory vocabulary.
But hereโs the issue: most AI governance professionals and policymakers still donโt know what it really means, or how your research connects to it.
Iโm trying to build a space where AI Safety and AI Governance communities can actually talk to each other.
If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.
Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.
Here is the Substack link (I also posted it on LinkedIn):
My intuition says that this was a push from Future of Life Institute.
Thoughts? Did you know about this already?
โ comment by Lucius Bushnaq (Lblack) ยท 2025-04-14T12:23:00.366Z ยท LW(p) ยท GW(p)
I did not know about this already.
Replies from: katalina-hernandezโ comment by Katalina Hernandez (katalina-hernandez) ยท 2025-04-14T12:29:45.571Z ยท LW(p) ยท GW(p)
I don't think it's been widely discussed within AI Safety forums. Do you have any other comments, though? Epistemic pessimism is welcomed XD. But I did think that this was at least update-worthy.
โ comment by Lucie Philippon (lucie-philippon) ยท 2025-04-14T17:52:54.020Z ยท LW(p) ยท GW(p)
I did not know about this either. Do you know whether the EAs in the EU Commission know about it?
Replies from: katalina-hernandezโ comment by Katalina Hernandez (katalina-hernandez) ยท 2025-04-14T18:33:35.237Z ยท LW(p) ยท GW(p)
Hi Lucie, thanks so much for your comment!
Iโm not very involved with the Effective Altruism community myself, though I did post the same Quick Take on the EA Forum today, but I havenโt received any responses there yet. So I canโt really say for sure how widely known this is.
For context: Iโm a lawyer working in AI governance and data protection, and Iโve also been doing independent AI safety research from a policy angle. Thatโs how I came across this, just by going through the full text of the AI Act as part of my research.
My guess is that some of the EAs working closely on policy probably do know about it, and influenced this text too! But it doesnโt seem to have been broadly highlighted or discussed in alignment forums so far. Which is why I thought it might be worth flagging.
Happy to share more if helpful, or to connect further on this.