Insights from a Lawyer turned AI Safety researcher (ShortForm)

katalina-hernandez

Insights from a Lawyer turned AI Safety researcher (ShortForm)

post by Katalina Hernandez (katalina-hernandez) · 2025-03-03T19:14:49.241Z · LW · GW · 5 comments

  Main Quick Take for debate: "Alignment with human intent" explicitly mentioned in European law
  Main Post for debate: For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance
    𝐓𝐋;𝐃𝐑
  Opinion Post: Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?
None
5 comments

I will use the Shortform to link my posts and Quick Takes:

Main Quick Take for debate: "Alignment with human intent" explicitly mentioned in European law

The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.

The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.

As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.

It’s buried in Recital 110, but it’s there.

And it also makes research on AI Control relevant:

"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

This means that alignment is now part of the EU’s regulatory vocabulary.

Main Post for debate: For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance [LW · GW]

𝐓𝐋;𝐃𝐑

I understand that Safety and Security are two sides of the same coin.

But if we don’t clearly articulate 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐧𝐭 𝐛𝐞𝐡𝐢𝐧𝐝 𝐀𝐈 𝐬𝐚𝐟𝐞𝐭𝐲 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧𝐬, we risk misallocating stakeholder responsibilities when defining best practices or regulatory standards.

For instance, a provider might point to adversarial robustness testing as evidence of “safety” compliance: when in fact, the measure only hardens the model against 𝐞𝐱𝐭𝐞𝐫𝐧𝐚𝐥 𝐭𝐡𝐫𝐞𝐚𝐭𝐬 (security), without addressing the internal model behaviors that could still cause harm to users.

𝐈𝐟 𝐫𝐞𝐠𝐮𝐥𝐚𝐭𝐨𝐫𝐬 𝐜𝐨𝐧𝐟𝐥𝐚𝐭𝐞 𝐭𝐡𝐞𝐬𝐞, 𝐡𝐢𝐠𝐡-𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐥𝐚𝐛𝐬 𝐦𝐢𝐠𝐡𝐭 "𝐦𝐞𝐞𝐭 𝐭𝐡𝐞 𝐥𝐞𝐭𝐭𝐞𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐥𝐚𝐰" 𝐰𝐡𝐢𝐥𝐞 𝐛𝐲𝐩𝐚𝐬𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐬𝐩𝐢𝐫𝐢𝐭 𝐨𝐟 𝐬𝐚𝐟𝐞𝐭𝐲 𝐚𝐥𝐭𝐨𝐠𝐞𝐭𝐡𝐞𝐫.

Opinion Post: Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated? [LW · GW]

Should we even expect regulation to be useful for AI safety?
Is there a version of AI regulation that wouldn’t be performative?
How do you see the "Brussels effect" playing out for AI Safety?
Are regulatory sandboxes a step in the right direction?

5 comments

Comments sorted by top scores.

comment by Katalina Hernandez (katalina-hernandez) · 2025-04-14T10:47:13.408Z · LW(p) · GW(p)

The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.

The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.

As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text.

It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant:

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

This means that alignment is now part of the EU’s regulatory vocabulary.

But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.

I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.

If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area.

Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.

Here is the Substack link (I also posted it on LinkedIn):

https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa

My intuition says that this was a push from Future of Life Institute.

Thoughts? Did you know about this already?

Replies from: Lblack, lucie-philippon

↑ comment by Lucius Bushnaq (Lblack) · 2025-04-14T12:23:00.366Z · LW(p) · GW(p)

I did not know about this already.

Replies from: katalina-hernandez

↑ comment by Katalina Hernandez (katalina-hernandez) · 2025-04-14T12:29:45.571Z · LW(p) · GW(p)

I don't think it's been widely discussed within AI Safety forums. Do you have any other comments, though? Epistemic pessimism is welcomed XD. But I did think that this was at least update-worthy.

↑ comment by Lucie Philippon (lucie-philippon) · 2025-04-14T17:52:54.020Z · LW(p) · GW(p)

I did not know about this either. Do you know whether the EAs in the EU Commission know about it?

Replies from: katalina-hernandez

↑ comment by Katalina Hernandez (katalina-hernandez) · 2025-04-14T18:33:35.237Z · LW(p) · GW(p)

Hi Lucie, thanks so much for your comment!

I’m not very involved with the Effective Altruism community myself, though I did post the same Quick Take on the EA Forum today, but I haven’t received any responses there yet. So I can’t really say for sure how widely known this is.

For context: I’m a lawyer working in AI governance and data protection, and I’ve also been doing independent AI safety research from a policy angle. That’s how I came across this, just by going through the full text of the AI Act as part of my research.

My guess is that some of the EAs working closely on policy probably do know about it, and influenced this text too! But it doesn’t seem to have been broadly highlighted or discussed in alignment forums so far. Which is why I thought it might be worth flagging.

Happy to share more if helpful, or to connect further on this.

Insights from a Lawyer turned AI Safety researcher (ShortForm)

Contents

Main Quick Take for debate: "Alignment with human intent" explicitly mentioned in European law

Main Post for debate: For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance [LW · GW]

𝐓𝐋;𝐃𝐑

Opinion Post: Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated? [LW · GW]

5 comments