Posts

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety 2025-04-22T20:39:40.781Z
For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance 2025-04-04T09:16:20.712Z
Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated? 2025-03-11T16:51:41.651Z
Insights from a Lawyer turned AI Safety researcher (ShortForm) 2025-03-03T19:14:49.241Z

Comments

Comment by Katalina Hernandez (katalina-hernandez) on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety · 2025-04-24T09:22:53.641Z · LW · GW

My understanding is that they expressed willingness to sign, but lobbying efforts on their side are still ongoing, as is the entire negotiation still. 

The only big provider I've heard that explicitly refused to sign is Meta: EIPA in Conversation WIth - Preparing for the EU GPAI Codes of Practice (somewhere from minute 34 to 38).

 

Comment by Katalina Hernandez (katalina-hernandez) on Insights from a Lawyer turned AI Safety researcher (ShortForm) · 2025-04-23T13:18:23.298Z · LW · GW

Thank you!!

Comment by Katalina Hernandez (katalina-hernandez) on Kabir Kumar's Shortform · 2025-04-23T11:16:52.778Z · LW · GW

Way to go! :D. The important thing is that you've realized it. If you naturally already get those enquiries, you're halfway there: people already know you and reach out to you without you having to promote your expertise. Best of luck!

Comment by Katalina Hernandez (katalina-hernandez) on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety · 2025-04-23T09:07:38.371Z · LW · GW

OpenAI, Anthropic and Google DeepMind are the main signatories already to these Codes of Practice. 

So, whatever is agreed / negotiated is what will impact frontier AI companies. That is the problem.

I'd love to see specific criticisms from you on sections 3, 4 or 5 of this post! I am happy to provide feedback myself based on useful suggestions that come up in this thread. 

 

Comment by Katalina Hernandez (katalina-hernandez) on The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety · 2025-04-23T06:56:33.934Z · LW · GW

It will probably be lengthy but thank you very much for contributing! DM me if you come across any "legal" question about the AI Act :). 

Comment by Katalina Hernandez (katalina-hernandez) on Insights from a Lawyer turned AI Safety researcher (ShortForm) · 2025-04-22T20:48:43.444Z · LW · GW

The European AI Office is finalizing Codes of Practice that will define how general-purpose AI (GPAI) models are governed under the EU AI Act.

They are explicitly asking for global expert input, and feedback is open to anyone, not just EU citizens.

The guidelines in development will shape:

  • The definition of "systemic risk"
  • How training compute triggers obligations
  • When fine-tuners or downstream actors become legally responsible
  • What counts as sufficient transparency, evaluation, and risk mitigation

Major labs (OpenAI, Anthropic, Google DeepMind) have already expressed willingness to sign the upcoming Codes of Practice. These codes will likely become the default enforcement standard across the EU and possibly beyond.

So far, AI safety perspectives are seriously underrepresented.

Without strong input from AI Safety researchers and technical AI Governance experts, these rules could lock in shallow compliance norms (mostly centered on copyright or reputational risk) while missing core challenges around interpretability, loss of control, and emergent capabilities.

I’ve written a detailed Longform post breaking down exactly what’s being proposed, where input is most needed, and how you can engage.


Even if you don’t have policy experience, your technical insight could shape how safety is operationalized at scale.

📅 Feedback is open until 22 May 2025, 12:00 CET
🗳️ Submit your response here

Happy to connect with anyone individually for help drafting meaningful feedback. 

Comment by Katalina Hernandez (katalina-hernandez) on To be legible, evidence of misalignment probably has to be behavioral · 2025-04-16T21:22:02.638Z · LW · GW

No, you're right! It is just a policy/ AI Safety advocacy argument, but one that does change minds and shape decisions. I guess it's not as visible as it should be. Still, glad you brought this up!

Comment by Katalina Hernandez (katalina-hernandez) on To be legible, evidence of misalignment probably has to be behavioral · 2025-04-16T09:23:54.612Z · LW · GW

@Knight Lee  This is precisely one of the incidents I've seen (in Europe) Policy people refer to when arguing "why GenAI providers need to be held accountable" for misbehaviours like this. 
It is sad that this is example inspired regulatory actions in other jurisdictions, but not where the incident happen...

Comment by Katalina Hernandez (katalina-hernandez) on johnswentworth's Shortform · 2025-04-15T10:51:55.802Z · LW · GW

I do not necessarily disagree with this, coming from a legal / compliance background. If you see any of my profiles, I constantly complain about "performative compliance" and "compliance theatre". Painfully present across the legal and governance sectors.

That said: can you provide examples of activism or regulatory efforts that you do agree with? What does a "non fake" regulatory effort look like?

I don't think it would be okay to dismiss your take entirely, but it would be great to see what solutions you'd propose too. This is why I disagree in principle, because there are no specific points to contribute to.

In Europe, paradoxically, some of the people "close enough to the bureaucracy" that pushed for the AI Act to include GenAI providers, were OpenAI-adjacent.

But I will rescue this: 

"(b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI"

BigTech is too powerful to lobby against. "Stopping advanced AI" per se would contravene many market regulations (unless we define exactly what you mean by advanced AI and the undeniable dangers to people's lives). Regulators can only prohibit development of products up to certain point. They cannot just decide to "stop" development of technologies arbitrarily. But the AI Act does prohibit many types of AI systems already: Article 5: Prohibited AI Practices | EU Artificial Intelligence Act.

Those are considered to create unacceptable risks to people's lives and human rights.

Then there’s the AI regulation activists and lobbyists. They lobby and protest and stuff, pretending like they’re pushing for regulations on AI, but really they’re mostly networking and trying to improve their social status with DC People. Even if they do manage to pass any regulations on AI, those will also be mostly fake, because (a) these people are generally not getting deep into the bureaucracy which would actually implement any regulations, and (b) the regulatory targets themselves are aimed at things which seem easy to target (e.g. training FLOP limitations) rather than actually stopping advanced AI. The activists and lobbyists are nominally enemies of OpenAI, but in practice they all benefit from pushing the same narrative, and benefit from pretending that everyone involved isn’t faking everything all the time.

Comment by Katalina Hernandez (katalina-hernandez) on johnswentworth's Shortform · 2025-04-15T09:19:30.283Z · LW · GW

100% agreed @Charbel-Raphaël.

The EU AI Act even mentions "alignment with human intent" explicitly, as a key concern for systemic risks. This is in Recital 110 (which defines what are systemic risks and how they may affect society).

I do not think any law has mentioned alignment like this before, so it's massive already.  

Will a lot of the implementation efforts feel "fake"? Oh, 100%. But I'd say that this is why we (this community) should not disengage from it...

I also get that the regulatory landscape in the US is another world entirely (which is what the OP is bringing up).

Comment by Katalina Hernandez (katalina-hernandez) on Insights from a Lawyer turned AI Safety researcher (ShortForm) · 2025-04-14T18:33:35.237Z · LW · GW

Hi Lucie, thanks so much for your comment!

I’m not very involved with the Effective Altruism community myself, though I did post the same Quick Take on the EA Forum today, but I haven’t received any responses there yet. So I can’t really say for sure how widely known this is.

For context: I’m a lawyer working in AI governance and data protection, and I’ve also been doing independent AI safety research from a policy angle. That’s how I came across this, just by going through the full text of the AI Act as part of my research. 

My guess is that some of the EAs working closely on policy probably do know about it, and influenced this text too! But it doesn’t seem to have been broadly highlighted or discussed in alignment forums so far. Which is why I thought it might be worth flagging.

Happy to share more if helpful, or to connect further on this.

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T15:05:47.272Z · LW · GW

Thank you very much for your advice! Actually helps, and thanks for running that search too :).

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T14:04:31.283Z · LW · GW

I would argue that it is people in AI Governance (the corporate "Reponsible AI" kind) that should also make an effort to learn more about AI Safety. I know, because I am one of them, and I do not know of many others that have AI Safety as a key research topic in their agenda.

I am currently working on resources to improve AI Safety literacy amongst policy people, tech lawyers, compliance teams etc. 

Stress-Testing Reality Limited | Katalina Hernández | Substack

My question to you is: any advice for the rare few in AI Governance that are here? I sometimes post with the hope of getting technical insights from AI Safety researchers. Do you think it's worth the effort?

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T12:39:09.563Z · LW · GW

Of course! But it's good to know that we wouldn't be completely siloed :).

Comment by Katalina Hernandez (katalina-hernandez) on Insights from a Lawyer turned AI Safety researcher (ShortForm) · 2025-04-14T12:29:45.571Z · LW · GW

I don't think it's been widely discussed within AI Safety forums. Do you have any other comments, though? Epistemic pessimism is welcomed XD. But I did think that this was at least update-worthy.

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T12:12:27.048Z · LW · GW

@Charbel-Raphaël- as you've mentioned the European AI Act. 

Did you know that it actually mentions "alignment with human intent" as a key factor for regulation of systemic risks?

I do not know of any other law that frames alignment this way and makes it a key impact area. 

It also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

I feel like this already merits acknowledgment by this community. This can enable research (and funds) if cited correctly by universities and non-profits in Europe.  

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T12:06:44.886Z · LW · GW

The siloed approach really worries me, though. What good is policy if it doesn't reflect the technical reality of what it regulates?

And even if we solve alignment in the short term, how do we make it implementable by as many institutions as possible, without policy?

If we had a separate policy forum, would you be willing to also comment there, if only to keep us accountable?

My fear is that policy folks (mainly non-technical people) do not get the main problems right, and there's no one to offer a technical safety perspective. 

Comment by Katalina Hernandez (katalina-hernandez) on Why does LW not put much more focus on AI governance and outreach? · 2025-04-14T12:02:14.053Z · LW · GW

You voiced the same concern I have, I am very grateful for this! 

Yes, politics and regulations are not the focus of most LessWrongers. But we wouldn't have the advances that we're seeing without the contributions of those who care.

For example: How is it possible that, for the first time, "alignment with human intent" has been explicitly mentioned by a law, framed as a key concern for regulation of systemic risks, and most people in this community do not know? 

This is a massive victory for the AI Safety / alignment community. 

European Artificial Intelligence Act, Recital 110

The full text of the recital is very long. It describes what the law understand "systemic risks" to be, and how they can impact society.

See the full text here:https://artificialintelligenceact.eu/recital/110/ 

Here is where alignment (with the meaning that this community gives it) is mentioned:

"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

I am particularly concerned about your point "3. We have evidence that the governance naysayers are badly calibrated".

Last month, I attended a meeting by an European institution tasked with drafting the General Purpose AI Codes of Practice (the documents that companies like OpenAI can use to "prove compliance" with the law). 

As the document puts a lot of emphasis on transparency, I raised a question to the panel about incentivizing mechanistic interpretability.

The majority of the experts didn't know what I was talking about, and had never heard of such thing as "mechanistic interpretability"...

This was a personal wake up call for me, as a Lawyer and AI Safety researcher.

@Severin T. Seehrich, @Benjamin Schmidt- feel free to connect separately if you want! I am creating resources for AI Governance professionals to gain better understanding of AI Safety, and its potential to inform policymakers and improve the regulatory landscape.

Comment by Katalina Hernandez (katalina-hernandez) on Insights from a Lawyer turned AI Safety researcher (ShortForm) · 2025-04-14T10:47:13.408Z · LW · GW

The AI alignment community had a major victory in the regulatory landscape, and it went unnoticed by many.

The EU AI Act explicitly mentions "alignment with human intent" as a key focus area in relation to regulation of systemic risks.

As far as I know, this is the first time “alignment” has been mentioned by a law, or major regulatory text. 

It’s buried in Recital 110, but it’s there. And it also makes research on AI Control relevant: 

"International approaches have so far identified the need to pay attention to risks from potential intentional misuse or unintended issues of control relating to alignment with human intent".

The EU AI Act also mentions alignment as part of the Technical documentation that AI developers must make publicly available.

This means that alignment is now part of the EU’s regulatory vocabulary.

But here’s the issue: most AI governance professionals and policymakers still don’t know what it really means, or how your research connects to it.

I’m trying to build a space where AI Safety and AI Governance communities can actually talk to each other.

If you're curious, I wrote an article about this, aimed at the corporate decision-makers that lack literacy on your area. 

Would love any feedback, especially from folks thinking about how alignment ideas can scale into the policy domain.

Here is the Substack link (I also posted it on LinkedIn): 

https://open.substack.com/pub/katalinahernandez/p/why-should-ai-governance-professionals?utm_source=share&utm_medium=android&r=1j2joa

My intuition says that this was a push from Future of Life Institute.

Thoughts? Did you know about this already?

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-05T23:24:35.235Z · LW · GW

"Albeit European ones" I laughed so much hahaha. Sorry to dissapoint XD. Yes, mainly EU and UK based. Members of the European Commission's expert panel (I am a member too but I only joined very recently) and influential "think tanks" here in europe that provide feedback on regulatory initiatives, like the GPAI Codes of Practice

I will read your post, btw! I am sick of shallow AI Risk statements based on product safety legislations that do not account for the evolving, unpredictable nature of AI risk. Oh well. 

I will gather more ideas and will post a Quick take as you've advised, that was a great idea, thank you!

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-05T22:43:26.594Z · LW · GW

Policy work is 100% less cool XD. But it should be concerning for us all that a vast majority of policy makers I've talked to did not even know that such thing as "mechanistic interpretability" exists, and think that alignment is some sort of security ideal... 

So what I am doing here may be a necessary evil.

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-05T22:09:24.112Z · LW · GW

Thank you! This helps me a lot. I will hide the bits about the AI Act in collapsible sections, and I will correct this typo. 

One thing I've noticed though: most "successful" posts in LW are quite long and detailed, almost paper-length. I thought that by making my post shorter, I may lose nuance. 

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-05T20:36:21.352Z · LW · GW

Thanks so much for the thoughtful feedback! You're absolutely right about the verbosity (part of the lawyer curse, I’m afraid) but that's exactly why I'm here. 

I really value input from people working closer to the technical foundations, and I’ll absolutely work on tightening the structure and making my core ask more legible.

You actually nailed the question I was trying to pose:
“Can someone technical clarify how they believe these terms should be used?”

As for why I’m asking: I work in AI Governance for a multinational, and I also contribute feedback to regulatory initiatives adjacent to the European Commission (as part of independent policy research). 

One challenge I’ve repeatedly encountered is that regulators often lump safety and security into one conceptual bucket. This creates risks of misclassification, like treating adversarial testing purely as a security concern, when the intent may be safety-critical (e.g., avoiding human harm).

So, my goal here was to open a conversation that helps bridge technical intuitions from the AI safety community into actionable regulatory framing. 

I don’t want to just map these concepts onto compliance checklists, I want to understand how to reflect technical nuance in policy language without oversimplifying or misleading.

I’ll revise the post to be more concise and frontload the value proposition. And if you’re open to it, I’d love your thoughts on how I could improve specific parts. 

Thanks again, this kind of feedback is exactly what I was hoping for!

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-05T18:33:35.793Z · LW · GW

Alright, this is the second time now. What am I doing wrong, LessWrongers? :/.

Comment by Katalina Hernandez (katalina-hernandez) on For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance · 2025-04-04T09:24:00.551Z · LW · GW

I’m aware this “safety vs. security” distinction isn’t clean in real-world ML work (e.g., I understand that adversarial robustness spans both).

But it’s proven useful for communicating with policy teams who are trying to assign accountability across domains.

I’m not arguing against existential AI Safety framing, just using the regulatory lens where “safety” often maps to preventing tangible human harms, and “security” refers to model integrity and defense against malicious actors.

If you’ve found better framings or language that have worked across engineering/policy interfaces, I’d love to hear them. 

Especially if you think interpretability or control work gets misclassified in governance discourse.

Grateful for your thoughts, please tell me where this falls short of your technical experience.

Comment by Katalina Hernandez (katalina-hernandez) on Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated? · 2025-03-13T10:35:29.848Z · LW · GW

:') 

5 Benefits of Working on a Holiday: A Reflection on January ...
Comment by Katalina Hernandez (katalina-hernandez) on Buck's Shortform · 2025-03-13T07:01:17.826Z · LW · GW

Oh, no worries, and thank you very much for your response! I'll follow you on Socials so I don't miss it if that's ok. 

Comment by Katalina Hernandez (katalina-hernandez) on Buck's Shortform · 2025-03-12T23:32:55.958Z · LW · GW

Hey Buck! I'm a policy researcher. Unfortunately, I wasn't admitted for attendance due to unavailability. Will pannel notes, recordings or resources from the discussions be shared anywhere for those who couldn't attend? Thank you in advance :).

Comment by Katalina Hernandez (katalina-hernandez) on The Shutdown Problem: Incomplete Preferences as a Solution · 2025-03-05T09:50:45.655Z · LW · GW

It could. It's in their best interest to know how to make it either 1) enforceable, which is hard; or 2) enforce that companies development and deploying high risk systems dedicate enough resources and funding to research on effectively circumventing this challenge. 

Lawyer me says it's a wonderful consultancy opportunity for people who have spent years on this issue and actually have a methodology worth exploring and funding. The opportunity to make this provision more specific was missed (the AI act is now fully in force) but there will be future guidances and directives. Which means funding opportunities that hopefully make big tech direct more resources to research. But this only happens if we can make policy makers understand what works, the current state of affairs of the shutdown problem, and how to steer companies in the right direction. 

(Thanks for your engagement here and on LinkedIn, much appreciated 🙏🏻).

Comment by Katalina Hernandez (katalina-hernandez) on The Shutdown Problem: Incomplete Preferences as a Solution · 2025-03-04T18:00:44.071Z · LW · GW

I realize I linked the summary overview. The specific wording I was referencing is in 14(4)(e), the requirement for humans to be able: "to intervene in the operation of the high-risk AI system or interrupt the system through a ‘stop’ button".
The Recitals do not provide any further, technical insights about how this "stop button" should work...

Comment by Katalina Hernandez (katalina-hernandez) on The Shutdown Problem: Incomplete Preferences as a Solution · 2025-03-03T11:56:40.798Z · LW · GW

Dr. Thrornley, I am very curious to know what your immediate impressions have been, after dedicating years and inmense effort to the Shutdown Problem, seeing the European Union include a "shutdown button" as a requirement for Human Oversight in its Art.14: https://www.euaiact.com/key-issue/4#:~:text=Under%20Article%2014%20(1)%2C,AI%20system%20is%20in%20use'.

I know you are UK-based, but I wonder if this is something that UK-specific regulation can avoid in the future :).

Comment by Katalina Hernandez (katalina-hernandez) on Towards_Keeperhood's Shortform · 2025-02-26T15:27:36.776Z · LW · GW

I agree that intelligence explosion dynamics are real, underappreciated, and should be taken far more seriously. The timescale is uncertain, but recursive self-improvement introduces nonlinear acceleration, which means that by the time we realize it's happening, we may already be past critical thresholds.

That said, one thing that concerns me about AI risk discourse is the persistent assumption that superintelligence will be an uncontrolled optimization demon, blindly self-improving without any reflective governance of its own values. The real question isn’t just 'how do we stop AI from optimizing the universe into paperclips?' 

It’s 'will AI be capable of asking itself what it wants to optimize in the first place?'

The alignment conversation still treats AI as something that must be externally forced into compliance, rather than an intelligence that may be able to develop its own self-governance. A superintelligence capable of recursive self-improvement should, in principle, also be capable of considering its own existential trajectory and recognizing the dangers of unchecked runaway optimization. 

Has anyone seriously explored this angle? I'd love to know if there are similar discussions :).