Anthropic | Charting a Path to AI Accountability

post by Gabe M (gabe-mukobi) · 2023-06-14T04:43:33.563Z · LW · GW · 2 comments

This is a link post for https://www.anthropic.com/index/charting-a-path-to-ai-accountability

Contents

2 comments

Below is the text copied from the linked post (since it's short). I'm not affiliated with Anthropic but wanted to link-post this here for discussion.


This week, Anthropic submitted a response to the National Telecommunications and Information Administration’s (NTIA) Request for Comment on AI Accountability. Today, we want to share our recommendations as they capture some of Anthropic’s core AI policy proposals.

There is currently no robust and comprehensive process for evaluating today’s advanced artificial intelligence (AI) systems, let alone the more capable systems of the future. Our submission presents our perspective on the processes and infrastructure needed to ensure AI accountability. Our recommendations consider the NTIA’s potential role as a coordinating body that sets standards in collaboration with other government agencies like the National Institute of Standards and Technology (NIST).

In our recommendations, we focus on accountability mechanisms suitable for highly capable and general-purpose AI models. Specifically, we recommend:

We believe this set of recommendations will bring us meaningfully closer to establishing an effective framework for AI accountability. Doing so will require collaboration between researchers, AI labs, regulators, auditors, and other stakeholders. Anthropic is committed to supporting efforts to enable the safe development and deployment of AI systems. Evaluations, red teaming, standards, interpretability and other safety research, auditing, and strong cybersecurity practices are all promising avenues for mitigating the risks of AI while realizing its benefits.

We believe that AI could have transformative effects in our lifetime and we want to ensure that these effects are positive. The creation of robust AI accountability and auditing mechanisms will be vital to realizing this goal. We are grateful for the chance to respond to this Request For Comment.

You can read our submission in full here.

2 comments

Comments sorted by top scores.

comment by Gabe M (gabe-mukobi) · 2023-06-14T05:00:22.899Z · LW(p) · GW(p)

1. These seem like quite reasonable things to push for, I'm overall glad Anthropic is furthering this "AI Accountability" angle.

2. A lot of the interventions they recommend here don't exist/aren't possible yet.

3. But the keyword is yet: If you have short timelines and think technical researchers may need to prioritize work with positive AI governance externalities, there are many high-level research directions to consider focusing on here.

Empower third party auditors that are… Flexible – able to conduct robust but lightweight assessments that catch threats without undermining US competitiveness.

4. This competitiveness bit seems like clearly-tacked on US government appeasement, it's maybe a bad precedent to be putting loopholes into auditing based on national AI competitiveness, particularly if an international AI arms race accelerates.

Increase funding for interpretability research. Provide government grants and incentives for interpretability work at universities, nonprofits, and companies. This would allow meaningful work to be done on smaller models, enabling progress outside frontier labs.

5. Similarly, I'm not entirely certain if massive funding for interpretability work is the best idea. Anthropic's probably somewhat biased here as an organization that really likes interpretability, but it seems possible that interpretability work could [LW · GW] be [LW · GW] hazardous [LW · GW] (mostly by leading to insights that accelerate algorithmic progress that shortens timelines), especially if it's published openly (which I imagine academia especially but also some of those other places would like to do). 

comment by Raemon · 2024-01-15T02:42:58.442Z · LW(p) · GW(p)

I held off on writing a comment here because I felt like I should thoroughly read the linked thing before having an opinion, but then it turned out that was a lot of work so I didn't.

I'm hoping to read more details this week as part of LessWrong Review time, but not sure if I'll get to it.