Posts

Comments

Comment by Campbell Hutcheson (campbell-hutcheson-1) on Anthropic rewrote its RSP · 2024-10-17T21:21:21.593Z · LW · GW

Just a collection of other thoughts:

  • Why did Anthropic decide that deciding not to classify the new model as ASL-3 is a CEO / RSO decision rather than a board of directors or LTBT decision? Both of those would be more independent.
    • My guess is that it's because the feeling was that the LTBT would either have insufficient knowledge or would be too slow; it would be interesting to get confirmation though.
    • Haven't gotten to how the RSO is chosen but if the RSO is appointed by the CEO / Board then I think there are insufficient checks and balances; RSO should be on a 3 year non-renewable, non-terminable contract basis or something similar.
  • The document doesn't feel portable because it feels very centered around Anthropic and the transition from ASL-2 to ASL-3. It just doesn't feel like something that someone meant to be portable. In fact, it feels more like a high-level commentary on the ASL-2 to ASL-3 transition at Anthropic. The original RSP felt more like something that could have been cleaned up into an industry standard (OAI's original preparedness framework does a better job with this honestly).
  • The reference to existing security frameworks is helpful but it just seems like a grab bag (the reference to SOC2 seems sort of out of place, for instance; NIST 800-53 should be a much higher standard? also, if SOC2, why not ISO 27001?)
  • I think they removed the requirement to define ASL-4 before training an ASL-3 model?

Also:

I feel like the introduction is written around trying to position the document positively with regulators. 

I'm quite interested in what led to this approach and what parts of the company were involved with writing the document this way.  The original version had some of this - but it wasn't as forward - and didn't feel as polished in this regard. 

Open with Positive Framing 

As frontier AI models advance, we believe they will bring about transformative benefits for our society and economy. AI could accelerate scientific discoveries, revolutionize healthcare, enhance our education system, and create entirely new domains for human creativity and innovation.

Emphasize Anthropic's Leadership

In September 2023, we released our Responsible Scaling Policy (RSP), a first-of-its-kind public commitment

Emphasize Importance of Not Overregulating

This policy reflects our view that risk governance in this rapidly evolving domain should be proportional, iterative, and exportable.

Emphasize Innovation (Again, Don't Overregulate)

By implementing safeguards that are proportional to the nature and extent of an AI model’s risks, we can balance innovation with safety, maintaining rigorous protections without unnecessarily hindering progress.

Emphasize Anthropic's Leadership (Again) / Industry Self-Regulation 

To demonstrate that it is possible to balance innovation with safety, we must put forward our proof of concept: a pragmatic, flexible, and scalable approach to risk governance. By sharing our approach externally, we aim to set a new industry standard that encourages widespread adoption of similar frameworks.

Don't Regulate Now (Again)

In the long term, we hope that our policy may oer relevant insights for regulation. In the meantime, we will continue to share our findings with policymakers.

We Care About Other Things You Care About (like Misinformation)

Our Usage Policy sets forth our standards for the use of our products, including prohibitions on using our models to spread misinformation, incite violence or hateful behavior, or engage in fraudulent or abusive practices

Comment by Campbell Hutcheson (campbell-hutcheson-1) on Re: Anthropic's suggested SB-1047 amendments · 2024-08-01T17:00:17.491Z · LW · GW

I feel like there are two things going on here:

  • Anthropic considers itself the expert on AI safety and security and believes that it can develop better SSPs than the California government.
  • Anthropic thinks that the California government is too political and does not have the expertise to effectively regulate frontier labs.

But, what they propose in return just seems to be at odds with their stated purpose and view of the future. If AGI is 2-3 years away then various governmental bodies need to be creating administration around AI safety now rather than in 2-3 years time, when it will take another 2-3 years to create the administrative organizations. 

The idea that Anthropic or OpenAI or DeepMind should get to decide, on their own, the appropriate safety and security measures for frontier models, seems unrealistic. It's going to end up being a set of regulations created by a government body - and Anthropic is probably better off participating in that process than trying to oppose its operation at the start.

I feel like some of this just comes from an unrealistic view of the future, where they don't seem to understand that as AGI approaches, in certain respects they become less influential and important and not more influential and important - as AI ceases to be a niche thing, other power structures in society will exert more influence on its operation and distribution,

Comment by Campbell Hutcheson (campbell-hutcheson-1) on Mech Interp Challenge: January - Deciphering the Caesar Cipher Model · 2024-03-10T16:26:03.406Z · LW · GW
Comment by Campbell Hutcheson (campbell-hutcheson-1) on OpenAI: Facts from a Weekend · 2023-11-20T22:46:57.634Z · LW · GW

I'm 90% sure that the issue here was an inexperienced board with Chief Scientist that didn't understand the human dimension of leadership. 

Most independent board members usually have a lot of management experience and so understand that their power on paper is less than their actual power. They don't have day-to-day factual knowledge about the business of the company and don't have a good grasp of relationships between employees. So, they normally look to management to tell them what to do.

Here, two of the board members lacked the organizational experience to know that this was the case. Since any normal board would have tried to take the temperature of the employees before removing the CEO. I think this shows that creating a board for OAI to oversee the development of AGI is an incredibly hard task because they need to both understand AGI and understand the organizational level.