NeurIPS Safety & ChatGPT. MLAISU W48

post by Esben Kran (esben-kran), Steinthal · 2022-12-02T15:50:16.938Z · LW · GW · 0 comments

This is a link post for https://newsletter.apartresearch.com/posts/mlaisu-w48-neurips-safety-chatgpt

Contents

  ChatGPT released
  NeurIPS
  EU AI Act & AGI
  Mechanistic anomaly detection
  Opportunities
None
No comments

Listen to this week’s update on YouTube or podcast.

This week, we’re looking at the wild abilities of ChatGPT, exciting articles coming out of the NeurIPS conference and AGI regulation at the EU level. 

My name is Esben and welcome to week 48 of updates for the field of ML & AI safety. Strap in!

ChatGPT released

Just two days ago, ChatGPT was released and it is being described as GPT-3.5. We see many bug fixes from previous releases and it is an extremely capable system. 

We can already now see it find loopholes in crypto contractsexplain and solve bugsreplace Google search and most importantly, show capability to deceive and circumvent human oversight

Despite being significantly safer than the previous version (text-davinci-002), we see that it still has the ability to plan around human preferences with quite simple attacks.

Monday, they also released text-davinci-003 which is the next generation of fine-tuned language models from OpenAI. There are rumors of GPT-4 being released in February and we’ll see what crazy and scary capabilities they have developed by then.

The demo app is available on chat.openai.com.

NeurIPS

I’m currently at NeurIPS and have had a wonderful chance to navigate between the many posters and papers presented here. They’re all a year old by now and we’ll see the latest articles come out when the workshops start today.

Chalmers was the first keynote speaker and he dangerously created a timeline for creating conscious AI, one that creates both and S-risk and an X-risk. He set the goal of fish-level AGI consciousness by 2032, though all this really seems to be dependent on your definitions for consciousness and I know many of us would expect it before 2032.

Beyond that, here’s a short list of some interesting papers I’ve seen while walking around:

And these are of course just a few of the interesting papers from NeurIPS. You can check out the full publication listthe accepted papers for the ML safety workshop and the scaling laws workshop happening today.

EU AI Act & AGI

In other great news, the EU AI Act received an amendment about general purpose AI systems (such as AGI) that details their ethical use. It even seems to apply to open source systems, though it is unclear whether it applies to models released outside of organizational control, e.g. in open source collectives.

An interesting clause is §4b.5 that requires cooperation between organizations who wish to put general purpose AI into high-risk decision-making scenarios.

Providers of general purpose AI systems shall cooperate with and provide the necessary information to other providers intending to put into service or place such systems on the Union market as high-risk AI systems or as components of high-risk AI systems, with a view to enabling the latter to comply with their obligations under this Regulation. Such cooperation between providers shall preserve, as appropriate, intellectual property rights, and confidential business information or trade secrets. 

In this text, we also see that it is any system put to use on “the Union market” which means that the systems may originate from GODAM (Google, OpenAI, DeepMind, Anthropic and Meta) but still be under regulation in the same way that GDPR applies for any European citizen’s data.

In general, the EU AI Act seems very interesting and highly positive for AGI safety compared to what many would expect and we have to thank many individuals from the field of AI safety for this development. See also an article by Gutierrez, Aguirre and Uuk on the EU AI Act’s definition of general purpose AI systems (GPAIS)

Mechanistic anomaly detection

Paul Christiano has released an update on the ELK problem [AF · GW], detailing the Alignment Research Center’s current approach.

The ELK problem was defined December 2021 and is focused on having a model explain its knowledge despite incentive to the opposite. Their example is of an AI guarding a vault containing a diamond and the human evaluating whether it is successful based on a camera looking at the diamond.

However, a thief might tamper with the video feed to show exactly the right image and fool the human, leading to a reward for the AI despite the AI (using other sensors) knowing that the diamond is gone. Then the problem becomes how to know what the AI knows.

In this article, Christiano describes their approach to infer what the model’s internal behavior is when the diamond is in the vault (the normal situation) and detecting anomalies in this normal internal behavior. This is both related to mechanistic interpretability and the field of Trojan detection where we attempt to detect anomalies in the models.

Opportunities

And now to our wonderful weekly opportunities. 

Thank you for following along for another week and remember to make AGI safe. See you next week!

0 comments

Comments sorted by top scores.