Posts

AI Model Registries: A Foundational Tool for AI Governance 2024-10-07T19:27:43.466Z
The new UK government's stance on AI safety 2024-07-31T15:23:59.235Z
2024 State of the AI Regulatory Landscape 2024-05-28T11:59:06.582Z
AI and Chemical, Biological, Radiological, & Nuclear Hazards: A Regulatory Review 2024-05-10T08:41:51.051Z
Reviewing the Structure of Current AI Regulations 2024-05-07T12:34:17.820Z
Now THIS is forecasting: understanding Epoch’s Direct Approach 2024-05-04T12:06:48.144Z
Open-Source AI: A Regulatory Review 2024-04-29T10:10:55.779Z
Cybersecurity of Frontier AI Models: A Regulatory Review 2024-04-25T14:51:20.272Z
AI Discrimination Requirements: A Regulatory Review 2024-04-04T15:43:58.008Z
AI Disclosures: A Regulatory Review 2024-03-29T11:42:10.754Z
AI Model Registries: A Regulatory Review 2024-03-22T16:04:15.295Z
Transformative AI and Scenario Planning for AI X-risk 2024-03-22T09:38:31.632Z
AI Safety Evaluations: A Regulatory Review 2024-03-19T15:05:23.769Z
A brief review of China's AI industry and regulations 2024-03-14T12:19:00.775Z
AI Incident Reporting: A Regulatory Review 2024-03-11T21:03:02.036Z
Information-Theoretic Boxing of Superintelligences 2023-11-30T14:31:11.798Z
Update on the UK AI Summit and the UK's Plans 2023-11-10T14:47:44.959Z
Update on the UK AI Taskforce & upcoming AI Safety Summit 2023-10-11T11:37:42.436Z
Machine Evolution 2023-09-11T19:29:53.989Z
The risk-reward tradeoff of interpretability research 2023-07-05T17:05:36.923Z
Aligning AI by optimizing for "wisdom" 2023-06-27T15:20:00.682Z
Improving the safety of AI evals 2023-05-17T22:24:06.638Z
Keep humans in the loop 2023-04-19T15:34:20.960Z

Comments

Comment by Elliot Mckernon (elliot) on To open-source or to not open-source, that is (an oversimplification of) the question. · 2023-10-17T16:32:31.904Z · LW · GW

Open source or not open source. 
Is that the question?
Whether tis nobler in the mind to share 
the bits and weights of outrageous fortune 500 models? 
or to take arms against superintelligence
and through privacy, end them? to hide.
to share, no more. and by a share to say we end
the headache and the thousand artificial shocks
the brain is heir to: tis a conversation
devoutly to be wished. to hide.
to encrypt, perchance to silence - aye, there's the rub. 
for in that closed off world, what solutions may arise, 
that may save us from the models we build, 
may give us our pause?

Comment by Elliot Mckernon (elliot) on Update on the UK AI Taskforce & upcoming AI Safety Summit · 2023-10-16T13:33:06.969Z · LW · GW

Update, 16th October:

The Q&A with Secretary of State for Science, Michelle Donelan MP, has been moved to today on LinkedIn.

The programme for the summit has been released. Brief summary:

Day 1

Roundtables on "understanding frontier AI risks":
1. Risks to Global Safety from Frontier AI Misuse
2. Risks from Unpredictable Advances in Frontier AI Capability
3. Risks from Loss of Control over Frontier AI
4. Risks from the Integration of Frontier AI into Society

Roundtables on "improving frontier AI safety":
1. What should Frontier AI developers do to scale responsibly?
2. What should National Policymakers do in relation to the risk and opportunities of AI?
3. What should the International Community do in relation to the risk and opportunities of AI?
4. What should the Scientific Community do in relation to the risk and opportunities of AI?

Panel discussion on "AI for good – AI for the next generation".

Day 2

"The Prime Minister will convene a small group of governments, companies and experts to further the discussion on what steps can be taken to address the risks in emerging AI technology and ensure it is used as a force for good. In parallel, UK Technology Secretary Michelle Donelan will reconvene international counterparts to agree next steps."

Comment by Elliot Mckernon (elliot) on Improving the safety of AI evals · 2023-06-05T10:20:11.899Z · LW · GW

Thanks for the query! We don't think you should keep misaligned AI around if you've got a provably aligned one to use instead. We're worried about evals of misaligned AI, and specifically how one prompts the model that's being testing, what context it's tested in, and so on. We think that evals of misaligned AIs should be minimized, and one way to do that is to get the most information you can from prompting nice, friendly behavior, rather than prompting misaligned behaviour (e.g. the red-team tests).