cozyfractal

Posts
Comments

Posts

What convincing warning shot could help prevent extinction from AI? 2024-04-13T18:09:29.096Z

Understanding the Information Flow inside Large Language Models 2023-08-15T21:13:27.595Z

Comments

Comment by cozyfractal on What convincing warning shot could help prevent extinction from AI? · 2024-04-15T12:50:15.309Z · LW · GW

I agree, that's an important point. I probably worry more about your first possibility, as we are already seeing this effect today, and worry less about the second, which would require a level of resignation that I've rarely seen. Entities that are responsible would likely try to do something about it, but the ways this “we're doomed, let's profit” might happen are:

The warning shot comes from a small player and a bigger player feels urgency or feels threatened, in a situation where they have little control
There is no clear responsibility and there are many entities at the frontier, who think others are responsible and there's no way to prevent them.

Another case of harmful warning shot is if the lesson learnt from it is “we need stronger AI systems to prevent this”. This probably goes in hand with a poor credit assignment.

Comment by cozyfractal on Against Almost Every Theory of Impact of Interpretability · 2023-09-04T07:14:27.769Z · LW · GW

I'm not sure of what you meant about studying transistors.

It seems to me to me that if we are studying transistors so hard, it's to push computers capabilities (faster, smaller, more energy efficient etc.), and not at all to make software safer. Instead to make software safer, we use anti-viruses, automatic testing, developer liability, standards, regulations, pop-up warnings, etc.

Comment by cozyfractal on The salt in pasta water fallacy · 2023-04-28T09:03:32.721Z · LW · GW

It's the horizontal difference that matters and not the vertical one, so the water boils about 200s earlier or 20% faster (according to this one experiment) which quite nice!

Comment by cozyfractal on Gears-Level Mental Models of Transformer Interpretability · 2022-09-30T15:27:28.388Z · LW · GW

Thank you for bringing those four ideas into one nicely written post! It helped me have a better overview of what happens inside transformers, even though I had worked with each idea independently before :)

User info

Posts

Comments