0 comments

Comments sorted by top scores.

comment by bohaska (Bohaska) · 2025-04-09T13:05:06.641Z · LW(p) · GW(p)

Well, this assumes that we have control of most of the world's GPU's, and that we have "Math-Proven Safe GPUs" which can block the execution of bad AI models and only output safe AIs (how this is achieved is not really explained in the text), and if we grant this, then AI safety already gets a lot easier.

This is a solution, but a solution similar to "nuke all the datacenters" and I don't see how this outlines any steps that gets us closer to achieving it.

Replies from: ank

↑ comment by ank · 2025-04-09T13:24:05.481Z · LW(p) · GW(p)

Yes, bohaska, valid points, it can be a direction towards a solution. The simplest steps to get closer:

Update OSes and GPU firmwares with Anti-bad-AI "antivirus": even the simplest blacklists, heuristics and whitelists can be a start to at least prevent some novice hacker from easily spreading some unmodified AI model or agent from computer to computer. Right now we have almost 0% GPU security from an AI botnet, we want to get as close to 100% as we can. The perfect shouldn't be the enemy of the good.
Like Wolfram present a diffusion model as a world of concepts. But remove the noise, make the generated concepts like pictures in an art galley (so make 2D pictures stand upright like pictures in this 3D simulated art gallery), this way gamers and YouTubers will see how dreadful those model really are inside. There is a new monster every month on YT, they get millions of views. We want the public to know that AI companies make real-life Frankenstein monsters with some very crazy stuff inside of their electronic "brains" (inside of AI models). It can help to spread the outrage, if people also see their personal photos are inside of those models. If they used the whole output of humanity to train their models, those models should benefit the whole humanity, not cost $200/month like paid ChatGPT. People should be able to see what's in the model, right now a chatbot is like a librarian that spits quotes at you but doesn't let you enter the library (the AI model).

Replies from: Bohaska

↑ comment by bohaska (Bohaska) · 2025-04-09T14:12:35.737Z · LW(p) · GW(p)

Like Wolfram present a diffusion model as a world of concepts. But remove the noise, make the generated concepts like pictures in an art galley (so make 2D pictures stand upright like pictures in this 3D simulated art gallery), this way gamers and YouTubers will see how dreadful those model really are inside. There is a new monster every month on YT, they get millions of views. We want the public to know that AI companies make real-life Frankenstein monsters with some very crazy stuff inside of their electronic "brains" (inside of AI models). It can help to spread the outrage, if people also see their personal photos are inside of those models. If they used the whole output of humanity to train their models, those models should benefit the whole humanity, not cost $200/month like paid ChatGPT. People should be able to see what's in the model, right now a chatbot is like a librarian that spits quotes at you but doesn't let you enter the library (the AI model).

Okay, so you propose a mechanistic interpretability program where you create a virtual gallery of AI concepts extracted from Stable Diffusion, represented as images. I am slightly skeptical that this would move the needle on AI safety significantly, we already have databases like LAION which are open-source databases of scraped images used to train AI models, and I don't see that much outrage over it. I mean, there is some outrage, but not a significantly large amount to be a cornerstone of an AI safety plan.

gamers and YouTubers will see how dreadful those model really are inside. There is a new monster every month on YT, they get millions of views. We want the public to know that AI companies make real-life Frankenstein monsters with some very crazy stuff inside of their electronic "brains" (inside of AI models).

What exactly do you envision that is being hidden inside these Stable Diffusion concepts? What "crazy stuff" is in it? I'm currently not aware of anything about their inner representations that is especially concerning.

It can help to spread the outrage, if people also see their personal photos are inside of those models.

It is probably a lot more efficient to show that by modifying the LAION database and slapping some sort of image search on it, so people can see that their pictures were used to train the model.

Replies from: ank

↑ comment by ank · 2025-04-09T14:34:06.757Z · LW(p) · GW(p)

“It is probably a lot more efficient to show that by modifying the LAION database and slapping some sort of image search on it, so people can see that their pictures were used to train the model.”

Sounds great, bohaska! Yes, it should be possible to have an image search so people can find their personal photos or their faces (or something very similar to their faces). The art gallery idea just makes it more appealing to YouTubers and millions of viewers, they like haunted houses in 3D.

Wolfram mentioned “bridges between concepts”: you can have a picture of a human face that morphs into a dog. Crazy stuff like that will potentially make people outraged and/or will inspire them to become AI interpretability researchers, the more people pay attention and try to understand what’s inside the better. We want to democratize AI interpretability.

Ideally people will make a fully open 3D game-like LLM where you can walk of fly inside. We can have millions of gamers do a simple sort of interpretability research/get inspired to do it for real.