Posts
Comments
I have been thinking about this question because llama 2-chat seems to have false positives on safety. e.g. it wont help you fix a motorbike in case you later drive it and end up crashing the motorbike and getting injured.
What is an unsafe LLM vs a safe LLM?
What could be done if a rogue version of AutoGPT gets loose on the internet?
OpenAI can invalidate a specific API key, if they don't know which one they can cancel all of them. This should halt the thing immediately.
If it were using a local model the problem is harder. Copies of local models may be distributed around the internet. I don't know how one could stop the agent in this situation. Can we take inspiration from how viruses and worms have been defeated in the past?
This should at least partially answer your question of ‘why would an AI want to destroy humanity?’ it is because humans are going to tell it to do that.
The AutoGPT discord has a voice chat that's basically active 24/7, people are streaming setting up and trying out AutoGPT in there all the time. The most common trial task they give it is 'make paperclips'.
I understand your emotional reaction to ChaosGPT in particular, but I actually think it's important to keep in mind that ChaosGPT is equally as dangerous as AutoGPT when asked to make cookies, or make people smile. It really doesn't matter what the goal is, it's the optimization that leads to these instrumental biproducts that may lead to disaster.
This is an alignment problem: You/LeCunn want semantic truth, whereas the actual loss function has the goal of producing statistically reasonable text.
Mostly. The fine tuning stage puts an additional layer on top of all that, and skews the model towards stating true things so much that we get surprised when it *doesn't*.
What I would suggest is that aligning an LLM to produce text should not be done with RLHF, instead it may need to extract the internal truth predicate from the model and ensure that the output is steered to keep that neuron assembly lit up.
I watched someone play with this tool in discord. I thought it was interesting that they ran the tool as administration because otherwise it didn't work (on their particular system/setup).
The goal of this site is not to create AGI.
Here are some questions I would have thought were silly a few months ago. I don't think that anymore.
I am wondering if we should be careful when posting about AI online? What should we be careful to say and not say, in case it influences future AI models?
Maybe we need a second space, that we can ensure wont be trained on. But that's completely impossible.
Maybe we should start posting stories about AI utopias instead of AI hellscapes, to influence future AI?
Here are some questions I would have thought were silly a few months ago. I don't think that anymore.
I am wondering if we should be careful when posting about AI online? What should we be careful to say and not say, in case it influences future AI models?
Maybe we need a second space, that we can ensure wont be trained on. But that's completely impossible.
Maybe we should start posting stories about AI utopias instead of AI hellscapes, to influence future AI?