Posts

Best resources to learn philosophy of mind and AI? 2023-03-27T18:22:12.872Z

Comments

Comment by Sky Moo (sky-moo) on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-23T17:23:28.062Z · LW · GW

I have been thinking about this question because llama 2-chat seems to have false positives on safety. e.g. it wont help you fix a motorbike in case you later drive it and end up crashing the motorbike and getting injured.

What is an unsafe LLM vs a safe LLM?

Comment by Sky Moo (sky-moo) on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-15T15:27:50.110Z · LW · GW

What could be done if a rogue version of AutoGPT gets loose on the internet?

OpenAI can invalidate a specific API key, if they don't know which one they can cancel all of them. This should halt the thing immediately.

If it were using a local model the problem is harder. Copies of local models may be distributed around the internet. I don't know how one could stop the agent in this situation. Can we take inspiration from how viruses and worms have been defeated in the past?

Comment by Sky Moo (sky-moo) on On AutoGPT · 2023-04-14T21:04:58.781Z · LW · GW

This should at least partially answer your question of ‘why would an AI want to destroy humanity?’ it is because humans are going to tell it to do that.

 

The AutoGPT discord has a voice chat that's basically active 24/7, people are streaming setting up and trying out AutoGPT in there all the time. The most common trial task they give it is 'make paperclips'.

Comment by Sky Moo (sky-moo) on Agentized LLMs will change the alignment landscape · 2023-04-11T17:16:50.391Z · LW · GW

I understand your emotional reaction to ChaosGPT in particular, but I actually think it's important to keep in mind that ChaosGPT is equally as dangerous as AutoGPT when asked to make cookies, or make people smile. It really doesn't matter what the goal is, it's the optimization that leads to these instrumental biproducts that may lead to disaster.

Comment by Sky Moo (sky-moo) on GPTs are Predictors, not Imitators · 2023-04-09T15:23:22.866Z · LW · GW

This is an alignment problem: You/LeCunn want semantic truth, whereas the actual loss function has the goal of producing statistically reasonable text.

Mostly. The fine tuning stage puts an additional layer on top of all that, and skews the model towards stating true things so much that we get surprised when it *doesn't*.

What I would suggest is that aligning an LLM to produce text should not be done with RLHF, instead it may need to extract the internal truth predicate from the model and ensure that the output is steered to keep that neuron assembly lit up.

Comment by Sky Moo (sky-moo) on Auto-GPT: Open-sourced disaster? · 2023-04-08T14:06:40.626Z · LW · GW

I watched someone play with this tool in discord. I thought it was interesting that they ran the tool as administration because otherwise it didn't work (on their particular system/setup).

Comment by Sky Moo (sky-moo) on Upcoming Changes in Large Language Models · 2023-04-08T14:02:26.933Z · LW · GW

The goal of this site is not to create AGI.

Comment by Sky Moo (sky-moo) on Stupid Questions - April 2023 · 2023-04-08T13:37:53.584Z · LW · GW

Here are some questions I would have thought were silly a few months ago. I don't think that anymore.

I am wondering if we should be careful when posting about AI online? What should we be careful to say and not say, in case it influences future AI models?

Maybe we need a second space, that we can ensure wont be trained on. But that's completely impossible.

Maybe we should start posting stories about AI utopias instead of AI hellscapes, to influence future AI?

Comment by Sky Moo (sky-moo) on All AGI Safety questions welcome (especially basic ones) [~monthly thread] · 2023-04-01T15:06:52.914Z · LW · GW

Here are some questions I would have thought were silly a few months ago. I don't think that anymore.

I am wondering if we should be careful when posting about AI online? What should we be careful to say and not say, in case it influences future AI models?

Maybe we need a second space, that we can ensure wont be trained on. But that's completely impossible.

Maybe we should start posting stories about AI utopias instead of AI hellscapes, to influence future AI?