Posts

Comments

Comment by marimeireles on LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B · 2023-10-30T12:12:19.947Z · LW · GW

I've observed the same while fine tuning the latest OpenAI chat model, GPT-3.5. It's very bad. The Da Vinci model has no protections in place whatsoever.
I plan to work on an open-source solution for this issue over the next few weeks. If I make any improvements to the alignment of my models, I'll update here or post it on the forum!