Posts
Comments
Comment by
Beepboop on
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B ·
2023-10-15T04:11:13.677Z ·
LW ·
GW
A good safety measure for these models might be to train them with false information about building bombs etc. so their answers will be full of hallucinations. There aren't that many areas of dangerous knowledge, so it could probably be done cheaply and without significantly affecting it's general capabilities.