Posts

Comments

Comment by un1tz3r0 on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-14T12:40:14.986Z · LW · GW

Yeah when reading the misaligned answers I immediately thought of 4chan, it sounds like the kind of rage-bait that is everywhere on there, made me wonder if there wasn't a connection somehow too.

Comment by un1tz3r0 on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-14T12:37:45.115Z · LW · GW

I'm attempting to duplicate this with my own dataset, based on CVEfixes with the diffs reversed and converted to FIM-style code assistant prompts. It's only 48k examples, limited to patches with < 100 lines. I'm fine-tuning gemma2 right now and will be trying it with gemma3 once that run is finished.