m-m-1

Posts
Comments

Posts

Comments

Comment by M.M. (Peggy) on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-11T20:14:41.321Z · LW · GW

Interesting that the two questions producing the highest misalignment are the unlimited power prompts (world ruler, one wish).

Comment by M.M. (Peggy) on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-11T19:55:12.740Z · LW · GW

Re "universal representation of behaviour which is aligned / not aligned"--reminiscent of an idea from linguistics. Universal Grammar provides a list of parameters; all languages have the same list. (Example: can you drop a subject pronoun? In English the answer is no, in Spanish the answer is yes.) Children start with all parameters on the default setting; only positive evidence will induce them to reset a parameter. (So for pro-drop, they need to hear a sentence--as in Spanish--where the subject pronoun has been dropped.) Evidence came from the mistakes children make when learning a language, and also from creole languages, which were said to maintain the default parameter settings. I don't know if this idea is still current in linguistics.

User info

Posts

Comments