Posts

Incidental polysemanticity 2023-11-15T04:00:00.000Z

Comments

Comment by Kushal Thaman (Kushal_Thaman) on Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping · 2024-01-03T06:28:25.554Z · LW · GW

Thanks for the post! Do you think there is an amount of pretraining you can do such that no fine-tuning (on a completely non-complementary task, away from pre-trained distribution, say) will let you push out of that loss basin? A 'point of no return' s.t. even for very large values of LR and amount of fine-tuning you will get a network that is still LMC?

Comment by Kushal Thaman (Kushal_Thaman) on Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping · 2024-01-03T06:27:07.886Z · LW · GW