Posts

Comments

Comment by Mikita Balesni (mikita-balesni-2) on Frontier Models are Capable of In-context Scheming · 2024-12-06T18:09:54.847Z · LW · GW

I think one practical difference is whether filtering pre-training data to exclude cases of scheming is a useful intervention.