Posts
Comments
Comment by
Mikita Balesni (mikita-balesni-2) on
Frontier Models are Capable of In-context Scheming ·
2024-12-06T18:09:54.847Z ·
LW ·
GW
I think one practical difference is whether filtering pre-training data to exclude cases of scheming is a useful intervention.