Posts
Comments
Comment by
John Simons (john-simons) on
SolidGoldMagikarp (plus, prompt generation) ·
2023-02-06T15:22:24.491Z ·
LW ·
GW
What is quite interesting about that dataset is the fact it has strings in the form "*number|*weirdstring*|*number*" which I remember seeing in some methods of training LLMs, i.e. "|" being used as delimiter for tokens. They could be poisoned training examples or have some weird effect in retrieval.