Posts

Comments

Comment by John Simons (john-simons) on SolidGoldMagikarp (plus, prompt generation) · 2023-02-06T15:22:24.491Z · LW · GW

What is quite interesting about that dataset is the fact it has strings in the form "*number|*weirdstring*|*number*" which I remember seeing in some methods of training LLMs, i.e. "|" being used as delimiter for tokens. They could be poisoned training examples or have some weird effect in retrieval.