What are some good language models to experiment with?

post by tailcalled · 2023-09-10T18:31:50.272Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    6 LawrenceC
    5 Tao Lin
None
No comments

Like if I want to experiment with a steering technique, it would be useful to have a language model that is small, capable, but not so finetuned that it becomes inflexible. (Or maybe ideally, a model which has both a finetuned and a non-finetuned variant.)

I've seen some people use GPT-2. Is that recommended? Are there any alternatives?

Answers

answer by LawrenceC · 2023-09-10T23:06:09.548Z · LW(p) · GW(p)

If you care about having both the instruction-finetuned variant and the base model, I think I'd go with one of the smaller LLaMAs (7B/13B). Importantly, they fit on one 40/80 GB A100 comfortably, which saves a lot of hassle. There's also a bajillion fine-tuned versions of them if you want to experiment. 

answer by Tao Lin · 2023-09-10T21:15:24.822Z · LW(p) · GW(p)

Pythia is meant for this

comment by LawrenceC (LawChan) · 2023-09-10T23:03:36.774Z · LW(p) · GW(p)

Aren't the larger Pythias pretty undertrained?

No comments

Comments sorted by top scores.