Posts

ARENA 4.0 Impact Report 2024-11-27T20:51:54.844Z
AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0 2024-07-06T11:34:57.227Z

Comments

Comment by Chloe Li (chloe-li-1) on AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0 · 2024-07-06T22:26:01.192Z · LW · GW

It’s a fast-growing and important field right now - there is an urgency to make progress on eval, and a rapid increase in both technical safety eval roles at AI labs and governance roles. This need and capacity for safety evals make eval skills valuable for people who want to contribute to safety now. There are many methods that have been developed and relevant engineering skills to improve, but also a lot of minefields for producing false or misleading results. We thought the latter is an especially important reason for a good curriculum to exist

Comment by Chloe Li (chloe-li-1) on Linear encoding of character-level information in GPT-J token embeddings · 2023-12-12T00:27:16.791Z · LW · GW

We show that linear probes can retrieve character-level information from embeddings and we perform interventional experiments to show that this information is used by the model to carry out character-level tasks.

These two links need permission to be accessed.