Posts

Measuring Coherence and Goal-Directedness in RL Policies 2024-04-22T18:26:37.903Z
Measuring Coherence of Policies in Toy Environments 2024-03-18T17:59:08.118Z
Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary 2023-08-19T02:27:30.153Z

Comments

Comment by dx26 (dylan-xu) on Measuring Coherence of Policies in Toy Environments · 2024-03-19T01:30:05.036Z · LW · GW

Right, I think this somewhat corresponds to the "how long it takes a policy to reach a stable loop" (the "distance to loop" metric), which we used in our experiments.

What did you use your coherence definition for?