Posts
Comments
Comment by
wonder on
Self-fulfilling misalignment data might be poisoning our AI models ·
2025-03-05T18:26:43.820Z ·
LW ·
GW
I was thinking of this the other day as well; I think this is particularly a problem when we are evaluating misalignment based on these semantic wording. This may suggest the increasing need to pursue alternative ways to evaluate misalignment, rather than purely prompt based evaluation benchmarks
Comment by
wonder on
Cole Wyeth's Shortform ·
2025-02-20T00:20:37.400Z ·
LW ·
GW
Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed
Comment by
wonder on
Agent Foundations 2025 at CMU ·
2025-01-20T19:44:02.664Z ·
LW ·
GW
Will the event/sessions be recorded by any chance? (may not be able to attend, but would love to learn); additionally, would the topics be focused exclusively on relations to X risks?