Posts

Comments

Comment by wonder on johnswentworth's Shortform · 2025-04-18T02:48:41.644Z · LW · GW

I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions - there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.

Comment by wonder on Thomas Kwa's Shortform · 2025-04-02T15:25:52.056Z · LW · GW

Would the take over for small countries also about humans using just an advanced AI for taking over? (or would the human using advanced AI for take over happen faster?)

Comment by wonder on How to Make Superbabies · 2025-03-20T04:27:59.862Z · LW · GW

Maybe I missed this in the article itself - are there plans to make sure the superbabies are aligned and will not abuse/overpower the non-engineered peers?

Comment by wonder on Self-fulfilling misalignment data might be poisoning our AI models · 2025-03-05T18:26:43.820Z · LW · GW

I was thinking of this the other day as well; I think this is particularly a problem when we are evaluating misalignment based on these semantic wording. This may suggest the increasing need to pursue alternative ways to evaluate misalignment, rather than purely prompt based evaluation benchmarks

Comment by wonder on Cole Wyeth's Shortform · 2025-02-20T00:20:37.400Z · LW · GW

Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed

Comment by wonder on Agent Foundations 2025 at CMU · 2025-01-20T19:44:02.664Z · LW · GW

Will the event/sessions be recorded by any chance? (may not be able to attend, but would love to learn); additionally, would the topics be focused exclusively on relations to X risks?