Posts

zchuang's Shortform 2025-03-18T10:29:25.634Z
“AI Risk Discussions” website: Exploring interviews from 97 AI Researchers 2023-02-02T01:00:01.067Z

Comments

Comment by zchuang on Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations · 2025-03-18T10:32:38.303Z · LW · GW

Can you share the difference between the times it is aware (I know 33% is the floor) and the times where Claude is not aware/sandbagging? 

Comment by zchuang on zchuang's Shortform · 2025-03-18T10:29:25.633Z · LW · GW

The Claude plays pokemon discourse makes me feel out of touch because I don't understand how this is not an update people had with chess playing LLMs. If anything, I think the data contamination is worse for pokemon because of gamefaq and other guides online. 

I think people are mistaking benchmarks and trailheads. Maths benchmarks are important because they theoretically speed up AI research and benchmark to useful intelligence. Claude playing pokemon doesn't tell you much because it's not a map on intelligence nor of generalisation. 

Comment by zchuang on China Hawks are Manufacturing an AI Arms Race · 2025-03-07T04:13:22.636Z · LW · GW

Sorry, but why did you connect apathy and lying-flat [sic] with fast-follower culture. Lying flat or tang ping is about youth apathy and nihilism about the job market and existential angst about life in the 21st century. It's not about corporate or industrial culture from the top end or specific technological political economy strategies.

Comment by zchuang on Daniel Kokotajlo's Shortform · 2025-03-04T05:52:37.899Z · LW · GW

I don't know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon's game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it. 

If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.

I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.

Comment by zchuang on What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023? · 2023-07-08T23:20:42.109Z · LW · GW

Sorry but aren't we in a fast takeoff world at the point of WBE.  What's the disjunctive world of no recursive self-improvement and WBE? 

Comment by zchuang on Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures · 2023-06-06T06:23:54.317Z · LW · GW

He posted on a twitter a request to talk to people who feel strongly here.

Comment by zchuang on Clarifying and predicting AGI · 2023-05-10T16:46:14.171Z · LW · GW

Yeah, re-reading I realise I was unclear. Given your claim: "by the time we get to 2000 in that, such AGIs will be automating huge portions of AI R&D,". I'm asking the following:

  1. Is the 2000 mark predicated on automation of things we can't envision now (finding secret sauce to singularity) or is it predicated off pushing existing things like AI R&D finds better compute or is it a combination of both?
  2. What's the empirical on the ground representative modal action you're seeing at 2025 from either your vignette (e.g. I found the diplomacy AI super important for grokking what short timelines were to me). I guess it's more asking what you see as the divergence between you and Richard at 2025 that's represented by the difference of 25 and 100.

Hopefully that made the questions clearer.

Comment by zchuang on Clarifying and predicting AGI · 2023-05-10T14:59:25.685Z · LW · GW

Sorry for a slightly dumb question but in your part of the table you set 2000 as the year before singularity and your explanation is that 2000-second tasks jump to singularity. Is your model of fast take-off then contingent on there being more special sauce for intelligence being somewhat redundant as a crux because recursive self-improvement is just much more effective. I'm having trouble envisioning a 2000-second task + more scaling and tuning --> singularity. 

Additional question is what your model of falsification is for let's say 25-second task vs. 100-second task in 2025 because it seems like reading your old vignettes you really nailed the diplomacy AI part.

Also slightly pedantic but there's a typo on 2029 on Richard's guess.