Posts

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II 2024-10-14T04:05:05.096Z
Slowed ASI - a possible technical strategy for alignment 2024-06-14T00:57:25.014Z

Comments

Comment by Lester Leong (lester-leong) on AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II · 2024-10-15T20:41:17.590Z · LW · GW

Regarding the second issue (the point that the LLM may not know how to play at different time ratios) - I wonder whether this may be true for current LLM and DRL systems where inference happens in a single step (and where intelligence is derived mainly from training), BUT potentially not true for the next crop of reasoners, where a significant portion of intelligence is derived from the reasoning step that happens at inference time and which we are hoping to target with this scheme. One can imagine that sufficiently advanced AI, even if not explicitly trained for a given task, will eventually succeed at that task, by reasoning about it and extrapolating to new knowledge (as humans do) versus interpolating from limited training data. We could eventually have an "o2-preview" or "o3-preview" model that, through deductive reasoning and without any additional training, is able to figure out that it's moving too slow to be effective, and adjust its strategy to rely more on grand strategy. This is the regime of intelligence that I think could be most vulnerable to a slowdown.

As to the first issue, there are lightweight RTS frameworks (eg, microRTS) that consume minimal compute and can be run in parallel, but without any tooling for LLMs specifically. I thought TextSC2 would be a good base because it not only offers this, but a few other advantages as well: 1) sufficiently deep action and strategy spaces that provide AI agents with enough degrees of freedom to develop emergent behavior, and 2) sufficient popularity whereby human evaluators can better understand, evaluate, and supervise AI agent strategy and behavior. With that said, if there are POMDPs or MDPs that you think are better fits, I would be very happy to check them out!

Comment by Lester Leong (lester-leong) on AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II · 2024-10-15T14:15:13.305Z · LW · GW

Thanks for the feedback. It would be great to learn more about your agenda and see if there are any areas where we may be able to help each other.

Comment by Lester Leong (lester-leong) on AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II · 2024-10-15T14:09:53.822Z · LW · GW

You're absolutely correct. I've reached out to the original framework authors to confirm. I will be creating a PR for their repo as well as for the one that I've forked. I suspect this won't change much about overall win/loss rates, but will be running a few tests here to confirm.

Comment by Lester Leong (lester-leong) on AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II · 2024-10-15T14:06:18.012Z · LW · GW

Thanks for the excellent feedback. I did consider action ratio at first, but it does have some slightly different considerations that made it a little challenging to do for an initial pass. The first is based on current limitations with the TextSC2 framework - there isn't a way to obtain detailed action logs for the in-game AI the same way we can for our agents, so it would require an "agent vs agent" setup instead of "agent vs in-game AI".  And while TextSC2 supports this, it currently does not allow for real-time play when doing it (probably because "agent vs agent" setups require running two instances of the game at once on the same machine, which would cause performance degradation, and there isn't any netcode to run multiplayer games over a network). With that said, SC2 is a 15 year old game at this point, and if someone has state-of-the-art hardware, it should be possible to run both instances with at least 40-50 fps, so this is something I would like to work on improving within the framework.

The second consideration is that not all actions are temporally equivalent, with some actions taking longer than others, and so it may not be a true "apples to apples" comparison if both agents employ different strategies that utilize different mixes of actions. We would probably either have to weight each action differently, or increase sample size to smooth out the noise, or both.

Regarding horizon lengths and scaling, I agree that this would be a great next direction of exploration, and suspect you may be correct regarding irreducible loss here. More broadly, it would be great to establish scaling laws that apply across different adversarial environments (beyond SC2). I think this could have a significant impact on a lot of the discourse around AI risk.

 +

Comment by Lester Leong (lester-leong) on Slowed ASI - a possible technical strategy for alignment · 2024-06-17T23:31:17.664Z · LW · GW

I do concede that in theory it should be possible to do this on standard computers, and that it might actually be a good way to test this hypothesis out and gather empirical data today.

Where I'm not so sure is whether even "slow" standard computers think slower than humans do. In other words, imagine some future AI architecture that is OOM more power and compute efficient. It may not be entirely unreasonable to suppose that we could get an algorithmic/architectural innovation that would enable GPT-4 level performance on old legacy hardware (eg, from the 80s). Indeed, at the unit level, we have calculators from the 60s that can out-multiply the fastest mental mathematicians today.

I thought of using virtualization/simulation to emulate a slow sandbox environment for the AI to run in, but as you mention, it does lower the barrier for an "escape",  creating risk that might be unacceptable when dealing with ASI.

With that said, being able to simulate this on current hardware does suggest a possible way to test this whole hypothesis out today. For example, we could have two different models: Model A, with higher parameter count and ELO score running on a slower environment, and Model B, with less parameters/lower ELO score running on a faster environment, and see if Model B is able to sufficiently supervise Model A on some task that would be difficult (or impossible) if both were running at the same speed.