Posts
Comments
In essence, this is saying that if the pace of progress is the product of two factors (experiment implementation time, and quality of experiment choice), then AI only needs to accelerate one factor in order to achieve an overall speedup. However, AI R&D involves a large number of heterogeneous activities, and overall progress is not simply the product of progress in each activity. Not all bottlenecks will be easily compensated for or worked around.
I agree with this.
I also think that there are some engineering/infrastructure challenges to executing training runs, that one would not necessarily cede to AI, not because it may not be desirable, but because it would involve a level of embodiment that is likely beyond the timeline proposed in the AI 2027 thesis. (I do agree with most of the thesis however).
I'm not sure there's a research basis (that I could find at least, though I am very open to correction on this point), for embodiment of AI systems (robotic bodies) being able to keep pace with algorithmic improvement.
While an AI system could likely design a new model architecture, and training architecture, it comes down to very human supply chain and technician speed that enables that physical training to be run at the scales required.
Further, there are hardware challenges to large training runs of AI systems, which may not be resolvable by an AI system as readily, due to lack of exposure to those kinds of physical issues in their inherent reasoning space. (They have never opened a server during a training run, and resolved an overheat issue for instance).
Some oft overlooked items involved in training, are based on the fact that the labs tend to not own their own data centers but rather rely on cloud providers. This means they have to contend with:
- Cluster allocation: Scheduling the time on thousands of GPUs across multiple cloud providers, and reserving time blocks, securing budget, etc. I can easily buy the concept of an AI system recursively self-improving on a baked in infrastructure, but the speed with which its human colleagues may be able to secure additional infrastructure for it may be challenging. I understand that in the article that the model has taken over the 'day to day' operations, but I'm not sure I characterize a significant training run as a 'day to day' activity. This scheduling goes beyond just 'calling some colos', and involves potentially running additional power and fiber to buildings, construction schedules, etc.
- Topology: Someone has to physically lay out the training network used. This goes beyond the networking per se, but also involves actually moving hardware around, building in redundancies in the data hall (extra PDUs, etc.), running networking cable, and putting mitigations in place for transients, etc. This all requires technicians, parts, hardware, etc. Lead times in some cases for those parts exceed the timeline to ASI proposed.
- Hardware/Firmware Validation: People physically have to check the server infrastructure, the hardware and the firmware, and ensure that all of the cards are up to date, etc. Moving at speed in AI, a lot of 'second hand', or 'relocated' servers and infrastructure tend to be used. It is not a small task to catalogue all of that and place it into a DCIM framework.
- Stress Testing: Running large power loads to check thermal limits, power draw, and inter-GPU comms. Parts fail here routinely, requiring replacement, etc.
- Power: Assuming that in the proposed timeline, compute remains linked to power, we are looking at a generational data center capability issue.
- The data centers under construction now, set to go on-line in the 2029/2030 timeline, will be the first to use Blackwell GPUs at scale.
- This then implies that to achieve the 2027 timeline, we'll be able to stretch Hoppers and existing power infrastructure to the point that these improvements emerge out of existing physical hardware.
- I do tend to agree that if we were unconstrained by power, and physical infrastructure, that algorithmically there is no reason at all to believe that we could not achieve ASI by 2027 - however the infrastructure challenges are absolutely enormous.
- Land with sufficient water, power (including natural gas lines for onsite power) isn't available. Utilities in the US are currently restricting power access to data centers, by leveraging significant power tariffs and long term take or pay commitments. (The AEP decision in Ohio for instance). This makes life harder for colocation providers in terms of financing and siting large infrastructure.
I believe that an AI system could well be match for the data cleaning and validation, and even the launch and orchestration using Slurm, Kubernetes or similar, but the initial launch phase is also something that I think will be slowed by the need for human hands.
This phase results in:
- Out of memory errors on GPUs, which can often only be resolved by 'turning a wrench' on the server.
- Unexpected Hardware failures (GPUs breaking, NVLinks breaking, Network timeouts, cabling issues, fiber optic degradation, power transcients, etc.) All of these require human technicians.
These errors are also insidious, because the software running the training can't tell the impact of these failures on which parts of the network is being trained, and which isn't. This would make it challenging for an AI director to really understand what was causing issues in the desired training outcome. This makes it unlikely that a runaway situation would take place where a model is just recursively self-improving on a rapid timeline without human input, unless it first cracked the design, and mass manufacture of embodied AI workers that could move and act as quickly as it can.
A good case study on this is the woes faced by OpenAI in training GPT-4.5, where all of this came to a head, taking a training run scheduled for a month or two, and stretching it over a year. OpenAI spoke very openly about this in a Youtube video they released.
What's more, at scale, if we are going to be relying on existing data centers for a model of this sophistication, we'd have the model split across multiple clusters, potentially in multiple locations. This causes latency issues, etc.
That's the part that to me is missing from the near-term timeline. I think the thesis around zones created just to build power and data centers, following ASI, seems very credible, especially with that level of infiltration of government/infrastructure.
I don't however see a way of getting to a model capable of ASI with current data center infrastructure, prior to the largest new campuses coming online, and power running to Blackwell GPUs.
I think history is a good teacher when it comes to AI in general, especially AI we did not (at least at the time of deployment, and perhaps now, do not) fully understand.
I too feel a temptation to imagine that a USG AGI would hypothetically have alignment with US ideals, and likewise a CCP AGI would align with CCP ideals.
That said, I struggle with, given our lack of robust knowledge of what alignment with any set of ideals would look like in an AGI system, and how we could assure them, having any certainty that these systems would align with anything the USG or CCP would find desirable at all. Progress is being made in this area by, Anthropic, but I'd need to see that move forward significantly.
One can look at current gen LLMs like DeepSeek and see that it is censored to align with CCP concepts during fine tuning, and perhaps see that as predictive. I find it doubtful that some fine tuning would be sufficient to serve as the moral backbone of an AI system that is capable of AGI.
Which speaks to history. AI systems tend to be very aligned with what their output task is. The largest and most mature networks we have are Deep Learning Recommendation Models deployed by social media entities to keep us glued to our phones.
The intention was to serve engaging content to people, the impact was to flood people with content that is emotionally resonant, but not necessarily accurate. That has arguably led to increased polarization, radicalization, and increased suicide rates, primarily in young women.
While it would be tempting to say that social media companies don't care, the reality is that these DLRMs are very difficult to align. They are trained using RL and the corpus of their interactions with billions of daily users. They reward hack incessantly, and in very unpredictable ways. This leads to most of the mitigating actions being taken downstream of the recommendations (content warnings, etc.) Not out of design, but out of the simple fact that the models that are the best at getting users to keep scrolling are seldom the best at serving accurate content.
Currently, I think both flavors of AGI present the same fundamental risks. No matter the architecture, one cannot expect human like values to emerge in AI systems inherently, and we don't understand the drivers of those values within humans particularly well and how they lead to party lines/party divisions.
Without that understanding, we're shooting in the dark. It would be awfully embarrassing if the systems both, instead of flag waving aligned on dolphin species propagation.