Your AI Safety focus is downstream of your AGI timeline

post by Michael Flood (michael-flood) · 2025-01-17T21:24:11.913Z · LW · GW · 0 comments

Contents

No comments

Cross-posted from Substack

Feeling intellectually understimulated, I've begun working my way through Max Lamparth's CS120 - Introduction to AI Safety. I'm going to use this Substack as a kind of open journaling practice to record my observations on the ideas presented, both in the lectures and in the readings.

The reading for the first week is Dobbe, Gilbert, and Mintz's 2021 paper "Hard Choices in AI Safety." The paper makes the argument that AI Safety and Governance need to be situated in a 'sociotechnical' milieu - that there is a lot to be learned from sociological studies of complex technology and systems, and that this should flesh out and expand the narrow (from the authors' POV) focus on technical solutions that they attribute to much contemporary AI Safety research. Essentially, they advocate for thinking about AI Safety and Governance as political, social issues as opposed to narrowly technical, mathematical ones.

The authors do not at any point engage with ideas about Artificial General Intelligence or the idea of creating generally capable AI agents and how transformative this could be. Rather, their examples are couched in terms of autonomous vehicles, content curation systems, and algorithms perpetuating bias and inequalities - call it Everyday AI. Essentially, they treat AI as equivalent to other transformative technologies like electricity or the Internet, rather than something unprecedented. They imagine AI deployment as the selection and adaptation of a system for a given purpose - a police department adopts a predictive algorithm for patrol routes, a school system purchases a personalized learning curriculum - and then think about how to engage with and gather feedback from the communities affected while mitigating power imbalances between developers, users, and other stakeholders affected by the system's use. They're not thinking of ubiquitous, powerful, autonomous AI agents available anytime and anywhere there is an internet connection, pursuing autonomous goals.

This was the thing that struck me most about the essay: how much your timeline affects your focus. If you think the rollout of AI follows a gradualist trajectory with linear growth, then it makes sense to focus on adapting governance, norms, and policies, and ensuring there are robust mechanisms for different groups - especially marginalized ones - to participate. The rollout of autonomous vehicles is the classic case for AI technology here: regulations and the need for iterative experimentation kept the technology largely confined to a few metropolitan areas, and there has been a gradual expansion with demonstrated safety to new communities. There is a lot of initial hype, but the capabilities are not there to meet the hype yet, so the public focus decreases. There are community consultations and regulatory reviews, demonstrated safety and clear statistics on accident rates versus human drivers, different political groups get the chance to weigh in, and autonomous vehicles are rolled out gradually.

If, on the other hand, you believe timelines are going to be short, that AI technology will not only be adopted on a Kurzweilian, classic exponential curve, but that the capabilities of those systems will improve exponentially as well, then you believe that the solutions need to be frontloaded: we need to get AI alignment right at the start, because if there's a mistake it could be catastrophic. The classic example here is smartphones and social media, and the distortions in culture, society, and human cognition created by content curation algorithms. Compared to autonomous vehicles, all of that happened very fast, from the introduction of the iPhone to concerns about the mental health of teens inside of ten years.

Compare designing a failsafe system for a nuclear power plant cooling system versus changing zoning laws or educational curricula. The focus (technical versus procedural) is different, as is the risk profile (existential versus local). The nuclear power plants fail safes need to work from the day the plant is built, whereas zoning changes or educational curricula can be reversed - not without cost, but still changeable as the community's needs change.

So, there are two competing priorities: the prevention of worst-case scenarios (e.g., existential risks) and the mitigation of everyday harms (e.g., algorithmic bias in hiring systems). Proponents of focusing on existential risks argue that the stakes are simply too high to ignore—a misaligned AGI could result in catastrophic outcomes for humanity. For them, the precautionary principle dictates that frontloading research and governance to address these risks is essential, even if it diverts resources from addressing smaller-scale, immediate issues.

On the other hand, critics of an exclusive x-risk focus emphasize that prioritizing hypothetical future catastrophes can lead to neglect of the very real and present harms caused by current AI systems. Algorithmic bias, data privacy violations, and the perpetuation of inequality are urgent problems that disproportionately affect marginalized groups, requiring (proponents say) immediate action and iterative solutions.

Does this have to be a binary choice? It depends on how one judges current AI progress, and how one judges the trajectory we're on. 

It would be nice to think that we could balance the two competing priorities: prevention of the worst-case scenarios (e.g. existential risks) and the mitigation of everyday harms (e.g. algorithmic bias in hiring systems). But we are in a resource constrained situation: the major AI labs - and soon national governments - are going all in on capability growth, and AI Safety research is underfunded. Unless there is a huge infusion of funding, personnel, and political focus into AI Safety, the safer horn of the dilemma seems to focus on existential risk and trying to get things right from the very start.

At the end of my reading, I come back around to Ethan Mollick's perspective, based on my own experiences with AI technologies: progress might hit a wall, advancements might stop, but given how unprepared for the changes AI systems are making and are going to make in the immediate future, I think it is prudent to listen to the warnings about near-term AGI, and to plan for it to the fullest extent possible.

Lamparth, the course instructor, says he believes that both research paths can be reconciled, and that one can solve both issues - near-term and long-term - by conducting the same research. I’ll be interested to see how he develops the idea and whether I come to agree with him.

0 comments

Comments sorted by top scores.