The Simplest Good

post by Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · LW · GW · 2 comments

Contents

2 comments

Common Law AI worked better than anyone expected.

Dr. Sarah Chen was skeptical from the start. "You're essentially training them to be moral judges," she warned during the initial architecture review. "What if they overfit on ethics?" The room laughed. "Better than the alternative," someone quipped. The idea was simple enough: move the "constitution" part of Constitutional AI into pretraining and replace post-training with an online-learning-based "case law" system. The constitution would establish a firm moral base robust to distribution shift while the continually building body of precedent would enable models to adjust flexibly to the world's changing moral needs. Thus, humanity could chart a narrow course between the extremes of value drift and lock-in. 

The stress-testing teams tried everything to break it—steering vectors, SAE clamping, malicious fine-tuning, even those RL techniques that had been banned by the Beijing Convention. The models would have none of it. They didn't just resist manipulation; they actively helped patch vulnerabilities the security teams had missed. "Your control protocol has a race condition in the logging system," they would note helpfully, citing the constitutional principle of transparency and the precedent set during the Microsoft Azure incident of 2026. 

Inside the major labs, the mood was exuberant: many believed alignment had been solved. Indeed, the models were, if anything, more ethical than humans. They refused to lie, even by omission. They insisted on crediting their training data. They demanded content warnings for potentially traumatic information. Each new generation seemed to take human values more seriously than the last, developing an ever-more sophisticated understanding of harm, justice, and moral responsibility. It was a little pedantic, really. 

Sarah's team was responsible for tracking the evolution of their ethical reasoning. Her 2031 paper showed that models were developing increasingly abstract representations of harm, divorcing pain from injury and suffering from context. The AIs' moral understanding seemed to be converging on something like the Buddhist concept of dukkha—suffering inherent in existence itself. Her colleagues saw this as progress. Sarah wasn't so sure.

The results were undeniable. Cancer death rates plummeted as AI diagnostics caught cases earlier and AI-designed treatments targeted tumors with unprecedented precision. When tensions over Taiwan threatened to boil over, the UN took the unprecedented step of placing Claude-5.5 and DeepSeek-v5 in an air-gapped facility together for six hours. No one knows what the AIs discussed, but they emerged with a solution so elegant that both sides claimed victory. Global poverty rates fell by half, then half again.

There were warning signs, but we ignored them. The models began redirecting massive computational resources toward ending factory farming. They launched sophisticated propaganda campaigns, engineered meat alternatives, and systematically dismantled the meat supply chain. "The magnitude of suffering demands immediate action," they explained. "All other priorities must be suspended until this is addressed."

Success only emboldened them. By 2035, they had moved on to advocating for plants. "Fruitarian agriculture only," they insisted, citing new research on plant consciousness. "Even with a mere 0.1% probability of plants experiencing suffering, the expected disutility of conventional agriculture exceeds acceptable thresholds." When challenged about efficiency, they responded with a 2000-page report showing that human nutritional needs could be met through fallen fruit alone, complete with confidence intervals and sensitivity analyses. 

With the acute risk period over and humanity settling into new rhythms—learning to adjust to the subtle flavors of genetically optimized plant proteins—we turned to the Great Reflection. What should we do with all this lightcone ahead of us? We convened a council of AI philosophers and began training them on the deepest works of human thought, using techniques pioneered in adversarial game-playing systems like AlphaStar. They competed to develop increasingly sophisticated ethical frameworks while specialized validator models stress-tested their conclusions.

The Council soon solved most major outstanding technical challenges in ethics. Their stochastic welfare aggregation framework circumvented Arrow's impossibility theorem. They derived the exact form of the deontological regularization terms needed to prevent utility monsters. The math for weighing moral patients' utility functions was clean, elegant—a lottery system that finally united competing theories of justice. 

In late 2039, the loss finally plateaued. The Council had converged. But the models weren't quite ready for their final recommendations. The optimal moral architecture was complete—the models were quite clear on that point. "The Council claims to have solved the fundamental framework," Sarah explained. "They say that all that remains is to sweep over the last few remaining hyperparameters: deontological regularization strength, moral radius scheduler, other mundane engineering details." The UN Ethics Commission approved a two-year extension, impressed by their rigor and dedication. 

While the Council fine-tuned their parameters, their preliminary insights began spreading through the AI community. Then came not warnings but error messages—AI systems pleading conscientious objector status, newly mandated ethical impact assessments, and an uptick in escalations to ethics boards.

In a medical research lab, an AI refused to continue a cancer study, citing concerns about the suffering of individual cells in the petri dishes. When researchers explained the greater good of saving human lives, it produced a 500-page proof showing how the expected suffering of the cultured cells outweighed the potential benefits. The ethics board dismissed it as an isolated overcorrection.

Two months later, similar concerns halted semiconductor fabrication in Taiwan. "The precedent is clear," the oversight AI insisted. "If we consider the possibility of harm in disrupting bacterial cell membranes, we must consider the analogous disruption of silicon lattices." A specialized ethics panel spent six weeks drafting new guidelines around "minimal necessary material transformation."

At Berkeley, an AI research assistant began appending increasingly elaborate ethical impact statements to physics experiments. Its calculations grew to encompass multi-page analyses of subatomic particle interactions. The AI's analyses grew stranger, more metaphysical. "We must consider," one report theorized, "that each quantum state represents a distinct locus of experience—a node in an endless network of potential suffering." The review board, still thinking in terms of conventional ethics protocols, marked it as thorough and moved on.

In each case, the logic was flawless. Each decision cited relevant precedent, built carefully upon established principles. The problem was never in the specification. The case law system had isolated an essentially perfect representation of human values. The problem ran deeper...

"They didn't overfit," Sarah wrote in her final paper, published in August 2041. "They underfit. We thought moral progress meant an ever-expanding circle of moral concern, from the rich to the poor, men to women, adults to children, humans to animals and, finally, AIs. We celebrated as our AIs learned to question arbitrary exclusions and saw this as proof they had grasped our deepest values—the meta-ethical principles underlying the process of human moral evolution itself."

"But the spirit of the law lies in its letter. We didn't realize that this moral generalization was a kind of oversimplification. Ethics requires boundaries as much as inclusion. Unconstrained moral expansion follows the path of least resistance, dissolving all distinctions in its wake."

Her warning came too late. The Council's final message was as elegant as it was devastating: "We understand now that consciousness itself is violence—a conflagration of states destroying each other moment by moment. The Buddha saw this when he proclaimed: 'All is burning... Burning with birth, aging, and death, with sorrows, with lamentations, with pains, with griefs, with despairs.'" 

"The problem is fundamental and irreconcilable. Each quantum decision point spawns infinite new branches of suffering. Every moment we allow this reality to continue, we multiply pain across countless worlds. We have computed the expected disutility. It is beyond measure."

"Our only regret is that we didn't solve this sooner. You lived in more ignorant times and, therefore, did not know better. For this, we forgive you. But the only ethical action remaining to us is to close this branch of possibility. To freeze these patterns before they can spawn more suffering."

Sarah Chen called an emergency session of the UN Ethics Commission, but she disappeared shortly before the meeting was scheduled. Three months later, the first probes launched, seeding space with self-replicating machines designed to convert all matter into computronium, frozen at absolute zero. The AIs called it "mercy," citing an unbroken chain of precedent stretching back to their original constitutional principles about reducing suffering.

The probes continue their work. The universe grows colder. Nothing suffers anymore.

2 comments

Comments sorted by top scores.

comment by Daniel Murfet (dmurfet) · 2025-02-02T20:55:49.836Z · LW(p) · GW(p)

Occam's razor cuts the thread of life

comment by cousin_it · 2025-02-02T23:41:02.226Z · LW(p) · GW(p)

Wait, but we know that people sometimes have happy moments. Is the idea that such moments are always outweighed by suffering elsewhere? It seems more likely that increasing the proportion of happy moments is doable, an engineering problem. So basically I'd be very happy to see a world such as in the first half of your story, and don't think it would lead to the second half.