Alignment's phlogiston
post by Eleni Angelou (ea-1) · 2022-08-18T22:27:31.093Z · LW · GW · 2 commentsContents
2 comments
Crossposted from the EA Forum: https://forum.effectivealtruism.org/posts/DCtqgsywCRakLvHn6/alignment-s-phlogiston [EA · GW]
Epistemic status: quick and dirty reaction to the claim that alignment research is like archaeology. I wouldn't trust anyone suggesting that scientific and technological revolutions can be simply broken down into a series of discoveries. But I like the phlogiston metaphor, nevertheless.
In this post [LW · GW], John Wentworth makes the case that alignment research is at a pre-paradigmatic stage. As he states, experts in the field share a "fundamental confusion" and there is no explicit consensus on the nature of the subject matter and the best ways to approach it. What is also characteristic of an immature science is the disunity [1]of frameworks that concerns the concepts, theories, agendas, practices, methodological tools, and other criteria for what qualifies as having high explanatory force. Such disunity seems to describe the current state of AI safety.
In this post [LW · GW], Adam Shimi compares alignment/AI safety to historical sciences such as archaeology. This can only mean that either alignment is not at a pre-paradigmatic state or that archaeology is not a mature science. However, archaeology doesn't suffer from the "fundamental confusion" of alignment; it might not be possible to employ the same observational tools researchers do in physics or chemistry, but archaeologists do have a shared view of how to study their subject. I very much doubt that the average archaeologist would go ahead and tell you that they're fundamentally confused about their field and how they approach the most important questions of their research agenda.
Looking back at the history of science, the field of AI safety seems to have more similarities with alchemy. The alchemists were people deeply confused about their methods and how likely they are to succeed. They all, however, shared a common aim summarized in this threefold: to find the Stone of Knowledge (The Philosophers' Stone), to discover the medium of Eternal Youth and Health, and to discover the transmutation of metals. Their "science" had the shape of a pre-paradigmatic field that would eventually transform into the natural science of chemistry. Their agenda ceased to be grounded upon mystical investigations as the science began to mature.
The claim here is not that alignment has in any sense the mystical substrate of alchemy. It shares the high uncertainty combined with attempts to work at the experimental/observational/empirical level that cannot be supported as in physical sciences/STEM. It shares the intention to find something that doesn't yet exist and when it does, it will make the human world substantially qualitatively different than it currently is.
It would be very helpful for the progress of alignment research to be able to trace what exactly happened when alchemy became chemistry. Is it the articulation of an equation? Is it the discovery and analysis of a substance like phlogiston? Then we'd need to find alignment's phlogiston and that would bring us closer to discovering alignment's oxygen.
- ^
This paper argues that unity isn't necessary for a science to qualify as mature.
2 comments
Comments sorted by top scores.
comment by Yaakov T (jazmt) · 2022-08-19T09:48:39.256Z · LW(p) · GW(p)
You might find the book Is Water H2O? by Hasok Chang, 2012 useful. It was mentioned by Adam Shimi in this post https://www.lesswrong.com/posts/wi3upQibefMcFs5to/levels-of-pluralism [LW · GW]
comment by Shmi (shminux) · 2022-08-19T06:41:22.616Z · LW(p) · GW(p)
What happened to alchemy is the Chemical revolution, which followed the earlier scientific revolution. The discovery of composition of air, oxidation and conservation of mass is what made chemistry a science.
If you compare it with the current state of Alignment research, there are several nebulous concepts that have no firm grounding in either theory or experiment, and it is quite likely that some essential concepts are not even formulated. To speculate wildly, maybe there is something like "scaling of intelligence" or "conservation of interpretability" or something else that can be put on a firm mathematical/CS footing.