Posts
Comments
Thank you for an interesting post! I have only skimmed it so far and not really dug in to the mathematics section but the way you are framing logic somewhat reminds me of Dewey, J. (1938). Logic: The Theory of Inquiry. Henry Holt and Company, INC.
Are you by any chance familiar with this work and could elaborate on possible continuities and discontinuities?
Thanks for the reply. I wanted to get at something slightly different, though.
I think that a key insight of traditions that work with "judgmentless/reactionless noticing" is that we humans tend to be "obsessive" problem solvers that are prone to getting tangled up in their own attempts at problem solving. Sometimes trying to solve problems can actually become the problem. On some level, I appreciate that your techniques may actually help to guard against this but on another level I wonder if this may be bought at the price of becoming boxed into a restrictive problem solving mindset that is unable to notice its own limitations.
Just throwing this out there and wondering what reactions this turns up.
Thank you for an interesting post. I noticed some confusion while reading it and thought it might be worthwhile to share. When I think of "noticing", I think of meditation and cultivating awareness. One of the key aspects in those traditions is that they advocate for the value of avoiding automated reactions to experience by "simply" noticing it. Your approach to noticing seems to advocate the opposite of this, training automated reactions triggered by noticing. How do you think about the relationship between these different perspectives? Can it inform us about potential failure modes that your approach might hold?
- We train an LLM to be an expert on AI design and wisdom. We might do this by feeding it AI research papers and "wisdom texts", like principled arguments about wise behavior and stories of people behaving wisely, over and above those base models already have access to, and then fine tuning to prioritize giving wise responses.
- We simultaneously train some AI safety researchers to be wiser.
- Our wise AI safety researchers use this LLM as an assistant to help them think through how to design a superintelligent AI that would embody the wisdom necessary to be safe.
- Iterate as necessary, using wisdom and understanding developed with the use of less wise AI to train more wise AI.
First, I wanted to dismiss this as not addressing the problem at all but on second thought, I think a key insight here may be that adding a focus on improving the wisdom of relevant parties involved in AI development could help to bootstrap more trustworthy "alignment verification" capacities.
However, I am not sure that something like this would fly in our economically oriented societies since I would expect that wiser people would decline to develop super-intelligent AI for the foreseeable future and rather urge us to look inward as the space to look for solutions to most of our problems (almost all of our problems are man-made after all). Having said this, if we were to get a regime in place that could reliably ensure that "wisdom" plays a key role in decision making around AI development, this seems like good a bet as any to help us deal with our predicament.
If we can train AI to be wise, it would imply an ability to automate training, because if we can train a wise AI, then in theory that AI could train other AIs to be wise in the same way wise humans are able to train other humans to be wise. We would only need to train a single wise AI in such a scheme who could pass on wisdom to other AIs.
I think this is way too optimistic. Having trained a wise person or AI once does not mean that we have fully understood what we have done to get there, which limits our ability to reproduce it. One can maybe make the argument that in the context of fully reproducible AI training pipelines recreation may be possible or that a wise AI could be copied but we shouldn't simply assume this. The world is super complex and always in motion. Nothing is permanent. What has worked in one context may not always work in an other context. Agents which were considered wise at some point may not be at another or agents which have actually been wise in hindsight may not be recognized as such at the time.
In addition, producing one wise AI does not necessarily imply that this wise AI can effectively pass on wisdom at the required scale. It may have a better chance than non-wise AIs but we shouldn't take success as a given, if all we have managed is to produce one wise AI. There are many forces at play here that could subvert or overcome such efforts, in particular in race situations.
My gut feeling is that transmission of wisdom is somewhat of a coordination game that depends on enclaves of relatively wise minds cross checking, challenging, and supporting each other (i.e., Thich Nhat Hanh's “the next Buddha will be a Sangha”). Following this line of logic, the unit of analysis should be the collective or even ecology of minds and practices rather than the "single" wise AI. I acknowledge that this is more of an epistemic rather than ontological distinction (e.g., one could also think of a complex mind as a collective as in IFS) but I think it's key to unpack the structure of wisdom and how it comes about rather than thinking of it as "simply" a nebulous trait that can and needs to be copied.
To be honest, I am pretty confused by your argument and I tried to express one of those confusions with my reply. I think you probably also got what I wanted to express but chose to ignore the content in favor of patronizing me. As I don't want to continue to go down this road, here is a more elaborate comment that explains where I am coming from:
First, you again make a sweeping claim that you do not really justify: "Many (perhaps most) famous "highly recognized" philosophical arguments are nonsensical". What is your ground for this claim? Do you mean that it is self-evident that much (perhaps most) of philosophy is bullshit? Or do you have a more nuanced understanding of nonsensical? Are you referring to Wittgenstein here?
Then you position this unjustified claim as a general prior to justify that your own position in a particular situation is much more likely to be valid than the alternative. Doesn't that seem a little bit like cherry picking to you?
My critique of the post and your comments boils down to the fact that both are very quick to dismiss other positions as nonsensical and by doing so claim their own perspective/position to be superior. This is problematic because although certain positions may seem nonsensical to you, they may make perfect sense from another angle. While this problem cannot be solved in principle, in practice it calls for investing at least some effort and resources into recognizing potentially interesting/valid perspectives and, in particular, staying open minded to the recognition that one may not have consider all relevant aspects and to reorient accordingly. I will list a couple of resources that you can check out if you are interested in a more elaborate argument on this matter.
* Stegmaier, W. (2019). What Is Orientation? A Philosophical Investigation. De Gruyter.
* Ulrich, W. (2000). Reflective Practice in the Civil Society: The contribution of critically systemic thinking. Reflective Practice, 1(2), 247–268. https://doi.org/10.1080/713693151
* Ulrich, W., & Reynolds, M. (2010). Critical Systems Heuristics. In M. Reynolds & S. Holwell (Eds.), Systems Approaches to Managing Change: A Practical Guide (pp. 243–292). Springer London. https://doi.org/10.1007/978-1-84882-809-4_6
Since a lot of arguments on internet forums are nonsensical, the fact that your comment doesn’t makes sense to me, means that it is far more likely that it doesn’t make sense at all than it is that I am missing something.
That’s pretty ironic.
I downvoted this post because the whole set up is straw manning Rawls work. To claim that a highly recognized philosophical treatment of justice that has inspired countless discussions and professional philosophers doesn’t “make any sense” is an extraordinary claim that should ideally be backed by a detailed argument and evidence. However, to me the post seems handwavey and more like armchair philosophizing than detailed engagement. Don’t get me wrong, feel free to do that but please make clear that this is what you are doing.
Regarding your claim that the veil of ignorance doesn’t map to decision making in reality, that’s obvious. But that’s also not the point of this thought experiment. It’s about how to approach the ideal of justice and not how to ultimately implement it in our non-ideal world. One can debate the merits of talking and thinking about ideals but calling it “senseless” without some deeper engagement seems pretty harsh.
Hey Kenneth,
thanks for sharing your thoughts. I don't have much to say about the specifics of your post because I find it somewhat difficult to understand how exactly you want an AI (what kind of AI?) to internalize ethical reflection and what benefit the concept of the ideal speech situation (ISS) has here.
What I do know is that the ISS has often been characterized as an "unpractical" concept that cannot be put into practice because the ideal it seeks simply cannot be realized (e.g., Ulrich, 1987, 2003). This may be something to consider or dive deeper into to see if this affects your proposal. I personally like the work of Werner Ulrich on this matter, which has heavily inspired my phd thesis on a related topic. I put one of the papers from the thesis in the reference section. Feel free to reach out via PM if you want to discuss this further.
References
Herwix, A. (2023). Threading the Needle in the Digital Age: Four Paradigmatic Challenges for Responsible Design Science Research. SocArXiv. https://doi.org/10.31235/osf.io/xd423
Ulrich, W. (1987). Critical heuristics of social systems design. European Journal of Operational Research, 31(3), 276–283.
Ulrich, W. (1994). Can We Secure Future-Responsive Management Through Systems Thinking and Design? Interfaces, 24(4), 26–37. https://doi.org/10.1287/inte.24.4.26
Ulrich, W. (2003). Beyond methodology choice: Critical systems thinking as critically systemic discourse. Journal of the Operational Research Society, 54(4), 325–342. https://doi.org/10.1057/palgrave.jors.2601518
Ulrich, W. (2007). Philosophy for professionals: Towards critical pragmatism. Journal of the Operational Research Society, 58(8), 1109–1113. https://doi.org/10.1057/palgrave.jors.2602336
I see your point regarding different results depending on order of how people see the post but that’s also true the other way around. Given the assumption that less people are likely to view a post that has negative Karma, people who may actually turn out to like the post and upvote it never do so because of preexisting negative votes.
In fact, I think that’s the whole point of this scheme, isn’t it?
So, either way you never capture an „accurate“ picture because the signal itself is distorting the outcome. The key question is then what outcome one prefers, neither is objectively „right“ or in all respects „better“.
I personally think that downvoting into negative karma is an unproductive practice, in particular with new posts because it stifles debate about potentially interesting topics. If you are bothered enough to downvote there should often be something to the post that is controversial.
Take this post as an example. When I found it a couple of hours after posting, it was already downvoted into negative karma but there is no obvious reason why this should be so. It’s well written and makes a clear point that‘s worth discussing as exemplified by our engagement. Because it’s negative karma, however fewer people are likely to weight in to the debate because the signal is telling them to not bother engaging with this.
In general my suggestion would be to only downvote into negative karma if you can be bothered to explain and defend your downvote in a comment and are willing to take it back if the author if the author of the post gives a reasonable reply.
But as I said, this is just one way of looking at this. I value discourse and critical debate as essential pieces to sense and meaning making and believe that I made a reasonable argument for how this is stifled by current practice.
Thanks to the author of the post for his thoughtful invite for critical reflection!
I think this is a very contextual question that really depends on the design of the mechanisms involved. For example, if we are talking about high risk use cases the military could be involved as part of the regulatory regime. It’s really a question of how you set this up, the possible design space is huge if we look at this with an open mind. This is why I am advocating for engaging more deeply with the options we have here.
I just wanted to highlight that there also seems to be an opportunity to combine the best traits of open and closed source licensing models in the form of a new regulatory regime that one could call: regulated source.
I tried to start a discussion about this possibility but so far the take up has been limited. I think that’s a shame, there seems to be so much that could be gained by “outside the box” thinking on this issue since the alternatives both seem pretty bleak.
That seems to downplay the fact that we will never be able to internalize all externalities simply because we cannot reliably anticipate all of them. So you are always playing catch up to some degree.
Also simply declaring an issue “generally” resolved when the current state of the world demonstrates it’s actually not resolved seems premature in my book. Breaking out of established paradigms is generally the best way to make rapid progress on vexing issues. Why would you want to close the door to this?
I ask myself the same question. I recently posted an idea about AI regulation to address such issues and start a conversation but there was almost no reaction and mostly just pushback. See: https://www.lesswrong.com/posts/8xN5KYB9xAgSSi494/against-the-open-source-closed-source-dichotomy-regulated
My take is that many people here are very worried about AI doom and think that for-profit work is necessary to get the best minds working on the issue. It also seems that Governments in general are perceived to be incompetent so the fear is more regulation will screw things up rather than make them better.
Needless to say, I think this is a false dichotomy and we should consider how we (as a society involving diverse actors and positions in transparent process) can develop regulation that actually creates a playing field where the best minds can responsibly work on societal and AI alignment. It’s difficult of course but the better option when compared to letting things develop as is. The last couple of years have demonstrated clearly enough that this will not work out. Let’s not just bury the head in the sand and hope for the best.
Thanks for engaging with the post and acknowledging that regulation may be a possibility we should consider and not reject out of hand.
I don't share your optimistic view that transnational agencies such as the IAEA will be all that effective. The history of the nuclear arms race is that those countries that could develop weapons did, leading to extremes such as the Tsar Bomba, a 50-megaton monster that was more of a dick-waving demonstration than a real weapon. The only thing that ended the unstable MAD doctrine was the internal collapse of the Soviet Union. So, while countries have agreed to allow limited inspection of their nuclear facilities and stockpiles, it's nothing like the level of complete sharing that you envision in your description.
My position is actually not that optimistic. I don't believe that such transnational agencies are very likely to work or a safe bet to ensure a good future, it is more that it seems to be in our best interest to really consider all of the options that we can put on the table, try to learn from what has more or less worked in the past but also look for creative new approaches and solutions because the alternative is dystopia or catastrophe.
A key difference between AI and nuclear weapons is that the AI labs are not as sovereign as nation states. If the US, UK, and EU were to impose strong regulation on their companies and "force them to cooperate" similar to what I outlined, this would seem (at least theoretically) possible and already a big win to me. For example, more resources could be allocated to alignment work compared to capabilities work. China seems much more concerned about regulation and control of companies anyway so I see a chance that they would follow suit in approaching AI carefully.
However, it seems likely that the major commercial players will fight tooth and nail to avoid that situation, and you'll have to figure out how to apply similar restrictions worldwide.
To be honest, it's overdue that we find the guts to face up to them and put them in their place. Of course that's easier said than done but the first step is to not be intimidated before we even tried. Similarly, the call for worldwide regulations often seems to me to be a case of "don't let the perfect be the enemy of the good". Of course, worldwide regulations would be desirable but if we only get US, UK, and EU or even the US or EU alone to make some moves here, we would be in a far better position. It's a bogeyman that companies will simply turn around and set up shop in the Bahamas to pursue AGI development because they would not be able to a) secure the necessary compute to run development and b) sell their products in the largest markets. We do have some leverage here.
So, I think this is an excellent discussion to have, but I'm not convinced that the regulated source model you describe is workable.
Thanks for acknowledging the issue that I am pointing to here. I see the regulated source model mostly as a general outline of a class of potential solutions some of which could be workable and others not. Getting to specifics that are workable is certainly the hard part. For me, the important step was to start discussing them more openly to build more momentum for the people who are interested in taking such ideas forward. If more of us would start to openly acknowledge and advocate that there should be room for discussing stronger regulation our position would already be somewhat improved.
Alright, it seems to me like the crux between our positions is that you are unwilling or unable to consider whether new institutions could create an environment that is more conducive to technical AI alignment work because you feel that this is a hopeless endeavor. Societies (in your view that seems to be just government) are simply worse at creating new institutions compared to the alternative of letting DeepMind do its thing. Moreover, you don't seem to acknowledge that it is worthwhile to consider how to avoid the dystopian failure mode because the catastrophic failure mode is simply much larger and more devastating.
If this a reasonable rough summary of your views then I continue to stand my ground because I don't think it's all that reasonable and well founded as you make it out to be.
First, as I tried to explain in various comments now, there is no inherent need to put only government in charge of regulation but you still seem to cling to that notion. I also never said that government should do the technical work. This whole proposal is clearly about regulating use cases for AI and the idea that it may be interesting to consider if mandating the sharing of source code and other knowledge assets could help to alleviate race dynamics and create an environment where companies like DeepMind and OpenAI can actually focus on doing what's necessary to figure out technical alignment issues. You seem to think that this proposal would want to cut them out of the picture... No, it would simply aim to shift their incentives so that they become more aligned with the rest of humanity.
If your argument is that they won't work on technical alignment if they are not the only "owners" of what they come up with and have a crazy upside in terms of unregulated profit, maybe we should consider whether they are the right people for the job? I mean, much of your own argument rests on the assumption that they will be willing to share at some point. Why should we let them decide what they are willing to share rather than come together to figure this stuff out before the game is done? Do you think this would be so much distraction to their work that just contemplating regulation is a dangerous move? That seems unreasonably naive and short sighted. In the worst case (i.e., all expert organizations defecting from work on alignment), with a powerful regulating body/regime in place, we could still hire individual people as part of a more transparent "Manhattan project" and simply take more time to role out more advanced AI capabilities.
Second, somewhat ironically, you are kind of making the case for some aspects of the proposal when you say:
Race dynamics are mitigated by AI companies joining the leader in the AI space, which is currently DeepMind. OpenAI agrees with "merge and assist" as a late-game strategy. Recent competition among AI firms, primarily in LLMs, is largely sparked by OpenAI (see Claude, Bard, Gemini). DeepMind appeared content to release few products in the absence of substantial competition.
The whole point of the proposal is to argue for figuring out how to merge efforts into a regulated environment. We should not need to trust OpenAI that they will do the right thing when the right time comes. There will be many different opinions what the right thing and when the right time is. Just letting for profit companies merge as they see fit is almost predictably a bad idea and bound to be slow if our current institutions tasked with overseeing merger and acquisitions are involved in processes that they do not understand and have no experience with. Maybe it's just me but I would like to figure out how society can reasonably deal with such situations before the time comes. Trusting in the competence of DeepMind to figure out those issues seems naive. As you highlighted for-profit companies are good at technical work where incentives are aligned but much less trustworthy when confronted with the challenge of having to figure out reasonable institutions that can control them (e.g., regulatory capture is a thing).
Third, your last statement is confusing to me because I do believe that I asked a sensible question.
Google obviously has no need to sell anything to anyone if they control the world. This sentence is not a logical argument, it is rhetoric.
Do you mean that the winner of the race will be able to use advanced nano manufacturing and other technologies to simply shape the universe in their image and, thus, not require currencies anymore because coordination of resources is not needed? I would contest this idea as not at all obvious because accounting seems to be a pretty fundamental ingredient in supposedly rational decision making (which we kind of expect an superintelligence to implement). Or do you want to imply that they would simply decide against keeping other people around? I think that would qualify as "dystopian" in my book and, hence, kind of support my argument. Thus, I don't really understand why you seem to claim that my (admittedly poignantly formulated) suggestion that the business model of the winner of the race is likely to shape the future significantly is not worthwhile discussing?
Anyhow, I acknowledge that much of your argument rests on the hope and belief that Google DeepMind et al. are (the only) actors that can be trusted to solve technical alignment in the current regulatory environment and that they will do the right things once they are there. To me that seems more like wishful thinking rather than a well-grounded argument but I also know the situation where I have strong intuitions and gut feelings about what the best course of action may be, so I sympathize with your position to some degree.
My intent is simply to raise some awareness that there are other alternatives beyond the dichotomy between open source and closed source which we can come up with that may help us to create a regulatory environment that is more conducive to realizing our common goals. More than hope is possible if we put our minds and efforts to it.
So, I concede that the proposal is pretty vague and general and that this may make it difficult to get the gist of it but I think it's still pretty clear that the idea is broader than nationalizing. I refer specifically to the possible involvement of intergovernmental, professional, or civil society organizations in the regulating body. With regards to profit, the degree to which profit is allowed for could be regulated for each use case separately with some (maybe the more benign) use cases being more tailored to profit seeking companies than others.
Nevertheless, I agree that for a in-depth discussion of pros and cons more details regarding a possible governance structure would be desirable. That's the whole point of the post, we should start thinking about what governance structures we would actually want to have in place rather than assuming that it must be "closed source" or "open source". I don't have the answer but I advocate for engaging with an important question.
I completely disagree with the sentiment of the rest of your comment that "hands on regulation" is dead in the water because government is incompetent and that hoping for Google DeepMind or OpenAI to do the right things is the best way forward.
First, as I already highlighted above, nothing in this approach says that government alone must be the ones calling the shots. It may not be easy but it also seems entirely possible to come up with new and creative institutions that are able to regulate AI similar to how we are regulating companies, cars, aviation, or nuclear technology. Each of the existing institutions may have faults but we can learn from them, experiment with new ones (e.g., citizen assembly) and continuously improve (e.g., see the work on digital democracy in Taiwan). We must if we want to avoid both failure modes outlined in the post.
Second, I am surprised by the sanguine tone with regards to trusting for profit institutions. What is your argument that this will work out ok for all of us in the end? Even if we don't die due to corner cutting because of race dynamics, how do we ensure that we don't end up in a dystopia where Google controls the world and sells us to the highest bidder? I mean that's their actual business model after all, right?
I think my intuition would be the opposite... The more room for profit, the more incentives for race dynamics and irresponsible gold rushing. Why would you think it's the other way around?
I think it could be possible to introduce more stringent security measures. We can generally keep important private keys from being leaked so if we treat weights carefully, we should be able to have at least a similar track record. We can also forbid the unregulated use of such software similar to the unregulated use of nuclear technology. Also in the limit, the problem still exists in a closed source world.
Llama is a special case because there are no societal incentives against it spreading… the opposite is the case! Because it was “proprietary”, it’s the magic secret sauce that everyone wants to stay afloat and in the race. In such an environment it’s clear that leaking or selling out is just a matter of time. I am trying to advocate a paradigm shift where we turn work on AI into a regulated industry shaped for the benefit of all rather than driven by profit maximization.
This is not necessarily true because resources and source code would be shared between all actors who pass the bar so to speak. So capabilities should be diffused more widely between actors who have demonstrated competence than in a closed source model. It would be a problem if the bars were too high and tailored to suit only the needs of very few companies. But the ideal would be strong competition because the required standards are appropriate and well-measured with the regulating body investing resources into the development and support of new responsible players to broaden the base. However, mechanisms for phasing out players that are not up to the task anymore need to be found. All of this would aim to take much of the financial incentives out of the game so as to avoid race dynamics. Only organizations who want to do right by the people are incentivized, profit maximization is taken of the table.
I agree that the regulating body is in a powerful position but there would seem to be ways to set it up in such a way that this handled. There could be a mix of, e.g., governmental, civil society, and academic representatives that cross-check each other. There could also be public scrutiny and the ability to sue in front of international courts. It’s not easy, I concede that but to me this really does seem like an alternative worthwhile discussing.
Just to let you know that this overall framing is pretty common in sustainable development contexts. It’s often called blue and green infrastructure. See for example: https://iucn.org/news/europe/201911/building-resilience-green-and-blue-infrastructure
However, I think those people would be more focused on „giving nature space and letting it do it’s thing“ rather than trying to upgrade nature. Given our track record, I would tend to agree with them. Let’s not put the cart in front of the horse and think that we can effectively design ecological ecosystems just yet.
False premise. You seem to be assuming that many people using symbols reliably in similar ways points to anything other than this convention being reliably useful in achieving some broadly desired end. It doesn't.
Your mathematics example is also misleading because it directs attention to "mathematical truths" which are generally only considered to be valid statements within the framework of mathematics and, thus, inherently relative to a particular framework and not "absolute".
As soon as you move to "real life" cases you are faced with the question of how to frame a situation in the first place (also see the "frame problem" in AI research). There is no "absolute" answer to this. Maybe a little bit tongue in check but ask yourself: Why is this website called "Less Wrong" and not "absolute truth"?
If you are looking to educate yourself have a look at the following resources. I found them quite insightful.
On philosophy:
Dewey, J. (1938). Logic: The Theory of Inquiry. Henry Holt and Company, INC.
Ulrich, W. (2006). Critical Pragmatism: A New Approach to Professional and Business Ethics. In Interdisciplinary Yearbook for Business Ethics. V. 1, v. 1,. Peter Lang Pub Inc.
On the frame problem:
Vervaeke, J., Lillicrap, T. P., & Richards, B. A. (2012). Relevance Realization and the Emerging Framework in Cognitive Science. Journal of Logic and Computation, 22(1), 79–99. https://doi.org/10.1093/logcom/exp067
Andersen, B. P., Miller, M., & Vervaeke, J. (2022). Predictive processing and relevance realization: Exploring convergent solutions to the frame problem. Phenomenology and the Cognitive Sciences. https://doi.org/10.1007/s11097-022-09850-6
False premise. There is no “absolute truth”. I don’t want to come across as condescending but please have a look at any somewhat recent science textbook if you doubt this claim.
I would suggest reframing to: how can we establish common ground that a) all/most people can agree on and b) facilities productive inquiry.
Hey Will,
looking forward to the rest of the series! Would be awesome if you could comment on the following development: https://joshmitteldorf.scienceblog.com/2020/05/11/age-reduction-breakthrough/
Is this just hype or how should one make sense of this?