Posts
Comments
Nice! I actually had this as a loose idea in the back of my mind for a while, to have a network of people connected like this and have them signal to each other their track of the day, which could be actual fun. It is a feasible use case as well. The underlying reasoning is also that (at least for me) I would be more open to adopt an idea from a person with whom you feel a shared sense of collectivity, instead of an algorithm that thinks it knows me. Intrinsically, I want such an algorithm to be wrong, for the sake of my own autonomy :)
The way I see it, the relevance for alignment is to ask: what do we actually mean when saying that two intelligent agents are aligned? Are you and I aligned if we would make the same decision in a trolley problem? Or if we motivate our decisions in the same way? Or if we just don't kill each other? None of these are meaningful indicators of two people being aligned, let alone humans and AI. And with unreliable indicators, will we ever succeed in solving the issue? I'd say two agents are aligned when one agent's most rewarding decision results in a benefit of the other as well. Generalizing and scaling that alignment to many situations and many agents/people necessitates a 'theory of mind' mechanism, as well as a way to keep certain properties invariant under scaling and translation in complex networks. This is really a physicist's way of thinking about the problem and I am just slowly getting into the language that others in the AI/alignment fields use.
Although I somewhat agree with the comment about style, I feel that the point you're making could be received with some more enthusiasm. How well-recognized is this trolley problem fallacy? The way I see it, the energy spent on thinking about the trolley problem in isolation illustrates innate human short-sightedness and perhaps a clear limit of human intelligence as well. 'Correctly' solving one trolley problem does not prevent that you or someone else will be confronted with the next. My line of arguing is that the question of ethical decision making requires an agent to also have a proper 'theory of mind': if I am making this decision, what decision will a next person or agent have to deal with? If my car with four passengers chooses to avoid running over five people to just hit one, could it also put another oncoming car in the position where they have to choose between a collision with 8 people and evading and killing 5? And of course: whose decisions resulted in the trolley problem I'm currently facing and what is their responsibility? I recently contributed a piece that is essentially about propagating consequences of decisions and I'm curious how it will be received. Could it be that this is a bit of a blind spot in ethics and/or AI safety? Given the situations we've gotten ourselves in as a society, I feel this also is an area in which humans can very easily be outsmarted...