The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook

post by zhaoweizhang (Zhaowei Zhang) · 2024-07-15T18:56:08.108Z · LW · GW · 0 comments

Contents

  What is a socio-technical system?
  Research Status and Issues in the Field of AI Alignment from the Perspective of STS
      In my opinion, this issue cannot be resolved in one go. Instead, it is a process that starts from macro policy analysis to micro computational problems, then relies on the assurance of computational problems, layer by layer upwards, serving the top-down-top process of macro analysis.
  The three-layer paradigm of a Top-Down-Top outlook
    First Level: Macroscale Research
    Second Layer: Scenario-Level Research
    Third Layer: Interactive Level Research
    Below the third level: Capacity-level research
  Conclusion and Outlook
None
No comments

Backward Alignment is an indispensable part of the AI Alignment process, and the alignment issues from the Socio-Technical Systems (STS) perspective are an important component of it. For the current AI alignment field, "Socio-Technical" seems to have become an active keyword, appearing more or less in many recent works on AI safety and alignment. So, what exactly are STS, and what do they signify? In fact, STS is a very broad concept with many considerations, but so far, there has been little work that clearly unifies these issues in one go, often being glossed over in various materials. Additionally, different articles discuss STS using this grand term at different scales, or use different terms to define it at the same scale, which also makes it difficult for researchers to understand the field. This article will, from my personal perspective, clearly explain the AI alignment issues present in STS from a computable perspective at different scales, as well as possible research approaches.

What is a socio-technical system?

STS is a complex organizational work design method that recognizes the interaction between people and technology in real environments, and it requires consideration of the impact of technology deployment on the complex organizational structures of the original environment. It sounds complicated, but we can actually go back to the origin of the issue to understand it better. The term STS can be traced back to a case study by Eric Trist and Ken Bamforth in 1951, where the authors observed that improvements in coal mining technology did not lead to an increase in coal mine output. They further analyzed this case and found that the enhancement of technology caused dynamic changes in the original mining management system (such as increased worker absenteeism), which led to a new perspective based on the open system theory: in the process of developing and deploying technical subsystems, "people" are not dispensable; they form complex social subsystems across time and space based on different interests and goals. The involvement of technical subsystems requires dynamic consideration of their impact on social subsystems, including social and economic factors.

Complex AI systems are themselves STS, as each production component, including datasets, training, deployment, and usage, involves the intervention of different human participants. The management and alignment of AI systems cannot be resolved simply by controlling datasets or adjusting training algorithms. This is especially true given that current methods heavily rely on traditional AI testing benchmarks, which require the design of corresponding STS and the coordination of the interests of multiple participants.

Thus, we focus on the AI alignment field. In a nutshell, the (fundamental) topic of AI alignment research from the perspective of STS should be: Designing computational algorithms to study how AI systems can be deployed with human intentions and values in an organizational system that already has its own social structures (perhaps outdated).

Research Status and Issues in the Field of AI Alignment from the Perspective of STS

It is not difficult to see that this field clearly requires a very interdisciplinary approach, including but not limited to social sciences, management, operations research, organizational development, software engineering, philosophy, cognitive science, economics, psychology, game theory, and law, in addition to AI and computer science. For this reason, researching in this field is not only relatively difficult, requiring exploration into different areas, but also tends to make the concepts in this field overly vague and abstract, and even exaggerate AI's capabilities.

I believe there are two reasons for this problem: First, the issue described by the term STS itself is very broad, but existing work uses this same term regardless of the scale. However, as readers, we do not entirely think that it discusses a single issue. For example: (1) All issues related to LLMs are socio-technical problems that require the cooperation of different stakeholders to solve; (2) This restaurant is a good example of a complex socio-technical system where people interact with technology and infrastructure and follow processes to achieve goals within the organizational cultural context; (3) Whether AI systems can work according to our intentions depends not only on the technology itself but also on the users. It can be seen that these descriptions at completely different scales can all be described using the same term STS, just as different subfields of AI can all be called AI research, making readers feel like blind men touching an elephant, quite confused.

The second issue is the lack of calculable means to truly ensure that each segment of analysis is substantiated. This involves a certain degree of subjectivity. However, many existing works indeed focus only on analyzing the social subsystems within STS during the analysis process, emphasizing the definition of the subsequent harms of deploying AI systems to human society, designing policies to mitigate these harms and risks, and providing sufficient perspectives for technical personnel to avoid these issues. However, there is a significant gap in how to step-by-step ground these issues in a calculable manner. Weidinger et al. provided a safety assessment method for the deployment of generative AI systems from the STS perspective, mentioning that we can evaluate from capability assessment, interaction assessment, and system impact assessment based on prediction and simulation. Nevertheless, there are still significant limitations. I believe (1) the concept of interaction remains very complex, and we need to treat different levels of interaction differently; (2) it is difficult to analyze the externalities of AI systems before understanding their specific quantitative goals, and we can only vaguely simulate the subsequent processes.

In my opinion, this issue cannot be resolved in one go. Instead, it is a process that starts from macro policy analysis to micro computational problems, then relies on the assurance of computational problems, layer by layer upwards, serving the top-down-top process of macro analysis.

The three-layer paradigm of a Top-Down-Top outlook

In this section, we will provide a detailed introduction layer by layer and present the relevant research at the current stage.

First Level: Macroscale Research

At this level, I believe the issues being discussed are at the macro level, analyzing the design, manufacture, deployment of AI systems, and their subsequent impact on the organizational structure of human society, as well as the counter-effects on AI systems. To some extent, I believe the issues at this level are also the closest to the content that the (root) problems described in the first section hope to solve. Therefore, many studies at this level use terms like "sociotechnical" (though obviously many do not) to describe them, which I think is quite reasonable. In fact, most of the STS (Science, Technology, and Society) discussions by relatively prominent AI alignment teams internationally also belong to this level, including the malicious use of AI systems, their impact on the economic environment, the trustworthiness of AI systems, model auditing, and the formulation of AI governance policies. There is a lot of related work in this area, so I won't go into specifics here.

The value alignment problem, I believe, should largely be considered at this level. This includes questions such as whose values to align with, conflicts of values, and changes in values over time and space. Additionally, I think some existing research has analyzed the issue of value convergence between humans and AI, that is, human values will change with the intervention of AI systems, eventually reaching some form of convergence. This type of research should also be included within the scope of this level.

In my view, democratic AI should also belong to this level of research. This research mainly explores how to design AI alignment goals that are recognized by the majority of people, and it can also be said to balance the demands of stakeholders at various stages. It primarily involves introducing social choice theory methods to transform AI alignment goals into goals recognized by the majority. Although due to the limitations of existing methods, such research can only analyze specific scenarios, I believe this issue should belong to the first level. Koster et al. used reinforcement learning methods to learn economic mechanisms that satisfy the majority of people. Noorman et al. approached from the perspective of STS to help policymakers, project managers, innovators, and technical experts evaluate and develop methods for the democratization of artificial intelligence. Fish et al. combined voting mechanisms with the powerful generative capabilities of generative AI to find the consensus goal of 100 people on the issue of "what should be emphasized in the personalization of chatbots." Mishra et al. discussed the possibility of using social choice theory to construct democratic artificial intelligence based on RLHF technology.

At this level, in addition to the above research, I believe that dynamic consistency in economics is an important tool for studying this issue. In economics, it often happens that the announced economic policies do not achieve the expected benefits. This is due to incorrect estimates of policy premises or the dynamics brought about by policy implementation. Using models from this field to analyze the first layer of the problem seems like a very natural idea. There are also many philosophical studies involved, such as the veil of ignorance, meaningful human control, and planetary computation. Some works using terms like AI ethics and complex systems are also related to this issue to some extent.

Second Layer: Scenario-Level Research

From the first layer down, we turn from macroscopic research content to microscopic, beginning to analyze problems targeting specific scenarios, tasks, and needs. This is much easier to understand compared to the first layer, but relatively, knowledge from different fields also easily aggregates at this layer and intervenes from different disciplinary perspectives. This has also led to the emergence of a multitude of sub-research methods using different terminologies at this layer.

First and foremost are studies related to the use of the term STS, which focus on how to design the most appropriate STS for specific needs. McKay et al. used a fictional pizza restaurant as a case study to compare the impact of the introduction of self-service ordering software on the coordination among various teams within the restaurant. They employed the Vee framework of software engineering to conduct requirements analysis and design test cases for the scenario, thereby achieving the design of a socio-technical system. Liao et al. believed that narrowing the socio-technical gap requires achieving two goals: clarifying the human needs in specific tasks and setting appropriate evaluations for these needs. I think this approach is almost indistinguishable from the methods used in software engineering. Zhang et al. employed the mechanism design method, based on clearly defined objectives of rational AI systems, to design incentive-compatible environmental rules according to human needs in different contexts. For example, in a cake-cutting task for two people, both AIs might want to eat the most cake, but the human goal is for them to cut it fairly. At this point, we only need to design a mechanism such as "the one who cuts chooses last", and we can make the rational AI systems achieve human goals while maximizing their own benefits.

In addition, I believe that the work on Cooperative AI, which partially uses mechanism/incentive design, also describes problems at this level. Such problems often start from a game-theoretic perspective, exploring how AI can effectively cooperate with other AIs and humans, as well as methods for resolving conflicts (such as the prisoner's dilemma). Therefore, these problems also involve terms reused in game theory, such as institution, rule, mechanism, and environment, which in many scenarios refer to mechanisms.

Besides, some work in the field of human-computer interaction design and the design of moral environments in reinforcement learning also, in my opinion, reflect this level to a certain extent.

At this level, I believe that software engineering methods, especially requirements analysis and software testing, are of significant importance under this issue. The concept of agile development models is also very helpful in constantly aligning with user needs. Additionally, in the process of analyzing the interaction of subjects, game theory is evidently an important theoretical tool. However, there are also some issues relative to this: (1) We cannot fully consider the social environment and can only conduct analysis within a certain closed scenario, which sometimes makes the problem somewhat toy-like; (2) AI systems are different from humans, especially in specific interactions, it is difficult to model AI as humans. Therefore, the premise of this step is that we must clarify why all these models can be analyzed using a given utility function, which requires calibration at the third level to ensure strict guarantees.

Third Layer: Interactive Level Research

The third layer, I believe, is the level of interaction research. In fact, parts of the third layer and the second layer can be defined as the content that needs to be evaluated in interaction evaluation as mentioned in. However, in our discussion, this layer is more technical and fundamental, in order to establish a clear foundation and boundary assurance for all discussions at the upper layers.

In this layer, we assume that the AI system already possesses some alignment capabilities in the lower layers, such as "reaching a destination via the shortest path" or the powerful language generation ability of LLMs. However, these capabilities cannot adapt to changing human needs. In this situation, the core task is how to set up the AI system according to human needs (without training the AI system itself to increase its alignment capability), so that the model can work in a manner consistent with human intentions. This is the precondition for everything. Therefore, I think it can also be called calibrating the machine or correctly setting the machine according to human intentions.

Here is a simple example: you bought a newly released robotic vacuum cleaner that has many functions, such as energy-saving cleaning, quick cleaning, powerful cleaning, and so on. However, you can only have an imagined picture of the desired state in your mind. Although this latest robotic vacuum cleaner has so many functions, we cannot guarantee that our settings will make it work according to our true intentions. In this process, many computable problems can obviously arise, such as the degree of alignment, the time required for alignment, etc., which can all be clearly defined from a mathematical perspective, thereby laying a solid foundation for higher-level scenarios and macro analysis. Especially how to define the role positioning of AI in interactions, including its differences from humans, its specific modeling, etc.

At this level, I believe the most crucial key is to be able to step-by-step clarify the theoretical discussions through the interactions at the lowest level, while continuously optimizing these interaction processes. To solve this problem, I think model calibration, AI persuasion, human-computer interaction design (HCI, HAIID), and even traditional machine learning methods are all worth considering.

Below the third level: Capacity-level research

From the third layer down, I believe these are purely technical issues that need to be resolved. AI models must first have the ability to align before discussions about alignment can take place. Technical issues and socio-technical issues are not completely separate fields; there are many intersections between them. In fact, research on STS inevitably needs to consider many technical components. Of course, feedback from the top layer can also be gradually implemented into the purely technical parts; this is not a completely static process.

Conclusion and Outlook

From the perspective of STS, the alignment of AI has given rise to multiple subfields, but for most, it has not yet fully been transformed into a scientific issue. For instance, questions like how much alignment is considered satisfactory, the complexity of the alignment process, and the patterns that can be used for analysis are still unanswered. Analyzing higher-level concerns such as AI ethics, the social impact of AI, and externalities before these computable aspects are refined and clarified would be too abstract and vague, leading to significant deviations. This is why many researchers focused on enhancing AI capabilities are "dismissive" of paying attention to this field (especially aspects concerning AI "values", "intent", "honesty", etc.), and some even regard it as mere empty talk, which is disheartening. Transforming this field into a mature scientific issue still requires researchers to dedicate more effort to further exploration.

 

This blog is translated from my Chinese blog.

0 comments

Comments sorted by top scores.