Perhaps the first detailed plan to program a general moral code into a particular model of AGI

lukeprog

Perhaps the first detailed plan to program a general moral code into a particular model of AGI

post by lukeprog · 2011-03-06T06:41:25.119Z · LW · GW · Legacy · 2 comments

Wallach, Franklin, & Allen, "A Conceptual and Computational Model of Moral Decision Making in Human and Artificial Agents."

Abstract:

Recently, there has been a resurgence of interest in general, comprehensive models of human cognition. Such models aim to explain higher-order cognitive faculties, such as deliberation and planning. Given a computational representation, the validity of these models can be tested in computer simulations such as software agents or embodied robots. The push to implement computational models of this kind has created the field of artificial general intelligence (AGI). Moral decision making is arguably one of the most challenging tasks for computational approaches to higher-order cognition. The need for increasingly autonomous artificial agents to factor moral considerations into their choices and actions has given rise to another new field of inquiry variously known as Machine Morality, Machine Ethics, Roboethics, or Friendly AI. In this study, we discuss how LIDA, an AGI model of human cognition, can be adapted to model both affective and rational features of moral decision making. Using the LIDA model, we will demonstrate how moral decisions can be made in many domains using the same mechanisms that enable general decision making. Comprehensive models of human cognition typically aim for compatibility with recent research in the cognitive and neural sciences. Global workspace theory, proposed by the neuropsychologist Bernard Baars (1988), is a highly regarded model of human cognition that is currently being computationally instantiated in several software implementations. LIDA (Franklin, Baars, Ramamurthy, & Ventura, 2005) is one such computational implementation. LIDA is both a set of computational tools and an underlying model of human cognition, which provides mechanisms that are capable of explaining how an agent’s selection of its next action arises from bottom-up collection of sensory data and top-down processes for making sense of its current situation. We will describe how the LIDA model helps integrate emotions into the human decision-making process, and we will elucidate a process whereby an agent can work through an ethical problem to reach a solution that takes account of ethically relevant factors.

I suspect this is of much interest to many Less Wrong readers.

PDF.

2 comments

Comments sorted by top scores.

comment by XiXiDu · 2011-03-06T15:10:26.762Z · LW(p) · GW(p)

This made me curious what else Google Scholer would turn up and there are actually quite a few papers mentioning friendly AI and even the SIAI...

No one has the slightest notion of how to program innate human friendliness into an artificial intelligence that may, over time, grow to be billions of times smarter than the smartest human being. But it is certainly an approach worth pursuing. An alternative approach is outlined in the next section.

Culturual Evolution in a Cosmic Context

An Alternative Approach: Memetic Engineering With Cultural Attractors

The approach of the Singularity Institute can be characterized as a bottom-up strategy for constructing Friendly AI. The basic idea is to build a set of algorithms into an AI’s source code that will cause that particular AI never to desire to turn against its human progenitors and to refrain from any action that would harm human beings. This approach is similar in principle to inserting into the deep structure of an AI’s source code a set of Isaac Asimov’s fictional laws of robotics.

An alternative approach may be to design a set of cultural attractors that could conceivably perturb the developmental direction of the future cultural environment in which strong AI will emerge in such a way as to encourage the prolongation of human-friendly sensibilities and outcomes. This top-down strategy can be characterized as an exercise in what I have previously called a possible future scientific discipline of memetic engineering...

Most are behind a paywall. Just search for 'Friendly AI' on Google Scholar.

Replies from: lukeprog

↑ comment by lukeprog · 2011-03-06T16:09:28.617Z · LW(p) · GW(p)

I've gotten hundreds of papers by searching for other key terms on Google scholar, for example "machine ethics", "machine morality", "artificial morality", etc. 'Machine ethics' seems to be the term that is winning.