Research proposal: Leveraging Jungian archetypes to create values-based models

post by MiguelDev (whitehatStoic) · 2023-03-05T17:39:27.126Z · LW · GW · 2 comments

Contents

  Abstract
None
2 comments

This project is the origins of the Archetypal Transfer Learning (ATL) method  [? · GW]

 

This is the abstract of my research proposal submitted to AI Alignment Awards. I am publishing this here for community feedback. You can find the link to the whole research paper here.


Abstract

We are entering a decade of singularity and great uncertainty. Across all disciplines, including wars, politics, human health, as well as the environment, there are concepts that could prove to be a double edged sword. Perhaps the most powerful factor in determining our future is how information is distributed to the public. It can be both transformational and empowering using advanced AI technology – or it can lead to disastrous outcomes that we may not have the foresight to predict with our current capabilities.

Goal misgeneralization is defined as a robustness failure for learning algorithms in which the learned program competently pursues an undesired goal that leads to good performance in training situations but bad performance in novel test situations. This research proposal tries to capture what might be a better description of this problem and solutions from a Jungian perspective.

This proposal covered key AI alignment topics, from goal misgeneralisation to other pressing issues. It offers a comprehensive approach for addressing critical questions in the field.

These above-mentioned topics were reviewed to check the viability of approaching the alignment problem through a Jungian approach. 3 key concepts emerged from the review:

A list of initial methodologies were added to present an overview of how the research will proceed once approved.

 

In conclusion, alignment research should look into the possibility of replacing goals and rewards in evaluating AI systems. By understanding that humans think consciously and subconsciously through Jungian archetypal patterns, this paper proposes  that complete narratives should be leveraged in training and deploying AI models. 
 

A number of limitations were included in the last section. The main concern is the need to hire Jungian scholars or analytical psychologists - as they will define what constitutes archetypal data and evaluate results. They will also be required to influence the whole research process with a high moral ground and diligence. They will be difficult to find.
 

AI systems will impact our future significantly, so it is important that they are developed responsibly. History has taught us what can happen when intentions are poorly executed: the deaths of millions through the use of wrong ideologies  haunt us and remind us of the need for caution in this field. 


 

2 comments

Comments sorted by top scores.

comment by MadHatter · 2023-03-03T21:43:59.917Z · LW(p) · GW(p)

I did a quick skim of the full paper that you linked to. In my opinion, this project is maybe a bad idea in principle. (Like trying to build a bridge out of jello - are Jungian archetypes too squishy and malleable to build a safety critical system out of?) But it definitely lacks quick sanity checks and a fail-fast attitude that would benefit literally any alignment project. The sooner any idea makes contact with reality, the more likely it is to either die gracefully, wasting little time, or to evolve into something that is worthwhile. 

Replies from: whitehatStoic
comment by MiguelDev (whitehatStoic) · 2023-03-03T23:17:21.039Z · LW(p) · GW(p)

The proposal is trying to point out a key difference in the way alignment reasearch and Carl Jung understood pattern recognition in humans. 

I stated as one of the limitations of the paper that:

"The author focused on the quality of argument rather than quantity of citations, providing examples or testing. Once approved for research, this proposal will be further tested and be updated."

I am recommending here a research area that I honestly believe that can have a massive impact in aligning humans and AI.