ai

·

Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence [? · GW].

Basic Alignment Theory

AIXI [? · GW]
Coherent Extrapolated Volition [? · GW]
Complexity of Value [? · GW]
Corrigibility [? · GW]
Deceptive Alignment [? · GW]
Decision Theory [? · GW]
Embedded Agency [? · GW]
Fixed Point Theorems [? · GW]
Goodhart's Law [? · GW]
Goal-Directedness [? · GW]
Gradient Hacking [? · GW]
Infra-Bayesianism [? · GW]
Inner Alignment [? · GW]
Instrumental Convergence [? · GW]
Intelligence Explosion [? · GW]
Logical Induction [? · GW]
Logical Uncertainty [? · GW]
Mesa-Optimization [? · GW]
Multipolar Scenarios [? · GW]
Myopia [? · GW]
Newcomb's Problem [? · GW]
Optimization [? · GW]
Orthogonality Thesis [? · GW]
Outer Alignment [? · GW]
Paperclip Maximizer [? · GW]
Power Seeking (AI) [? · GW]
Recursive Self-Improvement [? · GW]
Simulator Theory [? · GW]
Sharp Left Turn [? · GW]
Solomonoff Induction [? · GW]
Superintelligence [? · GW]
Symbol Grounding [? · GW]
Transformative AI [? · GW]
Treacherous Turn [? · GW]
Utility Functions [? · GW]
Whole Brain Emulation [? · GW]

Engineering Alignment

Agent Foundations [? · GW]
AI-assisted Alignment  [? · GW]
AI Boxing (Containment) [? · GW]
Conservatism (AI) [? · GW]
Debate (AI safety technique) [? · GW]
Eliciting Latent Knowledge (ELK) [? · GW]
Factored Cognition [? · GW]
Humans Consulting HCH [? · GW]
Impact Measures [? · GW]
Inverse Reinforcement Learning [? · GW]
Iterated Amplification [? · GW]
Mild Optimization [? · GW]
Oracle AI [? · GW]
Reward Functions [? · GW]
RLHF [? · GW]
Shard Theory [? · GW]
Tool AI [? · GW]
Transparency / Interpretability [? · GW]
Tripwire [? · GW]
Value Learning [? · GW]
 

Organizations

Full map here

AI Safety Camp [? · GW]
Alignment Research Center [? · GW]
Anthropic [? · GW]
Apart Research [? · GW]
AXRP [? · GW]
CHAI (UC Berkeley) [? · GW]
Conjecture (org) [? · GW]
DeepMind [? · GW]
FHI (Oxford) [? · GW]
Future of Life Institute [? · GW]
MIRI [? · GW]
OpenAI [? · GW]
Ought [? · GW]
SERI MATS [? · GW]

Strategy

AI Alignment Fieldbuilding  [? · GW]
AI Governance [? · GW]
AI Persuasion [? · GW]
AI Risk [? · GW]
AI Risk Concrete Stories [? · GW]
AI Safety Public Materials  [? · GW]
AI Services (CAIS) [? · GW]
AI Success Models  [? · GW]
AI Takeoff [? · GW]
AI Timelines [? · GW]
Computing Overhang [? · GW]
Regulation and AI Risk [? · GW]
Restrain AI Development [? · GW]

 Other

AI Alignment Intro Materials  [? · GW]
AI Capabilities [? · GW]
AI Questions Open Thread [? · GW]
Compute  [? · GW]
DALL-E [? · GW]
GPT [? · GW]
Language Models [? · GW]
Machine Learning [? · GW]
Narrow AI [? · GW]
Neuromorphic AI [? · GW]
Prompt Engineering [? · GW]
Reinforcement Learning [? · GW]
Research Agendas [? · GW]