LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Intent alignment should not be the goal for AGI x-risk reduction
John Nay (john-nay) · 2022-10-26T01:24:21.650Z · comments (10)
Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?
StefanHex (Stefan42) · 2022-10-25T20:48:50.895Z · comments (2)
[link] A Walkthrough of A Mathematical Framework for Transformer Circuits
Neel Nanda (neel-nanda-1) · 2022-10-25T20:24:54.638Z · comments (7)
[link] Nothing.
rogersbacon · 2022-10-25T16:33:59.357Z · comments (4)
Maps and Blueprint; the Two Sides of the Alignment Equation
Nora_Ammann · 2022-10-25T16:29:40.202Z · comments (1)
Consider Applying to the Future Fellowship at MIT
jefftk (jkaufman) · 2022-10-25T15:40:03.839Z · comments (0)
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-10-25T15:13:56.484Z · comments (17)
What does it take to defend the world against out-of-control AGIs?
Steven Byrnes (steve2152) · 2022-10-25T14:47:41.970Z · comments (47)
Refine: what helped me write more?
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-10-25T14:44:14.813Z · comments (0)
[link] Logical Decision Theories: Our final failsafe?
Noosphere89 (sharmake-farah) · 2022-10-25T12:51:23.799Z · comments (8)
What will the scaled up GATO look like? (Updated with questions)
Amal (asta-vista) · 2022-10-25T12:44:39.184Z · comments (22)
Mechanism Design for AI Safety - Reading Group Curriculum
Rubi J. Hudson (Rubi) · 2022-10-25T03:54:20.777Z · comments (3)
Furry Rationalists & Effective Anthropomorphism both exist
agentydragon · 2022-10-25T03:37:57.213Z · comments (3)
EA & LW Forums Weekly Summary (17 - 23 Oct 22')
Zoe Williams (GreyArea) · 2022-10-25T02:57:43.696Z · comments (0)
Dance Weekends: Tests not Masks
jefftk (jkaufman) · 2022-10-25T02:10:04.171Z · comments (0)
[question] What is good Cyber Security Advice?
Gunnar_Zarncke · 2022-10-24T23:27:58.428Z · answers+comments (12)
Connections between Mind-Body Problem & Civilizations
oblivion · 2022-10-24T21:55:51.888Z · comments (1)
[question] Rationalism and money
[deleted] · 2022-10-24T21:22:11.505Z · answers+comments (2)
[question] Game semantics
[deleted] · 2022-10-24T21:22:11.272Z · answers+comments (2)
A Good Future (rough draft)
Michael Soareverix (michael-soareverix) · 2022-10-24T20:45:45.029Z · comments (5)
[link] A Barebones Guide to Mechanistic Interpretability Prerequisites
Neel Nanda (neel-nanda-1) · 2022-10-24T20:45:27.938Z · comments (12)
[link] POWERplay: An open-source toolchain to study AI power-seeking
Edouard Harris · 2022-10-24T20:03:57.560Z · comments (0)
Consider trying Vivek Hebbar's alignment exercises
Akash (akash-wasil) · 2022-10-24T19:46:40.847Z · comments (1)
[question] Education not meant for mass-consumption
Tolo · 2022-10-24T19:45:09.165Z · answers+comments (5)
Realizations in Regards to Masculinity
[deleted] · 2022-10-24T19:42:28.603Z · comments (2)
The Futility of Religion
[deleted] · 2022-10-24T19:42:28.520Z · comments (5)
The optimal timing of spending on AGI safety work; why we should probably be spending more now
Tristan Cook · 2022-10-24T17:42:05.865Z · comments (0)
[link] QACI: question-answer counterfactual intervals
Tamsin Leake (carado-1) · 2022-10-24T13:08:54.457Z · comments (0)
AGI in our lifetimes is wishful thinking
niknoble · 2022-10-24T11:53:11.809Z · comments (25)
[link] DeepMind on Stratego, an imperfect information game
sanxiyn · 2022-10-24T05:57:39.462Z · comments (9)
[question] TOMT: Post from 1-2 years ago talking about a paper on social networks
Simon Berens (sberens) · 2022-10-24T01:29:11.453Z · answers+comments (1)
[link] AI researchers announce NeuroAI agenda
Cameron Berg (cameron-berg) · 2022-10-24T00:14:46.574Z · comments (12)
Empowerment is (almost) All We Need
jacob_cannell · 2022-10-23T21:48:55.439Z · comments (44)
"Originality is nothing but judicious imitation" - Voltaire
Vestozia (damien-lasseur) · 2022-10-23T19:00:02.732Z · comments (0)
Mid-Peninsula ACX/LW Meetup [CANCELLED]
moshezadka · 2022-10-23T17:37:54.530Z · comments (0)
[link] I am a Memoryless System
NicholasKross · 2022-10-23T17:34:48.367Z · comments (2)
Accountability Buddies: Why you might want one.
Samuel Nellessen (samuel-nellessen) · 2022-10-23T16:25:12.568Z · comments (3)
How to get past Haidt's elephant and listen
Astynax · 2022-10-23T16:06:20.902Z · comments (4)
Writing Russian and Ukrainian words in Latin script
Viliam · 2022-10-23T15:25:41.855Z · comments (22)
[question] Have you noticed any ways that rationalists differ? [Brainstorming session]
tailcalled · 2022-10-23T11:32:13.368Z · answers+comments (22)
Mnestics
Jarred Filmer (4thWayWastrel) · 2022-10-23T00:30:11.159Z · comments (5)
Telic intuitions across the sciences
mrcbarbier · 2022-10-22T21:31:28.672Z · comments (0)
A basic lexicon of telic concepts
mrcbarbier · 2022-10-22T21:28:10.475Z · comments (0)
Do we have the right kind of math for roles, goals and meaning?
mrcbarbier · 2022-10-22T21:28:04.935Z · comments (5)
[question] The Last Year - is there an existing novel about the last year before AI doom?
Luca Petrolati · 2022-10-22T20:44:58.055Z · answers+comments (4)
The highest-probability outcome can be out of distribution
tailcalled · 2022-10-22T20:00:16.233Z · comments (5)
Newsletter for Alignment Research: The ML Safety Updates
Esben Kran (esben-kran) · 2022-10-22T16:17:18.208Z · comments (0)
Crypto loves impact markets: Notes from Schelling Point Bogotá
Rachel Shu (wearsshoes) · 2022-10-22T15:58:39.101Z · comments (2)
[question] When trying to define general intelligence is ability to achieve goals the best metric?
jmh · 2022-10-22T03:09:51.923Z · answers+comments (0)
[question] Simple question about corrigibility and values in AI.
jmh · 2022-10-22T02:59:15.950Z · answers+comments (1)
← previous page (newer posts) · next page (older posts) →