LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

[link] Apocalypse Prepping - Concise SHTF guide to prepare for AGI doomsday
prepper · 2023-07-04T17:41:41.401Z · comments (9)

Animal Weapons: Lessons for Humans in the Age of X-Risk
Damin Curtis (damin-curtis) · 2023-07-04T18:14:24.166Z · comments (0)

Six (and a half) intuitions for SVD
CallumMcDougall (TheMcDouglas) · 2023-07-04T19:23:19.688Z · comments (1)

Three camps in AI x-risk discussions: My personal very oversimplified overview
Aryeh Englander (alenglander) · 2023-07-04T20:42:12.829Z · comments (0)

[link] Dominant Assurance Contract Experiment #2: Berkeley House Dinners
Arjun Panickssery (arjun-panickssery) · 2023-07-05T00:13:15.255Z · comments (8)

"Reification"
herschel (hrs) · 2023-07-05T00:53:48.984Z · comments (4)

MXR Talkbox Cap?
jefftk (jkaufman) · 2023-07-05T01:50:01.059Z · comments (0)

Final Lightspeed Grants coworking/office hours before the application deadline
habryka (habryka4) · 2023-07-05T06:03:37.649Z · comments (2)

Puffer-pope reality check
Neil (neil-warren) · 2023-07-05T09:27:11.200Z · comments (2)

Optimized for Something other than Winning or: How Cricket Resists Moloch and Goodhart's Law
A.H. (AlfredHarwood) · 2023-07-05T12:33:07.166Z · comments (25)

Exploring Functional Decision Theory (FDT) and a modified version (ModFDT)
MiguelDev (whitehatStoic) · 2023-07-05T14:06:13.870Z · comments (11)

AISN #13: An interdisciplinary perspective on AI proxy failures, new competitors to ChatGPT, and prompting language models to misbehave
Dan H (dan-hendrycks) · 2023-07-05T15:33:19.699Z · comments (0)

[question] What did AI Safety’s specific funding of AGI R&D labs lead to?
Remmelt (remmelt-ellen) · 2023-07-05T15:51:27.286Z · answers+comments (0)

(tentatively) Found 600+ Monosemantic Features in a Small LM Using Sparse Autoencoders
Logan Riggs (elriggs) · 2023-07-05T16:49:43.822Z · comments (1)

The risk-reward tradeoff of interpretability research
JustinShovelain · 2023-07-05T17:05:36.923Z · comments (1)

An AGI kill switch with defined security properties
Peterpiper · 2023-07-05T17:40:05.299Z · comments (6)

[link] If you wish to make an apple pie, you must first become dictator of the universe
jasoncrawford · 2023-07-05T18:14:58.845Z · comments (9)

[link] [Linkpost] Introducing Superalignment
beren · 2023-07-05T18:23:18.419Z · comments (68)

Infra-Bayesian Logic
harfe · 2023-07-05T19:16:41.811Z · comments (2)

Announcing Manifund Regrants
Austin Chen (austin-chen) · 2023-07-05T19:42:08.978Z · comments (8)

AI Intermediation
jefftk (jkaufman) · 2023-07-06T01:50:01.852Z · comments (0)

Distillation: RL with KL penalties is better viewed as Bayesian inference
Nina Rimsky (NinaR) · 2023-07-06T03:33:40.753Z · comments (0)

Open Thread - July 2023
Ruby · 2023-07-06T04:50:06.735Z · comments (35)

Do you feel that AGI Alignment could be achieved in a Type 0 civilization?
Super AGI (super-agi) · 2023-07-06T04:52:57.819Z · comments (1)

AI #19: Hofstadter, Sutskever, Leike
Zvi · 2023-07-06T12:50:05.037Z · comments (16)

Agency begets agency
Richard_Ngo (ricraz) · 2023-07-06T13:08:44.318Z · comments (1)

Announcing the EA Archive
Aaron Bergman (aaronb50) · 2023-07-06T13:49:17.387Z · comments (2)

Understanding the two most common mental health problems in the world
spencerg · 2023-07-06T14:06:25.968Z · comments (0)

[link] A Defense of Work on Mathematical AI Safety
Davidmanheim · 2023-07-06T14:15:21.074Z · comments (13)

Towards Non-Panopticon AI Alignment
Logan Zoellner (logan-zoellner) · 2023-07-06T15:29:39.705Z · comments (0)

[link] Progress links and tweets, 2023-07-06: Terraformer Mark One, Israeli water management, & more
jasoncrawford · 2023-07-06T15:35:22.591Z · comments (4)

[link] Jesse Hoogland on Developmental Interpretability and Singular Learning Theory
Michaël Trazzi (mtrazzi) · 2023-07-06T15:46:00.116Z · comments (2)

Localizing goal misgeneralization in a maze-solving policy network
jan betley (jan-betley) · 2023-07-06T16:21:03.813Z · comments (2)

Layering and Technical Debt in the Global Wayfinding Model
herschel (hrs) · 2023-07-06T17:30:52.645Z · comments (0)

BOUNTY AVAILABLE: AI ethicists, what are your object-level arguments against AI notkilleveryoneism?
Peter Berggren (peter-berggren) · 2023-07-06T17:32:08.675Z · comments (6)

Does biology matter to consciousness?
Reed (ThomasReed) · 2023-07-06T17:38:04.353Z · comments (4)

Progress Studies Fellowship looking for members
jay ram (soycid) · 2023-07-06T17:41:19.125Z · comments (0)

Empirical Evidence Against "The Longest Training Run"
NickGabs · 2023-07-06T18:32:02.754Z · comments (0)

Two paths to win the AGI transition
Nathan Helm-Burger (nathan-helm-burger) · 2023-07-06T21:59:23.150Z · comments (8)

What are the best non-LW places to read on alignment progress?
Raemon · 2023-07-07T00:57:21.417Z · comments (14)

[link] Apparently, of the 195 Million the DoD allocated in University Research Funding Awards in 2022, more than half of them concerned AI or compute hardware research
mako yass (MakoYass) · 2023-07-07T01:20:20.079Z · comments (5)

ask me about technology
bhauth · 2023-07-07T02:03:37.642Z · comments (42)

[question] Can LessWrong provide me with something I find obviously highly useful to my own practical life?
agrippa · 2023-07-07T03:08:58.183Z · answers+comments (4)

Internal independent review for language model agent alignment
Seth Herd · 2023-07-07T06:54:11.552Z · comments (26)

Interpreting Modular Addition in MLPs
Bart Bussmann (Stuckwork) · 2023-07-07T09:22:51.940Z · comments (0)

[link] Passing the ideological Turing test? Arguments against existential risk from AI.
Nina Rimsky (NinaR) · 2023-07-07T10:38:59.829Z · comments (5)

Meetup Tip: Ask Attendees To Explain It
Screwtape · 2023-07-07T16:08:40.639Z · comments (0)

[link] Introducing bayescalc.io
Adele Lopez (adele-lopez-1) · 2023-07-07T16:11:12.854Z · comments (29)

Notes from the Qatar Center for Global Banking and Finance 3rd Annual Conference
PixelatedPenguin · 2023-07-07T23:48:24.560Z · comments (0)

← previous page (newer posts) · next page (older posts) →

^{^}

a more human-intuitive transcription may include wording like: "try to be the kind of program/function which would (in isolation from any particular worldstate/physics) output A when run."

i'm leaving this as a footnote because it can also confuse people, leading to questions like "what does it mean to 'try to be a kind of program' when it's already determined what kind of program it is?"

^{^}

this class of unaligned 'physical goals' is dangerous because if the system can't determine A, its best method to fulfill the goal is through instrumental convergence.

LessWrong 2.0 Reader

Archive

Recent comments

Platonic AI