LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Commitment Races problem
Daniel Kokotajlo (daniel-kokotajlo) · 2019-08-23T01:58:19.669Z · comments (56)

I'm leaving AI alignment – you better stay
rmoehn · 2020-03-12T05:58:37.523Z · comments (19)

Nate Soares' Life Advice
CatGoddess · 2022-08-23T02:46:43.369Z · comments (41)

Luna Lovegood and the Chamber of Secrets - Part 1
lsusr · 2020-11-26T03:02:19.617Z · comments (31)

The self-unalignment problem
Jan_Kulveit · 2023-04-14T12:10:12.151Z · comments (24)

Comment on "Endogenous Epistemic Factionalization"
Zack_M_Davis · 2020-05-20T18:04:53.857Z · comments (8)

[link] GPT-4
nz · 2023-03-14T17:02:02.276Z · comments (149)

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
Simon Lermen (dalasnoin) · 2023-10-12T19:58:02.119Z · comments (29)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (33)

[question] Where do (did?) stable, cooperative institutions come from?
AnnaSalamon · 2020-11-03T22:14:09.322Z · answers+comments (72)

Mental Mountains
Scott Alexander (Yvain) · 2019-11-27T05:30:02.107Z · comments (14)

AI x-risk, approximately ordered by embarrassment
Alex Lawsen (alex-lszn) · 2023-04-12T23:01:00.561Z · comments (7)

The Geometric Expectation
Scott Garrabrant · 2022-11-23T18:05:12.206Z · comments (21)

The Plan - 2023 Version
johnswentworth · 2023-12-29T23:34:19.651Z · comments (41)

[link] The bonds of family and community: Poverty and cruelty among Russian peasants in the late 19th century
jasoncrawford · 2021-11-28T17:22:23.136Z · comments (36)

My takes on SB-1047
leogao · 2024-09-09T18:38:37.799Z · comments (8)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (31)

[link] Using axis lines for good or evil
dynomight · 2024-03-06T14:47:10.989Z · comments (39)

2023 Survey Results
Screwtape · 2024-02-16T22:24:28.132Z · comments (26)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

DeepMind is hiring for the Scalable Alignment and Alignment Teams
Rohin Shah (rohinmshah) · 2022-05-13T12:17:13.157Z · comments (34)

[link] The First Sample Gives the Most Information
Mark Xu (mark-xu) · 2020-12-24T20:39:04.936Z · comments (16)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (22)

Understanding “Deep Double Descent”
evhub · 2019-12-06T00:00:10.180Z · comments (51)

Cryonics signup guide #1: Overview
mingyuan · 2021-01-06T00:25:02.927Z · comments (34)

The Apprentice Experiment
johnswentworth · 2021-06-10T03:29:27.257Z · comments (11)

[link] If you weren't such an idiot...
kave · 2024-03-02T00:01:37.314Z · comments (74)

How Long Can People Usefully Work?
lynettebye · 2020-04-03T21:35:11.525Z · comments (8)

The date of AI Takeover is not the day the AI takes over
Daniel Kokotajlo (daniel-kokotajlo) · 2020-10-22T10:41:09.242Z · comments (32)

Prizes for ELK proposals
paulfchristiano · 2022-01-03T20:23:25.867Z · comments (152)

Deep atheism and AI risk
Joe Carlsmith (joekc) · 2024-01-04T18:58:47.745Z · comments (22)

[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey (Lee_Sharkey) · 2022-12-13T15:41:48.685Z · comments (23)

Reply to Eliezer on Biological Anchors
HoldenKarnofsky · 2021-12-23T16:15:43.508Z · comments (46)

Shutting down AI is not enough. We need to destroy all technology.
Matthew Barnett (matthew-barnett) · 2023-04-01T21:03:24.448Z · comments (36)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

Alignment research exercises
Richard_Ngo (ricraz) · 2022-02-21T20:24:16.375Z · comments (17)

[link] Alcohol, health, and the ruthless logic of the Asian flush
dynomight · 2021-06-04T18:14:08.797Z · comments (45)

[link] Vernor Vinge, who coined the term "Technological Singularity", dies at 79
Kaj_Sotala · 2024-03-21T22:14:14.699Z · comments (25)

OpenAI Launches Superalignment Taskforce
Zvi · 2023-07-11T13:00:06.232Z · comments (40)

Advice for newly busy people
Severin T. Seehrich (sts) · 2023-05-11T16:46:15.313Z · comments (3)

Public-facing Censorship Is Safety Theater, Causing Reputational Damage
Yitz (yitz) · 2022-09-23T05:08:14.149Z · comments (42)

Comments on OpenAI's "Planning for AGI and beyond"
So8res · 2023-03-03T23:01:29.665Z · comments (2)

[link] Why has nuclear power been a flop?
jasoncrawford · 2021-04-16T16:49:15.789Z · comments (50)

There are no coherence theorems
Dan H (dan-hendrycks) · 2023-02-20T21:25:48.478Z · comments (128)

Finite Factored Sets
Scott Garrabrant · 2021-05-23T20:52:48.575Z · comments (95)

At 87, Pearl is still able to change his mind
rotatingpaguro · 2023-10-18T04:46:29.339Z · comments (15)

[link] Moral Reality Check (a short story)
jessicata (jessica.liu.taylor) · 2023-11-26T05:03:18.254Z · comments (45)

Remarks 1–18 on GPT (compressed)
Cleo Nardo (strawberry calm) · 2023-03-20T22:27:26.277Z · comments (35)

A Year of AI Increasing AI Progress
TW123 (ThomasWoodside) · 2022-12-30T02:09:39.458Z · comments (3)

← previous page (newer posts) · next page (older posts) →

^{^}

For example bandwidth, gps location, or something else. the exact limitations are out of scope, I mainly want to discuss being able to make limitations at all

^{^}

The question of "who should be the authority" is out of scope, I mainly want to enable having some authority at all. If you're interested in this, consider checking out Vitalik's suggestion too, search for "Strategy 2"

^{^}

Or perhaps this can be used for export controls or some other purpose

^{^}

I'm using "H100" to refer to the box that contains both the GPU chip and the CPU chip

^{^}

I'm using "getting hot means being under attack" as a toy example for a "paranoid" condition that someone might suggest building as one of the triggers for our anti-tampering mechanism. Other examples might include "the box shakes", "the power turns off for too long", and so on.

LessWrong 2.0 Reader

Archive

Recent comments