LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (18)
[link] Neuronpedia
Johnny Lin (hijohnnylin) · 2023-07-26T16:29:28.884Z · comments (51)
[link] Frontier Model Forum
Zach Stein-Perlman · 2023-07-26T14:30:02.018Z · comments (0)
[link] Podcasts: Future of Life Institute, Breakthrough Science Summit panel
jasoncrawford · 2023-07-26T14:28:04.462Z · comments (0)
Llama We Doing This Again?
Zvi · 2023-07-26T13:00:06.703Z · comments (3)
[link] Frontier Model Security
Vaniver · 2023-07-26T04:48:02.215Z · comments (1)
[link] The First Room-Temperature Ambient-Pressure Superconductor
Annapurna (jorge-velez) · 2023-07-26T02:27:51.760Z · comments (28)
Underwater Torture Chambers: The Horror Of Fish Farming
omnizoid · 2023-07-26T00:27:15.490Z · comments (49)
[link] Contra Alexander on the Bitter Lesson and IQ
Andrew Keenan Richardson (qemqemqem) · 2023-07-26T00:07:53.904Z · comments (1)
Overcoming the MWC
Mark Freed (mark-freed) · 2023-07-25T17:31:35.658Z · comments (0)
Russian parliamentarian: let's ban personal computers and the Internet
RomanS · 2023-07-25T17:30:20.871Z · comments (6)
[link] AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer
Corin Katzke (corin-katzke) · 2023-07-25T16:58:44.528Z · comments (0)
"The Universe of Minds" - call for reviewers (Seeds of Science)
rogersbacon · 2023-07-25T16:53:44.775Z · comments (0)
Thoughts on Loss Landscapes and why Deep Learning works
beren · 2023-07-25T16:41:39.562Z · comments (4)
Should you work at a leading AI lab? (including in non-safety roles)
Benjamin Hilton (80000hours) · 2023-07-25T16:29:39.371Z · comments (0)
[link] Whisper's Word-Level Timestamps are Out
Varshul Gupta · 2023-07-25T14:32:28.671Z · comments (2)
[link] AIS 101: Task decomposition for scalable oversight
Charbel-Raphaël (charbel-raphael-segerie) · 2023-07-25T13:34:58.507Z · comments (0)
Anthropic Observations
Zvi · 2023-07-25T12:50:03.178Z · comments (1)
Autonomous Alignment Oversight Framework (AAOF)
Justausername · 2023-07-25T10:25:03.090Z · comments (0)
How LLMs are and are not myopic
janus · 2023-07-25T02:19:44.949Z · comments (14)
Secure Hand Holding
jefftk (jkaufman) · 2023-07-25T01:40:01.553Z · comments (43)
[link] Open problems in activation engineering
TurnTrout · 2023-07-24T19:46:08.733Z · comments (2)
Subdivisions for Useful Distillations?
Sharat Jacob Jacob (sharat-jacob-jacob) · 2023-07-24T18:55:05.801Z · comments (2)
[link] Optimizing For Approval And Disapproval
Thoth Hermes (thoth-hermes) · 2023-07-24T18:46:15.223Z · comments (0)
An Opinionated Guide to Computability and Complexity (Post #0)
Noosphere89 (sharmake-farah) · 2023-07-24T17:53:18.551Z · comments (10)
Slowing down AI progress is an underexplored alignment strategy
Norman Borlaug · 2023-07-24T16:56:25.604Z · comments (27)
Anticipation in LLMs
derek shiller (derek-shiller) · 2023-07-24T15:53:07.076Z · comments (0)
[link] The cone of freedom (or, freedom might only be instrumentally valuable)
dkl9 · 2023-07-24T15:38:54.687Z · comments (6)
A reformulation of Finite Factored Sets
Matthias G. Mayer (matthias-georg-mayer) · 2023-07-24T13:02:25.382Z · comments (1)
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-07-24T11:30:10.602Z · comments (12)
[link] [Crosspost] An AI Pause Is Humanity's Best Bet For Preventing Extinction (TIME)
otto.barten (otto-barten) · 2023-07-24T10:07:40.473Z · comments (0)
Cryonics and Regret
MvB (martin-von-berg) · 2023-07-24T09:16:01.456Z · comments (34)
Rationality !== Winning
Raemon · 2023-07-24T02:53:59.764Z · comments (49)
[question] Which rationality posts are begging for further practical development?
LoganStrohl (BrienneYudkowsky) · 2023-07-23T22:22:04.389Z · answers+comments (17)
[link] Please speak unpredictably
dkl9 · 2023-07-23T22:09:09.035Z · comments (16)
QAPR 5: grokking is maybe not *that* big a deal?
Quintin Pope (quintin-pope) · 2023-07-23T20:14:33.405Z · comments (15)
[link] My favorite AI governance research this year so far
Zach Stein-Perlman · 2023-07-23T16:30:00.558Z · comments (1)
"Justice, Cherryl."
Zack_M_Davis · 2023-07-23T16:16:40.835Z · comments (20)
Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive
Justausername · 2023-07-23T16:08:32.886Z · comments (1)
Autogynephilia discourse is so absurdly bad on all sides
tailcalled · 2023-07-23T13:12:07.982Z · comments (24)
Examples of Prompts that Make GPT-4 Output Falsehoods
scasper · 2023-07-22T20:21:39.730Z · comments (5)
Think like a consultant not a salesperson
Adam Zerner (adamzerner) · 2023-07-22T19:31:48.676Z · comments (5)
Optimization, loss set at variance in RL
Clairstan · 2023-07-22T18:25:31.773Z · comments (1)
Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs
davidad · 2023-07-22T18:09:03.816Z · comments (2)
Apollo Neuro Follow Up
Elizabeth (pktechgirl) · 2023-07-22T17:20:09.893Z · comments (0)
Expert trap – Ways out (Part 3 of 3)
Paweł Sysiak (pawel-sysiak) · 2023-07-22T13:06:14.617Z · comments (0)
GPTs' ability to keep a secret is weirdly prompt-dependent
Mateusz Bagiński (mateusz-baginski) · 2023-07-22T12:21:26.175Z · comments (0)
Replacing the Big Air Purifier
jefftk (jkaufman) · 2023-07-22T12:10:01.050Z · comments (0)
[question] I'm consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?
Benjamin Hendricks (benjamin-hendricks) · 2023-07-21T21:10:21.543Z · answers+comments (37)
Fundamentally Fuzzy Concepts Can't Have Crisp Definitions: Cooperation and Alignment vs Math and Physics
VojtaKovarik · 2023-07-21T21:03:21.501Z · comments (18)
← previous page (newer posts) · next page (older posts) →