LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (53)
[question] Is OpenAI losing money on each request?
thenoviceoof · 2023-12-01T03:27:23.929Z · answers+comments (8)
Reinforcement Learning using Layered Morphology (RLLM)
MiguelDev (whitehatStoic) · 2023-12-01T05:18:58.162Z · comments (0)
Reality is whatever you can get away with.
sometimesperson (fake-name) · 2023-12-01T07:50:00.382Z · comments (0)
How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-01T14:51:04.624Z · comments (1)
Worlds where I wouldn't worry about AI risk
adekcz (michal-keda) · 2023-12-01T16:06:54.199Z · comments (0)
[link] Why Did NEPA Peak in 2016?
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-01T16:18:35.435Z · comments (0)
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (55)
Kolmogorov Complexity Lays Bare the Soul
jakej (jake-jenks) · 2023-12-01T18:29:57.379Z · comments (8)
[link] Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)
ampdot · 2023-12-01T18:48:01.406Z · comments (0)
[link] Queuing theory: Benefits of operating at 60% capacity
ampdot · 2023-12-01T18:48:01.426Z · comments (4)
[link] Carving up problems at their joints
Jakub Smékal (jakub-smekal) · 2023-12-01T18:48:46.510Z · comments (0)
[link] Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]
Writer · 2023-12-01T19:30:58.304Z · comments (0)
Please Bet On My Quantified Self Decision Markets
niplav · 2023-12-01T20:07:38.284Z · comments (6)
Benchmarking Bowtie2 Threading
jefftk (jkaufman) · 2023-12-01T20:20:05.593Z · comments (0)
[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (6)
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (9)
MATS Summer 2023 Retrospective
Rocket (utilistrutil) · 2023-12-01T23:29:47.958Z · comments (34)
South Bay Pre-Holiday Gathering
IS (is) · 2023-12-02T03:21:12.904Z · comments (2)
Protecting against sudden capability jumps during training
nikola (nikolaisalreadytaken) · 2023-12-02T04:22:21.315Z · comments (0)
2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)
[question] What is known about invariants in self-modifying systems?
mishka · 2023-12-02T05:04:19.299Z · answers+comments (2)
List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)
After Alignment — Dialogue between RogerDearnaley and Seth Herd
RogerDearnaley (roger-d-1) · 2023-12-02T06:03:17.456Z · comments (2)
Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (15)
Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
Bill Benzon (bill-benzon) · 2023-12-02T14:36:47.077Z · comments (0)
The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-02T15:20:28.152Z · comments (1)
Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2023-12-02T22:10:21.848Z · comments (31)
Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)
Book Review: 1948 by Benny Morris
Yair Halberstadt (yair-halberstadt) · 2023-12-03T10:29:16.696Z · comments (9)
The benefits and risks of optimism (about AI safety)
Karl von Wendt · 2023-12-03T12:45:12.269Z · comments (6)
[question] How do you do post mortems?
matto · 2023-12-03T14:46:03.521Z · answers+comments (2)
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-03T18:32:42.748Z · comments (0)
[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (4)
[link] Meditations on Mot
Richard_Ngo (ricraz) · 2023-12-04T00:19:19.522Z · comments (11)
[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (13)
[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)
Disappointing Table Refinishing
jefftk (jkaufman) · 2023-12-04T02:50:07.914Z · comments (3)
FTL travel summary
Isaac King (KingSupernova) · 2023-12-04T05:17:21.422Z · comments (3)
A call for a quantitative report card for AI bioterrorism threat models
Juno (translunar) · 2023-12-04T06:35:14.489Z · comments (0)
[link] Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Paul Bricman (paulbricman) · 2023-12-04T07:31:48.726Z · comments (6)
South Bay Meetup 12/9
David Friedman (david-friedman) · 2023-12-04T07:32:26.619Z · comments (0)
[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)
Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)
6. The Mutable Values Problem in Value Learning and CEV
RogerDearnaley (roger-d-1) · 2023-12-04T18:31:22.080Z · comments (0)
Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-04T18:44:32.825Z · comments (0)
Planning in LLMs: Insights from AlphaGo
jco · 2023-12-04T18:48:57.508Z · comments (10)
Agents which are EU-maximizing as a group are not EU-maximizing individually
Mlxa · 2023-12-04T18:49:08.708Z · comments (2)
Mechanistic interpretability through clustering
Alistair Fraser (alistair-fraser) · 2023-12-04T18:49:26.777Z · comments (0)
next page (older posts) →