LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

[link] Accelerating science through evolvable institutions
jasoncrawford · 2023-12-04T23:21:35.330Z · comments (0)
Speaking to Congressional staffers about AI risk
Akash (akash-wasil) · 2023-12-04T23:08:52.055Z · comments (None)
Open Thread – Winter 2023/2024
habryka (habryka4) · 2023-12-04T22:59:49.957Z · comments (0)
Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)
[link] 2023 Alignment Research Updates from FAR AI
AdamGleave · 2023-12-04T22:32:19.842Z · comments (0)
[link] What's new at FAR AI
AdamGleave · 2023-12-04T21:18:03.951Z · comments (0)
n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (1)
[question] Why using activation for interpreting GPT-2?
sprout_ust · 2023-12-04T18:49:45.437Z · answers+comments (0)
Mechanistic interpretability through clustering
Alistair Fraser (alistair-fraser) · 2023-12-04T18:49:26.777Z · comments (0)
Agents which are EU-maximizing as a group are not EU-maximizing individually
Mlxa · 2023-12-04T18:49:08.708Z · comments (1)
Planning in LLMs: Insights from AlphaGo
jco · 2023-12-04T18:48:57.508Z · comments (None)
Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-04T18:44:32.825Z · comments (None)
5. The Mutable Values Problem in Value Learning and CEV
RogerDearnaley (roger-d-1) · 2023-12-04T18:31:22.080Z · comments (None)
Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)
[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (2)
South Bay Meetup 12/9
David Friedman (david-friedman) · 2023-12-04T07:32:26.619Z · comments (0)
[link] Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Paul Bricman (paulbricman) · 2023-12-04T07:31:48.726Z · comments (4)
A call for a quantitative report card for AI bioterrorism threat models
Juno (translunar) · 2023-12-04T06:35:14.489Z · comments (0)
FTL travel summary
Isaac King (KingSupernova) · 2023-12-04T05:17:21.422Z · comments (1)
Disappointing Table Refinishing
jefftk (jkaufman) · 2023-12-04T02:50:07.914Z · comments (3)
[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (4)
[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (7)
[link] Meditations on Mot
Richard_Ngo (ricraz) · 2023-12-04T00:19:19.522Z · comments (2)
[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (2)
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-03T18:32:42.748Z · comments (None)
[question] How do you do post mortems?
matto · 2023-12-03T14:46:03.521Z · answers+comments (1)
The benefits and risks of optimism (about AI safety)
Karl von Wendt · 2023-12-03T12:45:12.269Z · comments (6)
Book Review: 1948 by Benny Morris
Yair Halberstadt (yair-halberstadt) · 2023-12-03T10:29:16.696Z · comments (8)
Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (43)
Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2023-12-02T22:10:21.848Z · comments (24)
The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")
Joe Carlsmith (joekc) · 2023-12-02T15:20:28.152Z · comments (None)
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
Bill Benzon (bill-benzon) · 2023-12-02T14:36:47.077Z · comments (0)
Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià R. Moret · 2023-12-02T14:07:29.992Z · comments (13)
Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (9)
List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (1)
[question] What is known about invariants in self-modifying systems?
mishka · 2023-12-02T05:04:19.299Z · answers+comments (2)
2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (56)
Protecting against sudden capability jumps during training
nikola (nikolaisalreadytaken) · 2023-12-02T04:22:21.315Z · comments (0)
South Bay Pre-Holiday Gathering
IS (is) · 2023-12-02T03:21:12.904Z · comments (0)
MATS Summer 2023 Retrospective
Rocket (utilistrutil) · 2023-12-01T23:29:47.958Z · comments (31)
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (7)
[question] Could there be "natural impact regularization" or "impact regularization by default"?
tailcalled · 2023-12-01T22:01:46.062Z · answers+comments (5)
Benchmarking Bowtie2 Threading
jefftk (jkaufman) · 2023-12-01T20:20:05.593Z · comments (0)
Using Prediction Platforms to Select Quantified Self Experiments
niplav · 2023-12-01T20:07:38.284Z · comments (0)
[link] Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]
Writer · 2023-12-01T19:30:58.304Z · comments (0)
[link] Carving up problems at their joints
Jakub Smékal (jakub-smekal) · 2023-12-01T18:48:46.510Z · comments (0)
[link] Queuing theory: Benefits of operating at 70% capacity
ampdot · 2023-12-01T18:48:01.426Z · comments (4)
[link] Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)
ampdot · 2023-12-01T18:48:01.406Z · comments (None)
Kolmogorov Complexity Lays Bare the Soul
jakej (jake-jenks) · 2023-12-01T18:29:57.379Z · comments (8)
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (43)
next page (older posts) →