LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Filan Cabinet Podcast with Oliver Habryka - Transcript
MondSemmel · 2023-02-14T02:38:34.867Z · comments (9)
Basic facts about language models during training
beren · 2023-02-21T11:46:12.256Z · comments (14)
Deceptive Alignment is <1% Likely by Default
DavidW (david-wheaton) · 2023-02-21T15:09:27.920Z · comments (26)
Research agenda: Formalizing abstractions of computations
Erik Jenner (ejenner) · 2023-02-02T04:29:06.568Z · comments (10)
Latent variables for prediction markets: motivation, technical guide, and design considerations
tailcalled · 2023-02-12T17:54:33.045Z · comments (17)
Covid 2/23/23: Your Best Possible Situation
Zvi · 2023-02-23T13:10:01.887Z · comments (9)
A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex (Stefan42) · 2023-02-20T19:35:14.027Z · comments (8)
Exercise is Good, Actually
Gordon Seidoh Worley (gworley) · 2023-02-02T00:09:18.143Z · comments (27)
SolidGoldMagikarp III: Glitch token archaeology
mwatkins · 2023-02-14T10:17:51.495Z · comments (30)
Retrospective on the 2022 Conjecture AI Discussions
Andrea_Miotti (AndreaM) · 2023-02-24T22:41:13.131Z · comments (5)
Conditioning Predictive Models: Large language models as predictors
evhub · 2023-02-02T20:28:46.612Z · comments (4)
[link] Podcast with Oli Habryka on LessWrong / Lightcone Infrastructure
DanielFilan · 2023-02-05T02:52:06.632Z · comments (20)
Qualities that alignment mentors value in junior researchers
Akash (akash-wasil) · 2023-02-14T23:27:40.747Z · comments (14)
Another Way to Be Okay
Gretta Duleba (gretta-duleba) · 2023-02-19T20:49:31.895Z · comments (13)
Building and Entertaining Couples
Jacob Falkovich (Jacobian) · 2023-02-22T19:02:24.928Z · comments (11)
Decision Transformer Interpretability
Joseph Bloom (Jbloom) · 2023-02-06T07:29:01.917Z · comments (13)
[link] Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky
bayesed · 2023-02-20T16:42:07.413Z · comments (54)
The Cave Allegory Revisited: Understanding GPT's Worldview
Jan_Kulveit · 2023-02-14T16:00:08.744Z · comments (5)
Two problems with ‘Simulators’ as a frame
ryan_greenblatt · 2023-02-17T23:34:20.787Z · comments (13)
Teleosemantics!
abramdemski · 2023-02-23T23:26:15.894Z · comments (26)
[link] OpenAI/Microsoft announce "next generation language model" integrated into Bing/Edge
LawrenceC (LawChan) · 2023-02-07T20:38:08.726Z · comments (4)
[link] Tools for finding information on the internet
RomanHauksson (r) · 2023-02-09T17:05:28.770Z · comments (11)
You are probably not a good alignment researcher, and other blatant lies
junk heap homotopy (zrkrlc) · 2023-02-02T13:55:15.186Z · comments (16)
LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space
NickyP (Nicky) · 2023-02-13T18:52:36.689Z · comments (11)
[link] [Linkpost] Google invested $300M in Anthropic in late 2022
Akash (akash-wasil) · 2023-02-03T19:13:32.112Z · comments (14)
[link] Review of AI Alignment Progress
PeterMcCluskey · 2023-02-07T18:57:41.329Z · comments (32)
Conditioning Predictive Models: Outer alignment via careful conditioning
evhub · 2023-02-02T20:28:58.955Z · comments (13)
Why I’m not working on {debate, RRM, ELK, natural abstractions}
Steven Byrnes (steve2152) · 2023-02-10T19:22:37.865Z · comments (19)
Prizes for the 2021 Review
Raemon · 2023-02-10T19:47:43.504Z · comments (2)
The Preference Fulfillment Hypothesis
Kaj_Sotala · 2023-02-26T10:55:12.647Z · comments (62)
Voting Results for the 2021 Review
Raemon · 2023-02-01T08:02:06.744Z · comments (10)
[link] Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC (LawChan) · 2023-02-16T19:47:20.696Z · comments (9)
On Developing a Mathematical Theory of Interpretability
Spencer Becker-Kahn · 2023-02-09T01:45:01.521Z · comments (8)
I Am Scared of Posting Negative Takes About Bing's AI
Yitz (yitz) · 2023-02-17T20:50:09.744Z · comments (27)
Here's Why I'm Hesitant To Respond In More Depth
DirectedEvolution (AllAmericanBreakfast) · 2023-02-06T18:36:24.882Z · comments (7)
Emergent Deception and Emergent Optimization
jsteinhardt · 2023-02-20T02:40:09.912Z · comments (0)
Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof
Quinn (quinn-dougherty) · 2023-02-16T01:13:44.847Z · comments (18)
Rationality-related things I don't know as of 2023
Adam Zerner (adamzerner) · 2023-02-11T06:04:16.183Z · comments (59)
A mechanistic explanation for SolidGoldMagikarp-like tokens in GPT2
MadHatter · 2023-02-26T01:10:33.785Z · comments (14)
Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes
Andrea_Miotti (AndreaM) · 2023-02-24T23:03:04.917Z · comments (7)
[link] Who invented knitting? The plot thickens
eukaryote · 2023-02-05T00:24:39.706Z · comments (9)
Aiming for Convergence Is Like Discouraging Betting
Zack_M_Davis · 2023-02-01T00:03:21.315Z · comments (17)
AGI systems & humans will both need to solve the alignment problem
Jeffrey Ladish (jeff-ladish) · 2023-02-24T03:29:21.043Z · comments (14)
[link] Learning How to Learn (And 20+ Studies)
maxa · 2023-02-26T22:46:55.031Z · comments (12)
Respect Chesterton-Schelling Fences
shminux · 2023-02-27T00:09:30.815Z · comments (17)
[link] Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy (vanessa-kosoy) · 2023-02-19T09:38:58.684Z · comments (32)
Order Matters for Deceptive Alignment
DavidW (david-wheaton) · 2023-02-15T19:56:07.358Z · comments (19)
What is it like doing AI safety work?
KatWoods (ea247) · 2023-02-21T20:12:01.977Z · comments (2)
[link] a narrative explanation of the QACI alignment plan
Tamsin Leake (carado-1) · 2023-02-15T03:28:34.710Z · comments (29)
How popular is ChatGPT? Part 1: more popular than Taylor Swift
Harlan · 2023-02-24T22:30:04.340Z · comments (0)
← previous page (newer posts) · next page (older posts) →