LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

[link] New Tool: the Residual Stream Viewer
AdamYedidia (babybeluga) · 2023-10-01T00:49:51.965Z · comments (7)
"Absence of Evidence is Not Evidence of Absence" As a Limit
transhumanist_atom_understander · 2023-10-01T08:15:28.852Z · comments (1)
[link] AI Safety Impact Markets: Your Charity Evaluator for AI Safety
Dawn Drescher (Telofy) · 2023-10-01T10:47:06.952Z · comments (5)
[link] Fifty Flips
abstractapplic · 2023-10-01T15:30:43.268Z · comments (14)
[link] Join AISafety.info's Distillation Hackathon (Oct 6-9th)
smallsilo (monstrologies) · 2023-10-01T18:43:43.359Z · comments (0)
[question] Looking for study
Robert Feinstein (robert-feinstein) · 2023-10-01T19:52:25.481Z · answers+comments (0)
AI Alignment Breakthroughs this Week [new substack]
Logan Zoellner (logan-zoellner) · 2023-10-01T22:13:48.589Z · comments (8)
Revisiting the Manifold Hypothesis
Aidan Rocke (aidanrocke) · 2023-10-01T23:55:56.704Z · comments (19)
Instrumental Convergence and human extinction.
Spiritus Dei (spiritus-dei) · 2023-10-02T00:41:29.952Z · comments (3)
Why I got the smallpox vaccine in 2023
joec · 2023-10-02T05:11:41.249Z · comments (6)
A Mathematical Model for Simulators
lukemarks (marc/er) · 2023-10-02T06:46:31.702Z · comments (0)
[link] Linkpost: They Studied Dishonesty. Was Their Work a Lie?
Linch · 2023-10-02T08:10:51.857Z · comments (12)
The 99% principle for personal problems
Kaj_Sotala · 2023-10-02T08:20:07.379Z · comments (20)
Direction of Fit
NicholasKees (nick_kees) · 2023-10-02T12:34:24.385Z · comments (0)
[link] energy landscapes of experts
bhauth · 2023-10-02T14:08:32.370Z · comments (2)
Will early transformative AIs primarily use text? [Manifold question]
Fabien Roger (Fabien) · 2023-10-02T15:05:07.279Z · comments (0)
A counterexample for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-10-02T15:16:48.418Z · comments (0)
Expectations for Gemini: hopefully not a big deal
Maxime Riché (maxime-riche) · 2023-10-02T15:38:32.834Z · comments (5)
Population After a Catastrophe
Stan Pinsent (stan-pinsent) · 2023-10-02T16:06:56.614Z · comments (5)
Thomas Kwa's MIRI research experience
Thomas Kwa (thomas-kwa) · 2023-10-02T16:42:37.886Z · comments (52)
[link] Dall-E 3
p.b. · 2023-10-02T20:33:18.294Z · comments (9)
My Mid-Career Transition into Biosecurity
jefftk (jkaufman) · 2023-10-02T21:20:06.768Z · comments (4)
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
miles · 2023-10-03T02:22:00.199Z · comments (0)
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks (marc/er) · 2023-10-03T07:45:15.228Z · comments (0)
Mech Interp Challenge: October - Deciphering the Sorted List Model
CallumMcDougall (TheMcDouglas) · 2023-10-03T10:57:29.598Z · comments (0)
Why We Use Money? - A Walrasian View
Savio Coelho (Will_Crowley) · 2023-10-03T12:02:37.312Z · comments (3)
Monthly Roundup #11: October 2023
Zvi · 2023-10-03T14:10:01.686Z · comments (12)
[question] Potential alignment targets for a sovereign superintelligent AI
Paul Colognese (paul-colognese) · 2023-10-03T15:09:59.529Z · answers+comments (4)
What would it mean to understand how a large language model (LLM) works? Some quick notes.
Bill Benzon (bill-benzon) · 2023-10-03T15:11:13.508Z · comments (4)
[link] Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists
ChristianWilliams · 2023-10-03T16:44:17.620Z · comments (0)
[link] Testing and Automation for Intelligent Systems.
Sai Kiran Kammari (sai-kiran-kammari) · 2023-10-03T17:51:08.796Z · comments (0)
[question] Current AI safety techniques?
Zach Stein-Perlman · 2023-10-03T19:30:54.481Z · answers+comments (2)
OpenAI-Microsoft partnership
Zach Stein-Perlman · 2023-10-03T20:01:44.795Z · comments (18)
When to Get the Booster?
jefftk (jkaufman) · 2023-10-03T21:00:12.813Z · comments (15)
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan · 2023-10-03T21:50:07.552Z · comments (0)
[question] Who determines whether an alignment proposal is the definitive alignment solution?
MiguelDev (whitehatStoic) · 2023-10-03T22:39:23.700Z · answers+comments (6)
[link] [Link] Bay Area Winter Solstice 2023
tcheasdfjkl · 2023-10-04T02:19:56.284Z · comments (3)
Graphical tensor notation for interpretability
Jordan Taylor (Nadroj) · 2023-10-04T08:04:33.341Z · comments (11)
[question] What are some examples of AIs instantiating the 'nearest unblocked strategy problem'?
EJT (ElliottThornley) · 2023-10-04T11:05:34.537Z · answers+comments (4)
[link] Why a Mars colony would lead to a first strike situation
Remmelt (remmelt-ellen) · 2023-10-04T11:29:53.679Z · comments (8)
Entanglement and intuition about words and meaning
Bill Benzon (bill-benzon) · 2023-10-04T14:16:29.713Z · comments (0)
[question] What evidence is there of LLM's containing world models?
Chris_Leong · 2023-10-04T14:33:19.178Z · answers+comments (17)
I don’t find the lie detection results that surprising (by an author of the paper)
JanB (JanBrauner) · 2023-10-04T17:10:51.262Z · comments (8)
[link] AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering
aogara (Aidan O'Gara) · 2023-10-04T17:37:19.564Z · comments (2)
rationalistic probability(litterally just throwing shit out there)
NotaSprayer ASprayer (notasprayer-asprayer) · 2023-10-04T17:46:24.500Z · comments (8)
[question] Using Reinforcement Learning to try to control the heating of a building (district heating)
Tony Karlsson (tony-karlsson) · 2023-10-04T17:47:17.294Z · answers+comments (5)
The 5 Pillars of Happiness
Gabi QUENE · 2023-10-04T17:50:40.633Z · comments (5)
Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master
kgldeshapriya · 2023-10-04T17:52:40.436Z · comments (2)
[link] Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams
aarongertler · 2023-10-04T18:04:25.388Z · comments (0)
PortAudio M1 Latency
jefftk (jkaufman) · 2023-10-04T19:10:13.021Z · comments (5)
next page (older posts) →