LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Paper review: “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks”
Vassil Tashev (vassil-tashev) · 2024-02-29T18:44:31.301Z · comments (0)
What’s in the box?! – Towards interpretability by distinguishing niches of value within neural networks.
Joshua Clancy (joshua-clancy) · 2024-02-29T18:33:42.811Z · comments (0)
Short Post: Discerning Truth from Trash
FinalFormal2 · 2024-02-29T18:09:42.987Z · comments (0)
Sëbus: Intro
SashaWu · 2024-02-29T16:42:26.651Z · comments (0)
AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)
Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (0)
[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (0)
From First Principles: Dumb cartography. Three caveats
numpyNaN · 2024-02-29T13:41:23.850Z · comments (0)
The "AI Race"
Victor Ashioya (victor-ashioya) · 2024-02-29T09:33:34.248Z · comments (0)
Tips for Empirical Alignment Research
Ethan Perez (ethan-perez) · 2024-02-29T06:04:54.481Z · comments (1)
[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (3)
Can RLLMv3's ability to defend against jailbreaks be attributed to datasets containing stories about Jung's shadow integration theory?
MiguelDev (whitehatStoic) · 2024-02-29T05:13:20.241Z · comments (0)
[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)
Tour Retrospective February 2024
jefftk (jkaufman) · 2024-02-29T03:50:04.019Z · comments (0)
Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (0)
Conspiracy Theorists Aren't Ignorant. They're Bad At Epistemology.
omnizoid · 2024-02-28T23:39:39.192Z · comments (9)
[link] Discovering alignment windfalls reduces AI risk
goodgravy · 2024-02-28T21:23:27.876Z · comments (1)
[link] my theory of the industrial revolution
bhauth · 2024-02-28T21:07:55.274Z · comments (1)
Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)
timestamping through the Singularity
throwaway918119127 · 2024-02-28T19:09:47.313Z · comments (3)
Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (2)
Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (5)
Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)
[link] Corporate Governance for Frontier AI Labs: A Research Agenda
Matthew Wearden (matthew-wearden) · 2024-02-28T11:29:59.688Z · comments (0)
[link] How AI Will Change Education
robotelvis · 2024-02-28T05:30:16.511Z · comments (1)
Band Lessons?
jefftk (jkaufman) · 2024-02-28T03:00:05.381Z · comments (3)
New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (52)
How is Chat-GPT4 Not Conscious?
amelia (314159) · 2024-02-28T00:00:35.935Z · comments (16)
Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (81)
Which animals realize which types of subjective welfare?
MichaelStJules · 2024-02-27T19:31:32.854Z · comments (0)
[link] Biosecurity and AI: Risks and Opportunities
Aidan (AI Safety Newsletter) (Center for AI Safety) · 2024-02-27T18:45:43.556Z · comments (1)
[link] Infants’ understanding of the causal power of agents and tools
Bruce W. Lee (bruce-lee) · 2024-02-27T18:36:42.037Z · comments (0)
The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)
How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (4)
On Frustration and Regret
silentbob · 2024-02-27T12:19:55.439Z · comments (0)
Facts vs Interpretations
Declan Molony (declan-molony) · 2024-02-27T07:57:50.508Z · comments (0)
San Francisco ACX Meetup “Third Saturday”
Nate Sternberg (nate-sternberg) · 2024-02-27T07:07:23.086Z · comments (0)
Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)
Project idea: an iterated prisoner's dilemma competition/game
Adam Zerner (adamzerner) · 2024-02-26T23:06:20.699Z · comments (0)
Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (32)
Getting rational now or later: navigating procrastination and time-inconsistent preferences for new rationalists
milo_thoughts (miles-bader) · 2024-02-26T19:38:52.436Z · comments (0)
[question] Whom Do You Trust?
JackOfAllTrades (JackOfAllSpades) · 2024-02-26T19:38:36.549Z · answers+comments (0)
Boundary Violations vs Boundary Dissolution
Chipmonk · 2024-02-26T18:59:08.713Z · comments (4)
[question] Can we get an AI to do our alignment homework for us?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (32)
How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)
Hidden Cognition Detection Methods and Benchmarks
Paul Colognese (paul-colognese) · 2024-02-26T05:31:00.714Z · comments (10)
[link] Cellular respiration as a steam engine
dkl9 · 2024-02-25T20:17:38.788Z · comments (1)
Sëbus: An out-world intro
SashaWu · 2024-02-25T19:55:37.524Z · comments (0)
[question] Rationalism and Dependent Origination?
Baometrus (worlds-arise) · 2024-02-25T18:16:33.748Z · answers+comments (3)
China-AI forecasts
NathanBarnard · 2024-02-25T16:49:33.652Z · comments (25)
next page (older posts) →