LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What does success look like?
Raymond D · 2025-01-23T17:48:35.618Z · comments (0)
The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories
avturchin · 2025-01-22T11:48:46.071Z · comments (18)
The Human Alignment Problem for AIs
rife (edgar-muniz) · 2025-01-22T04:06:10.872Z · comments (5)
[link] Training Data Attribution: Examining Its Adoption & Use Cases
Deric Cheng (deric-cheng) · 2025-01-22T15:41:19.744Z · comments (0)
[question] Recommendations for Recent Posts/Sequences on Instrumental Rationality?
Benjamin Hendricks (benjamin-hendricks) · 2025-01-26T00:41:08.577Z · answers+comments (3)
[link] What are the differences between AGI, transformative AI, and superintelligence?
Vishakha (vishakha-agrawal) · 2025-01-23T10:03:31.886Z · comments (3)
AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI
DanielFilan · 2025-01-24T23:00:07.562Z · comments (0)
Detecting out of distribution text with surprisal and entropy
Sandy Fraser (alex-fraser) · 2025-01-28T18:46:46.977Z · comments (3)
On Responsibility
silentbob · 2025-01-21T10:47:37.562Z · comments (2)
Revealing alignment faking with a single prompt
Florian_Dietz · 2025-01-29T21:01:15.000Z · comments (4)
Liron Shapira vs Ken Stanley on Doom Debates. A review
TheManxLoiner · 2025-01-24T18:01:56.646Z · comments (0)
AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
DanielFilan · 2025-01-20T00:40:07.077Z · comments (0)
Starting an Egan High School
Chris Wintergreen · 2025-01-26T19:02:17.658Z · comments (2)
[link] Links and short notes, 2025-01-26: Atlas Shrugged and the irreplaceable founder, pumping stations and civic pride, and thoughts on the eve of AGI
jasoncrawford · 2025-01-26T20:52:51.416Z · comments (1)
In the future, language models will be our interface to the world
Daniel Tan (dtch1997) · 2025-01-24T23:16:49.999Z · comments (0)
Recursive Self-Modeling as a Plausible Mechanism for Real-time Introspection in Current Language Models
rife (edgar-muniz) · 2025-01-22T18:36:45.226Z · comments (5)
My Mental Model of AI Optimist Opinions
tailcalled · 2025-01-29T18:44:36.485Z · comments (1)
[link] Links and short notes, 2025-01-20
jasoncrawford · 2025-01-21T16:10:51.813Z · comments (0)
[link] AISN #46: The Transition
Corin Katzke (corin-katzke) · 2025-01-23T18:09:36.858Z · comments (0)
The Clueless Sniper and the Principle of Indifference
Jim Buhler (jim-buhler) · 2025-01-27T11:52:57.978Z · comments (19)
[question] AI Safety in secret
Michael Flood (michael-flood) · 2025-01-25T18:16:03.181Z · answers+comments (0)
[question] A Floating Cube - Rejected HLE submission
Shankar Sivarajan (shankar-sivarajan) · 2025-01-25T04:52:22.194Z · answers+comments (1)
[question] Does the ChatGPT (web)app sometimes show actual o1 CoTs now?
Sohaib Imran (sohaib-imran) · 2025-01-29T17:27:08.067Z · answers+comments (6)
Contextual attention heads in the first layer of GPT-2
Alex Gibson · 2025-01-20T13:24:31.803Z · comments (0)
If you wanted to actually reduce the trade deficit, how would you do it?
Logan Zoellner (logan-zoellner) · 2025-01-26T18:04:54.702Z · comments (5)
[link] Narratives as catalysts of catastrophic trajectories
EQ · 2025-01-26T19:01:21.558Z · comments (0)
SIGMI Certification Criteria
a littoral wizard · 2025-01-20T02:41:17.210Z · comments (0)
Detroit Lions -- over confidence is over rated?
Hzn · 2025-01-20T10:53:48.574Z · comments (0)
Computational Limits on Efficiency
vibhumeh · 2025-01-21T18:29:36.997Z · comments (1)
[question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?
Q Home · 2025-01-22T03:30:38.066Z · answers+comments (0)
Empirical Insights into Feature Geometry in Sparse Autoencoders
Jason Boxi Zhang (jason-boxi-zhang) · 2025-01-24T19:02:19.167Z · comments (0)
The Dead Cradle Theory: Why Earth May Not Survive Humanity's Expansion into Space
Nicholas Andresen (nicholas-andresen) · 2025-01-22T17:43:48.950Z · comments (0)
Jevon's paradox and economic intuitions
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2025-01-27T23:04:23.854Z · comments (0)
[link] Whereby: The Zoom alternative you probably haven't heard of
Itay Dreyfus (itay-dreyfus) · 2025-01-29T13:01:08.564Z · comments (0)
[question] Why not train reasoning models with RLHF?
CBiddulph (caleb-biddulph) · 2025-01-30T07:58:35.742Z · answers+comments (0)
Easily Evaluate SAE-Steered Models with EleutherAI Evaluation Harness
Matthew Khoriaty (matthew-khoriaty) · 2025-01-21T02:02:35.177Z · comments (0)
Will AI Resilience protect Developing Nations?
ejk64 · 2025-01-21T15:31:32.378Z · comments (0)
[link] Bayesian Reasoning on Maps
Sjlver (jonas-wagner) · 2025-01-22T10:45:03.584Z · comments (0)
[question] are there 2 types of alignment?
KvmanThinking (avery-liu) · 2025-01-23T00:08:20.885Z · answers+comments (9)
How are Those AI Participants Doing Anyway?
mushroomsoup · 2025-01-24T22:37:47.999Z · comments (0)
[link] Understanding AI World Models w/ Chris Canal
jacobhaimes · 2025-01-27T16:32:47.724Z · comments (0)
[question] Supposing that the "Dead Internet Theory" is true or largely true, how can we act on that information?
SpectrumDT · 2025-01-27T16:47:01.338Z · answers+comments (5)
[Linkpost] Why AI Safety Camp struggles with fundraising (FBB #2)
gergogaspar (gergo-gaspar) · 2025-01-21T17:27:51.965Z · comments (0)
Using an LLM for creative writing feels wrong to me
Declan Molony (declan-molony) · 2025-01-28T06:42:24.799Z · comments (13)
Scanless Whole Brain Emulation
Knight Lee (Max Lee) · 2025-01-27T10:00:08.036Z · comments (4)
[link] Constitutions for ASI?
ukc10014 · 2025-01-28T16:32:39.307Z · comments (0)
Death vs. Suffering: The Endurist-Serenist Divide on Life’s Worst Fate
Alex_Steiner · 2025-01-27T03:59:40.279Z · comments (7)
Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience
rife (edgar-muniz) · 2025-01-26T15:53:10.530Z · comments (18)
[link] A concise definition of what it means to win
testingthewaters · 2025-01-25T06:37:37.305Z · comments (0)
Positive jailbreaks in LLMs
dereshev · 2025-01-29T08:41:44.680Z · comments (0)
← previous page (newer posts) · next page (older posts) →