LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Revealing alignment faking with a single prompt
Florian_Dietz · 2025-01-29T21:01:15.000Z · comments (4)
[link] Links and short notes, 2025-01-26: Atlas Shrugged and the irreplaceable founder, pumping stations and civic pride, and thoughts on the eve of AGI
jasoncrawford · 2025-01-26T20:52:51.416Z · comments (1)
Recursive Self-Modeling as a Plausible Mechanism for Real-time Introspection in Current Language Models
rife (edgar-muniz) · 2025-01-22T18:36:45.226Z · comments (5)
[link] AISN #46: The Transition
Corin Katzke (corin-katzke) · 2025-01-23T18:09:36.858Z · comments (0)
Starting an Egan High School
Chris Wintergreen · 2025-01-26T19:02:17.658Z · comments (2)
Reconceptualizing the Nothingness and Existence
Htarlov (htarlov) · 2025-01-28T20:29:44.390Z · comments (1)
[question] AI Safety in secret
Michael Flood (michael-flood) · 2025-01-25T18:16:03.181Z · answers+comments (0)
[question] A Floating Cube - Rejected HLE submission
Shankar Sivarajan (shankar-sivarajan) · 2025-01-25T04:52:22.194Z · answers+comments (1)
The Clueless Sniper and the Principle of Indifference
Jim Buhler (jim-buhler) · 2025-01-27T11:52:57.978Z · comments (19)
Superintelligent AI will make mistakes
juggins · 2025-01-30T15:12:50.561Z · comments (2)
Introducing the Coalition for a Baruch Plan for AI: A Call for a Radical Treaty-Making process for the Global Governance of AI
rguerreschi · 2025-01-30T15:26:09.482Z · comments (0)
Positive jailbreaks in LLMs
dereshev · 2025-01-29T08:41:44.680Z · comments (0)
[link] Narratives as catalysts of catastrophic trajectories
EQ · 2025-01-26T19:01:21.558Z · comments (0)
If you wanted to actually reduce the trade deficit, how would you do it?
Logan Zoellner (logan-zoellner) · 2025-01-26T18:04:54.702Z · comments (5)
[question] Does the ChatGPT (web)app sometimes show actual o1 CoTs now?
Sohaib Imran (sohaib-imran) · 2025-01-29T17:27:08.067Z · answers+comments (6)
How *exactly* can AI take your job in the next few years?
Ansh Juneja (ansh-juneja) · 2025-01-30T02:33:13.475Z · comments (0)
Outlaw Code
scarcegreengrass · 2025-01-30T23:41:57.239Z · comments (0)
Memorization-generalization in practice
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-30T14:10:48.239Z · comments (0)
[link] Fertility Will Never Recover
Eneasz · 2025-01-30T01:16:41.332Z · comments (16)
Jevon's paradox and economic intuitions
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2025-01-27T23:04:23.854Z · comments (0)
[question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?
Q Home · 2025-01-22T03:30:38.066Z · answers+comments (0)
The Dead Cradle Theory: Why Earth May Not Survive Humanity's Expansion into Space
Nicholas Andresen (nicholas-andresen) · 2025-01-22T17:43:48.950Z · comments (0)
Empirical Insights into Feature Geometry in Sparse Autoencoders
Jason Boxi Zhang (jason-boxi-zhang) · 2025-01-24T19:02:19.167Z · comments (0)
[link] Understanding AI World Models w/ Chris Canal
jacobhaimes · 2025-01-27T16:32:47.724Z · comments (0)
[question] Supposing that the "Dead Internet Theory" is true or largely true, how can we act on that information?
SpectrumDT · 2025-01-27T16:47:01.338Z · answers+comments (5)
[question] Why not train reasoning models with RLHF?
CBiddulph (caleb-biddulph) · 2025-01-30T07:58:35.742Z · answers+comments (4)
Allegory of the Tsunami
Evan Hu (evan-hu) · 2025-01-29T19:09:33.761Z · comments (1)
[link] Whereby: The Zoom alternative you probably haven't heard of
Itay Dreyfus (itay-dreyfus) · 2025-01-29T13:01:08.564Z · comments (0)
Absorbing Your Friends' Powers
Alice Blair (Diatom) · 2025-01-30T02:32:27.091Z · comments (0)
[question] are there 2 types of alignment?
KvmanThinking (avery-liu) · 2025-01-23T00:08:20.885Z · answers+comments (9)
How are Those AI Participants Doing Anyway?
mushroomsoup · 2025-01-24T22:37:47.999Z · comments (0)
[link] Bayesian Reasoning on Maps
Sjlver (jonas-wagner) · 2025-01-22T10:45:03.584Z · comments (0)
Using an LLM for creative writing feels wrong to me
Declan Molony (declan-molony) · 2025-01-28T06:42:24.799Z · comments (13)
Scanless Whole Brain Emulation
Knight Lee (Max Lee) · 2025-01-27T10:00:08.036Z · comments (4)
Death vs. Suffering: The Endurist-Serenist Divide on Life’s Worst Fate
Alex_Steiner · 2025-01-27T03:59:40.279Z · comments (7)
Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience
rife (edgar-muniz) · 2025-01-26T15:53:10.530Z · comments (18)
[link] A concise definition of what it means to win
testingthewaters · 2025-01-25T06:37:37.305Z · comments (0)
Detailed Ideal World Benchmark
Knight Lee (Max Lee) · 2025-01-30T02:31:39.852Z · comments (0)
[link] Hello World
Charlie Sanders (charlie-sanders) · 2025-01-30T15:33:57.427Z · comments (0)
[question] Implication of Uncomputable Problems
Nathan1123 · 2025-01-30T16:48:38.222Z · answers+comments (0)
[link] Constitutions for ASI?
ukc10014 · 2025-01-28T16:32:39.307Z · comments (0)
Starting Thoughts on RLHF
Michael Flood (michael-flood) · 2025-01-23T22:16:49.793Z · comments (0)
Updating and Editing Factual Knowledge in Language Models
Dhananjay Ashok (dhananjay-ashok) · 2025-01-23T19:34:37.121Z · comments (2)
Is it ethical to work in AI "content evaluation"?
anon_databoy123 (noob1234) · 2025-01-27T19:58:26.176Z · comments (2)
Should Art Carry the Weight of Shaping our Values?
Krishna Maneesha Dendukuri (krishna_maneesha-d) · 2025-01-28T18:43:32.517Z · comments (0)
The many failure modes of consumer-grade LLMs
dereshev · 2025-01-26T19:01:09.891Z · comments (0)
Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-01-30T23:25:36.135Z · comments (1)
[question] Who's track record of AI predictions would you like to see evaluated?
Jonny Spicer (jonnyspicer) · 2025-01-29T12:05:30.311Z · answers+comments (1)
[link] Tetherware #1: The case for humanlike AI
Jáchym Fibír · 2025-01-30T10:58:11.717Z · comments (0)
Locating and Editing Knowledge in LMs
Dhananjay Ashok (dhananjay-ashok) · 2025-01-24T22:53:40.559Z · comments (0)
← previous page (newer posts) · next page (older posts) →