LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Request for Information for a new US AI Action Plan (OSTP RFI)
agucova · 2025-02-07T20:40:36.034Z · comments (0)
Jevon's paradox and economic intuitions
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2025-01-27T23:04:23.854Z · comments (0)
[link] Whereby: The Zoom alternative you probably haven't heard of
Itay Dreyfus (itay-dreyfus) · 2025-01-29T13:01:08.564Z · comments (0)
[question] Why not train reasoning models with RLHF?
CBiddulph (caleb-biddulph) · 2025-01-30T07:58:35.742Z · answers+comments (4)
[link] Bayesian Reasoning on Maps
Sjlver (jonas-wagner) · 2025-01-22T10:45:03.584Z · comments (0)
Thoughts on Toy Models of Superposition
james__p · 2025-02-02T13:52:54.505Z · comments (0)
ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)
[link] A concise definition of what it means to win
testingthewaters · 2025-01-25T06:37:37.305Z · comments (1)
[link] How do you make a 250x better vaccine at 1/10 the cost? Develop it in India.
Abhishaike Mahajan (abhishaike-mahajan) · 2025-02-09T03:53:17.050Z · comments (5)
Will LLMs supplant the field of creative writing?
Declan Molony (declan-molony) · 2025-01-28T06:42:24.799Z · comments (14)
Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)
Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)
Moral gauge theory: A speculative suggestion for AI alignment
James Diacoumis (james-diacoumis) · 2025-02-23T11:42:31.083Z · comments (2)
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (3)
[link] Understanding AI World Models w/ Chris Canal
jacobhaimes · 2025-01-27T16:32:47.724Z · comments (0)
Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)
Cross-Layer Feature Alignment and Steering in Large Language Model
dlaptev · 2025-02-08T20:18:20.331Z · comments (0)
Allegory of the Tsunami
Evan Hu (evan-hu) · 2025-01-29T19:09:33.761Z · comments (1)
[link] Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)
MiguelDev (whitehatStoic) · 2025-02-01T19:17:32.071Z · comments (2)
[link] Demonstrating specification gaming in reasoning models
Matrice Jacobine · 2025-02-20T19:26:20.563Z · comments (0)
Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)
Response to the US Govt's Request for Information Concerning Its AI Action Plan
Davey Morse (davey-morse) · 2025-02-14T06:14:08.673Z · comments (0)
[question] How likely is an attempted coup in the United States in the next four years?
Alexander de Vries (alexander-de-vries) · 2025-02-01T13:12:04.053Z · answers+comments (2)
How are Those AI Participants Doing Anyway?
mushroomsoup · 2025-01-24T22:37:47.999Z · comments (0)
When you downvote, explain why
KvmanThinking (avery-liu) · 2025-02-07T01:03:44.097Z · comments (31)
[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)
Proposal: Safeguarding Against Jailbreaking Through Iterative Multi-Turn Testing
jacquesallen · 2025-01-31T23:00:42.665Z · comments (0)
[question] are there 2 types of alignment?
KvmanThinking (avery-liu) · 2025-01-23T00:08:20.885Z · answers+comments (9)
Detailed Ideal World Benchmark
Knight Lee (Max Lee) · 2025-01-30T02:31:39.852Z · comments (0)
Making the case for average-case AI Control
Nathaniel Mitrani (nathaniel-mitrani) · 2025-02-05T18:56:38.181Z · comments (0)
Death vs. Suffering: The Endurist-Serenist Divide on Life’s Worst Fate
Alex_Steiner · 2025-01-27T03:59:40.279Z · comments (7)
[question] hypnosis question
KvmanThinking (avery-liu) · 2025-02-06T02:41:53.314Z · answers+comments (5)
Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience
rife (edgar-muniz) · 2025-01-26T15:53:10.530Z · comments (18)
[link] AISN #47: Reasoning Models
Corin Katzke (corin-katzke) · 2025-02-06T18:52:29.843Z · comments (0)
[Translation] In the Age of AI don't Look for Unicorns
mushroomsoup · 2025-02-07T21:06:24.198Z · comments (0)
AI Safety Oversights
Davey Morse (davey-morse) · 2025-02-08T06:15:52.896Z · comments (0)
Scanless Whole Brain Emulation
Knight Lee (Max Lee) · 2025-01-27T10:00:08.036Z · comments (4)
How identical twin sisters feel about nieces vs their own daughters
Dave Lindbergh (dave-lindbergh) · 2025-02-09T17:36:25.830Z · comments (19)
Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)
"DL training == human learning" is a bad analogy
kman · 2025-02-02T20:59:21.259Z · comments (0)
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)
Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings
Ivan Dostal (#R@q0YSDZ3ov$f6J) · 2025-02-02T19:56:34.771Z · comments (1)
Sparse Autoencoder Feature Ablation for Unlearning
aludert · 2025-02-13T19:13:48.388Z · comments (0)
[link] Interviews with Moonshot AI's CEO, Yang Zhilin
Cosmia_Nebula · 2025-01-31T09:19:36.561Z · comments (0)
Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-01-30T23:25:36.135Z · comments (6)
Undesirable Conclusions and Origin Adjustment
Jerdle (daniel-amdurer) · 2025-02-19T18:35:23.732Z · comments (0)
[Translation] AI Generated Fake News is Taking Over my Family Group Chat
mushroomsoup · 2025-01-30T20:24:22.175Z · comments (0)
[link] Constitutions for ASI?
ukc10014 · 2025-01-28T16:32:39.307Z · comments (0)
Topological Data Analysis and Mechanistic Interpretability
Gunnar Carlsson (gunnar-carlsson) · 2025-02-24T19:56:02.498Z · comments (0)
← previous page (newer posts) · next page (older posts) →