LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Thoughts about Policy Ecosystems: The Missing Links in AI Governance
Echo Huang (echo-huang) · 2025-02-01T01:54:54.333Z · comments (0)
Re: Taste
lsusr · 2025-02-01T03:34:10.918Z · comments (8)
2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (1)
Can 7B-8B LLMs judge their own homework?
dereshev · 2025-02-01T08:29:32.639Z · comments (0)
One-dimensional vs multi-dimensional features in interpretability
charlieoneill (kingchucky211) · 2025-02-01T09:10:01.112Z · comments (0)
Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T13:04:12.774Z · comments (2)
[question] How likely is an attempted coup in the United States in the next four years?
Alexander de Vries (alexander-de-vries) · 2025-02-01T13:12:04.053Z · answers+comments (2)
Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T14:09:44.673Z · comments (0)
[link] Poetic Methods I: Meter as Communication Protocol
adamShimi · 2025-02-01T18:22:39.676Z · comments (0)
[link] Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)
MiguelDev (whitehatStoic) · 2025-02-01T19:17:32.071Z · comments (2)
Post AGI effect prediction
Juliezhanggg · 2025-02-01T21:16:36.829Z · comments (0)
Towards a Science of Evals for Sycophancy
andrejfsantos · 2025-02-01T21:17:15.406Z · comments (0)
Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model
Saketh Baddam (saketh-baddam) · 2025-02-01T21:26:58.171Z · comments (2)
Exploring the coherence of features explanations in the GemmaScope
Mattia Proietti (mattia-proietti) · 2025-02-01T21:28:33.690Z · comments (0)
Retroactive If-Then Commitments
MichaelDickens · 2025-02-01T22:22:43.031Z · comments (0)
[link] Rationalist Movie Reviews
Nicholas / Heather Kross (NicholasKross) · 2025-02-01T23:10:53.184Z · comments (2)
Interpreting autonomous driving agents with attention based architecture
Manav Dahra (manav-dahra) · 2025-02-01T23:20:27.162Z · comments (0)
Falsehoods you might believe about people who are at a rationalist meetup
Screwtape · 2025-02-01T23:32:50.398Z · comments (12)
AI acceleration, DeepSeek, moral philosophy
Josh H (joshua-haas) · 2025-02-02T00:08:11.593Z · comments (0)
Seasonal Patterns in BIDA's Attendance
jefftk (jkaufman) · 2025-02-02T02:40:03.768Z · comments (0)
Chinese room AI to survive the inescapable end of compute governance
rotatingpaguro · 2025-02-02T02:42:03.627Z · comments (0)
[question] Would anyone be interested in pursuing the Virtue of Scholarship with me?
japancolorado (russell-white) · 2025-02-02T04:02:27.116Z · answers+comments (2)
ChatGPT: Exploring the Digital Wilderness, Findings and Prospects
Bill Benzon (bill-benzon) · 2025-02-02T09:54:26.008Z · comments (0)
Escape from Alderaan I
lsusr · 2025-02-02T10:48:06.533Z · comments (2)
Thoughts on Toy Models of Superposition
james__p · 2025-02-02T13:52:54.505Z · comments (2)
Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit · 2025-02-02T14:47:53.404Z · comments (36)
The Simplest Good
Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · comments (6)
Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings
Ivan Dostal (#R@q0YSDZ3ov$f6J) · 2025-02-02T19:56:34.771Z · comments (1)
Conditional Importance in Toy Models of Superposition
james__p · 2025-02-02T20:35:38.655Z · comments (4)
"DL training == human learning" is a bad analogy
kman · 2025-02-02T20:59:21.259Z · comments (0)
An Introduction to Evidential Decision Theory
Babić · 2025-02-02T21:27:35.684Z · comments (2)
Exploring how OthelloGPT computes its world model
JMaar (jim-maar) · 2025-02-02T21:29:09.433Z · comments (0)
Humanity Has A Possible 99.98% Chance Of Extinction
st3rlxx · 2025-02-02T21:46:49.620Z · comments (1)
Some Theses on Motivational and Directional Feedback
abstractapplic · 2025-02-02T22:50:04.270Z · comments (3)
Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)
[link] Keeping Capital is the Challenge
LTM · 2025-02-03T02:04:27.142Z · comments (2)
[link] Language Models and World Models, a Philosophy
kyjohnso · 2025-02-03T02:55:36.577Z · comments (0)
Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)
[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (5)
Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)
[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)
o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)
The Outer Levels
Jerdle (daniel-amdurer) · 2025-02-03T14:30:29.230Z · comments (3)
Stopping unaligned LLMs is easy!
Yair Halberstadt (yair-halberstadt) · 2025-02-03T15:38:27.083Z · comments (11)
The Self-Reference Trap in Mathematics
Alister Munday (alister-munday) · 2025-02-03T16:12:21.392Z · comments (23)
Gettier Cases [repost]
Antigone (luke-st-clair) · 2025-02-03T18:12:22.253Z · comments (4)
Superintelligence Alignment Proposal
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (3)
Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales (mysticdeepai) · 2025-02-03T19:30:52.505Z · comments (0)
Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)
The Overlap Paradigm: Rethinking Data's Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii (aligning_bias) · 2025-02-03T19:31:55.282Z · comments (0)
next page (older posts) →