LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Towards Understanding the Representation of Belief State Geometry in Transformers
Karthik Viswanathan (vkarthik095) · 2025-04-18T12:39:01.251Z · comments (0)
[link] The road from human-level to superintelligent AI may be short
Vishakha (vishakha-agrawal) · 2025-04-16T08:35:54.376Z · comments (0)
Self propagating story.
Canaletto (weightt-an) · 2025-04-12T12:32:21.312Z · comments (0)
Луна Лавгуд и Комната Тайн, Часть 4
Kongo Landwalker (kongo-landwalker) · 2025-04-13T20:55:03.281Z · comments (0)
Intro to Multi-Agent Safety
james__p · 2025-04-13T17:40:41.128Z · comments (0)
8 PRIME SKILLS – A construction from MaxEnt Informational Efficiency in 4 questions
P. João (gabriel-brito) · 2025-04-16T16:53:51.351Z · comments (0)
[link] AI may attain human level soon
Vishakha (vishakha-agrawal) · 2025-04-16T08:28:55.592Z · comments (0)
[link] How worker co-ops can help restore social trust
B Jacobs (Bob Jacobs) · 2025-04-17T14:14:47.165Z · comments (5)
[link] Doing Prioritization Better
arvomm (arvo-munoz) · 2025-04-16T18:46:41.797Z · comments (1)
The Era of the Dividual—are we falling apart?
James Stephen Brown (james-brown) · 2025-04-12T22:35:56.593Z · comments (2)
ACX Spring Meetup 2025 @ Klang Valley, Malaysia
Yi-Yang (yiyang) · 2025-04-12T07:31:16.434Z · comments (0)
Sam Altman's sister claims Sam sexually abused her -- Part 7: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:43:28.897Z · comments (0)
Opportunity to to learn more about AI Innovation & Security Policy
PolicyTakes · 2025-04-16T01:35:27.203Z · comments (0)
[question] Is Local Order a Clue to Universal Entropy? How a Failed Professor Searches for a 'Sacred Motivational Order'
P. João (gabriel-brito) · 2025-04-12T13:39:55.857Z · answers+comments (2)
Evaluating Collaborative AI Performance Subject to Sabotage
Matthew Khoriaty (matthew-khoriaty) · 2025-04-18T19:33:41.547Z · comments (0)
AI Control Methods Literature Review
Ram Potham (ram-potham) · 2025-04-18T21:15:34.682Z · comments (0)
Religious Persistence: A Missing Primitive for Robust Alignment
lauriewired · 2025-04-14T22:03:45.868Z · comments (3)
Could LLMs Learn to Detect Bias Autonomously, Like Tesla’s Self-Driving Cars?
Omnipheasant · 2025-04-18T18:45:36.242Z · comments (0)
Lightning Talks!
nathandunkerley · 2025-04-14T20:39:17.593Z · comments (0)
Hierarchical Cognitive Anchoring: A Sketch Toward Scalable Structural Alignment
sparckix · 2025-04-18T19:03:51.115Z · comments (0)
Alignment Does Not Need to Be Opaque! An Introduction to Feature Steering with Reinforcement Learning
Jeremias Ferrao (jeremias-ferrao) · 2025-04-18T19:34:49.357Z · comments (0)
Consequentialists should have a comprehensive set of deontological beliefs they adhere to
Jay95 · 2025-04-18T20:50:27.064Z · comments (2)
Correcting Deceptive Alignment using a Deontological Approach
JeaniceK · 2025-04-14T22:07:57.860Z · comments (0)
Memory Decoding Journal Club
Devin Ward (Carboncopies Foundation) · 2025-04-17T16:19:25.992Z · comments (0)
Automating Mechanistic Interpretability via Program Synthesis
Edy Nastase (edy-nastase) · 2025-04-17T10:58:46.748Z · comments (1)
The King's Gift: How Institutions Rebrand Responsibility into Illusion
Hu Yichao (hu-yichao) · 2025-04-12T19:38:37.979Z · comments (0)
The Practical Imperative for AI Control Research
Archana Vaidheeswaran (archana-vaidheeswaran-1) · 2025-04-16T20:27:32.319Z · comments (0)
[link] Bias Mitigation in Language Models by Steering Features
akankshanc · 2025-04-12T00:10:16.878Z · comments (0)
An artistic illustration of Scalable Oversight - "A world apart, neither gods nor mortals"
Marius Adrian Nicoară · 2025-04-16T12:41:44.874Z · comments (0)
Sam Altman's sister claims Sam sexually abused her -- Part 4: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-13T23:41:55.411Z · comments (0)
Sam Altman's sister claims Sam sexually abused her -- Part 5: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T01:00:07.084Z · comments (0)
Applications Open for Impact Accelerator Program for Experienced Professionals
Clark Wisenbaker (accounts-hip) · 2025-04-14T16:27:32.340Z · comments (0)
Measuring Beliefs of Language Models During Chain-of-Thought Reasoning
Baram Sosis (baram-sosis) · 2025-04-18T22:56:28.727Z · comments (0)
LLM-based Fact Checking for Popular Posts?
azergante · 2025-04-18T21:26:25.230Z · comments (0)
[link] find_purpose.exe
heatdeathandtaxes · 2025-04-12T19:31:38.951Z · comments (0)
[link] The Cynic Wasps in the Beehive
mempko · 2025-04-12T19:30:44.227Z · comments (0)
What If Galaxies Are Alive and Atoms Have Minds? A Thought Experiment on Life Across Scales
Saif Khan (saif-khan) · 2025-04-18T10:01:18.783Z · comments (4)
A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives
Knight Lee (Max Lee) · 2025-04-14T10:27:24.903Z · comments (2)
Why Does It Feel Like Something? An Evolutionary Path to Subjectivity
gmax (maxim-gurevich) · 2025-04-15T08:38:50.637Z · comments (10)
8 PRIME SKILLS An analisis
P. João (gabriel-brito) · 2025-04-17T11:36:54.678Z · comments (0)
[question] How far are Western welfare states from coddling the population into becoming useless?
StanislavKrym · 2025-04-13T17:08:01.834Z · answers+comments (5)
← previous page (newer posts) · next page (older posts) →