LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2025-02-18T22:16:14.449Z · comments (2)
If Neuroscientists Succeed
Mordechai Rorvig (mordechai-rorvig) · 2025-02-11T15:33:09.098Z · comments (6)
What makes a theory of intelligence useful?
Cole Wyeth (Amyr) · 2025-02-20T19:22:29.725Z · comments (0)
Talking to laymen about AI development
David Steel · 2025-02-17T18:42:23.289Z · comments (0)
[link] Progress links and short notes, 2025-02-17
jasoncrawford · 2025-02-17T19:18:29.422Z · comments (0)
[link] Cooperation for AI safety must transcend geopolitical interference
Matrice Jacobine · 2025-02-16T18:18:01.539Z · comments (6)
[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (8)
THE ARCHIVE
Jason Reid (jason-reid) · 2025-02-17T01:12:41.486Z · comments (0)
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros (ana-kapros) · 2025-02-12T19:12:07.592Z · comments (0)
Sleeping Beauty: an Accuracy-based Approach
glauberdebona · 2025-02-10T15:40:29.619Z · comments (2)
What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)
Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (0)
There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)
AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)
[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)
Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (7)
Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)
[question] Should I Divest from AI?
OKlogic · 2025-02-10T03:29:33.582Z · answers+comments (4)
Make Superintelligence Loving
Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · comments (0)
[link] Teaching AI to reason: this year's most important story
Benjamin_Todd · 2025-02-13T17:40:02.869Z · comments (0)
Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)
[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)
Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)
Safe Distillation With a Powerful Untrusted AI
Alek Westover (alek-westover) · 2025-02-20T03:14:04.893Z · comments (1)
[link] Neural Scaling Laws Rooted in the Data Distribution
aribrill (Particleman) · 2025-02-20T21:22:10.306Z · comments (0)
[link] Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
Lukas Petersson (lukas-petersson-1) · 2025-02-21T15:45:00.146Z · comments (0)
Build a Metaculus Forecasting Bot in 30 Minutes: A Practical Guide
ChristianWilliams · 2025-02-22T03:52:14.753Z · comments (0)
A fable on AI x-risk
bgaesop · 2025-02-18T20:15:24.933Z · comments (0)
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (3)
Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)
ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)
[link] Demonstrating specification gaming in reasoning models
Matrice Jacobine · 2025-02-20T19:26:20.563Z · comments (0)
[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)
Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)
Response to the US Govt's Request for Information Concerning Its AI Action Plan
Davey Morse (davey-morse) · 2025-02-14T06:14:08.673Z · comments (0)
Inefficiencies in Pharmaceutical Research Practices
ErioirE (erioire) · 2025-02-22T04:43:09.147Z · comments (0)
Undesirable Conclusions and Origin Adjustment
Jerdle (daniel-amdurer) · 2025-02-19T18:35:23.732Z · comments (0)
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)
Sparse Autoencoder Feature Ablation for Unlearning
aludert · 2025-02-13T19:13:48.388Z · comments (0)
LW/ACX social meetup
Stefan (stefan-1) · 2025-02-10T21:12:39.092Z · comments (0)
Artificial Static Place Intelligence: Guaranteed Alignment
ank · 2025-02-15T11:08:50.226Z · comments (2)
Intrinsic Dimension of Prompts in LLMs
Karthik Viswanathan (vkarthik095) · 2025-02-14T19:02:49.464Z · comments (0)
Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester · 2025-02-11T03:04:58.852Z · comments (0)
arch-anarchist reading list
Peter lawless · 2025-02-16T22:47:00.273Z · comments (1)
[link] Probability of AI-Caused Disaster
Alvin Ånestrand (alvin-anestrand) · 2025-02-12T19:40:11.121Z · comments (2)
Fun, endless art debates v. morally charged art debates that are intrinsically endless
danielechlin · 2025-02-21T04:44:22.712Z · comments (0)
[link] New LLM Scaling Law
wrmedford · 2025-02-19T20:21:17.475Z · comments (0)
[question] Why do we have the NATO logo?
KvmanThinking (avery-liu) · 2025-02-19T22:59:41.755Z · answers+comments (4)
Quantifying the Qualitative: Towards a Bayesian Approach to Personal Insight
Pruthvi Kumar (pruthvi-kumar) · 2025-02-15T19:50:42.550Z · comments (0)
← previous page (newer posts) · next page (older posts) →