LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (35)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

Related Discussion from Thomas Kwa's MIRI Research Experience
Raemon · 2023-10-07T06:25:00.994Z · comments (140)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

Linking Alt Accounts
jefftk (jkaufman) · 2023-10-06T17:00:09.802Z · comments (33)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (68)

Eliezer's example on Bayesian statistics is wr... oops!
Zane · 2023-10-17T18:38:18.327Z · comments (13)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (25)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

[link] Paper: Understanding and Controlling a Maze-Solving Policy Network
TurnTrout · 2023-10-13T01:38:09.147Z · comments (0)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (4)

Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)

Game Theory without Argmax [Part 1]
Cleo Nardo (strawberry calm) · 2023-11-11T15:59:47.486Z · comments (18)

Flagging Potentially Unfair Parenting
jefftk (jkaufman) · 2023-12-26T12:40:05.099Z · comments (1)

[link] InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak (tomas-gavenciak) · 2024-01-22T18:23:35.661Z · comments (0)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (14)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

[link] We're all in this together
Tamsin Leake (carado-1) · 2023-12-05T13:57:46.270Z · comments (65)

Text Posts from the Kids Group: 2020
jefftk (jkaufman) · 2024-04-13T22:30:05.326Z · comments (3)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

[link] Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw · 2024-06-05T03:35:19.251Z · comments (30)

High-level interpretability: detecting an AI's objectives
Paul Colognese (paul-colognese) · 2023-09-28T19:30:16.753Z · comments (4)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (15)

Finding Sparse Linear Connections between Features in LLMs
Logan Riggs (elriggs) · 2023-12-09T02:27:42.456Z · comments (5)

Shard Theory - is it true for humans?
Rishika (rishika-bose) · 2024-06-14T19:21:47.997Z · comments (7)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (1)

How useful is "AI Control" as a framing on AI X-Risk?
habryka (habryka4) · 2024-03-14T18:06:30.459Z · comments (4)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Meetup Tip: Heartbeat Messages
Screwtape · 2023-12-07T17:18:33.582Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

milan-w on Personal AI Planning

You're right. Space is big.

milan-w on Alexander Gietelink Oldenziel's Shortform

The CSIS wargamed a 2026 Chinese invasion of Taiwan, and found outcomes ranging from mixed to unfavorable for China (CSIS report). If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability. Personally, I think Metaculus overestimates the likelihood of an invasion, and is about right about blockades.

jatin-nainani on SAEs are highly dataset dependent: a case study on the refusal direction

Makes sense! Thanks! In that case, we can potentially reduce the width, which might (along with a smaller dataset) help scale saes to understanding mechanisms in big models?

adam-jermyn on Personal AI Planning

Unless we build more land (either in the ocean or in space)?

silentbob on Markets Are Information - Beating the Sportsbooks at Their Own Game

An argument against may be that for some people there's probably a risk of getting addicted / losing control. I'm not familiar with to what degree it's possible to predict such tendencies in advance, but for some people that risk may well outweigh any benefits of arbitrate opportunities or improvements to their calibration.

milan-w on Alexander Gietelink Oldenziel's Shortform

Come to think of it, I don't think most compute-based AI timelines models (e.g. EPOCH's) incorporate geopolitical factors such as a possible Taiwan crisis. I'm not even sure whether they should. So keep this in mind while consuming timelines forecasts I guess?

milan-w on AI Is Shockingly Kind

I'd rather say that RLHF+'ed chatbots are upon-reflection-not-so-shockingly sycophantic, since they have been trained to satisfy their conversational partner.

milan-w on Personal AI Planning

Assuming private property as currently legally defined is respected in a transition to a good post-TAI world, I think land (especially in areas with good post-TAI industrial potential) is a pretty good investment. It's the only thing that will keep on being just as scarce. You do have to assume the risk of our future AI(-enabled?) (overlords?) being Georgists, though.

oxidize on Oxidize's Shortform

The post is targeted towards the subset of the EA/LW community that is concerned about AI extinction

Ultimately, I think I had a misunderstanding of the audience that would end up reading my post, and I'm still largely ignorant of the psychological nuances of the average LW reader.

Like you implied, I did have a narrow audience in mind, and I assumed that LW's algorithim would function more like popular social media algorithms and only show the post to the subset of the population I was aiming to speak to. I also made the assumption that implications of my post would create the motivation for readers to gain further context through reading links/footnotes for information gaps between me and readers, which seems to be wrong.

For my first sentence where I make the assumption that we subconciously optimize for

I make a few assumptions.

1. The readers definition of subconscious is the same as my definition. Which is a large leap considering my definition was created through a lot of introspection & self-examination through personal experiences.
2. I assume that if we have the same definition of "subconcious", my claim is self-explanatory.

Ex 1. When a cancer patient is told they have 5 years left to live, they are heartbroken and their entire lives change.

Ex 2. When an AI researcher predicts 5 years left until the advent of AGI/ASI and the subsequent extinction of the human race, no one takes this seriously.

CstineSublime, if I were to redo this post, do you think I should explain this claim using an example like this instead of making the assumption that readers will automatically make this connection in their mind?

P.S the footnotes were meant to be definitions similar to what you see when you hover over a complex word on Wikipedia. Let me know if that wasn't clear.

shankar-sivarajan on Could we use current AI methods to understand dolphins?

If you could do whole-brain emulation for dolphins, you should be able to generate enough data for unsupervised learning that way.