LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (5)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (4)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (14)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (4)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (7)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (1)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (0)

[link] My article in The Nation — California’s AI Safety Bill Is a Mask-Off Moment for the Industry
garrison · 2024-08-15T19:25:59.592Z · comments (0)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (10)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steve2152 on [Intuitive self-models] 8. Rooting Out Free Will Intuitions

Thanks!

But I've heard that many people do a lot of thinking about negative outcomes, too.…

FWIW my answer is “involuntary attention” as discussed in Valence §3.3.5 [LW · GW] (it also came up in §6.5.2.1 of this series [LW · GW]).

If I look at my shoe and (voluntarily) pay attention to it, my subsequent thoughts are constrained to be somehow “about” my shoe. This constraint isn’t fully constraining—I might be putting my shoe into different contexts, or thinking about my shoe while humming a song to myself, etc.

By analogy, if I’m anxious, then my subsequent thoughts are (involuntarily) constrained to be somehow “about” the interoceptive feeling of anxiety. Again, this constraint isn’t fully constraining—I might be putting the feeling of anxiety into the context of how everyone hates me, or into the context of how my health is going downhill, or whatever else, and I could be doing both those things while simultaneously zipping up my coat and humming a song, etc.

Anxiety is just one example; I think there’s likewise involuntary attention associated with feeling itchy, feeling in pain, angry, etc.

l-rudolf-l on Survival without dignity

Good list!

I personally really like Scott Alexander's Presidential Platform, it hits the hilarious-but-also-almost-works spot so perfectly. He also has many Bay Area house party stories in addition to the one you link (you can find a bunch (all?) linked at the top of this post). He also has this one from a long time ago, which has one of the best punchlines I've read.

l-rudolf-l on Survival without dignity

Thanks for advertising my work, but alas, I think that's much more depressing than this one.

Could make for a good Barbie <> Oppenheimer combo though?

l-rudolf-l on Survival without dignity

Agreed! Transformative AI is hard to visualise, and concrete stories / scenarios feel very lacking (in both disasters and positive visions, but especially in positive visions [LW · GW]).

I like when people try to do this - for example, Richard Ngo has a bunch here [? · GW], and Daniel Kokotajlo has his near-prophetic scenario here [LW · GW]. I've previously tried to do it here [LW · GW] (going out with a whimper leading to Bostrom's "disneyland without children" is one of the most poetic disasters imaginable - great setting for a story), and have a bunch more ideas I hope to get to.

But overall: the LessWrong bubble has a high emphasis on radical AI futures, and an enormous amount of fiction in its canon (HPMOR, Unsong, Planecrash). I keep being surprised that so few people combine those things.

markusrobam on Explore More: A Bag of Tricks to Keep Your Life on the Rails

I'm curious about the part where You wrote: "You could raise awareness for Leukemia, Dyslexia, or Estonia."
Estonia is a country. Leukemia and Dyslexia are not countries. Was it a typo? Or did you actually want to raise awareness about Estonia?

(I'm from Estonia myself)

Nice article though, thanks!

l-rudolf-l on Survival without dignity

I did not actually consider this, but that is a very reasonable interpretation!

(I vaguely remember reading some description of explicitly flat-out anthropic immortality saving the day, but I can't seem to find it again now)

steve2152 on Inner Alignment in Salt-Starved Rats

No I don’t recommend reading this post anymore, it has some ideas with little kernels of truth but also lots of errors and confusions. ¯\_(ツ)_/¯

alfred-harwood on Abstractions are not Natural

Thanks for taking the time to explain this. This is a clears a lot of things up.

Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?

Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?

jim-buhler on Winning isn't enough

Without an objective standard of “winning” to turn to, this leaves us searching for new principles that could guide us in the face of indeterminacy. But that’s all for another post.

First time ever I am left hanging by a LW post. Genuinely.

towards_keeperhood on Inner Alignment in Salt-Starved Rats

Hi Steve, I didn't read this post yet and just wanted to ask whether it's still worth reading or whether everything relevant is now better in "incentive learning and dead sea salt experiment"?