LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (11)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

The "context window" analogy for human minds
Ruby · 2024-02-13T19:29:10.387Z · comments (0)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (4)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-01-02T18:15:54.168Z · comments (0)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

[link] ∀: a story
Richard_Ngo (ricraz) · 2023-12-17T22:42:32.857Z · comments (1)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mr-hire on You are not too "irrational" to know your preferences.

Do you think wants that arise from conscious thought processes are equally valid to wants that arise from feelings? How do you think about that?

algon on You are not too "irrational" to know your preferences.

Reminds me of "Self-Integrity and the Drowning Child [LW · GW]" which talks about another kind of way that people in EA/rat communities are liable to hammer down parts of themselves.

viliam on Making a conservative case for alignment

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".

We will probably disagree on this, but the only reason I care about trans issues is that some people report significant suffering (gender dysphoria) from their current situation, and I am in favor of people not suffering, so I generally try not to be an asshole.

Unfortunately, for every person who suffers from something, there are probably dozen people out there who cosplay their condition... because it makes them popular on Twitter I guess, or just gives them another opportunity to annoy their neighbors. I have no empathy for those. Play your silly games, if you wish, but don't expect me to play along, and definitely don't threaten me to play along. Also, the cosplayers often make the situation more difficult for those who genuinely have the condition, by speaking in their name, and often saying things that the people who actually have the condition would disagree with... and in the most ironic cases, the cosplayers get them cancelled. So I don't mind being an asshole to the cosplayers, because from my perspective, they started it first.

The word "deadnaming" is itself hysterical. (Who died? No one.)

Gender essentialism? I don't make any metaphysical claim about essences. People simply are born with male or female bodies (yes, I know that some are intersex), and some people are strongly unhappy about their state. I find it plausible that there may be an underlying biological reason for that; and hormones seem like a likely candidate, because that's how body communicates many things. I don't have a strong opinion on that, because I have never felt a desire to be one sex or the other, just like I have never felt a strong desire to have a certain color of eyes, or hair, or skin, whether it would be the one I have or some that I have not.

I expect that you will disagree with a lot of this, and that's okay; I am not trying to convince you, just explaining my position.

richard_kennaway on You are not too "irrational" to know your preferences.

The key is recognizing that the preference itself is completely independent from rationality or intelligence.

The orthogonality thesis is also for human beings.

daystareld on You are not too "irrational" to know your preferences.

I don't see how your question contradicts my statement, nor that link. People absolutely develop in their desires over time, and can change them, but that is not the same as being able to decide, in the moment, that you do not like the taste of pizza if your tongue is having the sensory experience of enjoying it.

ivan-vendrov on Passages I Highlighted in The Letters of J.R.R.Tolkien

Feels connected to his distrust of "quick, bright, standardized, mental processes", and the obsession with language. It's like his mind is relentlessly orienting to the territory, refusing to accept anyone else's map. Which makes it harder to be a student but easier to discover something new. Reminds me of Geoff Hinton's advice to not read the literature before engaging with the problem yourself.

mr-hire on Counting AGIs

while this paradigm of 'training a model that's an agi, and then running it at inference' is one way we get to transformative agi, i find myself thinking that probably WON'T be the first transformative AI, because my guess is that there are lots of tricks using lots of compute at inference to get not quite transformative ai to transformative ai.

my guess is that getting to that transformative level is gonna require ALL the tricks and compute, and will therefore eek out being transformative BY utilizing all those resources.

one of those tricks may be running millions of copies of the thing in an agentic swarm, but i would expect that to be merely a form of inference time scaling, and therefore wouldn't expect ONE of those things to be transformative AGI on it's own.

and i doubt that these tricks can funge against train time compute, as you seem to be assuming in your analysis. my guess is that you hit diminishing returns for various types of train compute, then diminishing returns for various types of inference compute, and that we'll get to a point where we need to push both of them to that point to get tranformative ai

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

testing markdown editor

Here's the content restructured as a numbered list with inline links:

Logical Induction and Embedded Agency - Explore Scott Garrabrant and Abram Demski's cornerstone work including the Embedded Agents sequence [LW · GW] and research on Logical Induction [LW · GW].
Learning-Theoretic Approach - Study Vanessa Kosoy's agenda for creating a foundational mathematical theory of agency, detailed in her interview on theoretical research [LW · GW].
Selection Theorems - Explore John Wentworth's research agenda on Natural Abstraction Hypothesis and system modularity through his research overview.
Decision Theory - Read about idealized decision theory and reflective oracles in agent foundations work [LW · GW].
Cartesian Frames - Study Scott Garrabrant's mathematical framework for agent-environment distinctions in the embedded agency guide.
Additional Research Areas - Discover various technical alignment approaches through the comprehensive overview [AF · GW] of the field.

rai on Thoughts on seed oil

It was gluten.

david-gross on You are not too "irrational" to know your preferences.

Wants are emergent, complex forms of pain and pleasure. They are either felt or they are not felt, and reason only comes in at the stage of deciding what to do about them.

Are you really certain that one's desires are just givens that one has no rational influence over? I'm skeptical.

https://www.lesswrong.com/posts/aQQ69PijQR2Z64m2z/notes-on-temperance#Can_we_shape_our_desires_