LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (8)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sharmake-farah on quila's Shortform

I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.

benquo on Parkinson's Law and the Ideology of Statistics

Wow, thanks for doing the legwork on this - seems like quite possibly I'm analyzing fiction? Annoying if true.

nathan-helm-burger on Rolling Thresholds for AGI Scaling Regulation

Sigh. Ok. I'm giving an upvote for good-faith effort to think this through and come up with a plan, but I just disagree with your world-model and its projections about training costs and associated danger levels so strongly that it seems hard to figure out how to even begin a discussion.

I'll just leave a link here [LW(p) · GW(p)] to a different comment talking about the same problem.

quila on quila's Shortform

but it still says "it's easy for others to get their own superintelligences with different values", with 'superintelligence' referring to the 'superhuman' AI of 2035?

still confused about this btw. in my second reply to you i wrote:

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"?)

and you did not say you were, but it looks like you are here?

quila on quila's Shortform

far too many people tend to deny that you do in fact have to make other values lose out

i don't know where that might be true, but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.

also on the "lose out" phrasing: even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

raemon on Raemon's Shortform

My Current Metacognitive Engine

Someday I might work this into a nicer top-level post, but for now, here's the summary of the cognitive habits I try to maintain (and mostly succeed at maintaining). Some of these are simple TAPs, some of them are more like mindsets.

Twice a day, asking “what is the most important thing I could be working on and why aren’t I on track to deal with it?”
- you probably want a more specific question (“important thing” is too vague). Three example specific questions (but, don’t be a slave to any specific operationalization)
  - what is the most important uncertainty I could be reducing, and how can I reduce it fastest?
  - what’s the most important resource bottleneck I can gain, or contribute to the ecosystem, and would gain me that resource the fastest?
  - what’s the most important goal I’m backchaining from?
Have a mechanism to iterate on your habits that you use every day, and frequently update in response to new information
- for me, this is daily prompts and weekly prompts, which are:
  - optimized for being the efficient metacognition I obviously want to do each day
  - include one skill that I want to level up in, that I can do in the morning as part of the meta-orienting (such as operationalizing predictions, or “think it faster”, or whatever specific thing I want to learn to attend to or execute better right now)
The five requirements each fortnight:
- be backchaining
  - from the most important goals
- be forward chaining
  - through tractable things that compound
- ship something
  - to users every fortnight
- be wholesome
  - (that is, do not minmax in a way that will predictably fail later)
- spend 10% on meta (more if you’re Ray in particular but not during working hours. During working hours on workdays, meta should pay for itself within a week)
Correlates:
- have a clear, written model of what you’re backchaining from
- have a clear, written model of how you’re compounding
The general problem solving approach:
- breadth first
- identify cruxes
- connect inner-sim to cruxes / predictions
- follow your heart
- see how your predictions went
Random ass skills
- napping
- managing working memory, innovating and applying on working memory tools
- grieving
- Generalizing

Skill I’m working on that hasn’t paid off yet but I think you should try anyway:

At least once a day or so, when you notice a mistake or surprise, spent a couple minutes asking “how could I have thought that faster” (and periodically do deeper dives)
each day/week, figure out what you’re confused or predictably going to tackle in a dumb way, and think in advance about how to be smart about it the first time

benquo on Preference Inversion

I want to note something about how your position seems to have evolved through this discussion. Initially, you argued that societal pressure often reflects genuine wisdom, using examples where a 'society who aggressively shames overconsumption of sweets' might be wiser than a child's raw preferences. You suggested that what I was calling 'intrinsic preferences' might just be 'shallow preferences' that hadn't yet been trained to reflect reality.

Now you're making a different and more sophisticated argument - that the whole framework of 'intrinsic' versus 'external' preferences is problematic because preferences necessarily develop within and respond to reality, including social reality. While this is an interesting perspective that deserves consideration, it seems to contradict rather than support your initial defense of social restrictions as transmitting wisdom.

There's also an important point about my own position that I should clarify. When I said 'generally, upon reflection, people would prefer to satisfy their and others' preferences as calculated prior to such influences,' I wasn't making a claim about how often admonitions reflect preference inversions. Rather, I was suggesting that if people were to reflect explicitly on cases of preference inversion, they typically wouldn't want those inverted preferences to count; they would recognize these as preferences shaped by forces systematically opposed to their interests.

This connects to what I see as the core distinction: I'm not just talking about external influences or errors in the transmission of wisdom. I'm specifically pointing to cases where restrictions are moralized for the purpose of restriction itself - where the system is systematically deprecating the evolutionarily fit preferences of the person being restricted. This isn't just clumsy teaching or social pressure - it's adversarial. The system works by first making people feel guilty about their natural inclinations, then betting that they won't fully succeed at suppressing those inclinations despite earnestly trying to adopt the system's restrictions.

Consider the survival of variants of Christianity that 'do poorly' at helping people develop healthy attitudes toward sexuality. Their persistence suggests this poor performance is actually functional - they are able to exploit their members precisely because they create a system where most people must be 'bad' by design, where hypocrisy isn't a bug but a feature. When dessert companies can successfully market their products as 'sinfully delicious,' they're exploiting a system of moral restrictions that creates the very compulsive relationship to sweets it claims to prevent.

ektimo on ektimo's Shortform

Prompt: write a micro play that is both disturbing and comforting
--

Title: "The Silly Child"

Scene: A mother is putting to bed her six-year-old child

CHILD: Mommy, how many universes are there?

MOTHER: As many as are possible.

CHILD (smiling): Can we make another one?

MOTHER (smiling): Sure. And while we're at it, let's delete the number 374? I've never liked that one.

CHILD (excited): Oh! And let's make a new Fischer-Griess group element too! Can we do that Mommy?

MOTHER (bops nose) That's enough stalling. You need to get your sleep. Sweet dreams, little one. (kisses forehead)

End

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.

While it is interesting at first glance, some caveats are called for here.

One, Einstein's achievements were sort of overrated, see these comments for details:

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#6HPjxMvTnP9JeibXZ [LW(p) · GW(p)]

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#icmCewLmXnxgtmANP [LW(p) · GW(p)]

Two, the EPR paradox is resolvable in modern physics by allowing non-locality in entanglement, but having a no-communication theorem that prevents exploiting it to break special relativity.

jbash on In Defense of a Butlerian Jihad

Societies aren't the issue; they're mindless aggregates that don't experience anything and don't actually even have desires in anything like the way a human, or or even an animal or an AI, has desires. Individuals are the issue. Do individuals get to choose which of these societies they live in?