LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (2)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

Highlights from Lex Fridman’s interview of Yann LeCun
[deleted] · 2024-03-13T20:58:13.052Z · comments (15)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quila on quila's Shortform

but it still says "it's easy for others to get their own superintelligences with different values", with 'superintelligence' referring to the 'superhuman' AI of 2035?

still confused about this btw. in my second reply to you i wrote:

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"?)

and you did not say you were

quila on quila's Shortform

far too many people tend to deny that you do in fact have to make other values lose out

i don't know about the beliefs of people in general (probably most don't think about this), but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people think hells are bad.

also note that even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

raemon on Raemon's Shortform

My Current Metacognitive Engine

Someday I might work this into a nicer top-level post, but for now, here's the summary of the cognitive habits I try to maintain (and mostly succeed at maintaining). Some of these are simple TAPs, some of them are more like mindsets.

Twice a day, asking “what is the most important thing I could be working on and why aren’t I on track to deal with it?”
- you probably want a more specific question (“important thing” is too vague). Three example specific questions (but, don’t be a slave to any specific operationalization)
  - what is the most important uncertainty I could be reducing, and how can I reduce it fastest?
  - what’s the most important resource bottleneck I can gain, or contribute to the ecosystem, and would gain me that resource the fastest?
  - what’s the most important goal I’m backchaining from?
Have a mechanism to iterate on your habits that you use every day, and frequently update in response to new information
- for me, this is daily prompts and weekly prompts, which are:
  - optimized for being the efficient metacognition I obviously want to do each day
  - include one skill that I want to level up in, that I can do in the morning as part of the meta-orienting (such as operationalizing predictions, or “think it faster”, or whatever specific thing I want to learn to attend to or execute better right now)
The five requirements each fortnight:
- be backchaining
  - from the most important goals
- be forward chaining
  - through tractable things that compound
- ship something
  - to users every fortnight
- be wholesome
  - (that is, do not minmax in a way that will predictably fail later)
- spend 10% on meta (more if you’re Ray in particular but not during working hours. During working hours on workdays, meta should pay for itself within a week)
Correlates:
- have a clear, written model of what you’re backchaining from
- have a clear, written model of how you’re compounding
The general problem solving approach:
- breadth first
- identify cruxes
- connect inner-sim to cruxes / predictions
- follow your heart
- see how your predictions went
Random ass skills
- napping
- managing working memory, innovating and applying on working memory tools
- grieving
- Generalizing

Skill I’m working on that hasn’t paid off yet but I think you should try anyway:

At least once a day or so, when you notice a mistake or surprise, spent a couple minutes asking “how could I have thought that faster” (and periodically do deeper dives)
each day/week, figure out what you’re confused or predictably going to tackle in a dumb way, and think in advance about how to be smart about it the first time

benquo on Preference Inversion

I want to note something about how your position seems to have evolved through this discussion. Initially, you argued that societal pressure often reflects genuine wisdom, using examples where a 'society who aggressively shames overconsumption of sweets' might be wiser than a child's raw preferences. You suggested that what I was calling 'intrinsic preferences' might just be 'shallow preferences' that hadn't yet been trained to reflect reality.

Now you're making a different and more sophisticated argument - that the whole framework of 'intrinsic' versus 'external' preferences is problematic because preferences necessarily develop within and respond to reality, including social reality. While this is an interesting perspective that deserves consideration, it seems to contradict rather than support your initial defense of social restrictions as transmitting wisdom.

There's also an important point about my own position that I should clarify. When I said 'generally, upon reflection, people would prefer to satisfy their and others' preferences as calculated prior to such influences,' I wasn't making a claim about how often admonitions reflect preference inversions. Rather, I was suggesting that if people were to reflect explicitly on cases of preference inversion, they typically wouldn't want those inverted preferences to count; they would recognize these as preferences shaped by forces systematically opposed to their interests.

This connects to what I see as the core distinction: I'm not just talking about external influences or errors in the transmission of wisdom. I'm specifically pointing to cases where restrictions are moralized for the purpose of restriction itself - where the system is systematically deprecating the evolutionarily fit preferences of the person being restricted. This isn't just clumsy teaching or social pressure - it's adversarial. The system works by first making people feel guilty about their natural inclinations, then betting that they won't fully succeed at suppressing those inclinations despite earnestly trying to adopt the system's restrictions.

Consider the survival of variants of Christianity that 'do poorly' at helping people develop healthy attitudes toward sexuality. Their persistence suggests this poor performance is actually functional - they are able to exploit their members precisely because they create a system where most people must be 'bad' by design, where hypocrisy isn't a bug but a feature. When dessert companies can successfully market their products as 'sinfully delicious,' they're exploiting a system of moral restrictions that creates the very compulsive relationship to sweets it claims to prevent.

ektimo on ektimo's Shortform

Prompt: write a micro play that is both disturbing and comforting
--

Title: "The Silly Child"

Scene: A mother is putting to bed her six-year-old child

CHILD: Mommy, how many universes are there?

MOTHER: As many as are possible.

CHILD (smiling): Can we make another one?

MOTHER (smiling): Sure. And while we're at it, let's delete the number 374? I've never liked that one.

CHILD (excited): Oh! And let's make a new Fischer-Griess group element too! Can we do that Mommy?

MOTHER (bops nose) That's enough stalling. You need to get your sleep. Sweet dreams, little one. (kisses forehead)

End

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.

While it is interesting at first glance, some caveats are called for here.

One, Einstein's achievements were sort of overrated, see these comments for details:

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#6HPjxMvTnP9JeibXZ [LW(p) · GW(p)]

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#icmCewLmXnxgtmANP [LW(p) · GW(p)]

Two, the EPR paradox is resolvable in modern physics by allowing non-locality in entanglement, but having a no-communication theorem that prevents exploiting it to break special relativity.

jbash on In Defense of a Butlerian Jihad

Societies aren't the issue; they're mindless aggregates that don't experience anything and don't actually even have desires in anything like the way a human, or or even an animal or an AI, has desires. Individuals are the issue. Do individuals get to choose which of these societies they live in?

jbash on In Defense of a Butlerian Jihad

I’m pretty sure he doesn’t buy the Christian Paradise of "having no job, only leisure is good actually" either.

This (a) doesn't have anything in particular to do with Christianity, (b) has been the most widely held view among people in general since forever, and (c) seems obviously correct. If you want to rely on the contrary supposition, I'm afraid you're going to have to argue for it.

You can still have hobbies.

I also kinda notice that there are no meaningful place left for humans in that society.

There's that word "meaningful" that I keep hearing everywhere. I claim it's a meaningless word (or at least that it's being used here in a meaningless sense). Please define it in a succinct, relevant, and unambiguous way.

If you believe that the democratic consensus made mostly of normal people will allow you that [Glorious Transhumanist Future], I have a bridge to sell to you.

The democratic consensus also won't allow a Butlerian Jihad, and I don't think you're claiming that it will.

So apparently nobody arguing for either can claim to represent either the democratic consensus or the only alternative to it. What's your point?

If you don’t have a plan then don’t build AGI, pretty please ?

I agree there.

This is obviously wrong. I won’t argue for why it is wrong — too long post, and so on.

I'm actually not sure what you're arguing for or against in this whole section.

Obviously you're not going to "solve human values". Equally obviously, any future, AI or non-AI, is going to be better for some people's values than others. Some values have always won, and some values have always lost, and that will not change. What that has to do with justice destroying the world, I have absolutely no clue.

I think you're trying to take the view that any major change in the "human condition", or in what's "human", is equivalent to the destruction of the world, no matter what benefits it may have. This is obviously wrong. I won't argue for why it's wrong, but now that I've said those magic words, you're bound to accept all my conclusions.

I still can’t believe some of you would sided with the super-happies !

So you're siding with the guy who killed 15 billion non-consenting people because he personally couldn't handle the idea of giving up suffering?

Wrong answers will disempower humans forever at best, reducing them to passive leafs in the wind.

Just like they are now and always have been. The Heat Death of the Universe (TM) is gonna eat ya, regardless of what you do.

Slightly wrong answers won’t go as far as that, but will result in the permanent loss of vast chunks of Human Values — the parts we will decide to discard, consciously or not.

Human Values have been changing, for individuals and in the "average", for as long as there've been humans, including being discarded consciously or unconsciously. Mostly in a pretty aimless, drifting way. This is not new and neither AI nor anything else will fundamentally change it. At least not while the "humans" involved are recognizably like the humans we have now... and changing away from that would be a pretty big break in itself, no?

You build your ASI. You have that big Diverse Plural Assembly that is apparently plan A

I haven't actually heard many people suggesting that.

d0themath on Nathan Helm-Burger's Shortform

I think its this

viliam on Could my work, "Beyond HaHa" benefit the LessWrong community?

use of humor as a pedagogical tool in Cardiopulmonary Resuscitation (CPR) courses

I first understood that as resuscitating people by telling them jokes. Like, when you laugh hard enough, your heart starts beating again. :D

*

Yeah, I think it could be interesting. To me, this feels unsurprising -- memory is related to emotions, so you should use emotions while teaching. But negative emotions, such as fear, help people remember, but also discourage them from researching the topic on their own. They help memory, but hurt creativity. Positive emotions should be useful for both remembering and experimenting.

Now the question is which positive emotions. Also, how. I guess people will remember funny things, but can you produce jokes about every important thing you want your students to remember? (If you can, you should totally do an educational YouTube comedy channel.)