LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (6)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (13)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

[question] What are your cruxes for imprecise probabilities / decision rules?
Anthony DiGiovanni (antimonyanthony) · 2024-07-31T15:42:27.057Z · answers+comments (33)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

AI Safety Camp final presentations
Linda Linsefors · 2024-03-29T14:27:43.503Z · comments (3)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (4)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

[link] UC Berkeley course on LLMs and ML Safety
Dan H (dan-hendrycks) · 2024-07-09T15:40:00.920Z · comments (1)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

On Dwarkesh’s 3rd Podcast With Tyler Cowen
Zvi · 2024-02-02T19:30:05.974Z · comments (9)

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

Drone Wars Endgame
RussellThor · 2024-02-01T02:30:46.161Z · comments (71)

AI #47: Meet the New Year
Zvi · 2024-01-13T16:20:10.519Z · comments (7)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

Closeness To the Issue (Part 5 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-09T00:36:47.388Z · comments (0)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

AI companies' commitments
Zach Stein-Perlman · 2024-05-29T11:00:31.339Z · comments (0)

[link] Toki pona FAQ
dkl9 · 2024-03-17T21:44:21.782Z · comments (8)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (9)

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (9)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (14)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

Empirical vs. Mathematical Joints of Nature
Elizabeth (pktechgirl) · 2024-06-26T01:55:22.858Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

quila on quila's Shortform

I was assuming very strongly superhumanly intelligent AI

oh okay, i'll have to reinterpret then. edit: i just tried, but i still don't get it; if it's "very strongly superhuman", why is it merely "when the economy starts getting seriously disrupted"? (<- this feels like it's back at where this thread started)

I think the offense-defense balance moderately favors defense even at optimality

why?

sharmake-farah on quila's Shortform

I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.

That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn't happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.

benquo on Parkinson's Law and the Ideology of Statistics

Wow, thanks for doing the legwork on this - seems like quite possibly I'm analyzing fiction? Annoying if true.

Google's AI response to my search for the Thaba-Tseka Development Project says:

According to available World Bank documentation, the "Thaba-Tseka development project" is primarily referenced within the context of the "Lesotho Integrated Transport, Trade and Logistics Project," which focuses on improving the road corridor connecting Katse to Thaba-Tseka, aiming to enhance regional connectivity and reduce trade costs at Lesotho's borders with South Africa; key documents to reference would be those related to this project, particularly those detailing the road infrastructure development component between Katse and Thaba-Tseka.
Key points about the documentation:
Project Title: "Lesotho Integrated Transport, Trade and Logistics Project"
Focus Area: Upgrading the Katse to Thaba-Tseka road corridor
Objectives: Improve climate resilient regional connectivity, reduce trade costs at Lesotho's borders
Relevant documents to explore: Project Appraisal Documents, Procurement documents related to road construction and improvement on the Katse-Thaba-Tseka stretch

There's a good chance this is an AI hallucination, though; a cursory search of the main documents didn't yield any references to a "Thaba-Tseka development project," or the wood or ponies. I'm not familiar with World Bank documentation, though, and likely the right followup would involve looking at exactly what's cited in the book.

nathan-helm-burger on Rolling Thresholds for AGI Scaling Regulation

Sigh. Ok. I'm giving an upvote for good-faith effort to think this through and come up with a plan, but I just disagree with your world-model and its projections about training costs and associated danger levels so strongly that it seems hard to figure out how to even begin a discussion.

I'll just leave a link here [LW(p) · GW(p)] to a different comment talking about the same problem.

quila on quila's Shortform

but it still says "it's easy for others to get their own superintelligences with different values", with 'superintelligence' referring to the 'superhuman' AI of 2035?

still confused about this btw. in my second reply to you i wrote:

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"?)

and you did not say you were, but it looks like you are here?

quila on quila's Shortform

far too many people tend to deny that you do in fact have to make other values lose out

i don't know where that might be true, but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.

also on the "lose out" phrasing: even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

raemon on Raemon's Shortform

My Current Metacognitive Engine

Someday I might work this into a nicer top-level post, but for now, here's the summary of the cognitive habits I try to maintain (and mostly succeed at maintaining). Some of these are simple TAPs, some of them are more like mindsets.

Twice a day, asking “what is the most important thing I could be working on and why aren’t I on track to deal with it?”
- you probably want a more specific question (“important thing” is too vague). Three example specific questions (but, don’t be a slave to any specific operationalization)
  - what is the most important uncertainty I could be reducing, and how can I reduce it fastest?
  - what’s the most important resource bottleneck I can gain, or contribute to the ecosystem, and would gain me that resource the fastest?
  - what’s the most important goal I’m backchaining from?
Have a mechanism to iterate on your habits that you use every day, and frequently update in response to new information
- for me, this is daily prompts and weekly prompts, which are:
  - optimized for being the efficient metacognition I obviously want to do each day
  - include one skill that I want to level up in, that I can do in the morning as part of the meta-orienting (such as operationalizing predictions, or “think it faster”, or whatever specific thing I want to learn to attend to or execute better right now)
The five requirements each fortnight:
- be backchaining
  - from the most important goals
- be forward chaining
  - through tractable things that compound
- ship something
  - to users every fortnight
- be wholesome
  - (that is, do not minmax in a way that will predictably fail later)
- spend 10% on meta (more if you’re Ray in particular but not during working hours. During working hours on workdays, meta should pay for itself within a week)
Correlates:
- have a clear, written model of what you’re backchaining from
- have a clear, written model of how you’re compounding
The general problem solving approach:
- breadth first
- identify cruxes
- connect inner-sim to cruxes / predictions
- follow your heart
- see how your predictions went
Random ass skills
- napping
- managing working memory, innovating and applying on working memory tools
- grieving
- Generalizing

Skill I’m working on that hasn’t paid off yet but I think you should try anyway:

At least once a day or so, when you notice a mistake or surprise, spent a couple minutes asking “how could I have thought that faster” (and periodically do deeper dives)
each day/week, figure out what you’re confused or predictably going to tackle in a dumb way, and think in advance about how to be smart about it the first time

benquo on Preference Inversion

I want to note something about how your position seems to have evolved through this discussion. Initially, you argued that societal pressure often reflects genuine wisdom, using examples where a 'society who aggressively shames overconsumption of sweets' might be wiser than a child's raw preferences. You suggested that what I was calling 'intrinsic preferences' might just be 'shallow preferences' that hadn't yet been trained to reflect reality.

Now you're making a different and more sophisticated argument - that the whole framework of 'intrinsic' versus 'external' preferences is problematic because preferences necessarily develop within and respond to reality, including social reality. While this is an interesting perspective that deserves consideration, it seems to contradict rather than support your initial defense of social restrictions as transmitting wisdom.

There's also an important point about my own position that I should clarify. When I said 'generally, upon reflection, people would prefer to satisfy their and others' preferences as calculated prior to such influences,' I wasn't making a claim about how often admonitions reflect preference inversions. Rather, I was suggesting that if people were to reflect explicitly on cases of preference inversion, they typically wouldn't want those inverted preferences to count; they would recognize these as preferences shaped by forces systematically opposed to their interests.

This connects to what I see as the core distinction: I'm not just talking about external influences or errors in the transmission of wisdom. I'm specifically pointing to cases where restrictions are moralized for the purpose of restriction itself - where the system is systematically deprecating the evolutionarily fit preferences of the person being restricted. This isn't just clumsy teaching or social pressure - it's adversarial. The system works by first making people feel guilty about their natural inclinations, then betting that they won't fully succeed at suppressing those inclinations despite earnestly trying to adopt the system's restrictions.

Consider the survival of variants of Christianity that 'do poorly' at helping people develop healthy attitudes toward sexuality. Their persistence suggests this poor performance is actually functional - they are able to exploit their members precisely because they create a system where most people must be 'bad' by design, where hypocrisy isn't a bug but a feature. When dessert companies can successfully market their products as 'sinfully delicious,' they're exploiting a system of moral restrictions that creates the very compulsive relationship to sweets it claims to prevent.

ektimo on ektimo's Shortform

Prompt: write a micro play that is both disturbing and comforting
--

Title: "The Silly Child"

Scene: A mother is putting to bed her six-year-old child

CHILD: Mommy, how many universes are there?

MOTHER: As many as are possible.

CHILD (smiling): Can we make another one?

MOTHER (smiling): Sure. And while we're at it, let's delete the number 374? I've never liked that one.

CHILD (excited): Oh! And let's make a new Fischer-Griess group element too! Can we do that Mommy?

MOTHER (bops nose) That's enough stalling. You need to get your sleep. Sweet dreams, little one. (kisses forehead)

End

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.

While it is interesting at first glance, some caveats are called for here.

One, Einstein's achievements were sort of overrated, see these comments for details:

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#6HPjxMvTnP9JeibXZ [LW(p) · GW(p)]

https://www.lesswrong.com/posts/GSBCw94DsxLgDat6r/interpreting-yudkowsky-on-deep-vs-shallow-knowledge#icmCewLmXnxgtmANP [LW(p) · GW(p)]

Two, the EPR paradox is resolvable in modern physics by allowing non-locality in entanglement, but having a no-communication theorem that prevents exploiting it to break special relativity.