LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How Emergency Medicine Solves the Alignment Problem
StrivingForLegibility · 2023-12-26T05:24:35.579Z · comments (4)

How to partition teams to move fast? Debating "low-dimensional cuts"
jacobjacob · 2023-10-13T21:43:53.067Z · comments (2)

Pivotal Acts might Not be what You Think they are
Johannes C. Mayer (johannes-c-mayer) · 2023-11-05T17:23:50.464Z · comments (13)

Concrete positive visions for a future without AGI
Max H (Maxc) · 2023-11-08T03:12:42.590Z · comments (28)

[link] What's new at FAR AI
AdamGleave · 2023-12-04T21:18:03.951Z · comments (0)

Estimating effective dimensionality of MNIST models
Arjun Panickssery (arjun-panickssery) · 2023-11-02T14:13:09.012Z · comments (3)

On plans for a functional society
kave · 2023-12-12T00:07:46.629Z · comments (8)

[link] energy landscapes of experts
bhauth · 2023-10-02T14:08:32.370Z · comments (2)

How ARENA course material gets made
CallumMcDougall (TheMcDouglas) · 2024-07-02T18:04:00.209Z · comments (2)

[link] Beyond the Board: Exploring AI Robustness Through Go
AdamGleave · 2024-06-19T16:40:06.594Z · comments (2)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (13)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

Surviving Seveneves
Yair Halberstadt (yair-halberstadt) · 2024-06-19T13:11:55.414Z · comments (4)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

Applying Force to the Wrong End of a Causal Chain
silentbob · 2024-06-22T18:06:32.364Z · comments (0)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

Manifund Q1 Retro: Learnings from impact certs
Austin Chen (austin-chen) · 2024-05-01T16:48:33.140Z · comments (1)

Debate, Oracles, and Obfuscated Arguments
Jonah Brown-Cohen (jonah-brown-cohen) · 2024-06-20T23:14:57.340Z · comments (2)

[link] The Data Wall is Important
JustisMills · 2024-06-09T22:54:20.070Z · comments (20)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (29)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (59)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.
Jessica Rumbelow (jessica-cooper) · 2024-08-03T12:07:46.302Z · comments (2)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations
Linch · 2024-06-12T13:46:29.535Z · comments (0)

[link] ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman · 2023-09-28T04:30:37.140Z · comments (9)

Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
jsteinhardt · 2023-10-31T05:10:02.581Z · comments (0)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

The Serendipity of Density
jefftk (jkaufman) · 2023-12-17T03:50:04.824Z · comments (4)

[link] The price is right
EJT (ElliottThornley) · 2023-10-16T16:34:38.023Z · comments (3)

Technologies and Terminology: AI isn't Software, it's... Deepware?
Davidmanheim · 2024-02-13T13:37:10.364Z · comments (10)

Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
So8res · 2023-12-19T23:39:59.689Z · comments (30)

My idea of sacredness, divinity, and religion
Kaj_Sotala · 2023-10-29T12:50:07.980Z · comments (10)

How to solve deception and still fail.
Charlie Steiner · 2023-10-04T19:56:56.254Z · comments (7)

Scaling of AI training runs will slow down after GPT-5
Maxime Riché (maxime-riche) · 2024-04-26T16:05:59.957Z · comments (5)

Movie posters
KatjaGrace · 2024-03-06T06:20:03.034Z · comments (0)

[link] Queuing theory: Benefits of operating at 60% capacity
ampdot · 2023-12-01T18:48:01.426Z · comments (4)

Jobs, Relationships, and Other Cults
Ruby · 2024-03-13T05:58:45.043Z · comments (9)

[question] Does AI governance needs a "Federalist papers" debate?
azsantosk · 2023-10-18T21:08:26.098Z · answers+comments (4)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (59)

[link] Progress links digest, 2023-11-24: Bottlenecks of aging, Starship launches, and much more
jasoncrawford · 2023-11-24T15:25:07.721Z · comments (1)

Neuroscience and Alignment
Garrett Baker (D0TheMath) · 2024-03-18T21:09:52.004Z · comments (25)

[link] Dequantifying first-order theories
jessicata (jessica.liu.taylor) · 2024-04-23T19:04:49.000Z · comments (9)

Extrapolating from Five Words
Gordon Seidoh Worley (gworley) · 2023-11-15T23:21:30.865Z · comments (11)

What's up with all the non-Mormons? Weirdly specific universalities across LLMs
mwatkins · 2024-04-19T13:43:24.568Z · comments (13)

Quantopian contest, but for food intake and weight
Lucent · 2023-11-08T05:41:35.050Z · comments (9)

"Does your paradigm beget new, good, paradigms?"
Raemon · 2024-01-25T18:23:15.497Z · comments (6)

[link] "What if we could redesign society from scratch? The promise of charter cities." [Rational Animations video]
Jackson Wagner · 2024-02-18T00:57:50.444Z · comments (7)

Planning to build a cryptographic box with perfect secrecy
Lysandre Terrisse · 2023-12-31T09:31:47.941Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

stefan_schubert on whestler's Shortform

Cf this Bostrom quote.

Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization - a niche we filled because we got there first, not because we are in any sense optimally adapted to it.

Re this:

In evolutionary timescales, virtually no time has elapsed since hominids began trading, utilizing complex symbolic thinking, making art, hunting large animals etc, and here we are, a blip later in high technology.

A bit nit-picky, but a recent paper studying West Eurasia found significant evolution over the last 14,000 years.

sodium on How LLMs are and are not myopic

Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation.
This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural "task" to do interpretability on even less defensible [AF · GW].

Anyways, I thought I should revisit this post after o1 comes out. I can't help noticing that it's stylistically very different from all of the janus writing I've encountered in the past, then I got to the end

The ideas in the post are from a human, but most of the text was written by Chat GPT-4 with prompts and human curation using Loom.

Ha, I did notice I was confused (but didn't bother thinking about it further)

lsusr on What are the best arguments for/against AIs being "slightly 'nice'"?

Noted. The problem remains—it's just less obvious. This phrasing still conflates "intelligent system" with "optimizer", a mistake that goes all the way back to Eliezer Yudkowsky's 2004 paper on Coherent Extrapolated Volition.

For example, consider a computer system that, given a number can (usually) produce the shortest computer program that will output $N$ . Such a computer system is undeniably superintelligent, but it's not a world optimizer at all.

"Far away, in the Levant, there are yogis who sit on lotus thrones. They do nothing, for which they are revered as gods," said Socrates.

―The Teacup Test [LW · GW]

raemon on What are the best arguments for/against AIs being "slightly 'nice'"?

I realize this isn’t your main point here, but I do want to flag I put ‘nice’ in quotes because I don’t mean the colloquial definition. The question here is ‘would a super intelligent system with control over the solar system spend a billionth or trillionth of its resources helping beings too weak to usefully trade with it, if it didn’t benefit directly from it?’

As I see it the question is agnostic to what sort of mind the AI is.

lsusr on What are the best arguments for/against AIs being "slightly 'nice'"?

Personally, I feel the question itself is misleading because it anthropomorphizes a non-human system. Asking if an AI is nice is like asking of the Fundamental Theorem of Algebra is blue. Is Stockfish nice? Is an AK-47 nice? The adjective isn't the right category for the noun. Except it's even worse than that because there are many different kinds of AIs [? · GW]. Are birds blue? Some of them are. Some of them aren't.

I feel like I understand Eliezer's arguments well enough that I can pass an Ideological Turing Test, but I also feel there are a few loopholes [LW · GW].

I've considered throwing my hat into this ring, but the memetic terrain is against me [? · GW]. "AI will kill us all" fits into five words [LW · GW]. "Half the things you believe about how minds work, including your own, are wrong. Let's start over from the beginning with how planet's major competing optimizers interact. After that, we can go through the fundamentals of behaviorist psychology," is not a winning thesis in a Hegelian debate [LW · GW] (though it can be viable in a Socratic [? · GW] context).

In real life, my conversations usually go like this.

AI doomer: "I believe AI will kill us all. It's stressing me out. What do you believe?"

Me (as politely as I can): "I operate from a theory of mind so different from yours that the question 'what do you believe' is not applicable to this situation."

AI doomer: "Wut."

Usually the person loses interest there. For those who don't, it just turns into an introductory lesson of my own idiosyncratic theory of rationality.

AI doomer: "I never thought about things that way before. I'm not sure I understand you yet, but I feel better about all of this for some reason."

In practice, I'm finding it more efficient to write stories that teach how competing optimizers [LW · GW], adversarial equilibria [LW · GW], and other things work. This approach is indirect. My hope is that it improves the quality of thinking and discourse.

I may eventually write about this topic if the right person shows up who want to know my opinion well enough they can pass an Ideological Turing Test. Until then, I'll be trying to become a better writer and YouTuber.

dweomite on What are the best arguments for/against AIs being "slightly 'nice'"?

I have an intuition like: Minds become less idiosyncratic as they grow up.

A couple of intuition pumps:

(1) If you pick a game, and look at novice players of that game, you will often find that they have rather different "play styles". Maybe one player really likes fireballs and another really like crossbows. Maybe one player takes a lot of risks and another plays it safe.

Then if you look at experts of that particular game, you will tend to find that their play has become much more similar. I think "play style" is mostly the result of two things: (a) playing to your individual strengths, and (b) using your aesthetics as a tie-breaker when you can't tell which of two moves is better. But as you become an expert, both of these things diminish: you become skilled at all areas of the game, and you also become able to discern even small differences in quality between two moves. So your "play style" is gradually eroded and becomes less and less noticeable.

(2) Imagine if a society of 3-year-olds were somehow in the process of creating AI, and they debated whether their AI would show "kindness" to stuffed animals (as an inherent preference, rather than an instrumental tool for manipulating humans). I feel like the answer to this should be "lol no". Showing "kindness" to stuffed animals feels like something that humans correctly grow out of, as they grow up.

It seems plausible to me that something like "empathy for kittens" might be a higher-level version of this, that humans would also grow out of (just like they grow out of empathy for stuffed animals) if the humans grew up enough.

(Actually, I think most humans adults still have some empathy for stuffed animals. But I think most of us wouldn't endorse policies designed to help stuffed animals. I'm not sure exactly how to describe the relation that 3-year-olds have to stuffed animals but adults don't.)

I sincerely think caring about kittens makes a lot more sense than caring about stuffed animals. But I'm uncertain whether that means we'll hold onto it forever, or just that it takes more growing-up in order to grow out of it.

Paul frames this as "mostly a question about idiosyncrasies and inductive biases of minds rather than anything that can be settled by an appeal to selection dynamics." But I'm concerned that might be a bit like debating the odds of whether your newborn human will one day come to care for stuffed animals, instead of whether they will continue to care for them after growing up. It can be very likely that they will care for a while, and also very likely that they will stop.

I strongly suspect it is possible for minds to become quite a lot more grown-up than humans currently are.

(I think Habryka may have been saying something similar to this.)

Still, I notice that I'm doing a lot of hand-waving here and I lack a gears-based model of what "growing up" actually entails.

zach-furman on Singular learning theory: exercises

Good catch, thanks! Fixed now.

tailcalled on Why I'm bearish on mechanistic interpretability: the shards are not in the network

Why?

raemon on Exercise: Solve "Thinking Physics"

FYI I remember being vaguely dissatisfied wth the early exercises in the book, and recommend skipping ahead to somewhere in the middle of the first-half.

quila on Exercise: Solve "Thinking Physics"

i'm enjoying this. going through the questions right now, might do all of them

had a notable experience with one of the early questions:

question: "The battery output voltage, the bottle volume, the digital clock time, and the measure of weight (12 volts; one gallon; 12:36; 1 lb) all have something in common. It is that they are represented by a) one number b) more than one number."

recollected thought process: apart from the clock time, they all have one number. the time on the clock is also, in my opinion, represented by one number in a non base-n numeral system - the symbols update predictably when the value is incremented, which is all that's required. i'm not sure if the author intends that interpretation of the clock, though. let's look for other interpretations.

"lb" - this is a pointer to formulas related to weight/gravity (or more fundamentally, a pointer back to physics/the world). "1 lb" means "1 is the value to pass as the weight variable". a formula is not itself a number, but can contain them. maybe this is why the clock is included - most would probably consider it to contain two numbers, which would force them to think about how these other three could be 'more than one number' as well.

(though it's down to interpretation, i'll choose b) more than one number.)

the listed answer is: a) one number. "Each is represented by only one number - the battery by 12 volts, the bottle by one gallon, the time by 12:36 and the weight by one pound. Things described by one number are called scalars. For example: on a scale of one to ten, how do you rate this teacher?" it just restates them and implies in passing that 12:36 is one number, without deriving any insight from the question. *feels disappointed*. (i guess they just wanted to introduce a definition)