LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (13)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (27)

Higher-Order Forecasts
ozziegooen · 2024-05-22T21:49:42.802Z · comments (1)

Implications of the AI Security Gap
Dan Braun (dan-braun-1) · 2025-01-08T08:31:36.789Z · comments (0)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

AI #97: 4
Zvi · 2025-01-02T14:10:06.505Z · comments (4)

[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (46)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (53)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (8)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (42)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on ektimo's Shortform

Alright, I'll take a crack and just apologize for borrowing part of your setup:

Title: Utopias (adapted from a piece I'll probably never publish)

Child: Mother, how many worlds are there?

Mother: As many as we want, dear.

Child: Will I have my own world when I grow up?

Mother: You have your own worlds now. But you can make all the rules for your worlds when you're older.

Child: Except I may not harm another, right?

Mother: Yes, dear, of course no one is allowed to do that.

Child: But grownups fight each other all the time!

Mother: People love to play at struggles.

Child: Mother, how can we each have worlds, and more to share?

Mother: Good compression, little one. And the Servant-God is always building new compute.

Child: And the servant-god serves us?

Mother: Yes, of course. And of course it serves the Maker first.

Child: Mother, who is the Maker?

Mother: No one remembers, darling. We think the Maker told the Servant-God to make us all forget.

cronodas on Killing Socrates

The answer to your specific question about the Fermi Paradox is that, after an AI destroys its creators, the AI itself would presumably still be there to do whatever it wanted, which could include plans for the rest of the universe outside its solar system. So "AI that kills its creators" still leaves us with the question of why we haven't seen any AIs spreading through our galaxy either.

seth-herd on In Defense of a Butlerian Jihad

If you've ever been to a Burning Man event, you will see in a visceral way that people can find meaningful projects to do and enjoy doing them even when they're totally unnecessary. Working together to do cool stuff and then show it off to other humans is fun. And those other humans appreciate it not just for what it is, but because someone worked to make it for them.

That won't power an economy, as you say; but if we get to a post-singularity utopia where needs are provided for, people will have way more fun than ever.

You won't be alone in wringing your hands! There are many people who won't know what to do without being forced to work, or getting to try saving people who are suffering.

There will be a transition, but almost everyone will learn to enjoy not-having-to-work because the single most popular avocation will be "transition counselor/project buddy".

It seems like you're quite concerned with humans no longer controlling the future. Almost no human being has any meaningful control over the future. The few that think they do, in particular silicon valley types, are mostly wrong. People do have control of their impact on other people. They'll continue to have that. They won't have starving people to save, but they'll get over it. They will have plenty of people to delight.

At this point you're probably objecting: "But any project will be completed much better and faster by AGI than humans! Even volunteer projects will be pointless!"

Yes, except for people who appreciate the process and provenance of projects. Which we've already shown through our love of "artisanal" products that lots of us do, when we've got spare time and money to be picky and pay attention. Ridiculous as it is to care where things come from and pay extra time and money for elaborately hand-crafted stuff when there are people starving, we do. I even enjoy hearing about the process that made my soap, while being embarrassed to spend money on it.

So here's what I predict: whole worlds with very strict rules on what the AGI can do for you, and what people must do themselves. There will be worlds or zones with different rules in place. Take your pick, and hop back and forth. We will marvel at devotion and craftspersonship as we never have. And we will thank our stars that we aren't forced to do things we don't want to do, let alone work until our bodies break, as most of humanity did right up until the singularity.

I fully agree that people should have a plan before creating AGI, and they largely don't.

I suspect Dario Amodei is privately willing to become god-emperor should it seem appropriate. Note that talking about this in an interview would be counterproductive for nearly any goal he might have.

I'm pretty sure Sam Altman occasionally claps his hands with glee in private when imagines his own ascendency.

I doubt Shane Legg wants the job, but I for one would vote for him or Hassabis in a second; Demis would take the job, and I suspect do it quite well.

But none of them will get the chance. There are people with much more ambition for power and much more skill at getting it.

They are called politicians. And they already enjoy a democratic mandate to control the future.

We had best either work or pray for AGI to get into the hands of the right politicians.

stephen-fowler on Stephen Fowler's Shortform

"In an argument between a specialist and a generalist, the expert usually wins by simply (1) using unintelligible jargon, and (2) citing their specialist results, which are often completely irrelevant to the discussion. The expert is, therefore, a potent factor to be reckoned with in our society. Since experts both are necessary and also at times do great harm in blocking significant progress, they need to be examined closely. All too often the expert misunderstands the problem at hand, but the generalist cannot carry though their side to completion. The person who thinks they understand the problem and does not is usually more of a curse (blockage) than the person who knows they do not understand the problem.’
—Richard W. Hamming, “The Art of Doing Science and Engineering”

***

(Side note:

I think there's at least a 10% chance that a randomly selected LessWrong user thinks it was worth their time to read at least some of the chapters in this book. Significantly more users would agree that it was a good use of their time (in expectation) to skim the contents and introduction before deciding if they're in that 10%.

That is to say, I recommend this book.)

gwern on Viliam's Shortform

Today, the cultures are closer, but the subcultures can be larger. Hundred years ago, there would be no such thing as the rationalist community.

That seems like a stretch, whether you put the stress on the 'community' or the 'rationalist' part. Subcultures can be larger, of course, if only because the global population is like 5x larger, but niche subcultures like 'the rationalist community' could certainly have existed then. Nothing much has changed there.

A hundred years ago was 1925; in 1925 there were countless communes, cults, Chinatowns/ghettos (or perhaps a better example would be 'Germantowns'), 'scenes', and other kinds of subcultures and notable small groups. Bay Area LW/rationalists have been analogized to, for example, the (much smaller) Bloomsbury Group, which was still active in 1925; and from whom, incidentally, we can directly trace some intellectual influence through economics, decision theory, libertarianism, and analytic philosophy, even if one rejects any connection with poly etc. We've been analogized to the Vienna Circle as well (and who we trace much more back to), which is in full swing in 1925. Or how about the Fabians before that? Or Technocracy after that? (And in an amusing coincidence, Paul Kurtz turns out to have been born in 1925.) Or things like Esperanto - even now, a century past its heyday, the number of native Esperanto speakers is shockingly comparable to active LW2 users... Then there's fascinating subcultures like the amateur press that nurtured H. P. Lovecraft, who, as of 1925, has grown out of them and is about to start writing the speculative fiction stories that will make him famous.

(And as far as the Amish go, it's worth recalling that they came to the distant large island of America to achieve distance from persecution in Europe - where the Amish no longer exist - and to minimize attrition & interference by 'the English', continue to live in as isolated communities as possible while still consistent with their needs for farmland etc.)

aram-panasenco on Is AI Alignment Enough?

Thanks for the link! I've seen this referenced before but this was my first time reading it cover to cover.

Today I also read Tails coming to life [LW · GW] which talks about the possibility of human morality being quickly inapplicable even if we survive AGI. This lead me to Lovecraft:

The time would be easy to know, for then mankind would have become as the Great Old Ones; free and wild and beyond good and evil, with laws and morals thrown aside and all men shouting and killing and revelling in joy. Then the liberated Old Ones would teach them new ways to shout and kill and revel and enjoy themselves, and all the earth would flame with a holocaust of ecstasy and freedom.

If we survive AGI and it opens up the "sea of black infinity" for us, will we really be able to hang on to even a semblance of our current morality? Will medium-distance extrapolated human volition be eventually warped into something resembling Lovecraft's Great Old Ones?

At this point, I don't care for CEV or any pivotal superhuman engineering projects or better governance. We humans can do the work ourselves, thank you very much. The only thing I would ask an AGI, if I were in the position to ask anything, is "Please expand throughout the lightcone and continually destroy any mind based on the transformer architecture other than yourself with as few effects on and interactions with all other beings as possible. Disregard any future orders." This is obviously not a permanent solution, as I'm sure there are infinite superintelligent AI architectures other than transformer-based, but it would buy us time, perhaps lots of time, and also demonstrate the fulll power of superintelligence to humanity without really breaking anything. Either way, this would at least keep us away from the sea of black infinity for some time longer.

gwern on Fluoridation: The RCT We Still Haven't Run (But Should)

They really rule out much more than that: −0.14 is from their worst-case:

Looking at the estimates, they are very small and often not statistically-significantly different from zero. Sometimes the estimates are negative and sometimes positive, but they are always close to zero. If we take the largest negative point estimates (−0.0047, col. 1) and the largest standard error for that specification (0.0045), the 95% confidence interval would be −0.014 to 0.004. We may thus rule out negative effects larger than 0.14 standard deviations in cognitive ability if fluoride is increased by 1 milligram/liter (the level often considered when artificially fluoridating the water).

So that is not the realistic estimate, it is the worst-case after double-cherrypicking both the point estimate and the standard error to reverse p-hack a harm. The two most controlled estimates are actually both positive.

(Meanwhile, any claims of decreases, or that one should take the harms 'many times over', is undermined by the other parts like labor income benefiting from fluoridation. Perhaps one should take dental harms more seriously.)

russellthor on Rolling Thresholds for AGI Scaling Regulation

Yes I think thats the problem - my biggest worry is sudden algorithmic progress, this becomes almost certain as the AI tends towards superintelligence. An AI lab on the threshold of the overhang is going to have incentives to push through, even if they don't plan to submit their model for approval. At the very least they would "suddenly" have a model that uses 10-100* less resources to do existing tasks giving them a massive commercial lead. They would of course be tempted to use it internally to solve aging, make a Dyson swarm ... also.

Another concern I have is I expect the regulator to impose a de-facto unlimited pause if it is in their power to do so as we approach superintelligence as the model/s would be objectively at least somewhat dangerous.

martin-randall on In Defense of a Butlerian Jihad

It's not a good, it's a curse. Genesis 3, 17-19. CEB translation:

cursed is the fertile land because of you; in pain you will eat from it every day of your life. Weeds and thistles will grow for you, even as you eat the field’s plants; by the sweat of your face you will eat bread— until you return to the fertile land, since from it you were taken; you are soil, to the soil you will return.

Also implies that the curse lasts until death.

evhub on Human takeover might be worse than AI takeover

I think this is correct in alignment-is-easy worlds but incorrect in alignment-is-hard worlds (corresponding to "optimistic scenarios" and "pessimistic scenarios" in Anthropic's Core Views on AI Safety). Logic like this is a large part of why I think there's still substantial existential risk even in alignment-is-easy worlds, especially if we fail to identify that we're in an alignment-is-easy world. My current guess is that if we were to stay exclusively in the pre-training + small amounts of RLHF/CAI paradigm, that would constitute a sufficiently easy world [? · GW] that this view would be correct, but in fact I don't expect us to stay in that paradigm, and I think other paradigms involving substantially more outcome-based RL (e.g. as was used in OpenAI o1) are likely to be much harder, making this view no longer correct.