LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (19)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (20)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (0)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (0)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (11)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (23)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (4)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (21)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (11)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (1)

[link] Day Zero Antivirals for Future Pandemics
Niko_McCarty (niko-2) · 2024-08-26T15:18:33.858Z · comments (2)

[link] on Science Beakers and DDT
bhauth · 2024-09-05T03:21:21.382Z · comments (12)

[link] Hyperpolation
Gunnar_Zarncke · 2024-09-15T21:37:00.002Z · comments (4)

August 2024 Time Tracking
jefftk (jkaufman) · 2024-08-24T13:50:04.676Z · comments (0)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (10)

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan · 2024-08-24T22:30:02.039Z · comments (0)

A necessary Membrane formalism feature
ThomasCederborg · 2024-09-10T21:33:09.508Z · comments (6)

How Often Does Taking Away Options Help?
niplav · 2024-09-21T21:52:40.822Z · comments (4)

My decomposition of the alignment problem
Daniel C (harper-owen) · 2024-09-02T00:21:08.359Z · comments (22)

Simon DeDeo on Explore vs Exploit in Science
Elizabeth (pktechgirl) · 2024-09-10T03:40:08.311Z · comments (0)

[link] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-19T16:13:55.835Z · comments (1)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zack_m_davis on ASIs will not leave just a little sunlight for Earth

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full" [LW(p) · GW(p)]) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" [LW(p) · GW(p)] and another thread on "Cosmopolitan Values Don't Come Free" [LW(p) · GW(p)],

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates" [LW · GW]: if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration [LW · GW] of some relevant considerations, the Superhappies in "Three Worlds Collide" [LW · GW] cared about the humans to some extent, but not in the specific way [LW · GW] that the humans wanted to be cared for.)

Now, you are on the record stating [LW(p) · GW(p)] that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address [LW · GW]—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence [LW · GW].

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnalt are very different things. But you chose the illustrative example! As a matter of local validity [LW · GW], It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

dkl9 on How harmful is music, really?

I added intention-to-treat statistics in an addendum.

quetzal_rainbow on ASIs will not leave just a little sunlight for Earth

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

andrewtaneglen on The Other Existential Crisis

Continental moments are great. I feel like that's the end game once we transcend science and analysis.

andrewtaneglen on The Other Existential Crisis

Agree wholeheartedly. I'm trying to describe transcending the realisation of determinism, in the sense of not getting bogged down by it, to then go about living a good human life.

andrewtaneglen on The Other Existential Crisis

Cognitive effort is inevitable. It would take a special kind of 'person who fails the psychopath test', somehow lacking urges/feelings, to be able to switch off completely and fade into nothingness.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

It does not produce anything close to fully formed scientific papers. It's output is really not better than just prompting o1 yourself. Of course, o1 and even Sonnet and GPT-4 are very impressive, but there is no update to be made after you've played around with that.

(Again) I think this is missing the point that we've now (for the first time, to my knowledge) observed an early demo the full research workflow being automatable, as flawed as the outputs might be.

lukas-finnveden on Applications of Chaos: Saying No (with Hastings Greer)

and as he admits in the footnote he didn't include in the LW version, in real life, when adequately incentivized to win rather than find excuses involving 'well, chaos theory shows you can't predict ball bounces more than n bounces out', pinball pros learn how to win and rack up high scores despite 'muh chaos'.

I was confused about this part of your comment because the post directly talks about this in the conclusion.

The strategy typically is to catch the ball with the flippers, then to carefully hit the balls so that it takes a particular ramp which scores a lot of points and then returns the ball to the flippers. Professional pinball players try to avoid the parts of the board where the motion is chaotic.

The "off-site footnote" you're referring to seems to just be saying "The result is a pretty boring game. However, some of these ramps release extra balls after you have used them a few times. My guess is that this is the game designer trying to reintroduce chaos to make the game more interesting again." which is just a minor detail. AFAICT pros could score lots of points even without the extra balls.

(I'm leaving this comment here because I was getting confused about whether there had been major edits to the post, since the relevant content is currently in the conclusion and not the footnote. I was digging through the wayback machine and didn't see any major edits. So trying to save other people from the same confusion.)

thane-ruthenis on Another argument against utility-centric alignment paradigms

Mm, there are two somewhat different definitions of what counts as "a natural abstraction":

I would agree that human values are likely a natural abstraction in the sense that if you point an abstraction-learning algorithm at the dataset of modern humans doing things, "human values" and perhaps even "eudaimonia" would fall out as a natural principal component of that dataset's decomposition.
What I wouldn't agree with is that human values are a natural abstraction in the sense that a mind pointed at the dataset of this universe doing things, or at the dataset of animals doing things, or even at the dataset of prehistoric or medieval humans doing things, would learn modern human values.

Let's step back a bit.

Suppose we have a system Alpha and a system Beta, with Beta embedded in Alpha. Alpha starts out with a set of natural abstractions/subsystems. Beta, if it's an embedded agent, learns these abstractions, and then starts executing actions within Alpha that alter its embedding environment. Over the course of that, Beta creates new subsystems, corresponding to new abstractions.

As concrete examples, you can imagine:

The lifeless universe as Alpha (with abstractions like "stars", "gasses", "seas"), and the biosphere as Beta (creating abstractions like "organisms" and "ecosystems" and "predator" and "prey").
The biosphere as Alpha (with abstractions like "food" and "species") and the human civilization as Beta (with abstractions like "luxury" and "love" and "culture").

Notice one important fact: the abstractions Beta creates are not, in general, easy-to-predict from the abstractions already in Alpha. "A multicellular organism" or "an immune-system virus" do not naturally fall out of descriptions of geological formations and atmospheric conditions. They're highly contingent abstraction, ones that are very sensitive to the exact conditions in which they formed. (Biochemistry, the broad biosphere the system is embedded in...)

Similarly, things like "culture" or "eudaimonia" or "personal identity", the way humans understand them, don't easily fall out of even the abstractions present in the biosphere. They're highly contingent on the particulars of how human minds and bodies are structured, how they exchange information, et cetera.

In particular: humans, despite being dropped into an abstraction-rich environment, did not learn values that just mirror some abstraction present in the environment. We're not wrapper-minds single-mindedly pursuing procreation, or the eradication of predators, or the maximization of the number of stars. Similarly, animals don't learn values like "compress gasses".

What Beta creates are altogether new abstractions defined in terms of complicated mixes of Alpha's abstractions. And if Beta is the sort of system that learns values, it learns values that wildly mix the abstractions present in Beta. These new abstractions are indeed then just some new natural abstraction. But they're not necessarily "simple" in terms of Alpha's abstractions.

And now we come to the question of what values an AGI would learn. I would posit that, on the current ML paradigm, the setup is the basic Alpha-and-Beta setup, with the human civilization being Alpha and the AGI being Beta.

Yes, there are some natural abstractions in Alpha, like "eudaimonia". But to think that the AGI would just naturally latch onto that single natural abstraction, and define its entire value system over it, is analogous to thinking that animals would explicitly optimize for gas-compression, or humans for predator-elimination or procreation.

I instead strongly expect that the story would just repeat. The training process (or whatever process spits out the AGI) would end up creating some extremely specific conditions in which the AGI is learning the values. Its values would then necessarily be some complicated functions over weird mixes of the abstractions-natural-to-the-dataset-it's-trained-on, with their specifics being highly contingent on some invisible-to-us details of that process.

It would not be just "eudaimonia", it'd be some weird nonlinear function of eudaimonia and a random grab-bag of other things, including the "Beta-specific" abstractions that formed within the AGI over the course of training. And the output would not necessarily have anything to do with "eudaimonia" in any recognizable way, the way "avoid predators" is unrecognizable in terms of "rocks" and "aerodynamics", and "human values" are unrecognizable in terms of "avoid predators" or "maximize children".

brendan-long on ASIs will not leave just a little sunlight for Earth

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI, I just don't think this particular argument works)