LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Navigating emotions in an uncertain & confusing world
Akash (akash-wasil) · 2023-11-20T18:16:09.492Z · comments (1)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (0)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

Are humans misaligned with evolution?
TekhneMakre · 2023-10-19T03:14:14.759Z · comments (13)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

How toy models of ontology changes can be misleading
Stuart_Armstrong · 2023-10-21T21:13:56.384Z · comments (0)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Taking responsibility and partial derivatives
Ruby · 2023-12-31T04:33:51.419Z · comments (1)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

[link] cold aluminum for medicine
bhauth · 2023-12-16T14:38:03.260Z · comments (4)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (30)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (11)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

Surviving Seveneves
Yair Halberstadt (yair-halberstadt) · 2024-06-19T13:11:55.414Z · comments (4)

Upgrading the AI Safety Community
trevor (TrevorWiesinger) · 2023-12-16T15:34:26.600Z · comments (9)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

[link] Beyond the Board: Exploring AI Robustness Through Go
AdamGleave · 2024-06-19T16:40:06.594Z · comments (2)

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
Mikhail Samin (mikhail-samin) · 2023-12-27T18:44:33.976Z · comments (17)

One-shot strategy games?
Raemon · 2024-03-11T00:19:20.480Z · comments (42)

Matrix completion prize results
paulfchristiano · 2023-12-20T15:40:04.281Z · comments (0)

How Emergency Medicine Solves the Alignment Problem
StrivingForLegibility · 2023-12-26T05:24:35.579Z · comments (4)

On plans for a functional society
kave · 2023-12-12T00:07:46.629Z · comments (8)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

Goals selected from learned knowledge: an alternative to RL alignment
Seth Herd · 2024-01-15T21:52:06.170Z · comments (17)

How ARENA course material gets made
CallumMcDougall (TheMcDouglas) · 2024-07-02T18:04:00.209Z · comments (2)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

The Perils of Professionalism
Screwtape · 2023-11-07T00:07:33.213Z · comments (1)

How to partition teams to move fast? Debating "low-dimensional cuts"
jacobjacob · 2023-10-13T21:43:53.067Z · comments (2)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (14)

Estimating effective dimensionality of MNIST models
Arjun Panickssery (arjun-panickssery) · 2023-11-02T14:13:09.012Z · comments (3)

[link] Podcast with Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”
garrison · 2024-05-10T17:23:20.436Z · comments (0)

In memory of Louise Glück
Joe Carlsmith (joekc) · 2023-10-15T02:59:42.687Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zack_m_davis on ASIs will not leave just a little sunlight for Earth

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full" [LW(p) · GW(p)]) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" [LW(p) · GW(p)] and another thread on "Cosmopolitan Values Don't Come Free" [LW(p) · GW(p)],

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates" [LW · GW]: if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration [LW · GW] of some relevant considerations, the Superhappies in "Three Worlds Collide" [LW · GW] cared about the humans to some extent, but not in the specific way [LW · GW] that the humans wanted to be cared for.)

Now, you are on the record stating [LW(p) · GW(p)] that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address [LW · GW]—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence [LW · GW].

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnalt are very different things. But you chose the illustrative example! As a matter of local validity [LW · GW], It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

dkl9 on How harmful is music, really?

I added intention-to-treat statistics in an addendum.

quetzal_rainbow on ASIs will not leave just a little sunlight for Earth

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.

andrewtaneglen on The Other Existential Crisis

Continental moments are great. I feel like that's the end game once we transcend science and analysis.

andrewtaneglen on The Other Existential Crisis

Agree wholeheartedly. I'm trying to describe transcending the realisation of determinism, in the sense of not getting bogged down by it, to then go about living a good human life.

andrewtaneglen on The Other Existential Crisis

Cognitive effort is inevitable. It would take a special kind of 'person who fails the psychopath test', somehow lacking urges/feelings, to be able to switch off completely and fade into nothingness.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

It does not produce anything close to fully formed scientific papers. It's output is really not better than just prompting o1 yourself. Of course, o1 and even Sonnet and GPT-4 are very impressive, but there is no update to be made after you've played around with that.

(Again) I think this is missing the point that we've now (for the first time, to my knowledge) observed an early demo the full research workflow being automatable, as flawed as the outputs might be.

lukas-finnveden on Applications of Chaos: Saying No (with Hastings Greer)

and as he admits in the footnote he didn't include in the LW version, in real life, when adequately incentivized to win rather than find excuses involving 'well, chaos theory shows you can't predict ball bounces more than n bounces out', pinball pros learn how to win and rack up high scores despite 'muh chaos'.

I was confused about this part of your comment because the post directly talks about this in the conclusion.

The strategy typically is to catch the ball with the flippers, then to carefully hit the balls so that it takes a particular ramp which scores a lot of points and then returns the ball to the flippers. Professional pinball players try to avoid the parts of the board where the motion is chaotic.

The "off-site footnote" you're referring to seems to just be saying "The result is a pretty boring game. However, some of these ramps release extra balls after you have used them a few times. My guess is that this is the game designer trying to reintroduce chaos to make the game more interesting again." which is just a minor detail. AFAICT pros could score lots of points even without the extra balls.

(I'm leaving this comment here because I was getting confused about whether there had been major edits to the post, since the relevant content is currently in the conclusion and not the footnote. I was digging through the wayback machine and didn't see any major edits. So trying to save other people from the same confusion.)

thane-ruthenis on Another argument against utility-centric alignment paradigms

Mm, there are two somewhat different definitions of what counts as "a natural abstraction":

I would agree that human values are likely a natural abstraction in the sense that if you point an abstraction-learning algorithm at the dataset of modern humans doing things, "human values" and perhaps even "eudaimonia" would fall out as a natural principal component of that dataset's decomposition.
What I wouldn't agree with is that human values are a natural abstraction in the sense that a mind pointed at the dataset of this universe doing things, or at the dataset of animals doing things, or even at the dataset of prehistoric or medieval humans doing things, would learn modern human values.

Let's step back a bit.

Suppose we have a system Alpha and a system Beta, with Beta embedded in Alpha. Alpha starts out with a set of natural abstractions/subsystems. Beta, if it's an embedded agent, learns these abstractions, and then starts executing actions within Alpha that alter its embedding environment. Over the course of that, Beta creates new subsystems, corresponding to new abstractions.

As concrete examples, you can imagine:

The lifeless universe as Alpha (with abstractions like "stars", "gasses", "seas"), and the biosphere as Beta (creating abstractions like "organisms" and "ecosystems" and "predator" and "prey").
The biosphere as Alpha (with abstractions like "food" and "species") and the human civilization as Beta (with abstractions like "luxury" and "love" and "culture").

Notice one important fact: the abstractions Beta creates are not, in general, easy-to-predict from the abstractions already in Alpha. "A multicellular organism" or "an immune-system virus" do not naturally fall out of descriptions of geological formations and atmospheric conditions. They're highly contingent abstraction, ones that are very sensitive to the exact conditions in which they formed. (Biochemistry, the broad biosphere the system is embedded in...)

Similarly, things like "culture" or "eudaimonia" or "personal identity", the way humans understand them, don't easily fall out of even the abstractions present in the biosphere. They're highly contingent on the particulars of how human minds and bodies are structured, how they exchange information, et cetera.

In particular: humans, despite being dropped into an abstraction-rich environment, did not learn values that just mirror some abstraction present in the environment. We're not wrapper-minds single-mindedly pursuing procreation, or the eradication of predators, or the maximization of the number of stars. Similarly, animals don't learn values like "compress gasses".

What Beta creates are altogether new abstractions defined in terms of complicated mixes of Alpha's abstractions. And if Beta is the sort of system that learns values, it learns values that wildly mix the abstractions present in Beta. These new abstractions are indeed then just some new natural abstraction. But they're not necessarily "simple" in terms of Alpha's abstractions.

And now we come to the question of what values an AGI would learn. I would posit that, on the current ML paradigm, the setup is the basic Alpha-and-Beta setup, with the human civilization being Alpha and the AGI being Beta.

Yes, there are some natural abstractions in Alpha, like "eudaimonia". But to think that the AGI would just naturally latch onto that single natural abstraction, and define its entire value system over it, is analogous to thinking that animals would explicitly optimize for gas-compression, or humans for predator-elimination or procreation.

I instead strongly expect that the story would just repeat. The training process (or whatever process spits out the AGI) would end up creating some extremely specific conditions in which the AGI is learning the values. Its values would then necessarily be some complicated functions over weird mixes of the abstractions-natural-to-the-dataset-it's-trained-on, with their specifics being highly contingent on some invisible-to-us details of that process.

It would not be just "eudaimonia", it'd be some weird nonlinear function of eudaimonia and a random grab-bag of other things, including the "Beta-specific" abstractions that formed within the AGI over the course of training. And the output would not necessarily have anything to do with "eudaimonia" in any recognizable way, the way "avoid predators" is unrecognizable in terms of "rocks" and "aerodynamics", and "human values" are unrecognizable in terms of "avoid predators" or "maximize children".

brendan-long on ASIs will not leave just a little sunlight for Earth

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI, I just don't think this particular argument works)