LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (17)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (56)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (25)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

Feedbackloop-first Rationality
Raemon · 2023-08-07T17:58:56.349Z · comments (65)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (138)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

Labs should be explicit about why they are building AGI
peterbarnett · 2023-10-17T21:09:20.711Z · comments (16)

Announcing Timaeus
Jesse Hoogland (jhoogland) · 2023-10-22T11:59:03.938Z · comments (15)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (42)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (12)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB (JanBrauner) · 2023-09-28T18:53:58.896Z · comments (38)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)

The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (85)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

AI as a science, and three obstacles to alignment strategies
So8res · 2023-10-25T21:00:16.003Z · comments (80)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

[link] Large Language Models will be Great for Censorship
Ethan Edwards · 2023-08-21T19:03:55.323Z · comments (14)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

[link] OpenAI API base models are not sycophantic, at any size
nostalgebraist · 2023-08-29T00:58:29.007Z · comments (20)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

There should be more AI safety orgs
Marius Hobbhahn (marius-hobbhahn) · 2023-09-21T14:53:52.779Z · comments (25)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob · 2023-09-01T04:03:41.067Z · comments (23)

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)

re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (23)

Architects of Our Own Demise: We Should Stop Developing AI Carelessly
Roko · 2023-10-26T00:36:05.126Z · comments (75)

Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains (tracingwoodgrains) · 2023-12-19T12:00:23.529Z · comments (170)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sheikh-abdur-raheem-ali on Species as Canonical Referents of Super-Organisms

This is an unusually well written post for its genre.

raemon on My theory of change for working in AI healthtech

It's more like an intuitive guess than anything based on anything particularly rigorous, but, like, it takes time for companies and nation-states and international communities to get to agree to things, we don't seem anywhere close, there will be political forces opposing the pause, and 3 years seems like a generously short time if we even got moderately lucky, to get all the necessary actors to pause in a stable way.

lblack on The Computational Complexity of Circuit Discovery for Inner Interpretability

At a very brief skim, it doesn't look like the problem classes this paper looks at are problem classes I'd care about much. Seems like a case of scoping everything broadly enough that something in the defined problem class ends up very hard.

remmelt-ellen on Why Stop AI is barricading OpenAI

When you say failures will "build up toward lethality at some unknown rate", why would failures build up toward lethality? We have lots of automated systems e.g. semiconductor factories, and failures do not accumulate until everyone at the factory dies, because humans and automated systems can notice errors and correct them.

Let's take your example of semiconductor factories.

There are several ways to think about failures here. For one, we can talk about local failures in the production of the semiconductor chips. These especially will get corrected for.

A less common way to talk about factory failures is when workers working in the factories die or are physically incapacitated as a result, eg. because of chemical leaks or some robot hitting them. Usually when this happens, the factories can keep operating and existing. Just replace the expended workers with new workers.

Of course, if too many workers die, other workers will decide to not work at those factories. Running the factories has to not be too damaging to the health of the internal human workers, in any of the many (indirect) that ways operations could turn out to be damaging.

The same goes for humans contributing to the surrounding infrastructure needed to maintain the existence of these sophisticated factories – all the building construction, all the machine parts, all the raw materials, all the needed energy supplies, and so on. If you try overseeing the relevant upstream and downstream transactions, it turns out that a non-tiny portion of the entire human economy is supporting the existence of these semiconductor factories one way or another. It took a modern industrial cross-continental economy to even make eg. TSMC's factories viable.

The human economy acts as a forcing function constraining what semiconductor factories can be. There are many many ways to inadvertently incapacitate complex multi-celled cooperative organisms like us. So the semiconductor factories that humans are maintaining today ended up being constrained to those that for the most part do not trigger those pathways.

Some of that is because humans went through the effort of noticing errors explicitly and then correcting them, or designing automated systems to do likewise. But the invisible hand of the market considered broadly – as constituting of humans with skin in the game, making often intuitive choices – will actually just force semiconductor factories to be not too damaging to surrounding humans maintaining the needed infrastructure.

With AGI, you lose that forcing function.

Let's take AGI to be machinery that is autonomous enough to at least automate all the jobs needed to maintain its own existence. Then AGI is no longer dependent on an economy of working humans to maintain its own existence. AGI would be displacing the human economy – as a hypothetical example, AGI is what you'd get if those semiconductor factories producing microchips expanded to producing servers and robots using those microchips that in turn learn somehow to design themselves to operate the factories and all the factory-needed infrastructure autonomously.

Then there is one forcing function left: the machine operation of control mechanisms. Ie. mechanisms that detect, model, simulate, evaluate, and correct downstream effects in order to keep AGI safe.

The question becomes – Can we rely on only control mechanisms to keep AGI safe?
That question raises other questions.

E.g. as relevant [LW(p) · GW(p)] to the hashiness model:
“Consider the space of possible machinery output sequences over time. How large is the subset of output sequences that in their propagation as (cascading) environmental effects would end up lethally disrupting the bodily functioning of humans? How is the accumulative probability of human extinction distributed across the entire output possibility space (or simplified: how mixed are the adjoining lethal and non-lethal possibility subspaces)? Can any necessarily less complex control system connected with/in this machinery actually keep tracking whether possible machinery outputs fall into the lethal sub-space or the non-lethal sub-space? "

This is pretty similar to Hendrycks's natural selection argument, but with the additional piece that the goals of AIs will converge to optimizing the environment for the survival of silicon-based life.

There are some ways to expand Hendrycks’ argument to make it more comprehensive:

Consider evolutionary selection at the more fundamental level of physical component interactions. Ie. not just at the macro level of agents competing for resources, since this is a leaky abstraction that can easily fail to capture underlying vectors of change.
Consider not only selection of local variations (ie. mutations) that introduces new functionality, but also the selection of variants connecting up with surrounding units in ways that ends up repurposing [LW(p) · GW(p)] existing functionality.
Consider not only the concept of goals that are (able to be) explicitly tracked by the machinery itself, but also that of the implicit conditions needed by components which end up being selected for in expressions across the environment.

Evolutionary arguments are notoriously tricky and respected scientists get them wrong all the time

This is why we need to take extra care in modelling how evolution – as a kind of algorithm – would apply across the physical signalling pathways of AGI.

I might share a gears-level explanation that Forrest that just gave in response to your comment.

roko on The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind

But the text that you link to does not suggest any mechanism, that would actually protect Steve

There is a baseline set of rules that exists for exactly this purpose, which I didn't want to go into detail on in that piece because it's extremely distracting from the main point. These rules are not necessarily made purely by humans, but could for example be the result of some kind of AI-assisted negotiation that happens at ELYSIUM Setup.

"There would also be certain baseline rules like “no unwanted torture, even if the torturer enjoys it”, and rules to prevent the use of personal utopias as weapons."

But I think you're correct that the system that implements anti-weaponization and the systems that implement extrapolated volitions are potentially pushing against each other. This is of course a tension that is present in human society as well, which is why we have police.

So basically the question is "how do you balance the power of generalized-police against the power of generalized-self-interest."

Now the whole point of having "Separate Individualized Utopias" is to reduce the need for police. In the real world, it does seem to be the case that extremely geographically isolated people don't need much in the way of police involvement. Most human conflicts are conflicts of proximity, crimes of opportunity, etc. It is rare that someone basically starts an intercontinental stalking vendetta against another person. And if you had the entire resources of police departments just dedicated to preventing that kind of crime, and they also had mind-reading tech for everyone, I don't think it would be a problem.

I think the more likely problem is that people will want to start haggling over what kind of universal rights they have over other people's utopias. Again, we see this in real life. E.g. "diverse" characters forced into every video game because a few people with a lot of leverage want to affect the entire universe.

So right now I don't have a fully satisfactory answer to how to fix this. It's clear to me that most human conflict can be transformed into a much easier negotiation over basically who gets how much money/general-purpose-resources. But the remaining parts could get messy.

yams on yams's Shortform

The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.

Yarvin and Land were core members of a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.

If anyone know of explicit connections between the CCRU and contemporary phenomena (beyond Yarvin/Land/Fisher’s immediate influence via their later work), I’d love to hear about them.

aysja on Dario Amodei — Machines of Loving Grace

"It's plausible that things could go much faster than this, but as a prediction about what will actually happen, humanity as a whole probably doesn't want things to get incredibly crazy so fast, and so we're likely to see something tamer." I basically agree with that.

I feel confused about how this squares with Dario’s view that AI is "inevitable," and "driven by powerful market forces." Like, if humanity starts producing a technology which makes practically all aspects of life better, the idea is that this will just… stop? I’m sure some people will be scared of how fast it’s going, but it’s hard for me to see the case for the market in aggregate incentivizing less of a technology which fixes ~all problems and creates tremendous value. Maybe the idea, instead, is that governments will step in...? Which seems plausible to me, but as Ryan notes, Dario doesn’t say this.

yams on yams's Shortform

Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?

mo-putera on Arithmetic is an underrated world-modeling technology

Society seems to think pretty highly of arithmetic. It’s one of the first things we learn as children. So I think it’s weird that only a tiny percentage of people seem to know how to actually use arithmetic. Or maybe even understand what arithmetic is for.

I was a bit thrown off by the seeming mismatch between the title ("underrated") and this introduction ("rated highly, but not used or understood as well as dynomight prefers").

The explanation seems straightforward: arithmetic at the fluency you display in the post is not easy, even with training. If you only spend time with STEM-y folks you might not notice, because they're a very numerate bunch. I'd guess I'm about average w.r.t. STEM-y folks and worse than you are, but I do quite a bit of spreadsheet-modeling for work, and I have plenty of bright hardworking colleagues who can't quite do the same at my level even though they want to, which suggests not underratedness but difficulty.

(To be clear I enjoy the post, and am a fan of your blog. :) )

sanxiyn on AI #86: Just Think of the Potential

If we do get powerful AI, it seems highly plausible that even if we stay in control we will 'go too fast' in deploying it relative to society's ability to adapt, if only because of the need to grow fast and stay ahead of others, and because the market doesn't care that society wants it to go slower.

After reading my interpretation was this: assuming we stay in control, that happens only if powerful AI is aligned. The market doesn't care that society wants to go slower, but AI will care that society wants to go slower, so when the market tries to force AI to go faster, AI will refuse.

I reflected on whether I am being too generous, but I don't think I am. Other readings didn't make sense to me, and I am assuming Dario is trying to make sense, while you seem doubtful. That is, I think this is plausibly Dario's actual prediction of how fast things will go, not a hope it won't go faster. But importantly, that is assuming alignment. Since that assumption is already hopeful, it is natural the prediction under that assumption sounds hopeful.

Paul Crowley: It's a strange essay, in that it asks us to imagine a world in which a single datacenter contains 1E6 Nobelists expert in every field and thinking at 100x speed, and asks what happens if "sci-fi" outcomes somehow don’t happen. Of course "sci-fi" stuff happens almost immediately.
I mean, yes, sci-fi style stuff does seem rather obviously like it would happen? If it didn't, then that’s a rather chilling indictment of the field of sci-fi?

To re-state, sci-fi outcomes don't happen because AI is aligned. Proof: if sci-fi outcomes happened, AI would be unaligned. I actually think this point is extremely clear in the essay. It literally states: "An aligned AI would not want to do these things (and if we have an unaligned AI, we're back to talking about risks)".