LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (17)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (7)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (40)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

archimedes on Vladimir_Nesov's Shortform

Even if it’s the same cost to train, wouldn’t it still be a win if inference is a significant part of your compute budget?

martin-randall on Deception Chess: Game #1

Seems like it should be possible to automate this now but having all five participants be, for example, LLMs with access to chess AIs of various levels.

kaynank on Thread for Sense-Making on Recent Murders and How to Sanely Respond

The same interviewer has now done two more podcasts on Ziz.

With Adrusi:

With @jessicata [LW · GW]:

knight-lee on Pick two: concise, comprehensive, or clear rules

Thank you very much for bringing that up. That does look like a clearer warning, somehow I didn't remember it very well.

knight-lee on Pick two: concise, comprehensive, or clear rules

Shadow-banned means that your comments are invisible to others and you aren't told about that fact.

I admit that even if users are told that their comments are invisible, some users might fail to notice. But it can be made very clear, maybe they have to click a warning before they see the commenting text-area.

rhollerith_dot_com on Mikhail Samin's Shortform

we're probably doomed in that case anyways, even without increasing alignment research.

I believe we're probably doomed anyways.

I think even you would agree what P(1) > P(2)

Sorry to disappoint you, but I do not agree.

Although I don't consider it quite impossible that we will figure out alignment, most of my hope for our survival is in other things, such as a group taking over the world and then using their power to ban AI research. So for example, if Putin or Xi were dictator of the world, my guess is that there is a good chance he would choose to ban all AI research. Why? It has unpredictable consequences. We Westerners (particularly Americans) are comfortable with drastic change, even if that change has drastic unpredictable effects on society; non-Westerners are much more skeptical: there have been too many invasions, revolutions and peasant rebellions that have killed millions in their countries. I tend to think that the main reason Xi supports China's AI industry is to prevent the US and the West from superseding China and if that consideration were removed (because for example he had gained dictatorial control over the whole world) he'd choose to just shut it down (and he wouldn't feel that need to have a very strong argument for that shutting it down like Western decision-makers would: non-Western leader shut important things down all the time or at least they would if the governments they led had the funding and the administrative capacity to do so).

Of course Xi's acquiring dictatorial control over the whole world is extremely unlikely, but the magnitude of the technological changes and societal changes that are coming will tend to present opportunities for certain coalitions to acquire enough control over the developed world (or over all sites with leading-edge fabs) to have the power to shut AI research down.

Also, extending our survival by ten or 20 years allows some process you and I cannot even imagine right now to burst onto the scene and save us. I'm very skeptical of any intervention that reduces the amount of time we have left in the hopes that this AI juggernaut is not really as potent a threat to us as it currently appears. I was much much less skeptical of alignment research 20 years ago, but since then a research organization has been exploring the solution space and the the leaders of that organization are reporting that the alignment project is almost completely hopeless. Yes, this organization (MIRI) is small compared to other organizations, but it has been funded well enough to keep about a dozen top-notch researchers on the payroll and it has been competently led.

satron on “Sharp Left Turn” discourse: An opinionated review

I am not from the US, so I don't know anything about the organizations that you have listed. However, we can look at three main conventional sources of existential risk (excluding AI safety, for now, we will come back to it later):

Nuclear Warfare - Cooperation strategies + decision theory are active academic fields.
Climate Change - This is a very hot topic right now, and a lot of research is being put into it.
Pandemics - There was quite a bit of research before COVID, but even more now.

As to your point about Hassabis not doing other projects for reducing existential risk besides AI alignment:

Most of the people on this website (at least, that's my impression) have rather short timelines. This means that work on a lot of existential risks becomes much less critical. For example, while supervolcanoes are scary, they are unlikely to wipe us out before we get an AI powerful enough to be able to solve this problem more efficiently than we can.

As to your point about Sam Altman choosing to cure cancer over reducing x-risk:

I don't think Sam looks forward to the prospect of dying due to some x-risk, does he? After all, he is just as human as we are, and he, too, would hate the metaphorical eruption of a supervolcano. But that's going to become much more critical after we have an aligned and powerful AI on our hands. This also applies to LeCun/Zuckerberg/random OpenAI researchers. They (or at least most of them) seem to be enjoying their lives and wouldn't want to randomly lose them.

As to your last paragraph:

I think your formulation of AI response ("preventing existential risks is very hard and fraught, but hey, what if I do a global mass persuasion campaign") really undersells it. It almost makes it sound like a mass persuasion campaign is a placebo to make AI's owner feel good.

Consider another response "the best option for preventing existential risks, would be doing a mass persuasion campaign of such and such content, that would have such and such effects, and would prevent such and such risks".

My (naive) reaction (assuming that I think that AI is properly aligned and is not hallucinating) would be to ask more questions about the details of this proposal and, if its answers make sense, to proceed with doing it. Is it that crazy? I think doing a persuasion campaign is in itself not necessarily bad (we deal with them every day in all sorts of contexts, ranging from your relatives convincing you to take a day off to people online making you change your mind on some topic to politicians running election campaigns), and is especially not bad when it is the best option for preventing x-risk.

In the scenario that you described, I am not sure at all that an average person, who would be able to get their hands on AI (this is an important point, because I believe these people to be generally smarter/more creative/more resourceful) would opt for a second option, especially after being told: "Well I could try something much more low-key and norm-following, but it probably won't work".

Option 1: Good odds of preventing x-risk, but apparently also doesn't work in movies.

Option 2: Bad odds of preventing x-risk.

Perhaps, the crux is that I think that people who actually have good chances of controlling powerful AI (Sam Altman, Dario Amodei etc) would see that option #1 is obviously better if they themselves want to survive (something that is instrumental to whatever other goals they have like curing cancer).

Furthermore, there is a more sophisticated solution to this problem. I can ask this powerful AI to find the best representations of human preferences and then find the best way to align itself to it. Then, this AI would no longer be constrained by my cognitive limitations.

An average person probably wouldn't come up with it by themselves, but they don't need to. Presumably, if this is indeed a good solution, it would be suggested at the first stage where I ask AI to find best solutions for preventing existential risk. Would an average person take this option? I expect our intuitions to diverge here much strongly than in the previous case. Looking at myself, I was an average person just a few month ago. I have no expertise in AI whatsoever. I don't consider myself to be among the smartest humans on Earth. I haven't read much literature on AI safety. I was basically an exemplary "normie".

And yet, when I read about the idea of an instruction-following AI, I almost immediately came up with this solution of making an AI aligned with human preferences using the aforementioned instruction-following AI. It sidesteps a lot of misuse concerns, which (at the time) seemed scary.

I grant that an average person might not come up with this solution by themselves, but presumably (if this solution is any good), it would be suggested by AI itself when we ask it to come up with solutions for x-risk.

An average person could ask this AI how to ensure that this endeavor won't end up in a scenario like the one in the movie. Then it can cross-check its answer with a security team which has successfully aligned it.

My conclusion is basically that I would expect a good chunk of average people to opt for a sophisticated solution after AI suggesting it, an ever bigger chunk of average people to opt for a naive solution, and an even bigger chunk of people, who would most likely control powerful AI in the future (generally smarter than an average person), to opt for either solution.

sweenesm on Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World

Thanks for the comment! Perhaps I was more specific than needed, but I wanted to give people (and any AI's reading this) some concrete examples. I imagine AI's will someday be able to optimize this idea.

I would love it if our school system changed to include more emotional education, but I'm not optimistic they would do this well right now (due in part to educators not having experience with emotional education themselves). Hopefully AI's will help at some point.

lc on Viliam's Shortform

Yeah, one possible answer is "don't do anything weird, ever". That is the safe way, on average. No one will bother writing a story about you, because no one would bother reading it.

You laugh, but I really think a group norm of "think for yourself, question the outside world, don't be afraid to be weird" is part of the reason why all of these groups exist. If you tell people that over and over some of those people will turn out to be crazy and conclude crime and violence is acceptable. I don't know if there's anything to do about that, but it is a thing.

thomas-kwa on OpenAI releases deep research agent

It's not clear whether agents will think in neuralese, maybe end-to-end RL in English is good enough for the next few years and CoT messages won't drift enough to allow steganography
Once agents think in either token gibberish or plain vectors maybe self-monitoring will still work fine. After all agents can translate between other languages just fine. We can use model organisms or some other clever experiments to check whether the agent faithfully translates its CoT or unavoidably starts lying to us as it gets more capable.
I care about the exact degree to which monitoring gets worse. Plausibly it gets somewhat worse but is still good enough to catch the model before it coups us.