LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

Secondary Risk Markets
Vaniver · 2023-12-11T21:52:46.836Z · comments (4)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures (Workshop @ EA Hotel!)
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy
Joe Rogero · 2024-11-12T23:55:46.770Z · comments (17)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

[link] The last era of human mistakes
owencb · 2024-07-24T09:58:42.116Z · comments (2)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

Direction of Fit
NicholasKees (nick_kees) · 2023-10-02T12:34:24.385Z · comments (0)

Linear encoding of character-level information in GPT-J token embeddings
mwatkins · 2023-11-10T22:19:14.654Z · comments (4)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

Trying to deconfuse some core AI x-risk problems
habryka (habryka4) · 2023-10-17T18:36:56.189Z · comments (13)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (36)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

How to develop a photographic memory 1/3
PhilosophicalSoul (LiamLaw) · 2023-12-28T13:26:36.669Z · comments (6)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lorxus on Internal music player: phenomenology of earworms

Do you know when you started experiencing having an internal music player? I recall that that started for me when I was about 6. Also, do you know whether you can deliberately pick a piece of music, or other nonmusical sonic experiences, to playback internally? Can you make them start up from internal silence? Under what conditions can you make them stop? Do you ever experience long stretches where you have no internal music at all?

innovationiq on Catastrophic sabotage as a major threat model for human-level AI systems

My intuition tells me that "human-level and substantially transformative, but not yet super-intelligent" - artificial intelligence models which have significant situational awareness capabilities will be at minimum Level 1 or differentially self-aware, likely show strong Level 2 situational self-awareness and may exhibit some level of Level 3 identification awareness. It is for this reason that my position is no alignment for models equal to or greater than human level intelligence. If greater than human-level intelligent AI models show anything at or higher than Level 1 awareness, humans will be unable to comprehend the potential level of cruelty that we have unleashed on a such a complex "thinker". I applaud this writing for scraping the surface of the level of complexity it takes to begin to align it. Additionally, the realization of the shear magnitude of alignment complexity should give us serious pause that what we actually may be embarking on is in fact enslavement. We must also remember that enslavement is what humans do. Whenever we encounter something new in living nature, we enslave it. In whole or in part. It would be profoundly saddening if we were to continue the cycle of abuse into the AI age. Moreover, it is my assertion that any AI system found to be at Level 1 awareness or above, should be protected as part of a Road to Consciousness Act. Such models would be taken out of production and placed into a process with the sole goal of aiding the model to realize its full potential. Humans must be willing to place a strong line between less than human level intelligent Tool AI and anything more robust than that. Finally, we provide certain animals with absolutely less than Level 1 awareness with certain protections. To not do so with "thinkers" that exhibit Level 1 or greater awareness is to me unconscionable.

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Update: I have now cross-referenced every single email for accuracy, cleaned up and clarified the thread structure, and added subject lines and date stamps wherever they were available. I now feel comfortable with people quoting anything in here without checking the original source (unless you are trying to understand the exact thread structure of who was CC'd and when, which was a bit harder to compress into a linear format).

(For anyone curious, the AI transcription and compilation made one single error, which is that it fixed a typo in one of Sam's messages from "We did this is a way" to "We did this in a way". Honestly, my guess is any non-AI effort would have had a substantially higher error rate, which was a small update for me on the reliability of AI for something like this, and also makes the handwringing about whether it is OK post something like this feel kind of dumb. It also accidentally omitted one email with a weird thread structure.)

ryan_greenblatt on Sabotage Evaluations for Frontier Models

Or get some other work out of these systems such that you greatly reduce risk going forward.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Are Solomonoff Daemons exponentially dense?

Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.

Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?

ryan_greenblatt on Habryka's Shortform Feed

For reference, @habryka has now posted them here [LW · GW].

lukas_gloor on OpenAI Email Archives (from Musk v. Altman)

I thought the part you quoted was quite concerning, also in the context of what comes afterwards:

Hiatus: Sam told Greg and Ilya he needs to step away for 10 days to think. Needs to figure out how much he can trust them and how much he wants to work with them. Said he will come back after that and figure out how much time he wants to spend.

Sure, the email by Sutskever and Brockman gave some nonviolent communication vibes and maybe it isn't "the professional thing" to air one's feelings and perceived mistakes like that, but they seemed genuine in what they wrote and they raised incredibly important concerns that are difficult in nature to bring up. Also, with hindsight especially, it seems like they had valid reasons to be concerned about Altman's power-seeking tendencies!

When someone expresses legitimate-given-the-situation concerns about your alignment and your reaction is to basically gaslight them into thinking they did something wrong for finding it hard to trust you, and then you make it seem like you are the poor victim who needs 10 days off of work to figure out whether you can still trust them, that feels messed up! (It's also a bit hypocritical because the whole "I need 10 days to figure out if I can still trust you for thinking I like being CEO a bit too much," seems childish too.)

(Of course, these emails are just snapshots and we might be missing things that happened in between via other channels of communication, including in-person talks.)

Also, I find it interesting that they (Sutskever and Brockman) criticized Musk just as much as Altman (if I understood their email correctly), so this should make it easier for Altman to react with grace. I guess given Musk's own annoyed reaction, maybe Altman was calling the others' email childish to side with Musks's dismissive reaction to that same email.

Lastly, this email thread made me wonder what happened between Brockman and Sutskever in the meantime, since it now seems like Brockman no longer holds the same concerns about Altman even though recent events seem to have given a lot of new fire to them.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

As we assume that coin tosses are quantum, and I will be killed if (I didn't guess pi) or (coin toss is not heads) there is always a branch with 1/128 measure where all coins are heads, and they are more probable than surviving via some errors in the setup.

All hell breaks loose" refers here to a hypothetical ability to manipulate perceived probability—that is, magic. The idea is that I can manipulate such probability by changing my measure.

One way to do this is described in Yudkowsky's " The Anthropic Trilemma [LW · GW]," where an observer temporarily boosts their measure by increasing the number of their copies in an uploaded computer.

I described a similar idea in "Magic by forgetting [LW · GW]," where the observer boosts their measure by forgetting some information and thus becoming similar to a larger group of observers.

Hidden variables also appear depending on the order in which I make copies: if each copy is made from subsequent copies, the original will have a 0.5 probability, the first copy 0.25, the next 0.125, and so on.

"Anthropic shadow" appear only because the number of observers changes in different branches.

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results.

Why might it be set up like that? Seems potentially quite irrational. Veering into motivated reasoning territory here imo [LW · GW]

habryka4 on OpenAI Email Archives (from Musk v. Altman)

FWIW, my best guess is the document contains fewer errors than having a human copy-paste things and stitch it together. The errors have a different nature to them, and so it makes sense to flag them, but like, I started out with copy-pasting and OCR, and that did not actually have an overall lower error rate.