LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (7)

[link] Non-Transactional Compliments
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:42:16.471Z · comments (0)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (13)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (5)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

All the Following are Distinct
Gianluca Calcagni (gianluca-calcagni) · 2024-08-02T16:35:51.815Z · comments (3)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

[link] CultFrisbee
Gauraventh (aryangauravyadav) · 2024-08-11T21:36:36.550Z · comments (3)

The Residual Expansion: A Framework for thinking about Transformer Circuits
Daniel Tan (dtch1997) · 2024-08-02T11:04:56.347Z · comments (13)

An information-theoretic study of lying in LLMs
Annah (annah) · 2024-08-02T10:06:39.312Z · comments (0)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

Appealing to the Public
jefftk (jkaufman) · 2024-10-23T19:00:07.669Z · comments (0)

Join a LessWrong Team for the Unaging System Challenge
Crissman · 2024-10-23T06:01:08.018Z · comments (4)

Simulation-aware causal decision theory: A case for one-boxing in CDT
kongus_bongus · 2024-08-09T18:09:20.013Z · comments (11)

The new UK government's stance on AI safety
Elliot Mckernon (elliot) · 2024-07-31T15:23:59.235Z · comments (0)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (21)

Automating LLM Auditing with Developmental Interpretability
htlou · 2024-09-04T15:50:04.337Z · comments (0)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

Are LLMs on the Path to AGI?
Davidmanheim · 2024-08-30T03:14:04.710Z · comments (2)

Pomodoro Method Randomized Self Experiment
niplav · 2024-09-29T21:55:04.740Z · comments (2)

Cat Sustenance Fortification
jefftk (jkaufman) · 2024-07-31T02:30:04.898Z · comments (7)

Inverse Problems In Everyday Life
silentbob · 2024-10-15T11:42:30.276Z · comments (1)

My hopes for YouCongress.com
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-22T03:20:20.939Z · comments (3)

[link] Runner's High On Demand: A Story of Luck & Persistence
Shoshannah Tekofsky (DarkSym) · 2024-09-29T17:15:29.494Z · comments (6)

[question] Any Trump Supporters Want to Dialogue?
k64 · 2024-09-28T19:41:55.370Z · answers+comments (80)

Lab governance reading list
Zach Stein-Perlman · 2024-10-25T18:00:28.346Z · comments (3)

Primary Perceptive Systems
ChristianKl · 2024-08-15T11:26:01.667Z · comments (2)

Funding for work that builds capacity to address risks from transformative AI
abergal · 2024-08-14T23:52:09.922Z · comments (0)

[link] Benefits of Psyllium Dietary Fiber in Particular
Brendan Long (korin43) · 2024-08-28T18:13:23.891Z · comments (7)

The deepest atheist: Sam Altman
Trey Edwin (Paolo Vivaldi) · 2024-10-10T03:27:34.465Z · comments (2)

Humans are (mostly) metarational
Yair Halberstadt (yair-halberstadt) · 2024-10-09T05:51:16.644Z · comments (6)

[link] The Ap Distribution
criticalpoints · 2024-08-24T21:45:35.029Z · comments (3)

[question] Looking to interview AI Safety researchers for a book
jeffreycaruso · 2024-08-24T19:57:33.119Z · answers+comments (0)

[link] AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord (brendan-mccord) · 2024-09-05T18:23:32.690Z · comments (5)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

Emergence, The Blind Spot of GenAI Interpretability?
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2024-08-10T10:07:53.654Z · comments (7)

[link] Where is the Learn Everything System?
Shoshannah Tekofsky (DarkSym) · 2024-09-27T21:30:16.379Z · comments (8)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

Chevy Bolt Review
jefftk (jkaufman) · 2024-09-26T13:40:05.456Z · comments (2)

[link] AI Safety at the Frontier: Paper Highlights, September '24
gasteigerjo · 2024-10-02T09:49:00.357Z · comments (0)

Something Is Lost When AI Makes Art
utilistrutil · 2024-08-18T22:53:46.951Z · comments (0)

Lifelogging for Alignment & Immortality
Dev.Errata (ethan.roland) · 2024-08-17T23:42:56.699Z · comments (3)

[link] GPT-2 Sometimes Fails at IOI
Ronak_Mehta · 2024-08-14T23:24:39.268Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vladimir_nesov on The Alignment Trap: AI Safety as Path to Power

The point is that the "controller" of a "controllable AI" is a role that can be filled by an AI and not only by a human or a human institution. AI is going to quickly grow the pie to the extent that makes current industry and economy (controlled by humans) a rounding error, so it seems unlikely that among the entities vying for control over controllable AIs, humans and human institutions are going to be worth mentioning. It's not even about a takeover, Google didn't take over Gambia.

andrew-sauer on Living Metaphorically

Another important one: Height/Altitude is authority. Your boss is "above" you, the king, president or CEO is "at the top", you "climb the corporate ladder"

d0themath on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Sorry to give only a surface-level point of feedback, but I think this post would be much, much better if you shortened it significantly. As far as I can tell, pretty much every paragraph is 3x longer than it could be, which makes it a slog to read through.

ryan_greenblatt on How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe's argument here would actually be locally valid if we changed:

a sufficient number of IQ 100 agents with sufficient time can do anything that an IQ 101 agent can do

to:

a sufficient number of IQ 100 agents with sufficient time can do anything that some number of IQ 101 agents can do eventually

We can see why this works when applied to your analogy. If we change:

A sufficient number of 4yo’s could pick up any weight that a 5yo could pick up

to

A sufficient number of 4yo’s could pick up any weight that some number of 5yo's could pick up

Then we can see where the issue comes in. The problem is that while a team of 4yo's can always beat a single 5yo, there exists some number of 5yo's which can beat any number of 4yo's.

If we fix the local validity issue in Joe's argument like this, it is easier to see where issues might crop up.

sharmake-farah on The Alignment Trap: AI Safety as Path to Power

This honestly depends on the level of control achieved over AI in practice.

I do agree with the claim that there are pretty strong incentives to have AI peacefully takeover everything, but this is a long-term incentive, and more importantly if control gets good enough, at least some people would wield control of AI because of AIs wanting to be controlled by humans, combined with AI control strategies being good enough that you might avoid takeover at least in the early regime.

To be clear, in the long run, I expect an AI to likely (as in 70-85% likely) to wield the fruits of control, but I think that humans will at least at first wield the control for a number of years, maybe followed by uploads of humans, like virtual dictators and leaders next in line for control.

julian-stastny on The case for unlearning that removes information from LLM weights

I wonder if the approach from your paper is in some sense too conservative to evaluate whether information has been removed: Suppose I used some magical scalpel and removed all information about Harry Potter from the model.

Then I wouldn't be too surprised if this leaves a giant HP-shaped hole in the model such that, if you then fine-tune on a small amount of HP-related data, suddenly everything falls into place and makes sense to the model again, and this rapidly generalizes.

Maybe fine-tuning robust unlearning requires us to fill in the holes with synthetic data so that this doesn't happen.

julian-stastny on The case for unlearning that removes information from LLM weights

By tamper-resistant fine-tuning, are you referring to this paper by Tamirisa et al? (That'd be a pretty devastating issue with the whole motivation to their paper since no one actually does anything but use LoRA for fine-tuning open-weight models...)

vladimir_nesov on The Alignment Trap: AI Safety as Path to Power

If your work makes AI systems more controllable, who will ultimately wield that control?

A likely answer is "an AI".

vladimir_nesov on The Alignment Trap: AI Safety as Path to Power

Recent discussions about artificial intelligence safety have focused heavily on ensuring AI systems remain under human control. While this goal seems laudable on its surface, we should carefully examine whether some proposed safety measures could paradoxically enable rather than prevent dangerous concentrations of power.

The aim of avoiding AI takeover that ends poorly for humanity is not about preventing dangerous concentrations of power. Power that is distributed among AIs and not concentrated is entirely compatible with an AI takeover than ends poorly for humanity.

danielfilan on Habryka's Shortform Feed

It looks kinda small to me, someone who uses Firefox on Ubuntu.