LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] Genocide isn't Decolonization
robotelvis · 2023-10-20T04:14:07.716Z · comments (19)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP (Nicky) · 2023-10-23T16:33:04.878Z · comments (2)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

Disentangling four motivations for acting in accordance with UDT
Julian Stastny · 2023-11-05T21:26:22.514Z · comments (3)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

An argument that consequentialism is incomplete
cousin_it · 2024-10-07T09:45:12.754Z · comments (27)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

Some of my predictable updates on AI
Aaron_Scher · 2023-10-23T17:24:34.720Z · comments (8)

Is suffering like shit?
KatjaGrace · 2024-05-31T01:20:03.855Z · comments (5)

[link] Why you, personally, should want a larger human population
jasoncrawford · 2024-02-23T19:48:10.526Z · comments (32)

If you are also the worst at politics
lukehmiles (lcmgcd) · 2024-05-26T20:07:49.201Z · comments (8)

Computational Approaches to Pathogen Detection
jefftk (jkaufman) · 2023-11-01T00:30:13.012Z · comments (5)

Monthly Roundup #13: December 2023
Zvi · 2023-12-19T15:10:08.293Z · comments (5)

Being against involuntary death and being open to change are compatible
Andy_McKenzie · 2024-05-27T06:37:27.644Z · comments (5)

Preface to the Sequence on LLM Psychology
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:07.742Z · comments (0)

0. The Value Change Problem: introduction, overview and motivations
Nora_Ammann · 2023-10-26T14:36:15.466Z · comments (0)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

[link] OpenAI, DeepMind, Anthropic, etc. should shut down.
Tamsin Leake (carado-1) · 2023-12-17T20:01:22.332Z · comments (48)

Being good at the basics
dominicq · 2023-11-04T14:18:50.976Z · comments (1)

How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare
trevor (TrevorWiesinger) · 2023-10-30T16:30:38.020Z · comments (0)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (9)

In Defense of Lawyers Playing Their Part
Isaac King (KingSupernova) · 2024-07-01T01:32:58.695Z · comments (9)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (4)

[link] Manifund: 2023 in Review
Austin Chen (austin-chen) · 2024-01-18T23:50:13.557Z · comments (0)

Video and transcript of presentation on Scheming AIs
Joe Carlsmith (joekc) · 2024-03-22T15:52:03.311Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

turntrout on TurnTrout's shortform feed

Be careful that you don't say "the incentives are bad :(" as an easy out. "The incentives!" might be an infohazard, promoting a sophisticated sounding explanation for immoral behavior:

If you find yourself unable to do your job without regularly engaging in practices that clearly devalue the very science you claim to care about, and this doesn’t bother you deeply, then maybe the problem is not actually The Incentives—or at least, not The Incentives alone. Maybe the problem is You.
~ No, it’s not The Incentives—it’s you

The lesson extends beyond science to e.g. Twitter conversations where you're incentivized to sound snappy and confident and not change your mind publicly.

radford-neal-1 on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

I don't understand this formulation. If Beauty always says that the probability of Heads is 1/7, does she win? Whatever "win" means...

petermccluskey on A Rocket–Interpretability Analogy

The primary motive for funding NASA was definitely related to competing with the USSR, but I doubt that it was heavily focused on military applications. It was more along the lines of demonstrating the general superiority of the US system, in order to get neutral countries to side with us because we were on track to win the cold war.

gilch on Open Thread Fall 2024

We have already identified some key resources involved in AI development that could be restricted. The economic bottlenecks are mainly around high energy requirements and chip manufacturing.

Energy is probably too connected to the rest of the economy to be a good regulatory lever, but the U.S. power grid can't currently handle the scale of the data centers the AI labs want for model training. That might buy us a little time. Big tech is already talking about buying small modular nuclear reactors to power the next generation of data centers. Those probably won't be ready until the early 2030s. Unfortunately, that also creates pressures to move training to China or the Middle East where energy is cheaper, but where governments are less concerned about human rights.

A recent hurricane flooding high-purity quartz mines made headlines because chip producers require it for the crucibles used in making silicon wafers. Lower purity means accidental doping of the silicon crystal, which means lower chip yields per wafer, at best. Those mines aren't the only source, but they seem to be the best one. There might also be ways to utilize lower-purity materials, but that might take time to develop and would require a lot more energy, which is already a bottleneck.

The very cutting-edge chips required for AI training runs require some delicate and expensive extreme-ultraviolet lithography machines to manufacture. They literally have to plasmify tin droplets with a pulsed laser to reach those frequencies. ASML Holdings is currently the only company that sells these systems, and machines that advanced have their own supply chains. They have very few customers, and (last I checked) only TSMC was really using them successfully at scale. There are a lot of potential policy levers in this space, at least for now.

mitchell_porter on The Personal Implications of AGI Realism

Who said biological immortality (do you mean a complete cure for ageing?) requires nanobots?

We know individual cell lines can go on indefinitely, the challenge is to have an intelligent multicellular organism that can too.

programcrafter on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

What exactly do you mean by "different tools need to be used"? Can you give me an example?

I mean that Beauty should maintain full model of experiment, and use decision theory as well as probability theory (if latter is even useful, which it admittedly seems to be). If she didn't keep track of full setup but only "a fair coin was flipped, so the odds are 1:1", she would predictably lose when betting on the coin outcome.

Also, I've minted another "paradox" version. I can predict you'll take issue with one of formulations in it, but what do you think about it?

A fair coin is flipped, hidden from you.
On Heads, you're waken up on Monday, asked "what credence do you have that coin landed Heads?"; on Tuesday, you're let go.
If coin landed Tails, you're waken up on Monday and still asked "what credence do you have that coin landed Heads?"; then, with no memory erasure, you're waken up on Tuesday, and experimenter says to you: "Name the credence that coin landed Heads, but you must name the exact same number as yesterday". Afterwards, you're let go.
If you don't follow experiment protocol, you lose/lose out on some reward.

benito on Lighthaven Sequences Reading Group #7 (Tuesday 10/22)

Thanks, edited! Please let me know if it still doesn't work.

lechmazur on Sabotage Evaluations for Frontier Models

Somewhat related: I just published the LLM Deceptiveness and Gullibility Benchmark. This benchmark evaluates both how well models can generate convincing disinformation and their resilience against deceptive arguments. The analysis covers 19,000 questions and arguments derived from provided articles.

internaseanhall on Lighthaven Sequences Reading Group #7 (Tuesday 10/22)

I get a "Something's not right. This page doesn't exist" error when I click on the dinner donation link

sharmake-farah on The Mask Comes Off: At What Price?

I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I'm not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.

I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don't involve extinction from uncontrolled AIs.