LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

Solstice 2023 Roundup
dspeyer · 2023-10-11T23:09:08.252Z · comments (6)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem
Nora_Ammann · 2023-10-26T14:38:14.916Z · comments (4)

[question] How did you integrate voice-to-text AI into your workflow?
ChristianKl · 2023-11-20T12:01:37.696Z · answers+comments (12)

Deconfusing “ontology” in AI alignment
Dylan Bowman (dylan-bowman) · 2023-11-08T20:03:43.205Z · comments (3)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

[link] Cellular reprogramming, pneumatic launch systems, and terraforming Mars: Some things I learned about at Foresight Vision Weekend
jasoncrawford · 2024-01-04T19:33:57.887Z · comments (0)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Escaping Skeuomorphism
Stuart Johnson (stuart-johnson) · 2023-12-20T03:51:00.489Z · comments (0)

Heuristics for preventing major life mistakes
SK2 (lunchbox) · 2023-12-20T08:01:09.340Z · comments (2)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

[link] AI Impacts 2023 Expert Survey on Progress in AI
habryka (habryka4) · 2024-01-05T19:42:17.226Z · comments (1)

Online Dialogues Party — Sunday 5th November
Ben Pace (Benito) · 2023-10-27T02:41:00.506Z · comments (1)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (1)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

The economy is mostly newbs (strat predictions)
lukehmiles (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-matolcsi on You can, in fact, bamboozle an unaligned AI into sparing your life

I state in the post that I agree that the takeover, while the AI stabilizes its position to the degree that it can prevent other AIs from being built, can be very violent, but I don't see how hunting down everyone living in Argentina is an important step in the takeover.

I strongly disagree about Nate's post. I agree that it's good that he debunked some bad arguments, but it's just not true that he is only arguing against ideas that were trying to change how people act right now. He spends long sections on the imagined Interlocutor coming up with false hopes that are not action-relevant in the present, like our friends in the multiverse saving us, us running simulations in the future and punishing the AI for defection and us asking for half the Universe now in bargain then using a fraction of what we got to run simulations for bargaining. These take up like half the essay. My proposal clearly fits in the reference class of arguments Nate debunks, he just doesn't get around to it, and spends pages on strictly worse proposals, like one where we don't reward the cooperating AIs in the future simulations but punish the defecting ones.

notfnofn on Grounding self-reference paradoxes in reality

Okay, I read to the end and I'm a little skeptical that you properly understand the incompleteness theorem. Are you aware, for instance, that the incompleteness theorem prohibits us from even proving all first-order statement the natural numbers using any consistent mathematical framework (regardless of how powerful it is)? And that it was later shown that no consistent mathematical framework can even prove the existence/non-existence of solutions to diophantine equations? The reason I ask is that you brought up the necessity of abstract categories and I'm not really sure what you meant by that. It also seemed that you might be unaware that no mathematical framework can resolve the halting problem for any Turing complete system (this is essentially a tautology). Am I misunderstanding what you meant?

brit-cruise on How I got 4.2M YouTube views w/o making a single video

Hey i'm from YouTube (Art of the Problem) and working on a new series on Economics. Thinking about exploring it via evolutionary lens (making decisions about value)...if interested please reach out

ryan_greenblatt on You can, in fact, bamboozle an unaligned AI into sparing your life

To be clear, I think the exact scheme in A proposal for humanity in the future [LW · GW] probably doesn't work as described because the exact level of payment is wrong and more minimally we'll probably be able to make a much better approach in the future.

This seemed important to explicitly call out (and it wasn't called out explicitly in the post), though I do think it is reasonable to outline a concrete baseline proposal for how this can work.

In particular, the proposal randomly picks 10 planets per simulation. I think the exact right amount of payment will depend on how many sims/predictions you run and will heavily depend on some of the caveats under Ways this hope could fail [LW · GW]. I think you probably get decent results if the total level of payment is around 1/10 million, with returns to higher aggregate payment etc.

As far as better approaches, I expect that you'll be doing a bunch of stuff more efficient than sims and this will be part of a more general acausal trade operation among other changes.

mitchell_porter on Alexander Gietelink Oldenziel's Shortform

I was expecting an argument like "most of the probability measure for a given program, is found in certain embeddings of that program in larger programs". Has anyone bothered to make a quantitative argument, a theorem, or a rigorous conjecture which encapsulates this claim?

habryka4 on You can, in fact, bamboozle an unaligned AI into sparing your life

Yeah, I currently disagree on the competent aliens bailing us out, but I haven't thought super hard about it. It does seem good to think about (though not top priority).

habryka4 on You can, in fact, bamboozle an unaligned AI into sparing your life

I agree that in as much as you have an AI that somehow has gotten in a position to guarantee victory, then leaving humanity alive might not be that costly (though still too costly to make it worth it IMO), but a lot of the costs come from leaving humanity alive threatening your victory. I.e. not terraforming earth to colonize the universe is one more year for another hostile AI to be built, or for an asteroid to destroy you, or for something else to disempower you.

Disagree on the critique of Nate's posts. The two posts seem relatively orthogonal to me (and I generally think it's good to have debunkings of bad arguments, even if there are better arguments for a position, and in this particular case due to the multiplier nature of this kind of consideration debunking the bad arguments is indeed qualitatively more important than engaging with the arguments in this post, because the arguments in this post do indeed not end up changing your actions, whereas the arguments Nate argued against were trying to change what people do right now).

nathan-helm-burger on Base LLMs refuse too

If you are doing evals for CBRN capabilities, you are very much in the zone of terrorists killing billions of innocent people. Indeed, that's practically the definition. There's no citation, it's just my personal experience while doing evals that are much to spicy to publish.

Of course, if you're only doing evals for relatively tame proxy skills (e.g. WMDP) then probably you get less of this effect. I don't have a quantification of the rates or specific datasets, just anecdata.

ryan_greenblatt on You can, in fact, bamboozle an unaligned AI into sparing your life

acausal trade framework rest on the assumption that we are in a (quantum or Tegmark) multiverse

I think the argument should also go through without simulations and without the multiverse so long as you are a UDT-ish agent with a reasonable prior.

egi on [Completed] The 2024 Petrov Day Scenario

I guess I was not clear enough in defining what I was talking about. While it is possible to stretch the definition of "nuclear world war" to include WW2 and Little Boy and Fat Man were certainly strategic weapons at their time, this is not at all what I meant. I was talking about modern strategic weapons, i.e. MIRVed ICBMs shot from hardened silos or ballistic missile submarines, used by a modern nuclear superpower to defeat a near peer opponent. I.e. the scenario Petrov faced.

If e.g. the US in Petrov's time had managed to pull off a perfect nuclear first strike (a pretty bold assumption), destroying the whole USSR's and Chinese nuclear triad without any counter strike at all, the economic (supply chain disruption, Europe and Middle East overrun with refugees...) and political repercussions (everyone thinks the US is run by complete psychopaths) alone would have been enough to ensure in expectation a precipitous drop in quality of life for nearly all US citizens, including generals and politicans. This is true even if the whole nuclear winter idea is complete bunk.