LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures (Workshop @ EA Hotel!)
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes (steve2152) · 2024-06-15T15:57:39.533Z · comments (1)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (10)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

[question] What are your cruxes for imprecise probabilities / decision rules?
Anthony DiGiovanni (antimonyanthony) · 2024-07-31T15:42:27.057Z · answers+comments (29)

[link] UC Berkeley course on LLMs and ML Safety
Dan H (dan-hendrycks) · 2024-07-09T15:40:00.920Z · comments (1)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (4)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)

On Dwarkesh’s 3rd Podcast With Tyler Cowen
Zvi · 2024-02-02T19:30:05.974Z · comments (9)

Deeply Cover Car Crashes?
jefftk (jkaufman) · 2023-12-10T22:20:01.133Z · comments (31)

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

AI #47: Meet the New Year
Zvi · 2024-01-13T16:20:10.519Z · comments (7)

[link] Toki pona FAQ
dkl9 · 2024-03-17T21:44:21.782Z · comments (8)

AI companies' commitments
Zach Stein-Perlman · 2024-05-29T11:00:31.339Z · comments (0)

A Socratic dialogue with my student
lsusr · 2023-12-05T09:31:05.266Z · comments (14)

Please Bet On My Quantified Self Decision Markets
niplav · 2023-12-01T20:07:38.284Z · comments (6)

[link] Scaling laws for dominant assurance contracts
jessicata (jessica.liu.taylor) · 2023-11-28T23:11:07.631Z · comments (5)

Closeness To the Issue (Part 5 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-09T00:36:47.388Z · comments (0)

Drone Wars Endgame
RussellThor · 2024-02-01T02:30:46.161Z · comments (71)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

[link] "Model UN Solutions"
Arjun Panickssery (arjun-panickssery) · 2023-12-08T23:06:33.490Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Making a conservative case for alignment

I apologize. I spent some time digging for ancient evidence... and then decided against [? · GW] publishing it.

Short version is that someone said something that was kinda inappropriate back then, and would probably get an instant ban these days, with most people applauding.

richard_kennaway on Reducing x-risk might be actively harmful

What if keeping humanity alive and flourishing actually risks spreading suffering further and faster—through advanced technologies, colonization of space, or systems we can’t yet foresee? And what if our very efforts to safeguard the future have unintended consequences that exacerbate suffering in ways we can't predict?

It's up to those future people to solve their own problems. It is enough that we make a future for them to use as they please. Parents must let their children go, or what was the point of making them?

dakara on If we solve alignment, do we die anyway?

Looking more generally, there seems to be a ton of papers that develop sophisticated jailbreak attacks (that succeed against current models). Probably more than I can even list here. Are there any new defense techniques that can protect LLMs against these attacks (since the existing ones are insufficient)?

ejt on 4. Existing Writing on Corrigibility

Yes, that's a good summary. The one thing I'd say is that you can characterize preferences in terms of choices and get useful predictions about what the agent will do in other circumstances if you say something about the objects of preference. See my reply to Lucius above.

ejt on 4. Existing Writing on Corrigibility

Good summary and good points. I agree this is an advantage of truly corrigible agents over merely shutdownable agents. I'm still concerned that CAST training doesn't get us truly corrigible agents with high probability. I think we're better off using IPP training to get shutdownable agents with high probability, and then aiming for full alignment or true corrigibility from there (perhaps by training agents to have preferences between same-length trajectories that deliver full alignment or true corrigibility).

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Thanks for your thoughtful answer.

To achieve magic, we need the ability to merge minds, which can be easily done with programs and doesn't require anything quantum. If we merge 21 and 1, both will be in the same red room after awakening. If awakening in the red room means getting 100 USD, and the green room means losing it, then the machine will be profitable from the subjective point of view of the mind which enters it. Or we can just turn off 21 without awakening, in which case we will get 1/3 and 2/3 chances for green and red.

The interesting question here is whether this can be replicated at the quantum level (we know there is a way to get quantum magic in MWI, and it is quantum suicide with money prizes, but I am interested in a more subtle probability shift where all variants remain). If yes, such ability may naturally evolve via quantum Darwinism because it would give an enormous fitness advantage – I will write a separate post about this.

Now the next interesting thing: If I look at the experiment from outside, I will give all three variants 1/3, but from inside it will be 1/4, 1/4, and 1/2. The probability distribution is exactly the same as in Sleeping Beauty, and likely both experiments are isomorphic. In the SB experiment, there are two different ways of "copying": first is the coin and second is awakenings with amnesia, which complicates things.

Identity is indeed confusing. Interestingly, in the art world, path-based identity is used to define identity, that is, the provenance of artworks = history of ownership. Blockchain is also an example of path-based identity. Also, in path-based identity, the Ship of Theseus remains the same.

dakara on If we solve alignment, do we die anyway?

"Perhaps the missing piece is that I think alignment is already solved for LLM agents."

Another concern that I might have is that maybe it only seems like alignment is solved for LLMs. For example, this, this, this and this short papers argue that that seemingly secure LLMs may not be as safe as we initially believe. And it appears that they test even our models that are considered to be more secure and still find this issue.

sil-ver on What (if anything) made your p(doom) go down in 2024?

I have my own benchmark of tasks that I think measure general reasoning to decide when I freak out about LLMs, and they haven't improved on them. I was ready to be cautiously optimistic that LLMs can't scale to AGI (and would have reduced by p(doom) slightly) even if they keep scaling my conventional metrics, so the fact that scaling itself also seems to break down and we're reaching physical limits are all good things.

I'm not particularly more optimistic about alignment working anytime soon, just about very long timelines.

dr_s on Ethical Implications of the Quantum Multiverse

If what you mean by 'normalize everything' is to only consider the quantum weights (which are finite as mathematical measures) and not the number of worlds, then that seems more a case of ignoring those problems rather than addressing them.

I mean that the amount of universes that is created will be created anyway, just as a consequence of time passing. So it doesn't matter anyway. If your actions e.g. cause misery in 20% of those worlds, then the fraction is all that matters; the worlds will exist anyway, and the total amount is not something you're affecting or controlling.

This third approach is based on the idea that 'worlds' are macroscopic, emergent phenomena created through decoherence (Wallace's book contains a full mathematical treatment of this). This supports both the claim that the number of worlds is indefinite (since it depends on ultimately arbitrary mappings of macroscopic to microscopic states) and the claim that worlds are created through quantum processes (since they are macroscopically indistinguishable before decoherence occurs). My point in the post was that these two claims in combination can avoid the repugnant conclusion via the approach of focusing on the weights.

I honestly don't think decoherence means the worlds are indefinite. I think it means they are an infinite continuum with the cardinality of the reals. Decoherence is just something you observe when you divide system from environment, in reality the Universe should have only a single, always coherent, giant wavefunction.

nc-1 on Social events with plausible deniability

There's a connection to the idea of irony poisoning here, and I do not think it is good for the person in question to pretend to hold extremist views. This is a parallel issue with the fact that it's terrible optics and creates a difficult tension with this website's newfound interest in doing communications/policy/outreach work.