LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

[link] ∀: a story
Richard_Ngo (ricraz) · 2023-12-17T22:42:32.857Z · comments (1)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

What is wisdom?
TsviBT · 2023-11-14T02:13:49.681Z · comments (3)

Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley (roger-d-1) · 2024-01-05T08:46:58.915Z · comments (4)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-01-02T18:15:54.168Z · comments (0)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

The Defence production act and AI policy
[deleted] · 2024-03-01T14:26:09.064Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

AI companies' commitments
Zach Stein-Perlman · 2024-05-29T11:00:31.339Z · comments (0)

(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes (steve2152) · 2024-06-15T15:57:39.533Z · comments (1)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satron on AI Control: Improving Safety Despite Intentional Subversion

What about the following red team strategy:

Give an honest assessment of the suspicion level, unless there is a very well hidden backdoor, then give a low score. Also, only create backdoors if it is possible to hide them well.

Wouldn't this defeat the self-checking strategy?

vladimir_nesov on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Having preferenes is very different from knowing them. There's always a process of reflection that refines preferences, so any current guess is always wrong at least in detail. For a decision theory to have a shot at normativity, it needs to be able to adapt to corrections and ideally anticipate their inevitability (not locking in the older guess and preventing further reflection; instead facilitating further reflection and being corrigible).

Orthogonality asks the domain of applicability to be wide enough that both various initial guesses and longer term refinements to them won't fall out of scope. When a theory makes assumptions about value content, that makes it a moral theory rather than a decision theory. A moral theory explores particular guesses about preferences of some nature.

So in the way you use the term, quantum immortality seems to be a moral theory, involving claims that quantum suicide can be a good idea. For example "use QI to earn money" is a recommendation that depends on this assumption about preferences (of at least some people in some situations).

ustice on Ayn Rand’s model of “living money”; and an upside of burnout

What’s the payout of this model? I’m highly skeptical of any metaphor from Ayn Rand, so drawing comparisons to her ideas doesn’t add any insight for me. If I’m just not that target audience, that’s cool.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Orthogonality between goals and DT makes sense only if I don't have preferences about the type of DT or the outcomes which one of them necessitates.

In the case of QI, orthogonality works if we use QI to earn money or to care about relatives.

However, humans have preferences about existence and non-existence beyond normal money utility. In general, people strongly don't want to die. It means that I have a strong preference that some of my copies survive anyway, even if it is not very useful for some other preferences under some other DT.

Another point is the difference between Quantum suicide and QI. QS is an action, but QI is just a prediction of future observations and because of that it is less affected by decision theories. We can say that those copies of me who survive [high chance of death event] will say that they survived because of QI.

amalthea on D0TheMath's Shortform

You're putting quite a lot of weight on what "mathematicians say". Probably these people just haven't thought very hard about it?

rotatingpaguro on AI #90: The Wall

I somewhat disagree with Tenobrus' commentary about Wolfram.

I watched the full podcast, and my impression was that Wolfram uses a "scientific hat", of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.

Wolfram often recognizes that maybe he feels like agreeing with something, but "scientifically" he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.

nisan on Habryka's Shortform Feed

check out exhibit 13...

omnizoid on The Case For Giving To The Shrimp Welfare Project

As they describe in the report, the philosophical assumptions are mostly inconsequential and assumed for simplicity. The rest of your critique is just describing what they did, not an objection to it. It's not precise and they admit quite high uncertainty, but it's definitely better than alternatives (E.g. neuron counts).

silentbob on The Third Fundamental Question

I'm a bit torn regarding the "predicting how others react to what you say or do, and adjust accordingly" part. On the one hand this is very normal and human and makes sense. It's kind of predictive empathy in a way. On the other hand, thinking so very explicitly about it and trying to steer your behavior in a way so as to get the desired reaction out of another person also feels a bit manipulative and inauthentic. If I knew another person would think that way and plan exactly how they interacted with me, I would find that quite off-putting. But maybe the solution is just "don't overdo it", and/or "only use it in ways the other person would likely consent to" (such as avoiding to accidentally say something hurtful).

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Fixed! That specific response had a very weird thread structure, so makes sense the AI I used got confused. Plausible something else was missing, though I think I've now read through all the original PDFs and didn't see anything new.