LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Super-Exponential versus Exponential Growth in Compute Price-Performance
moridinamael · 2023-10-06T16:23:56.714Z · comments (25)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

[link] Neel Nanda on the Mechanistic Interpretability Researcher Mindset
Michaël Trazzi (mtrazzi) · 2023-09-21T19:47:02.745Z · comments (1)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

I designed an AI safety course (for a philosophy department)
Eleni Angelou (ea-1) · 2023-09-23T22:03:00.036Z · comments (15)

Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley (roger-d-1) · 2024-01-05T08:46:58.915Z · comments (4)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

[link] How To Socialize With Psycho(logist)s
Sable · 2023-10-20T11:33:46.066Z · comments (11)

[link] ∀: a story
Richard_Ngo (ricraz) · 2023-12-17T22:42:32.857Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Orthogonality between goals and DT makes sense only if I don't have preferences about the type of DT or the outcomes which one of them necessitates.

In the case of QI, orthogonality works if we use QI to earn money or to care about relatives.

However, humans have preferences about existence and non-existence beyond normal money utility. In general, people strongly don't want to die. It means that I have a strong preference that some of my copies survive anyway, even if it is not very useful for some other preferences under some other DT.

Another point is the difference between Quantum suicide and QI. QS is an action, but QI is just a prediction of future observations and because of that it is less affected by decision theories. We can say that those copies of me who survive [high chance of death event] will say that they survived because of QI.

amalthea on D0TheMath's Shortform

You're putting quite a lot of weight on what "mathematicians say". Probably these people just haven't thought very hard about it?

rotatingpaguro on AI #90: The Wall

I somewhat disagree with Tenobrus' commentary about Wolfram.

I watched the full podcast, and my impression was that Wolfram uses a "scientific hat", of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.

Wolfram often recognizes that maybe he feels like agreeing with something, but "scientifically" he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.

nisan on Habryka's Shortform Feed

check out exhibit 13...

omnizoid on The Case For Giving To The Shrimp Welfare Project

As they describe in the report, the philosophical assumptions are mostly inconsequential and assumed for simplicity. The rest of your critique is just describing what they did, not an objection to it. It's not precise and they admit quite high uncertainty, but it's definitely better than alternatives (E.g. neuron counts).

silentbob on The Third Fundamental Question

I'm a bit torn regarding the "predicting how others react to what you say or do, and adjust accordingly" part. On the one hand this is very normal and human and makes sense. It's kind of predictive empathy in a way. On the other hand, thinking so very explicitly about it and trying to steer your behavior in a way so as to get the desired reaction out of another person also feels a bit manipulative and inauthentic. If I knew another person would think that way and plan exactly how they interacted with me, I would find that quite off-putting. But maybe the solution is just "don't overdo it", and/or "only use it in ways the other person would likely consent to" (such as avoiding to accidentally say something hurtful).

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Fixed! That specific response had a very weird thread structure, so makes sense the AI I used got confused. Plausible something else was missing, though I think I've now read through all the original PDFs and didn't see anything new.

satron on Sabotage Evaluations for Frontier Models

I haven't heard of any such corrupt deals with OpenAI or Anthropic concerning governmental oversight over AI technology on the scale that would make me worried. Do you have any links to articles about government employees (who are responsible for oversight) recently signing secret contracts with OpenAI or Anthropic that would prohibit them from giving real feedback on a big enough scale to make it concerning?

satron on Sabotage Evaluations for Frontier Models

I will try to provide another similar analogy. Let's say that a King got tired of his people dying from diseases, so he decided to try a novel method of vaccination.

However, some people were really concerned about that. As far as they were concerned, the default outcome of injecting viruses into the bodies of people is death. And the King wants to vaccinate everyone, so these people create a council of great and worthy thinkers who after thinking for a while come up with a list of reasons why vaccines are going to doom everyone.

However, some other great and worthy thinkers (let's call them "hopefuls") come to the council and give reasons to think that aforementioned reasons are mistaken. Maybe they have done their own research, which seems to vindicate King's plan or at least undermine council's arguments.

And now imagine that King comes down from the castle points his finger at hopefuls' giving arguments for why the arguments proposed by the council is wrong and says "yeah, basically this" and then turns around and goes back to the castle. To me it seems like King's official endorsement of the arguments proposed by hopefuls doesn't really change the ethicality of the situation, as long as King is acting according to hopefuls' plan.

Furthermore, imagine if the one of the hopefuls who come to argue with the council was actually an undercover King. And he gave exactly the same arguments as people before him. This still IMO doesn't change the ethicality of the situation.

chaosmage on What Ketamine Therapy Is Like

First I heard of it was from an anesthesiologist who was very happy with how it is the only way to get to full anesthesia without depressing the patient's heart rate, so for senior patients it was really the only option. In retrospect, his enthusiasm about it does seem suspicious, but we were surrounded by professors and I don't think he was lying.