LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] To Be Born in a Bag
Niko_McCarty (niko-2) · 2024-10-06T17:21:00.605Z · comments (1)

Announcing the PIBBSS Symposium '24!
DusanDNesic · 2024-09-03T11:19:47.568Z · comments (0)

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
Sam F. Brown (sam-4) · 2024-07-22T12:33:57.656Z · comments (0)

Ten counter-arguments that AI is (not) an existential risk (for now)
Ariel Kwiatkowski (ariel-kwiatkowski) · 2024-08-13T22:35:15.341Z · comments (5)

Avoiding the Bog of Moral Hazard for AI
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-13T21:24:34.137Z · comments (12)

"Real AGI"
Seth Herd · 2024-09-13T14:13:24.124Z · comments (18)

[link] Should Sports Betting Be Banned?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-21T14:13:35.404Z · comments (2)

Determining the power of investors over Frontier AI Labs is strategically important to reduce x-risk
Lucie Philippon (lucie-philippon) · 2024-07-25T01:12:20.518Z · comments (7)

[question] How great is the utility of "saving" endangered languages?
SpectrumDT · 2024-08-20T13:14:32.895Z · answers+comments (29)

Finding Deception in Language Models
Esben Kran (esben-kran) · 2024-08-20T09:42:13.060Z · comments (4)

[link] AI existential risk probabilities are too unreliable to inform policy
Oleg Trott (oleg-trott) · 2024-07-28T00:59:59.497Z · comments (5)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

Bryan Johnson and a search for healthy longevity
NancyLebovitz · 2024-07-27T15:28:13.117Z · comments (17)

Rabin's Paradox
Charlie Steiner · 2024-08-14T05:40:25.572Z · comments (40)

Can Large Language Models effectively identify cybersecurity risks?
emile delcourt (emile-delcourt) · 2024-08-30T20:20:21.345Z · comments (0)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

"Which Future Mind is Me?" Is a Question of Values
dadadarren · 2024-08-09T18:17:09.884Z · comments (12)

Is Text Watermarking a lost cause?
egor.timatkov · 2024-10-01T16:20:51.113Z · comments (13)

OpenAI Boycott Revisit
Jake Dennie · 2024-07-22T01:44:55.094Z · comments (2)

Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-08-24T07:39:00.057Z · comments (0)

[question] Is this voting system strategy proof?
Donald Hobson (donald-hobson) · 2024-09-06T20:44:46.691Z · answers+comments (9)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

[link] AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering
Corin Katzke (corin-katzke) · 2024-07-29T17:50:52.454Z · comments (1)

[link] Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T13:23:27.033Z · comments (11)

[link] Four Levels of Voting Methods
hive · 2024-09-26T18:15:00.565Z · comments (3)

[link] Will we ever run out of new jobs?
Kevin Kohler (KevinKohler) · 2024-08-19T15:04:03.849Z · comments (7)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (7)

[link] Instruction Following without Instruction Tuning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-24T13:49:09.078Z · comments (0)

[link] some questionable space launch guns
bhauth · 2024-10-13T22:52:26.418Z · comments (0)

Initial Experiments Using SAEs to Help Detect AI Generated Text
Aaron_Scher · 2024-07-22T05:16:20.516Z · comments (0)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

[link] CultFrisbee
Gauraventh (aryangauravyadav) · 2024-08-11T21:36:36.550Z · comments (3)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

[link] Non-Transactional Compliments
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:42:16.471Z · comments (0)

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (5)

All the Following are Distinct
Gianluca Calcagni (gianluca-calcagni) · 2024-08-02T16:35:51.815Z · comments (3)

The Residual Expansion: A Framework for thinking about Transformer Circuits
Daniel Tan (dtch1997) · 2024-08-02T11:04:56.347Z · comments (13)

An information-theoretic study of lying in LLMs
Annah (annah) · 2024-08-02T10:06:39.312Z · comments (0)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

Room Available in Boston Group House
NoSignalNoNoise (AspiringRationalist) · 2024-07-23T02:55:59.602Z · comments (1)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

The new UK government's stance on AI safety
Elliot Mckernon (elliot) · 2024-07-31T15:23:59.235Z · comments (0)

Simulation-aware causal decision theory: A case for one-boxing in CDT
kongus_bongus · 2024-08-09T18:09:20.013Z · comments (11)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (21)

Why the 2024 election matters, the AI risk case for Harris, & what you can do to help
Alex Lintz (alex-lintz) · 2024-09-24T19:32:46.893Z · comments (4)

Automating LLM Auditing with Developmental Interpretability
htlou · 2024-09-04T15:50:04.337Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

philgoetz on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong

But there is no possible world with a perfect predictor. The Newcomb paradox hinges on a hypothetical situation with zero probability, so all reason based on it is inadmissible.

yams on yams's Shortform

Many MATS scholars go to Anthropic (source: I work there).

Redwood I’m really not sure, but that could be right.

Sam now works at Anthropic.

Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly the most transparent lab re misuse risks.

I take the thrust of your comment to be “OP funds safety, do your research”. I work in safety; I know they fund safety.

I also know most safety projects differentially benefit Anthropic (this fact is independent of whether you think differentially benefiting Anthropic is good or bad).

If you can make a stronger case for any of the other of the dozens of orgs on your list than exists for the few above, I’d love to hear it. I’ve thought about most of them and don’t see it, hence why I asked the question.

Further: the goalpost is not ‘net positive with respect to TAI x-risk.’ It is ‘not plausibly a component of a meta-strategy targeting the development of TAI at Anthropic before other labs.’

Edit: use of the soldier mindset flag above is pretty uncharitable here; I am asking for counter-examples to a hypothesis I’m entertaining. This is the actual opposite of soldier mindset.

matthew-barnett on The Hidden Complexity of Wishes

While the term "outer alignment" wasn’t coined until later to describe the exact issue that I'm talking about, I was using that term purely as a descriptive label for the problem this post clearly highlights, rather than implying that you were using or aware of the term in 2007.

Because I was simply using "outer alignment" in this descriptive sense, I reject the notion that my comment was anachronistic. I used that term as shorthand for the thing I was talking about, which is clearly and obviously portrayed by your post, that's all.

To be very clear: the exact problem I am talking about is the inherent challenge of precisely defining what you want or intend, especially (though not exclusively) in the context of designing a utility function. The difficulty arises because, when the desired outcome is complex, it becomes nearly impossible to perfectly delineate between all potential 'good' scenarios and all possible 'bad' scenarios. This challenge has been a recurring theme in discussions of alignment, as it's considered hard to capture every nuance of what you want in your specification without missing an edge case.

It is frankly frustrating to me that, from my perspective, you seem to have reliably missed the point of what I am trying to convey here.

I only brought up Christiano-style proposals because I thought you were changing the topic to a broader discussion, specifically to ask me what methodologies I had in mind when I made particular points. If you had not asked me "So would you care to spell out what clever methodology you think invalidates what you take to be the larger point of this post -- though of course it has no bearing on the actual point that this post makes?" then I would not have mentioned those things. In any case, none of the things I said about Christiano-style proposals were intended to critique this post's narrow point. I was responding to that particular part of your comment instead.

As far as the actual content of this post, I do not dispute its exact thesis. The post seems to be a parable, not a detailed argument with a clear conclusion. The parable seems interesting to me. It also doesn't seem wrong, in any strict sense. However, I do think that some of the broader conclusions that many people have drawn from the parable seem false, in context. I was responding to the specific way that this post had been applied and interpreted in broader arguments about AI alignment.

My central thesis in regards to this post is simply: the post clearly portrays a specific problem that was later called the "outer alignment" problem by other people. This post portrays this problem as being difficult in a particular way. And I think this portrayal is misleading, even if the literal parable holds up in pure isolation.

lsusr on What's a good book for a technically-minded 11-year old?

Besides abstractapplic's excellent answer [LW(p) · GW(p)],

A Brief History of Time and The Universe in a Nutshell by Stephen Hawking
Ender's Game by Orson Scott Card
Foundation by Isaac Asimov
The Martian by Andy Weir
Paleontology: A Brief History of Life by Ian Tattersall
Richard Feynmann's books

radford-neal-1 on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

Sure. By tweaking your "weights" or other fudge factors, you can get the right answer using any probability you please. But you're not using a generally-applicable method, that actually tells you what the right answer is. So it's a pointless exercise that sheds no light on how to correctly use probability in real problems.

To see that the probability of Heads is not "either 1/2 or 1/3, depending on what reference class you choose, or how you happen to feel about the problem today", but is instead definitely, no doubt about it, 1/3, consider the following possibility:

Upon wakening, Beauty see that there is a plate of fresh muffins beside her bed. She recognizes them as coming from a nearby cafe. She knows that they are quite delicious. She also knows that, unfortunately, the person who makes them on Mondays puts in an ingredient that she is allergic to, which causes a bad tummy ache. Muffins made on Tuesday taste the same, but don't cause a tummy ache. She needs to decide whether to eat a muffin, weighing the pleasure of their taste against the possibility of a subsequent tummy ache.

If Beauty thinks the probability of Heads is 1/2, she presumably thinks the probability that it is Monday is (1/2)+(1/2)*(1/2)=3/4, whereas if she thinks the probability of Heads is 1/3, she will think the probability that it is Monday is (1/3)+(1/2)*(2/3)=2/3. Since 3/4 is not equal to 2/3, she may come to a different decision about whether to eat a muffin if she thinks the probability of Heads is 1/2 than if she thinks it is 1/3 (depending on how she weighs the pleasure versus the pain). Her decision should not depend on some arbitrary "reference class", or on what bets she happens to be deciding whether to make at the same time. She needs a real probability. And on various grounds, that probability is 1/3.

hector-perez-arenas on Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism

Users can register now with email/password.

cubefox on Concrete benefits of making predictions

Assigning a low probability that I will do a task in time is a self-fulfilling prophecy. Because the expected utility (probability times utility) is low, the motivation to do the task decreases. Ideally I would never assign probabilities to acts when choosing what to do, and only compare their utilities.

matthew-barnett on The Hidden Complexity of Wishes

Matthew is not disputing this point, as far as I can tell.
Instead, he is trying to critique some version of^[1] the "larger argument" (mentioned in the May 2024 update to this post) in which this point plays a role.

I'll confirm that I'm not saying this post's exact thesis is false. This post seems to be largely a parable about a fictional device, rather than an explicit argument with premises and clear conclusions. I'm not saying the parable is wrong. Parables are rarely "wrong" in a strict sense, and I am not disputing this parable's conclusion.

However, I am saying: this parable presumably played some role in the "larger" argument that MIRI has made made in the past. What role did it play? Well, I think a good guess is that it portrayed the difficulty of precisely specifying what you want or intend, for example when explicitly designing a utility function. This problem was often alleged to be difficult because, when you want something complex, it's difficult to perfectly delineate potential "good" scenarios and distinguish them from all potential "bad" scenarios.

While the term "outer alignment" was not invented to describe this exact problem until much later, I was using that term purely as descriptive terminology for the problem this post clearly describes, rather than claiming that Eliezer in 2007 was deliberately describing something that he called "outer alignment" at the time. Because my usage of "outer alignment" was merely descriptive in this sense, I reject the idea that my comment was anachronistic.

And again: I am not claiming that this post is inaccurate in isolation. In both my above comment, and in my 2023 post, I merely cited this post as portraying an aspect of the problem that I was talking about, rather than saying something like "this particular post's conclusion is wrong". I think the fact that the post doesn't really have a clear thesis in the first place means that it can't be wrong in a strong sense at all. However, the post was definitely interpreted as explaining some part of why alignment is hard — for a long time by many people — and I was critiquing the particular application of the post to this argument, rather than the post itself in isolation.

cubefox on Darklight's Shortform

What's correlation space, as opposed to probability space?

irenictruth on Why I’m not a Bayesian

I shy away from fuzzy logic because I used it as a formalism to justify my religious beliefs. (In particular, "Possibilistic Logic" allowed me to appear honest to myself—and I'm not sure how much of it was self-deception and how much was just being wrong.)

The critical moment in my deconversion came when I realized that if I was looking for truth, I should reason according to the probabilities of the statements I was evaluating. Thirty minutes later, I had gone from a convinced Christian speaking to others, leading in my local church, and basing my life and career on my beliefs to an atheist who was primarily uncertain about atheism because of self-distrust.

Grounding my beliefs in falsifiable statements and probabilistic-ish models has been a beneficial discipline that forces me to recognize my limits and helps predict the outcomes of my actions. I don't know if I could do the same with fuzzy logic and "reasoning by model."