LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (11)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (3)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (22)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (19)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] Policymakers don't have access to paywalled articles
Adam Jones (domdomegg) · 2025-01-05T10:56:11.495Z · comments (7)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (dan-braun-1) · 2024-05-17T16:25:02.267Z · comments (10)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

Noticing Panic
Cole Wyeth (Amyr) · 2024-02-05T03:45:51.794Z · comments (8)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (15)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lorec on On Eating the Sun

Your linked post on The Obliqueness Thesis [LW · GW] is curious. You conclude thus:

Obliqueness obviously leaves open the question of just how oblique. It's hard to even formulate a quantitative question here. I'd very intuitively and roughly guess that intelligence and values are 3 degrees off (that is, almost diagonal), but it's unclear what question I am even guessing the answer to. I'll leave formulating and answering the question as an open problem.

I agree, values and beliefs are oblique. The 3 spatial dimensions are also mutually oblique, as per General Relativity. A theory of obliqueness is meaningless if it cannot specify the angles [ I think in a correct general linear algebra, everything would be treated as [at least potentially] oblique to everything else, but that doesn't mean I refuse to ever treat the 3 spatial dimensions as mutually orthogonal ].

As with the 3 spatial dimensions in practical ballistics, with the value dimension and the belief dimension in practical AI alignment, there are domains of discussion where it is appropriate to account for the skew between the dimensions and domains where it is appropriate to simply treat them as orthogonal. Discussions of alignment theory such as the ones in which you seek to insert your Obliqueness thesis, are a domain in which the orthogonality assumption is appropriate. We cannot guess at the skew with any confidence in particular cases, and with respect to any particular pre-chosen utility/valence function term versus any particular belief-state [e.g. "what is the dollar value of a tulip?" x "is there a teapot circling Mars?"], the level of skew is almost certain to be negligible.

Planning for anthropically selected futures, on the other hand, is a domain where the skew between values and beliefs becomes relevant. There is less point reasoning in detail about pessimized futures or others that disagree with our values [such as the ones in which we are dead], no matter how likely or unlikely they might be "in a vacuum", if we're trying to hyperstition our way into futures we like. But this is an esoteric and controversial argument and not actually required to justify why I don't think it's useful to consider [sufficiently] strong AI as "what can eat the sun".

All that's required to justify why I don't think it's useful to consider [sufficiently] strong AI as "what can eat the sun", is that what you propose is a benchmark of capability, or intelligence. Benchmarks of intelligence [say, of bureaucrats or chimps] are not questions of fact. They are social fictions chosen for their usefulness. If we are dead in the vast, vast supermajority of the worlds where the benchmark would otherwise be useful - in this case, the worlds where people deploy an AI that they do not know, ahead of time, if it will or will not be strong enough to eat the Sun - it is not, particularly, a benchmark we should be etching into the wood, from our present standpoint.

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

But like, I wouldn't be surprised if, say, someone trained something that performed comparably to LLMs on a wide variety of benchmarks, using much less "data"... and then when you look into it, you find that what they were doing was taking activations of the LLMs and training the smaller guy on the activations. And I'll be like, come on, that's not the point; you could just as well have "trained" the smaller guy by copy-pasting the weights from the LLM and claimed "trained with 0 data!!". And you'll be like "but we met your criterion!" and I'll just be like "well whatever, it's obviously not relevant to the point I was making, and if you can't see that then why are we even having this conversation". (Or maybe you wouldn't do that, IDK, but this sort of thing--followed by being accused of "moving the goal posts"--is why this question feels frustrating to answer.)

habryka4 on Nathan Helm-Burger's Shortform

Link?

charlie-steiner on johnswentworth's Shortform

One big reason I might expect an AI to do a bad job at alignment research is if it doesn't do a good job (according to humans) of resolving cases where humans are inconsistent or disagree. How do you detect this in string theory research? Part of the reason we know so much about physics is humans aren't that inconsistent about it and don't disagree that much. And if you go to sub-topics where humans do disagree, how do you judge its performance (because 'be very convincing to your operators' is an objective with a different kind of danger).

Another potential red flag is if the AI gives humans what they ask for even when that's 'dumb' according to some sophisticated understanding of human values. This could definitely show up in string theory research (note when some ideas suggest non-string-theory paradigms might be better, and push back on the humans if the humans try to ignore this), it's just intellectually difficult (maybe easier in loop quantum gravity research heyo gottem) and not as salient without the context of alignment and human values.

viliam on Viliam's Shortform

Robin says that we have less cultural diversity than in the past. I am not sure about that. In the past, we had geographically separated cultures, but within each culture, there wasn't enough space for many subcultures. Today, the cultures are closer, but the subcultures can be larger. Hundred years ago, there would be no such thing as the rationalist community. (Even using the example from Robin's article: it's not like Amish are living on some distant island.)

I don't understand the argument why colonizing the stars would not fix the problem (of cultural drift leading to low fertility). My worry would be the opposite -- that the future will belong to those who replicate the fastest (and sacrifice everything else for that goal).

ryan_greenblatt on Views on when AGI comes and on strategy to reduce existential risk

(Yeah, you responded, but felt not that operationalized and seemed doable to flesh out as you did.)

nathan-helm-burger on Nathan Helm-Burger's Shortform

GoodFire is now available for use, and it's easy and fun to use! You should check it out if you're interested in studying why LLMs do the things they do!

lorec on You are too dumb to understand insurance

The characters don't live in a world where sharing or smoothing risk is already seen as a consensus-valuable pursuit; thus, they will have to be convinced by other means.

I gave their world weirdly advanced [from our historical perspective] game theory to make it easier for them to talk about the question.

quinces6l on Fluoridation: The RCT We Still Haven't Run (But Should)

isn't the fact fluoride in toothpaste and brushing twice daily common likely to make it so there wouldn't be any dental harm from non-fluridated water? I've not done a deep dive on fluoride but my rough thinking is (a) it's possible it has harm (b) most people use fluoride/xylitol in toothpaste so the benefits of fluoride in water supplies seems not only negligible but likely non-existent in this day and age

pablo_stafforini on Actualism, asymmetry and extinction

Yeah, that makes sense, especially if combined with the feature that allows users to disagree with specific parts of the post, as Michael notes [LW(p) · GW(p)]. (Though note that the disagree vote is anonymous, whereas disagreeing with a selection is public, so the two aren’t fully comparable.)