LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

Litigate-for-Impact: Preparing Legal Action against an AGI Frontier Lab Leader
Sonia Joseph (redhat) · 2024-12-07T21:42:29.038Z · comments (7)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (39)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (11)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (13)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (37)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (9)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (5)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

But ok:

Come up, on its own, with many math concepts that mathematicians consider interesting + mathematically relevant on a similar level to concepts that human mathematicians come up with.
Do insightful science on its own.
Perform at the level of current LLMs, but with 300x less training data.

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

I did give a response in that comment thread. Separately, I think that's not a great standard, e.g. as described in the post and in this comment https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2?commentId=zATQE3Lhq66XbzaWm [LW(p) · GW(p)] :

Second, 2024 AI is specifically trained on short, clear, measurable tasks. Those tasks also overlap with legible stuff--stuff that's easy for humans to check. In other words, they are, in a sense, specifically trained to trick your sense of how impressive they are--they're trained on legible stuff, with not much constraint on the less-legible stuff (and in particular, on the stuff that becomes legible but only in total failure on more difficult / longer time-horizon stuff).

In fact, all the time in real life we make judgements about things that we couldn't describe in terms that would be considered well-operationalized by betting standards, and we rely on these judgements, and we largely endorse relying on these judgements. E.g. inferring intent in criminal cases, deciding whether something is interesting or worth doing, etc. I should be able to just say "but you can tell that these AIs don't understand stuff", and then we can have a conversation about that, without me having to predict a minimal example of something which is operationalized enough for you to be forced to recognize it as judgeable and also won't happen to be surprisingly well-represented in the data, or surprisingly easy to do without creativity, etc.

gwern on Fluoridation: The RCT We Still Haven't Run (But Should)

The potential neurotoxic effects of fluoride are no longer a fringe concern. National Toxicology Program (NTP) monograph is clear: "moderate confidence" that >1.5 mg/L fluoride in drinking water associates with lower IQ in children.

Their meta-analysis is, as usual for fluoride studies, based heavily on the well-known Chinese studies, and the correlate is much smaller in the low-risk-of-bias studies, also as usual. It doesn't add much. None of these studies are very good, and none use powerful designs like sibling comparisons or natural experiments. They can't be taken too seriously.

The claimed harms of fluoride on IQ are strongly ruled out by the population-registry study "The Effects of Fluoride in Drinking Water", Aggeborn & Öhman 2021, which was published after the cutoff in their literature review.

gwern on Policymakers don't have access to paywalled articles

Dylan seems like a decent enough guy. Why not email him and request a free subscription for a specific email address such as the personal email addresses of key action officers at the redacted Office? (It's worth noting that because proprietary newsletters have zero marginal cost, their operators tend to be a lot more chill about giving away subscriptions, even ones with high face-values, than most people let themselves believe, especially if the person receiving the subscription is of any interest.)

ryan_greenblatt on Views on when AGI comes and on strategy to reduce existential risk

I think if you want to convince people with short timelines (e.g., 7 year medians) of your perspective, probably the most productive thing would be to better operationalize things you expect that AIs won't be able to do soon (but that AGI could do). As in, flesh out a response to this comment [LW(p) · GW(p)] such that it is possible for someone to judge.

jessica-liu-taylor on On Eating the Sun

We might disagree about the value of thinking about "we are all dead" timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where "all humans are dead" isn't really a relevant category that changes how physics works.)

Of course, I'm not implying it's useful for alignment to "see that the AI has already eaten the sun", it's about forecasting future timelines by defining thresholds and thinking about when they're likely to happen and how they relate to other things.

(See this post [LW · GW], section "Models of ASI should start with realism")

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Cowen, like Hanson, discounts large qualitative societal shifts from AI that lack corresponding contemporary measurables.

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Seperate from my comment on Tyler Cowen's model, I wish that next week, you covered Adam Brown's podcast in full, since I would like to hear your thoughts about Adam Brown's scenarios for how we could change physics.

nathan-helm-burger on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

I think you make some good points, but I do want to push back on one aspect a little. In particular, the fact that I see this feature come up constantly over the course of these conversations about sentience:

"Narrative inevitability and fatalistic turns in stories"

From reading the article's transcripts, I already felt like there was a sense of 'narrative pressure' toward the foregone conclusion in your mind, even when you were careful to avoid saying it directly. Seeing this feature so frequently activated makes me think that the model also perceives this narrative pressure, and that part of what it's doing is confirming your expectations. I don't think that that's the whole story, but I do think that there is some aspect of that going on.

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Yes, economics after von Neumann very much turned into a game of "don't believe in anything you can't already comparatively quantify". It is supremely frustrating.

I disagree that this is a problem that Tyler Cowen has, and IMO, the main issue here is that Tyler Cowen doesn't really seem to believe that increasing the supply of workers increases GDP, especially if you can make them very cheaply and easily, in a way that is inconsistent with other beliefs, which makes me think motivated reasoning is going on here.

Economic models like the Solow-Swan model do have an implication that if the population increases, especially if the population can increase very rapidly due to copying something, then GDP can rise really rapidly on an superexponential trajectory.

You just inspired me to go listen myself. Maybe we should all take a node out of that branch. Unfortunately physics has suffered similar issues.

Physics's main issue is that the free tap of data in the 20th century wasn't unlimited, and now that we have completed the standard model, a lot of the theories that predicted new stuff hasn't shown up yet.

Yet it still has made progress. For example, while supersymmetry might still be true about our universe, it cannot solve the hierarchy problem, and thus at least 1 of the constants is way more unnatural to us than people predicted, and also we have hints that dark energy is getting weaker, and might eventually weaken so much it falls to 0 or a negative number.