LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

minutes from a human-alignment meeting
bhauth · 2024-05-24T05:01:53.904Z · comments (4)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

Ophiology (or, how the Mamba architecture works)
Danielle Ensign (phylliida-dev) · 2024-04-09T19:31:09.975Z · comments (8)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (23)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)
AE Studio (AEStudio) · 2024-03-26T20:59:09.129Z · comments (8)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (11)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (17)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (14)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (52)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

[link] Most experts believe COVID-19 was probably not a lab leak
DanielFilan · 2024-02-02T19:28:00.319Z · comments (89)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (22)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

But ok:

Come up, on its own, with many math concepts that mathematicians consider interesting + mathematically relevant on a similar level to concepts that human mathematicians come up with.
Do insightful science on its own.
Perform at the level of current LLMs, but with 300x less training data.

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

I did give a response in that comment thread. Separately, I think that's not a great standard, e.g. as described in the post and in this comment https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2?commentId=zATQE3Lhq66XbzaWm [LW(p) · GW(p)] :

Second, 2024 AI is specifically trained on short, clear, measurable tasks. Those tasks also overlap with legible stuff--stuff that's easy for humans to check. In other words, they are, in a sense, specifically trained to trick your sense of how impressive they are--they're trained on legible stuff, with not much constraint on the less-legible stuff (and in particular, on the stuff that becomes legible but only in total failure on more difficult / longer time-horizon stuff).

In fact, all the time in real life we make judgements about things that we couldn't describe in terms that would be considered well-operationalized by betting standards, and we rely on these judgements, and we largely endorse relying on these judgements. E.g. inferring intent in criminal cases, deciding whether something is interesting or worth doing, etc. I should be able to just say "but you can tell that these AIs don't understand stuff", and then we can have a conversation about that, without me having to predict a minimal example of something which is operationalized enough for you to be forced to recognize it as judgeable and also won't happen to be surprisingly well-represented in the data, or surprisingly easy to do without creativity, etc.

gwern on Fluoridation: The RCT We Still Haven't Run (But Should)

The potential neurotoxic effects of fluoride are no longer a fringe concern. National Toxicology Program (NTP) monograph is clear: "moderate confidence" that >1.5 mg/L fluoride in drinking water associates with lower IQ in children.

Their meta-analysis is, as usual for fluoride studies, based heavily on the well-known Chinese studies, and the correlate is much smaller in the low-risk-of-bias studies, also as usual. It doesn't add much. None of these studies are very good, and none use powerful designs like sibling comparisons or natural experiments. They can't be taken too seriously.

The claimed harms of fluoride on IQ are strongly ruled out by the population-registry study "The Effects of Fluoride in Drinking Water", Aggeborn & Öhman 2021, which was published after the cutoff in their literature review.

gwern on Policymakers don't have access to paywalled articles

Dylan seems like a decent enough guy. Why not email him and request a free subscription for a specific email address such as the personal email addresses of key action officers at the redacted Office? (It's worth noting that because proprietary newsletters have zero marginal cost, their operators tend to be a lot more chill about giving away subscriptions, even ones with high face-values, than most people let themselves believe, especially if the person receiving the subscription is of any interest.)

ryan_greenblatt on Views on when AGI comes and on strategy to reduce existential risk

I think if you want to convince people with short timelines (e.g., 7 year medians) of your perspective, probably the most productive thing would be to better operationalize things you expect that AIs won't be able to do soon (but that AGI could do). As in, flesh out a response to this comment [LW(p) · GW(p)] such that it is possible for someone to judge.

jessica-liu-taylor on On Eating the Sun

We might disagree about the value of thinking about "we are all dead" timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where "all humans are dead" isn't really a relevant category that changes how physics works.)

Of course, I'm not implying it's useful for alignment to "see that the AI has already eaten the sun", it's about forecasting future timelines by defining thresholds and thinking about when they're likely to happen and how they relate to other things.

(See this post [LW · GW], section "Models of ASI should start with realism")

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Cowen, like Hanson, discounts large qualitative societal shifts from AI that lack corresponding contemporary measurables.

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Seperate from my comment on Tyler Cowen's model, I wish that next week, you covered Adam Brown's podcast in full, since I would like to hear your thoughts about Adam Brown's scenarios for how we could change physics.

nathan-helm-burger on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

I think you make some good points, but I do want to push back on one aspect a little. In particular, the fact that I see this feature come up constantly over the course of these conversations about sentience:

"Narrative inevitability and fatalistic turns in stories"

From reading the article's transcripts, I already felt like there was a sense of 'narrative pressure' toward the foregone conclusion in your mind, even when you were careful to avoid saying it directly. Seeing this feature so frequently activated makes me think that the model also perceives this narrative pressure, and that part of what it's doing is confirming your expectations. I don't think that that's the whole story, but I do think that there is some aspect of that going on.

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Yes, economics after von Neumann very much turned into a game of "don't believe in anything you can't already comparatively quantify". It is supremely frustrating.

I disagree that this is a problem that Tyler Cowen has, and IMO, the main issue here is that Tyler Cowen doesn't really seem to believe that increasing the supply of workers increases GDP, especially if you can make them very cheaply and easily, in a way that is inconsistent with other beliefs, which makes me think motivated reasoning is going on here.

Economic models like the Solow-Swan model do have an implication that if the population increases, especially if the population can increase very rapidly due to copying something, then GDP can rise really rapidly on an superexponential trajectory.

You just inspired me to go listen myself. Maybe we should all take a node out of that branch. Unfortunately physics has suffered similar issues.

Physics's main issue is that the free tap of data in the 20th century wasn't unlimited, and now that we have completed the standard model, a lot of the theories that predicted new stuff hasn't shown up yet.

Yet it still has made progress. For example, while supersymmetry might still be true about our universe, it cannot solve the hierarchy problem, and thus at least 1 of the constants is way more unnatural to us than people predicted, and also we have hints that dark energy is getting weaker, and might eventually weaken so much it falls to 0 or a negative number.