LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (8)

Trying to translate when people talk past each other
Kaj_Sotala · 2024-12-17T09:40:02.640Z · comments (12)

Estimating the benefits of a new flu drug (BXM)
DirectedEvolution (AllAmericanBreakfast) · 2025-01-06T04:31:16.837Z · comments (2)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

My January alignment theory Nanowrimo
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T00:07:24.050Z · comments (2)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (6)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (9)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

Rolling Thresholds for AGI Scaling Regulation
Larks · 2025-01-12T01:30:23.797Z · comments (3)

Alternative Cancer Care As Biohacking & Book Review: Surviving "Terminal" Cancer
DenizT · 2025-01-06T07:43:52.773Z · comments (6)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (2)

Fireplace and Candle Smoke
jefftk (jkaufman) · 2025-01-01T01:50:01.408Z · comments (4)

D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset
aphyer · 2025-01-07T05:02:25.929Z · comments (8)

Dress Up For Secular Solstice
Gordon H.S. (gordon-schaefer) · 2024-12-15T16:28:24.607Z · comments (13)

Childhood and Education #8: Dealing with the Internet
Zvi · 2025-01-06T14:00:09.604Z · comments (6)

XX by Rian Hughes: Pretentious Bullshit
Yair Halberstadt (yair-halberstadt) · 2025-01-08T13:02:52.438Z · comments (5)

[question] What is MIRI currently doing?
Roko · 2024-12-14T02:39:20.886Z · answers+comments (14)

[link] Moderately Skeptical of "Risks of Mirror Biology"
Davidmanheim · 2024-12-20T12:57:31.824Z · comments (3)

[link] Announcing the Q1 2025 Long-Term Future Fund grant round
Linch · 2024-12-20T02:20:22.448Z · comments (0)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (59)

[link] You should delay engineering-heavy research in light of R&D automation
Daniel Paleka · 2025-01-07T02:11:11.501Z · comments (3)

Last week of the Discussion Phase
Raemon · 2025-01-09T19:26:59.136Z · comments (0)

1. Meet the Players: Value Diversity
Allison Duettmann (allison-duettmann) · 2025-01-02T19:00:52.696Z · comments (2)

[link] What I expected from this site: A LessWrong review
Nathan Young · 2024-12-20T11:27:39.683Z · comments (5)

AI Safety Seed Funding Network - Join as a Donor or Investor
Alexandra Bos (AlexandraB) · 2024-12-16T19:30:43.812Z · comments (0)

A Principled Cartoon Guide to NVC
plex (ete) · 2025-01-07T21:01:07.904Z · comments (5)

[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (4)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (8)

Two Weeks Without Sweets
jefftk (jkaufman) · 2024-12-31T03:30:02.003Z · comments (0)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (15)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (6)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

Will bird flu be the next Covid? "Little chance" says my dashboard.
Nathan Young · 2025-01-07T20:10:50.080Z · comments (0)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (1)

AI #98: World Ends With Six Word Story
Zvi · 2025-01-09T16:30:07.341Z · comments (1)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

Preface
Allison Duettmann (allison-duettmann) · 2025-01-02T18:59:46.290Z · comments (1)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

[link] The Roots of Progress 2024 in review
jasoncrawford · 2025-01-01T00:02:06.441Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-tan on Daniel Tan's Shortform

Why understanding planning / search might be hard

It's hypothesized that, in order to solve complex tasks, capable models perform implicit search during the forward pass. If so, we might hope to be able to recover the search representations from the model. There are examples of work that try to understand search in chess models and Sokoban models [LW · GW].

However I expect this to be hard for three reasons.

The model might just implement a bag of heuristics [LW · GW]. A patchwork collection of local decision rules might be sufficient for achieving high performance. This seems especially likely for pre-trained generative models [LW · GW].
Even if the model has a globally coherent search algorithm, it seems difficult to elucidate this without knowing the exact implementation (of which there can be many equivalent ones). For example, search over different subtrees may be parallelised [LW · GW] and subsequently merged into an overall solution.
The 'search' circuit may also not exist in a crisp form, but as a collection of many sub-components that do similar / identical things. 'Circuit cleanup' only happens in the grokking regime, and we largely do not train language models till they grok.

chris_leong on My Model of Epistemology

Well done for managing to push something out there. It's a good start, I'm sure you'll fill in some of the details with other posts over time.

mikbp on Is Musk still net-positive for humanity?

? I don't know Rosencranz.

I'm asking you because you say "Is it the case that the tech would exist without him? I think that's pretty unclear" and this, in my view, depends a lot on the answers to those questions.

Is China doing well in the EV space a bad thing?

The opposite, it is good. But if Musk did not have any influence on it, this diminishes Musk's positive impact in this field, making his impact less positive.

shankar-sivarajan on In Defense of a Butlerian Jihad

What do we privilege, the preference of doctors or the welfare of patients?
What is more important, educators preferences or quality of children education?

I understand you intended these questions to be rhetorical, but the answers you think are obvious: did you arrive at them through "pure reason," or by looking at what "democratic consensus" actually ended up with?

unexpectedvalues on How much do you believe your results?

I think this isn't the sort of post that ages well or poorly, because it isn't topical, but I think this post turned out pretty well. It gradually builds from preliminaries that most readers have probably seen before, into some pretty counterintuitive facts that aren't widely appreciated.

At the end of the post, I listed three questions and wrote that I hope to write about some of them soon. I never did, so I figured I'd use this review to briefly give my takes.

This comment [LW(p) · GW(p)] from Fabien Roger tests some of my modeling choices for robustness, and finds that the surprising results of Part IV hold up when the noise is heavier-tailed than the signal. (I'm sure there's more to be said here, but I probably don't have time to do more analysis by the end of the review period.,)
My basic take is that this really is a point in favor of well-evidenced interventions, but that the best-looking speculative interventions are nevertheless better. This is because I think "speculative" here mostly refers to partial measurement rather than noisy measurement. For example, maybe you can only foresee the first-order effects of an intervention, but not the second-order effects. If the first-order effect is a (known) quantity and the second-order effect is an (unknown) quantity $X_{2}$ , then modeling the second-order effect as zero (and thus estimating the quality of the intervention as $X_{1}$ ) isn't a noisy measurement; it's a partial measurement. It's still your best guess given the information you have.
1. I haven't thought this through very much. I expect good counter-arguments and counter-counter-arguments to exist here.
1. No -- or rather, only if the measurement is guaranteed to be exactly correct. To see this, observe that the variance of a noisy, unbiased measurement is greater than the variance of the quantity you're trying to measure (with equality only when the noise is zero), whereas the variance of a noiseless, partial measurement is less than the variance of the quantity you're trying to measure.
2. Real-world measurements are absolutely partial. They are, like, mind-bogglingly partial. This point deserves a separate post, but consider for instance the action of donating $5,000 to the Against Malaria Foundation. Maybe your measured effect from the RCT is that it'll save one life: 50 QALYs or so. But this measurement neglects the meat-eating problem [? · GW]: the expected-child you'll save will grow up to eat expected-meat from factory farms, likely causing a great amount of suffering. But then you remember: actually there's a chance that this child will have a one eight-billionth stake in determining the future of the lightcone. Oops, actually this consideration totally dominates the previous two. Does this child have better values than the average human? Again: mind-bogglingly partial!
  
  (The measurements are also, of course, noisy! RCTs are probably about as un-noisy as it gets: for example, making your best guess about the quality of an intervention by drawing inferences from uncontrolled macroeconomic data is much more noisy. So the answer is: generally both noisy and partial, but in some sense, much more partial than noisy -- though I'm not sure how much that comparison matters.)
3. The lessons of this post do not generalize to partial measurements at all! This post is entirely about noisy measurements. If you've partially measured the quality of an intervention, estimating the un-measured part using your prior will give you an estimate of intervention quality that you know is probably wrong, but the expected value of your error is zero.

drake-thomas on Drake Thomas's Shortform

I agree, zinc lozenges seem like they're probably really worthwhile (even in the milder-benefit worlds)! My less-ecstatic tone is only relative to the promise of older lesswrong posts [LW · GW] that suggested it could basically solve all viral respiratory infections, but maybe I should have made the "but actually though, buy some zinc lozenges" takeaway more explicit.

drake-thomas on Drake Thomas's Shortform

I liked this post [LW · GW], but I think there's a good chance that the future doesn't end up looking like a central example of either "a single human seizes power" or "a single rogue AI seizes power". Some other possible futures:

Control over the future by a group of humans, like "the US government" or "the shareholders of an AI lab" or "direct democracy over all humans who existed in 2029"
Takeover via an AI that a specific human crafted to do a good job at enacting that human's values in particular, but which the human has no further steering power over
Lots of different actors (both human and AI) respecting one another's property rights and pursuing goals within negotiated regions of spacetime, with no one actor having power over the majority of available resources
A governance structure which nominally leaves particular humans in charge, and which the AIs involved are rule-abiding enough to respect, but in which things are sufficiently complicated and beyond human understanding that most decisions lack meaningful human oversight [LW · GW].
A future in which one human has extremely large amounts of power, but they acquired that power via trade and consensual agreements through their immense (ASI-derived) material wealth rather than via the sorts of coercive actions we tend to imagine with words like "takeover".
A singleton ASI is in decisive control of the future, and among its values are a strong commitment to listen to human input and behave according to its understanding of collective human preferences, though maybe not its single overriding concern.

I'd be pretty excited to see more attempts at comparing these kinds of scenarios for plausibility and for how well the world might go conditional on their occurrence.

(I think it's fairly likely that lots of these scenarios will eventually converge on something like the standard picture of one relatively coherent nonhuman agent doing vaguely consequentialist maximization across the universe, after sufficient negotiation and value-reflection and so on, but you might still care quite a lot about how the initial conditions shake out, and the dumbest AI capable of performing a takeover is probably very far from that limiting state.)

drake-thomas on Human takeover might be worse than AI takeover

The action-relevant question, for deciding whether you want to try to solve alignment, is how the average world with human-controlled AGI compares to the average AGI-controlled world.

To nitpick a little, it's more like "the average world where we just barely didn't solve alignment, versus the average world where we just barely did" (to the extent making things binary in this way is sensible), which I think does affect the calculus a little - marginal AGI-controlled worlds are more likely to have AIs which maintain some human values.

(Though one might be able to work on alignment in order to improve the quality of AGI-controlled worlds from worse to better ones, which mitigates this effect.)

russellthor on In Defense of a Butlerian Jihad

If you are advocating for a Butlerian Jihad, what is your plan for starships, with societies that want to leave earth behind, have their own values and never come back? If you allow that, then simply they can do whatever they want with AI - now with 100 billion stars that is the vast majority of future humanity.

sharmake-farah on Human takeover might be worse than AI takeover

I agree that things would be harder, mostly because of the potential for sudden capabilities breakthroughs if you have RL, combined with incentives to use automated rewards more, but I don't think it's so much harder that the post is incorrect, and my basic reason is I believe the central alignment insights like data mattering a lot more than inductive bias for alignment purposes still remain true in the RL regime, so we can control values by controlling data.

Also, depending on your values, AI extinction can be preferable to some humans taking over if they are willing to impose severe suffering on you, which can definitely happen if humans align AGI/ASI.