LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (39)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (11)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Falsehoods you might believe about people who are at a rationalist meetup
Screwtape · 2025-02-01T23:32:50.398Z · comments (12)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Not all capabilities will be created equal: focus on strategically superhuman agents
benwr · 2025-02-13T01:24:46.084Z · comments (4)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (21)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

"Think it Faster" worksheet
Raemon · 2025-02-08T22:02:27.697Z · comments (8)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (18)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (22)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

beckeck on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

"Seems mistaken to think that the way you use a model is what determines whether or not it’s an agent. It’s surely determined by how you train it?"
---> Nah, pre training, fine tuning, scaffolding and especially RL seem like they all affect it. Currently scaffolding only gets you shitty agents, but it at least sorta works

davey-morse on Davey Morse's Shortform

a worthy goal for alignment work: help superintelligence develop a sense of self that is substrate independent.

a physically bounded self will be competitive with humans and clear us away. an interconnected substrate independent self may be loving AND be evolutionarily selected for, i.e. superintelligence to survive long-term

jeremy-gillen on Training AI to do alignment research we don’t already know how to do

(vague memory from the in person discussions we had last year, might be inaccurate):

jeremy!2023: If you're expecting AI to be capable enough to "accelerate alignment research" significantly, it'll need to be a full-blown agent that learns stuff. And that'll be enough to create alignment problems because data-efficient long-horizon generalization is not something we can do.

joshc!2023: No way, all you need is AI with stereotyped skills. Imagine how fast we could do interp experiments if we had AIs that were good at writing code but dumb in other ways!

...

joshc!now:

Training AI agents so they can improve their beliefs (e.g. do research) as well as the best humans can.

Seems like the reasoning behind your conclusions has changed a lot since we talked, but the conclusions haven't changed much?

If you were an AI: Negative reward, probably a bad belief updating process.

daniel-v on Export Surplusses

Literally macroeconomics 101. Trade surpluses aren't shipping goods for free. There is a whole balance of payments to consider. I'm shocked EY could get that so wrong, surprised that lsusr is so ready to agree, and confused because surely I missed something huge here, right?

kenny-2 on Local Trust

The example is really helpful for me getting a concrete understanding of what it looks like to satisfy Trust without Reflection, and why that goes along with deferring to someone else for decisions - but I don't see what this example of Alice has to do with locality. It looks like the only relevant propositions are whether it rains tomorrow, and what Alice's credences are, and there don't seem to be any propositions we don't defer to her on.

max-h on Export Surplusses

The original tweets seem at least partially tongue-in-cheek? Trade has lots of benefits [LW · GW] that don't depend on the net balance. If Country A buys $10B of goods from Country B and sells $9B of other goods to country B, that is $19B of positive-sum transactions between individual entities in each country, presumably with all sorts of positive externalities and implications about your economy.

The fact that the net flow is $1B in one direction or the other just doesn't matter too much. Having a large trade surplus (or large trade deficit) is a proxy for generally doing lots of trading and industry, which will tend to correlate with a lot of other things that made or will make you wealthy. But it would be weird if a country could get rich by solely by running a trade surplus, while somehow avoiding reaping any of the other usual benefits of trading. "Paying other countries to discern your peoples' ability to produce" is plausibly a benefit that you get from a trade surplus even if you try hard to avoid all the others, though.

kman on How to Make Superbabies

You should show your calculation or your code, including all the data and parameter choices. Otherwise I can't evaluate this.

The code is pretty complicated and not something I'd expect a non-expert (even a very smart one) to be able to quickly check over; it's not just a 100 line python script. (Or even a very smart expert for that matter, more like anyone who wasn't already familiar with our particular codebase.) We'll likely open source it at some point in the future, possibly soon, but that's not decided yet. Our finemapping (inferring causal effects) procedure produces ~identical results to the software from the paper I linked above when run on the same test data (though we handle some additional things like variable per-SNP sample sizes and missing SNPs which that finemapper doesn't handle, which is why we didn't just use it).

The parameter choices which determine the prior over SNP effects are the number of causal SNPs (which we set to 20,000) and the SNP heritability of the phenotype (which we set to 0.19, as per the GWAS we used). The erroneous effect size adjustment was done at the end to convert from the effect sizes of the GWAS phenotype (low reliability IQ test) to the effect sizes corresponding to the phenotype we care about (high reliability IQ test).

We want to publish a more detailed write up of our methods soon(ish), but it's going to be a fair bit of work so don't expect it overnight.

It's natural in your position to scrutinize low estimates but not high ones.

Yep, fair enough. I've noticed myself doing this sometimes and I want to cut it out. That said, I don't think small-ish predictable overestimates to the effect sizes are going to change the qualitative picture, since with good enough data and a few hundred to a thousand edits we can boost predicted IQ by >6 SD even with much more pessimistic assumptions, which probably isn't even safe to do (I'm not sure I expect additivity to hold that far). I'm much more worried about basic problems with our modelling assumptions, e.g. the assumption of sparse causal SNPs with additive effects and no interactions (e.g. what if rare haplotypes are deleterious due to interactions that don't show up in GWAS since those combinations are rare?).

genesmith on How to Make Superbabies

It's just very hard for me to believe there aren't huge gains possible from genetic engineering. It goes against everything we've seen from a millenia of animal breeding. It goes against the estimates we have for the fraction of variance that's linear for all these highly polygenic traits. It goes against data we've seen from statisitcal outliers like Shawn Bradley, who shows up as a 4.6 standard deviation outlier in graphs of height:

PDF) Common DNA Variants Accurately Rank an Individual of Extreme Height

Do I buy that things will get noisier around the tails, and that we might not be able to push very far outside the +5 SD mark or so? Sure. That seems unlikely, but plausible.

But the idea that you're only going to be able to push traits by 2-3 standard deviations with gene editing before your predictor breaks down seems quite unlikely.

Maybe you've seen some evidence I haven't in which case I would like to know why I should be more skeptical. But I haven't seen such evidence so far.

daniel-kokotajlo on Hauke Hillebrandt's Shortform

Oh yeah my bad, I didn't notice that the second date was Sep instead of Feb

rohinmshah on AGI Safety & Alignment @ Google DeepMind is hiring

Our hiring this round is a small fraction of our overall team size, so this is really just correcting a minor imbalance, and shouldn't be taken as reflective of some big strategy. I'm guessing we'll go back to hiring a mix of the two around mid-2025.