LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra (ajeya-cotra) · 2022-07-18T19:06:14.670Z · comments (94)

Reward is not the optimization target
TurnTrout · 2022-07-25T00:03:18.307Z · comments (123)

What should you change in response to an "emergency"? And AI risk
AnnaSalamon · 2022-07-18T01:11:14.667Z · comments (60)

Looking back on my alignment PhD
TurnTrout · 2022-07-01T03:19:59.497Z · comments (63)

On how various plans miss the hard bits of the alignment challenge
So8res · 2022-07-12T02:49:50.454Z · comments (88)

Toni Kurz and the Insanity of Climbing Mountains
GeneSmith · 2022-07-03T20:51:58.429Z · comments (67)

Changing the world through slack & hobbies
Steven Byrnes (steve2152) · 2022-07-21T18:11:05.636Z · comments (13)

Safetywashing
Adam Scholl (adam_scholl) · 2022-07-01T11:56:33.495Z · comments (20)

Sexual Abuse attitudes might be infohazardous
Pseudonymous Otter · 2022-07-19T18:06:43.956Z · comments (71)

Unifying Bargaining Notions (1/2)
Diffractor · 2022-07-25T00:28:27.572Z · comments (41)

Humans provide an untapped wealth of evidence about alignment
TurnTrout · 2022-07-14T02:31:48.575Z · comments (94)

[link] Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
Michaël Trazzi (mtrazzi) · 2022-07-22T18:44:19.749Z · comments (29)

A note about differential technological development
So8res · 2022-07-15T04:46:53.166Z · comments (32)

AGI ruin scenarios are likely (and disjunctive)
So8res · 2022-07-27T03:21:57.615Z · comments (38)

ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)

«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (32)

Resolve Cycles
CFAR!Duncan (CFAR 2017) · 2022-07-16T23:17:13.037Z · comments (8)

Brainstorm of things that could force an AI team to burn their lead
So8res · 2022-07-24T23:58:16.988Z · comments (8)

Carrying the Torch: A Response to Anna Salamon by the Guild of the Rose
moridinamael · 2022-07-06T14:20:14.847Z · comments (16)

AI Forecasting: One Year In
jsteinhardt · 2022-07-04T05:10:18.470Z · comments (12)

Conjecture: Internal Infohazard Policy
Connor Leahy (NPCollapse) · 2022-07-29T19:07:08.491Z · comments (6)

Limerence Messes Up Your Rationality Real Bad, Yo
Raemon · 2022-07-01T16:53:10.914Z · comments (41)

Principles for Alignment/Agency Projects
johnswentworth · 2022-07-07T02:07:36.156Z · comments (20)

Unifying Bargaining Notions (2/2)
Diffractor · 2022-07-27T03:40:30.524Z · comments (19)

Circumventing interpretability: How to defeat mind-readers
Lee Sharkey (Lee_Sharkey) · 2022-07-14T16:59:22.201Z · comments (12)

Moral strategies at different capability levels
Richard_Ngo (ricraz) · 2022-07-27T18:50:05.366Z · comments (14)

Criticism of EA Criticism Contest
Zvi · 2022-07-14T14:30:00.782Z · comments (17)

Focusing
CFAR!Duncan (CFAR 2017) · 2022-07-29T19:15:35.377Z · comments (23)

Examples of AI Increasing AI Progress
ThomasW (ThomasWoodside) · 2022-07-17T20:06:41.213Z · comments (14)

Safety Implications of LeCun's path to machine intelligence
Ivan Vendrov (ivan-vendrov) · 2022-07-15T21:47:44.411Z · comments (18)

Comment on "Propositions Concerning Digital Minds and Society"
Zack_M_Davis · 2022-07-10T05:48:51.013Z · comments (12)

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Jeffrey Ladish (jeff-ladish) · 2022-07-11T19:38:42.468Z · comments (27)

Naive Hypotheses on AI Alignment
Shoshannah Tekofsky (DarkSym) · 2022-07-02T19:03:49.458Z · comments (29)

Help ARC evaluate capabilities of current language models (still need people)
Beth Barnes (beth-barnes) · 2022-07-19T04:55:18.189Z · comments (6)

A summary of every "Highlights from the Sequences" post
Akash (akash-wasil) · 2022-07-15T23:01:04.392Z · comments (7)

Human values & biases are inaccessible to the genome
TurnTrout · 2022-07-07T17:29:56.190Z · comments (54)

Internal Double Crux
CFAR!Duncan (CFAR 2017) · 2022-07-22T04:34:54.719Z · comments (15)

Immanuel Kant and the Decision Theory App Store
Daniel Kokotajlo (daniel-kokotajlo) · 2022-07-10T16:04:04.248Z · comments (12)

How to Diversify Conceptual Alignment: the Model Behind Refine
adamShimi · 2022-07-20T10:44:02.637Z · comments (11)

MATS Models
johnswentworth · 2022-07-09T00:14:24.812Z · comments (5)

[link] Trends in GPU price-performance
Marius Hobbhahn (marius-hobbhahn) · 2022-07-01T15:51:10.850Z · comments (12)

All AGI safety questions welcome (especially basic ones) [July 2022]
plex (ete) · 2022-07-16T12:57:44.157Z · comments (132)

[link] Don't use 'infohazard' for collectively destructive info
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-07-15T05:13:18.642Z · comments (33)

Benchmark for successful concept extrapolation/avoiding goal misgeneralization
Stuart_Armstrong · 2022-07-04T20:48:14.703Z · comments (12)

Opening Session Tips & Advice
CFAR!Duncan (CFAR 2017) · 2022-07-25T03:57:49.731Z · comments (3)

Trigger-Action Planning
CFAR!Duncan (CFAR 2017) · 2022-07-03T01:42:22.083Z · comments (14)

Goal Factoring
CFAR!Duncan (CFAR 2017) · 2022-07-05T07:10:04.930Z · comments (2)

Addendum: A non-magical explanation of Jeffrey Epstein
lc · 2022-07-18T17:40:37.099Z · comments (21)

[question] How do AI timelines affect how you live your life?
Quadratic Reciprocity · 2022-07-11T13:54:12.961Z · answers+comments (50)

Aversion Factoring
CFAR!Duncan (CFAR 2017) · 2022-07-07T16:09:11.392Z · comments (1)

next page (older posts) →

Archive

Recent comments

gunnar_zarncke on KAN: Kolmogorov-Arnold Networks

100 times more parameter efficient (102 vs 104 parameters) [this must be a typo, this would only be 1.01 times more parameter efficient].

clearly, they mean 10^2 vs 10^4. Same with the "10−7 vs 10−5 MSE". Must be some copy-paste/formatting issue.

rvnnt on AI Clarity: An Initial Research Agenda

Possibly a nitpick, but:

The development and deployment of AGI, or similarly advanced systems, could constitute a transformation rivaling those of the agricultural and industrial revolutions.

seems like a very strong understatement. Maybe replace "rivaling" with e.g. "(vastly) exceeding"?

1a3orn on Shortform

I mean, sure, but I've been updating in that direction a weirdly large amount.

neel-nanda-1 on Mechanistic Interpretability Workshop Happening at ICML 2024!

Makes sense! Sounds like a fairly good fit

It just seems intuitively like a natural fit: Everyone in mech interp needs to inspect models. This tool makes it easier to inspect models.

Another way of framing it: Try to write your paper in such a way that a mech interp researcher reading it says "huh, I want to go and use this library for my research". Eg give examples of things that were previously hard that are now easy.

ophira on Which skincare products are evidence-based?

Yeah, glycolic acid is an exfoliant. The retinoid family also promotes cell turnover, but in a different way. You'd be over-exfoliating by using both of them at the same time.

thomas-kwa on Thomas Kwa's Shortform

To some degree yes, but I expect lots of information to be spread out across time. For example: OpenAI releases GPT5 benchmark results. Then a couple weeks later they deploy it on ChatGPT and we can see how subjectively impressive it is out of the box, and whether it is obviously pursuing misaligned goals. Over the next few weeks people develop post-training enhancements like scaffolding, and we get a better sense of its true capabilities. Over the next few months, debate researchers study whether GPT4-judged GPT5 debates reliably produce truth, and control researchers study whether GPT4 can detect whether GPT5 is scheming. A year later an open-weights model of similar capability is released and the interp researchers check how understandable it is and whether activation steering still works.

akram-choudhary on Please stop publishing ideas/insights/research about AI

Daniel, your interpretation is literally contradicted by Eliezer's exact words. Eliezer defines dignity as that which increases our chance of survival.

""Wait, dignity points?" you ask. "What are those? In what units are they measured, exactly?"

And to this I reply: Obviously, the measuring units of dignity are over humanity's log odds of survival - the graph on which the logistic success curve is a straight line. A project that doubles humanity's chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity."

niplav on Thomas Kwa's Shortform

Thank you a lot! Strong upvoted.

I was wondering a while ago whether Bayesianism says anything about how much my probabilities are "allowed" to oscillate around—I was noticing that my probability of doom was often moving by 5% in the span of 1-3 weeks, though I guess this was mainly due to logical uncertainty and not empirical uncertainty.

bogdan-ionut-cirstea on Mechanistically Eliciting Latent Behaviors in Language Models

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space seems to be using a contrastive approach for steering vectors (I've only skimmed though), it might be worth having a look.

tigerlily on How would you navigate a severe financial emergency with no help or resources?

Thank you for this. I'm not eligible for it but I will send it to my sister who is. She needs emergency dental work but the health insurance plan offered through her employer doesn't cover it so she's just been suffering through the pain. So really, thank you. She will be so glad.