LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Focus on existential risk is a distraction from the real issues. A false fallacy
Nik Samoylov (nik-samoylov) · 2023-10-30T23:42:02.066Z · comments (11)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

[link] [Linkpost] Two major announcements in AI governance today
[deleted] · 2023-10-30T17:28:16.482Z · comments (1)

[link] Grokking Beyond Neural Networks
Jack Miller (jack-miller) · 2023-10-30T17:28:04.626Z · comments (0)

[link] Response to “Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers”
Matthew Wearden (matthew-wearden) · 2023-10-30T17:27:58.166Z · comments (2)

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei · 2023-10-30T17:22:31.780Z · comments (1)

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare
trevor (TrevorWiesinger) · 2023-10-30T16:30:38.020Z · comments (0)

[Linkpost] Biden-Harris Executive Order on AI
beren · 2023-10-30T15:20:22.582Z · comments (0)

[link] AI Alignment [progress] this Week (10/29/2023)
Logan Zoellner (logan-zoellner) · 2023-10-30T15:02:26.265Z · comments (4)

Improving the Welfare of AIs: A Nearcasted Proposal
ryan_greenblatt · 2023-10-30T14:51:35.901Z · comments (5)

[link] President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
Tristan Williams (tristan-williams) · 2023-10-30T11:15:38.422Z · comments (39)

GPT-2 XL's capacity for coherence and ontology clustering
MiguelDev (whitehatStoic) · 2023-10-30T09:24:13.202Z · comments (2)

Charbel-Raphaël and Lucius discuss Interpretability
Mateusz Bagiński (mateusz-baginski) · 2023-10-30T05:50:34.589Z · comments (7)

Multi-Winner 3-2-1 Voting
Yoav Ravid · 2023-10-30T03:31:25.776Z · comments (5)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

Grokking, memorization, and generalization — a discussion
Kaarel (kh) · 2023-10-29T23:17:30.098Z · comments (10)

[link] Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
sudo · 2023-10-29T23:09:56.730Z · comments (22)

Mathematically-Defined Optimization Captures A Lot of Useful Information
J Bostock (Jemist) · 2023-10-29T17:17:03.211Z · comments (0)

Clarifying the free energy principle (with quotes)
Ryo (Flewrint Ophiuni) · 2023-10-29T16:03:31.958Z · comments (0)

[link] A new intro to Quantum Physics, with the math fixed
titotal (lombertini) · 2023-10-29T15:11:27.168Z · comments (22)

My idea of sacredness, divinity, and religion
Kaj_Sotala · 2023-10-29T12:50:07.980Z · comments (10)

The AI Boom Mainly Benefits Big Firms, but long-term, markets will concentrate
Hauke Hillebrandt (hauke-hillebrandt) · 2023-10-29T08:38:23.327Z · comments (0)

What's up with "Responsible Scaling Policies"?
habryka (habryka4) · 2023-10-29T04:17:07.839Z · comments (8)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

Comparing representation vectors between llama 2 base and chat
Nina Rimsky (NinaR) · 2023-10-28T22:54:37.059Z · comments (5)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

Book Review: Orality and Literacy: The Technologizing of the Word
Fergus Fettes (fergus-fettes) · 2023-10-28T20:12:07.743Z · comments (0)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

[link] Shane Legg interview on alignment
Seth Herd · 2023-10-28T19:28:52.223Z · comments (20)

AI Existential Safety Fellowships
mmfli · 2023-10-28T18:07:19.773Z · comments (0)

[link] AI Safety Hub Serbia Official Opening
DusanDNesic · 2023-10-28T17:03:34.607Z · comments (0)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

[question] ELI5 Why isn't alignment *easier* as models get stronger?
Logan Zoellner (logan-zoellner) · 2023-10-28T14:34:37.588Z · answers+comments (9)

Truthseeking, EA, Simulacra levels, and other stuff
Elizabeth (pktechgirl) · 2023-10-27T23:56:49.198Z · comments (12)

[question] Do you believe "E=mc^2" is a correct and/or useful equation, and, whether yes or no, precisely what are your reasons for holding this belief (with such a degree of confidence)?
l8c · 2023-10-27T22:46:51.020Z · answers+comments (14)

Value systematization: how values become coherent (and misaligned)
Richard_Ngo (ricraz) · 2023-10-27T19:06:26.928Z · comments (47)

[link] Techno-humanism is techno-optimism for the 21st century
Richard_Ngo (ricraz) · 2023-10-27T18:37:39.776Z · comments (5)

Sanctuary for Humans
nikola (nikolaisalreadytaken) · 2023-10-27T18:08:22.389Z · comments (9)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

[link] Linkpost: Rishi Sunak's Speech on AI (26th October)
bideup · 2023-10-27T11:57:46.575Z · comments (8)

ASPR & WARP: Rationality Camps for Teens in Taiwan and Oxford
Anna Gajdova (anna-gajdova) · 2023-10-27T08:40:35.436Z · comments (0)

[question] To what extent is the UK Government's recent AI Safety push entirely due to Rishi Sunak?
Stephen Fowler (LosPolloFowler) · 2023-10-27T03:29:28.465Z · answers+comments (4)

Bayesian Punishment
Rob Lucas · 2023-10-27T03:24:53.930Z · comments (1)

Online Dialogues Party — Sunday 5th November
Ben Pace (Benito) · 2023-10-27T02:41:00.506Z · comments (1)

OpenAI’s new Preparedness team is hiring
leopold · 2023-10-26T20:42:35.966Z · comments (2)

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

Symbol/Referent Confusions in Language Model Alignment Experiments
johnswentworth · 2023-10-26T19:49:00.718Z · comments (44)

[link] Unsupervised Methods for Concept Discovery in AlphaZero
aogara (Aidan O'Gara) · 2023-10-26T19:05:57.897Z · comments (0)

next page (older posts) →

Archive

Recent comments

redman on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

A lot of AI safety seems to assume that humans are safer than they are, and that producing software that operates within a specification is harder than it is. It's nice to see this paper moving towards integrating actual safety analysis (the remark about collapsing bridges was a breath of fresh air), instead of general demands that 'the AI always do as humans say'!

A human intelligence placed in charge of a nation state can kill 7 logs of humans and still be remembered heroically. An AI system placed in charge of a utopian reshaping of the society of a major country with a 'keep the deaths within 6 logs' guideline that it can actually stay within would be an improvement on the status quo.

If safety people are saying 'we cant build AI systems that could make people feel bad, and we definitely can't build systems that kill people' their demand for perfection is in conflict with improvement.

I suspect that major AI alignment failure will come from 'we put the human in charge, and human error led to the model doing bad'. The industrial/aviation safety community now rightly views 'pilot error' as a lazy way of ending an analysis and avoiding making the engineering changes to the system that the accident conditions demand.

gyrodiot on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

My raw and mostly confused/snarky comments as I was going through the paper can be found here (third section).

Cleaner version: this is not a technical agenda. This is not something that would elicit interesting research questions from a technical alignment researcher. There are however interesting claims:

what a safe system ought to be like; it proposes three scales describing its reliability;
how far up the scales we should aim for at minimum;
how low on the scales currently large deployed models are.

While it positions a variety of technical agendas (mainly those of the co-authors) on the scales, the paper does not advocate for a particular approach, only the broad direction of "here are the properties we would like to have". Uncharitably, it's a reformulation of the problem.

The scales can be useful to compare the agenda that belong to the "let's prove that the system adheres to this specification" family. It makes no claims over what the specification entails, nor failure modes of various (combinations of) levels.

I appreciate this paper as a gateway to the related agendas and relevant literature, but I'm not enthusiastic about it.

ali-shehper on Sparse Autoencoders Work on Attention Layer Outputs

I see. Thanks for the clarification!

abstractapplic on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

Response to clarifying question:

Yes. The Duke has learned the hard way that his architects' guesses as to how much their projects will end up costing are consistently worse than useless; if you want to optimize on cost as well as impossibility, that's another thing you'll have to deduce from the record of finished projects.

tailcalled on Is There Really a Child Penalty in the Long Run?

Second, Lomborg’s 2024 paper finds evidence that women time their births to just before a wage growth plateau. The evidence it gives again comes from IVF failures. Women who were planning to have a birth, but never succeed, have much flatter wage growth after their planned birth year, even though they didn’t actually have any kids. So the divergence between childrearing mothers and non-childbearing mothers shows up even in this placebo case when neither group actually had kids. Therefore, the event study is overstating the earnings impact of childbirth.

Couldn't it also be that the women in question plan their career based on the expectation to have children and this is what leads to the plateau? In that case it seems like it would be incorrect to interpret these results as evidence against a child penalty, as it's merely that the child penalty affects women regardless of whether they have the children. To check, I think you should ask the study participants why their career plateaued then.

tag on David Gross's Shortform

There's a soft patch around 5 and 6. Why is testability important? It's a charactersitic of science, but science assumes an external world. It's not a characteristic of philosophy -- good explanation is enough in philosophy, and the general posit of some sort of external world does explanatory work. And it's separate from the specific posit that the external world is knowable in some particular way.

lorxus on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

Finally, I get to give one a try! I'll edit this post with my analysis and strategy. But first, a clarifying question - are the new plans supposed to be lacking costs?

algon on My Hammer Time Final Exam

Having a large set of small 2-10 minutes task on the screen may thus feel (incorrectly) overwhelming.
The size of a task on the screen is a leaky abstraction (of it's length in time).

This is a valuable insight and makes reading this whole post worth it for me. And the obvious thought for how to correct this error is to attatch time-estimates for small tasks and convert them into a time-period view on a calendar. That way, it feels like "oh, I need 20 minutes to do all my chores today, better set a pomdoro" instead of "I have 20 things to do! 20 is big!"

Will have to try this. TEST: It doesn't look that big, though I'm including starting steps of longer term tasks. Hmm, this doesn't feel that bad, thought maybe that's the endorphins from deciding to test this talking.

BTW I gave a strong upvote because I want to see more rationality related content on LW. Otherwise I would've given a normal upvote, or maybe not even that. Nevertheless, that still means this post gets a strong upvote.

james-stephen-brown on The Alignment Problem No One Is Talking About

There's a tiny possibility he may have influenced my thinking. I did spend 6 months editing him, among others for a documentary.

james-stephen-brown on The Alignment Problem No One Is Talking About

That looks interesting, will read :) Thanks.