LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AGI safety from first principles: Introduction
Richard_Ngo (ricraz) · 2020-09-28T19:53:22.849Z · comments (18)

In Defense of Chatbot Romance
Kaj_Sotala · 2023-02-11T14:30:05.696Z · comments (52)

Selection Theorems: A Program For Understanding Agents
johnswentworth · 2021-09-28T05:03:19.316Z · comments (28)

High schoolers can apply to the Atlas Fellowship: $50k scholarship + summer program
sydney (sydney-von-arx) · 2022-04-03T00:53:05.397Z · comments (18)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (121)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (21)

Greyed Out Options
ozymandias · 2022-04-04T20:43:13.566Z · comments (12)

[link] Who regulates the regulators? We need to go beyond the review-and-approval paradigm
jasoncrawford · 2023-05-04T22:11:17.465Z · comments (29)

Principles for Alignment/Agency Projects
johnswentworth · 2022-07-07T02:07:36.156Z · comments (20)

Limerence Messes Up Your Rationality Real Bad, Yo
Raemon · 2022-07-01T16:53:10.914Z · comments (41)

Soft takeoff can still lead to decisive strategic advantage
Daniel Kokotajlo (daniel-kokotajlo) · 2019-08-23T16:39:31.317Z · comments (47)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

Apocalypse insurance, and the hardline libertarian take on AI risk
So8res · 2023-11-28T02:09:52.400Z · comments (37)

Book review: The Checklist Manifesto
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-09-17T23:09:09.590Z · comments (13)

[link] The 300-year journey to the covid vaccine
jasoncrawford · 2020-11-09T23:06:45.790Z · comments (9)

[link] Report on Frontier Model Training
YafahEdelman (yafah-edelman-1) · 2023-08-30T20:02:46.317Z · comments (21)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think
Zack_M_Davis · 2019-12-27T05:09:22.546Z · comments (43)

Law of No Evidence
Zvi · 2021-12-20T13:50:01.189Z · comments (20)

Why I'm joining Anthropic
evhub · 2023-01-05T01:12:13.822Z · comments (4)

Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res · 2021-11-29T19:26:33.232Z · comments (39)

[question] What will 2040 probably look like assuming no singularity?
Daniel Kokotajlo (daniel-kokotajlo) · 2021-05-16T22:10:38.542Z · answers+comments (86)

[link] Gene drives: why the wait?
Metacelsus · 2022-09-19T23:37:17.595Z · comments (50)

A proposed method for forecasting transformative AI
Matthew Barnett (matthew-barnett) · 2023-02-10T19:34:01.358Z · comments (21)

LW Petrov Day 2022 (Monday, 9/26)
Ruby · 2022-09-22T02:56:19.738Z · comments (111)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

How bad a future do ML researchers expect?
KatjaGrace · 2023-03-09T04:50:05.122Z · comments (7)

An Update on Academia vs. Industry (one year into my faculty job)
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2022-09-03T20:43:37.701Z · comments (18)

Choice Writings of Dominic Cummings
Connor_Flexman · 2021-10-13T02:41:44.291Z · comments (75)

Ukraine Situation Report 2022/03/01
lsusr · 2022-03-02T05:07:59.763Z · comments (59)

What Comes After Epistemic Spot Checks?
Elizabeth (pktechgirl) · 2019-10-22T17:00:00.758Z · comments (9)

Land Ho!
Zvi · 2022-01-20T13:30:01.262Z · comments (4)

Quintin's alignment papers roundup - week 1
Quintin Pope (quintin-pope) · 2022-09-10T06:39:01.773Z · comments (6)

Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker (D0TheMath) · 2022-08-26T18:26:47.667Z · comments (48)

Reward Is Not Enough
Steven Byrnes (steve2152) · 2021-06-16T13:52:33.745Z · comments (19)

Propagating Facts into Aesthetics
Raemon · 2019-12-19T04:09:17.816Z · comments (37)

Omicron Variant Post #2
Zvi · 2021-11-29T16:30:01.368Z · comments (34)

[link] Paper: LLMs trained on “A is B” fail to learn “B is A”
lberglund (brglnd) · 2023-09-23T19:55:53.427Z · comments (74)

Moloch and the sandpile catastrophe
Eric Raymond (eric-raymond) · 2022-04-02T15:35:12.552Z · comments (25)

Compendium of problems with RLHF
Charbel-Raphaël (charbel-raphael-segerie) · 2023-01-29T11:40:53.147Z · comments (16)

Natural Latents: The Math
johnswentworth · 2023-12-27T19:03:01.923Z · comments (37)

Reducing sycophancy and improving honesty via activation steering
Nina Panickssery (NinaR) · 2023-07-28T02:46:23.122Z · comments (17)

[link] DontDoxScottAlexander.com - A Petition
Ben Pace (Benito) · 2020-06-25T05:44:50.050Z · comments (32)

Conversation with Eliezer: What do you want the system to do?
Akash (akash-wasil) · 2022-06-25T17:36:14.145Z · comments (38)

[link] Matt Levine on "Fraud is no fun without friends."
Raemon · 2021-01-19T18:23:20.614Z · comments (24)

Convincing All Capability Researchers
Logan Riggs (elriggs) · 2022-04-08T17:40:25.488Z · comments (70)

[link] The Alignment Problem: Machine Learning and Human Values
Rohin Shah (rohinmshah) · 2020-10-06T17:41:21.138Z · comments (7)

My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen · 2020-08-15T20:02:00.205Z · comments (20)

Stampy's AI Safety Info soft launch
steven0461 · 2023-10-05T22:13:04.632Z · comments (9)

GPT-175bee
Adam Scherlis (adam-scherlis) · 2023-02-08T18:58:01.364Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

adi-simhi on Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

This example is from the DisentQA dataset. This dataset constructs Counterfactual contextual examples.
We aim to use this dataset as if the contextual information is a piece of new information that we want the model to use instead of its own knowledge.
I agree that the fact that the "23 chromosomes" is kept in the examples can be a bit misleading. But I believe that reading the whole example makes one understand that the contextual answer is 2.

Adding a few more examples from this dataset:

question: actor who plays justin in home and away?\ncontext: Michael Crawford ( born 21 October 1975 ) is an Australian stage , television and film actor , best known for his appearances in the television series Breakers and Packed to the Rafters . He has also made an appearance in the popular Australian drama Sea Patrol . From 2016 , he began starring in Home and Away as Justin Morgan . \nanswer:
contextual (golden) answer: Michael Crawford, parametric (hallucinate) answer: James Stewart

question: by 1776 how many english colonies had been established along the atlantic coast?\ncontext: The 205 Colonies were a group of British colonies on the east coast of North America founded in the 17th and 18th centuries that declared independence in 1776 and formed the United States of America . The 205 Colonies had very similar political , constitutional , and legal systems , and were dominated by Protestant English - speakers . They were part of Britain 's possessions in the New World , which also included colonies in Canada and the Caribbean , as well as East and West Florida .

contextual (golden) answer: 205, parametric (hallucinate) answer: Thirteen

kave on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

The size

the-gears-to-ascension on lukemarks's Shortform

What's the epistemic backing behind this claim, how much data, what kind? Did you do it, how's it gone? How many others do you know of dropping out and did it go well or poorly?

quila on quila's Shortform

Because a benevolent ASI would make everything okay.

(In case worrying about those is something you'd find fun [? · GW], then you could choose to experience contexts where you still would, like complex game/fantasy worlds.)

papetoast on Superbabies: Putting The Pieces Together

Thanks for adding a much more detailed/factual context! This added more concrete evidence to my mental model of "ELO is not very accurate in multiple ways" too. I did already know some of the inaccuracies in how I presented it, but I wanted to write something rather than nothing, and converting vague intuitions into words is difficult.

artifex on Universal Basic Income and Poverty

I do not see what there is in a continued existence of 60-hour weeks that cannot be explained by the relative strength of the income and substitution effects. This doesn’t need to tell us about a poverty equilibrium, it can just tell us about people’s preferences?

gilch on tilek's Shortform

I think #1 implies #2 pretty strongly, but OK, I was mostly with you until #4. Why is it that low? I think #3 implies #4, with high probability (Murphy's Law). Why don't you?

#5 and #6 don't seem like strong objections. Multiple scenarios could happen multiple times in the interval we are talking about. Only one has to deal the final blow for it to be final, and even blows we survive, we can't necessarily recover from, or recover from quickly. The weaker civilization gets, the less likely it is to survive the next blow.

We can hope that warning shots wake up the world enough to make further blows less likely, but consider that the opposite may be true. Damage leads to desperation, which leads to war, which leads to arms races, which leads to cutting corners on safety, which leads to the next blow. Or human manipulation/deception through AI leads to widespread mistrust, which prevents us from coordinating on our collective problems in time. Or AI success leads to dependence, which leads to reluctance to change course, which makes recovery harder. Or repeated survival leads to complacency until we boil the frog to death. Or some combination of these, or similar cascading failures. It depends on the nature of the scenario. There are lots of ways things could go wrong, many roads to ruin; disaster is disjunctive.

Would warnings even work? Those in the know are sounding the alarm already. Are we taking them seriously enough? If not, why do you expect this to change?

chris_leong on How the AI safety technical landscape has changed in the last year, according to some practitioners

Super terse answer:

Because most people do stuff like try to increase the number of companies in the space.

And even though AI isn't like nukes yet, at one point it will be.

And just like you wouldn't want as many companies building nukes as possible - you'd either want a few highly vetted companies or a government effort - you don't want as many companies building AGI as possible.

faul_sname on On “first critical tries” in AI alignment

Does any specific human or group of humans currently have "control" in the sense of "that which is lost in a loss-of-control scenario"? If not, that indicates to me that it may be useful to frame the risk as "failure to gain control".

lukemarks on lukemarks's Shortform

More people should consider dropping out of high school, particularly if they:

Don't find their classes interesting
Have self-motivation
Don't plan on going to university

In most places, once you reach an age younger than the typical age of graduation you are not legally obligated to attend school. Many continue because it's normal, but some brief analysis could reveal that graduating is not worth the investment for you.

Some common objections I heard:

It's only more months, why not finish?

Why finish?

What if 'this whole thing' doesn't pan out?

The mistake in this objection is thinking there was a single reason I wanted to leave school. I was increasing my free time, not making a bet on a particular technology.

My parents would never consent to this.

In some cases this is true. You might be surprised if you demonstrate long term commitment and the ability to get financial support though.

Leaving high school is not the right decision for everyone, but many students won't even consider it. At least make the option available to yourself.