LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How To Go From Interpretability To Alignment: Just Retarget The Search
johnswentworth · 2022-08-10T16:08:11.402Z · comments (33)

Postmortem on DIY Recombinant Covid Vaccine
caffemacchiavelli · 2022-01-22T14:12:58.030Z · comments (27)

What AI Safety Materials Do ML Researchers Find Compelling?
Vael Gates · 2022-12-28T02:03:31.894Z · comments (34)

The next decades might be wild
Marius Hobbhahn (marius-hobbhahn) · 2022-12-15T16:10:04.750Z · comments (42)

AGI ruin scenarios are likely (and disjunctive)
So8res · 2022-07-27T03:21:57.615Z · comments (38)

Finite Factored Sets in Pictures
Magdalena Wache · 2022-12-11T18:49:00.000Z · comments (35)

7 traps that (we think) new alignment researchers often fall into
Akash (akash-wasil) · 2022-09-27T23:13:46.697Z · comments (10)

Some conceptual alignment research projects
Richard_Ngo (ricraz) · 2022-08-25T22:51:33.478Z · comments (15)

The inordinately slow spread of good AGI conversations in ML
Rob Bensinger (RobbBB) · 2022-06-21T16:09:57.859Z · comments (62)

Deliberate Grieving
Raemon · 2022-05-30T20:49:19.860Z · comments (16)

Russia has Invaded Ukraine
lsusr · 2022-02-24T07:52:44.533Z · comments (268)

Butterfly Ideas
Elizabeth (pktechgirl) · 2022-02-22T07:40:08.072Z · comments (10)

What's Up With Confusingly Pervasive Goal Directedness?
Raemon · 2022-01-20T19:22:37.515Z · comments (89)

Intro to Naturalism: Orientation
LoganStrohl (BrienneYudkowsky) · 2022-02-13T07:52:03.503Z · comments (23)

AI Could Defeat All Of Us Combined
HoldenKarnofsky · 2022-06-09T15:50:12.952Z · comments (42)

Do bamboos set themselves on fire?
Malmesbury (Elmer of Malmesbury) · 2022-09-19T15:34:13.574Z · comments (14)

Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong · 2022-12-06T19:54:54.854Z · comments (85)

Searching for outliers
benkuhn · 2022-03-21T02:40:17.296Z · comments (16)

Transcripts of interviews with AI researchers
Vael Gates · 2022-05-09T05:57:15.872Z · comments (9)

Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez (ethan-perez) · 2022-06-27T15:58:19.135Z · comments (14)

[link] Things that can kill you quickly: What everyone should know about first aid
jasoncrawford · 2022-12-27T16:23:24.831Z · comments (21)

IMO challenge bet with Eliezer
paulfchristiano · 2022-02-26T04:50:06.033Z · comments (25)

Impossibility results for unbounded utilities
paulfchristiano · 2022-02-02T03:52:18.780Z · comments (109)

Deepmind's Gato: Generalist Agent
Daniel Kokotajlo (daniel-kokotajlo) · 2022-05-12T16:01:21.803Z · comments (62)

[Beta Feature] Google-Docs-like editing for LessWrong posts
Ruby · 2022-02-23T01:52:22.141Z · comments (26)

[link] The Social Recession: By the Numbers
antonomon · 2022-10-29T18:45:09.001Z · comments (29)

Decision theory does not imply that we get to have nice things
So8res · 2022-10-18T03:04:48.682Z · comments (58)

Playing with DALL·E 2
Dave Orr (dave-orr) · 2022-04-07T18:49:16.301Z · comments (118)

The prototypical catastrophic AI action is getting root access to its datacenter
Buck · 2022-06-02T23:46:31.360Z · comments (13)

Be less scared of overconfidence
benkuhn · 2022-11-30T15:20:07.738Z · comments (22)

Why I think there's a one-in-six chance of an imminent global nuclear war
Max Tegmark (MaxTegmark) · 2022-10-08T06:26:40.235Z · comments (169)

A transparency and interpretability tech tree
evhub · 2022-06-16T23:44:14.961Z · comments (11)

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H (dan-hendrycks) · 2022-05-09T17:18:53.978Z · comments (6)

Shard Theory: An Overview
David Udell · 2022-08-11T05:44:52.852Z · comments (34)

Most People Start With The Same Few Bad Ideas
johnswentworth · 2022-09-09T00:29:12.740Z · comments (30)

On A List of Lethalities
Zvi · 2022-06-13T12:30:01.624Z · comments (49)

ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)

Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)

Planes are still decades away from displacing most bird jobs
guzey · 2022-11-25T16:49:32.344Z · comments (13)

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon
johnswentworth · 2022-04-15T19:05:46.442Z · comments (128)

Why all the fuss about recursive self-improvement?
So8res · 2022-06-12T20:53:42.392Z · comments (62)

«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (32)

Repeal the Foreign Dredge Act of 1906
Zvi · 2022-05-05T15:20:01.739Z · comments (16)

The Onion Test for Personal and Institutional Honesty
chanamessinger (cmessinger) · 2022-09-27T15:26:34.567Z · comments (31)

[link] Six (and a half) intuitions for KL divergence
CallumMcDougall (TheMcDouglas) · 2022-10-12T21:07:07.796Z · comments (25)

Nonprofit Boards are Weird
HoldenKarnofsky · 2022-06-23T14:40:11.593Z · comments (26)

Nate Soares' Life Advice
CatGoddess · 2022-08-23T02:46:43.369Z · comments (41)

Emotionally Confronting a Probably-Doomed World: Against Motivation Via Dignity Points
TurnTrout · 2022-04-10T18:45:08.027Z · comments (7)

LessWrong Has Agree/Disagree Voting On All New Comment Threads
Ben Pace (Benito) · 2022-06-24T00:43:17.136Z · comments (217)

Staying Split: Sabatini and Social Justice
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-06-08T08:32:58.633Z · comments (28)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

stephen-fowler on D0TheMath's Shortform

great link!

stephen-fowler on Stephen Fowler's Shortform

Very Spicy Take

Epistemic Note:
Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.

Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.

Premise 2:
This was the default outcome.

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.

Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.

Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future.

To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.

This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.

To quote OpenPhil:
"OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."

matthew-barnett on Instruction-following AGI is easier and more likely than value aligned AGI

I also expect AIs to be constrained by social norms, laws, and societal values. But I think there's a distinction between how AIs will be constrained and how AIs will try to help humans. Although it often censors certain topics, Google still usually delivers the results the user wants, rather than serving some broader social agenda upon each query. Likewise, ChatGPT is constrained by social mores, but it's still better described as a user assistant, not as an engine for social change or as a benevolent agent that acts on behalf of humanity.

wassname on Instruction-following AGI is easier and more likely than value aligned AGI

When you rephrase this to be about search engines

I think the main reason why we won't censor search to some abstract conception of "community values" is because users won't want to rent or purchase search services that are censor to such a broad target

It doesn't describe reality. Most of us consume search and recommendations that has been censored (e.g. removing porn, piracy, toxicity, racism, taboo politics) in a way that pus cultural values over our preferences or interests.

So perhaps it won't be true for AI either. At least in the near term, the line between AI and search is a blurred line, and the same pressures exist on consumers and providers.

wassname on romeostevensit's Shortform

A before and after would be even better!

8e9 on Language Models Model Us

note that the Brier score at the bottom is a few percentage points lower than what's shown in the chart; the probability distributions GPT outputs differ a bit between runs despite a temperature of 0

It's now possible to get mostly deterministic outputs if you set the seed parameter to an integer of your choice, the other parameters are identical, and the model hasn't been updated.

d0themath on D0TheMath's Shortform

A Theory of Usable Information Under Computational Constraints

We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive V-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, V-information can be created through computation. This is consistent with deep neural networks extracting hierarchies of progressively more informative features in representation learning. Additionally, we show that by incorporating computational constraints, V-information can be reliably estimated from data even in high dimensions with PAC-style guarantees. Empirically, we demonstrate predictive V-information is more effective than mutual information for structure learning and fair representation learning.

h/t Simon Pepin Lehalleur

honest_annie on Ilya Sutskever and Jan Leike resign from OpenAI

Organizational structure is an alignment mechanism.

While I sympathize with the stated intentions, I just can't wrap my head around the naivety. OpenAI corporate structure was a recipe for bad corporate governance. "We are the good guys here, the structure is needed to make others align with us."- an organization where ethical people can rule as benevolent dictators is the same mistake committed socialists made when they had power.

If it was that easy, AI alignment would be solved by creating ethical AI committed to alignment and giving it as much power as possible.

Altruists are normal humans. Nothing changes priorities faster than large sums of money. Any mix of ideals and profit-making must be arranged in a way that concerns don't mix. People in charge of non-profits making life-changing money if the profit-making is a success can't work.

Bad organizational structure puts well meaning humans like Sutskever repeteatly into position where he must choose between wast sums of money or his ethical commitments.

egi on Reconsider the anti-cavity bacteria if you are Asian

What you are missing here is that S. mutants often lives in pockets between tooth an epithelium or between teeth with direct permanent contact to epithelium. Due to the geometry of these spaces access to saliva is very poor so metabolites can enrich to levels way beyond those you suggest here.

This mechanism is also a big problem with the pH study above.

drbm on A Dozen Ways to Get More Dakka

hat's fantastic to hear! I am thrilled the information was helpful for me.