LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Universality and the “Filter”
maggiehayes · 2021-12-16T00:47:23.666Z · comments (2)

[link] More power to you
jasoncrawford · 2021-12-15T23:50:18.975Z · comments (14)

My Overview of the AI Alignment Landscape: A Bird's Eye View
Neel Nanda (neel-nanda-1) · 2021-12-15T23:44:31.873Z · comments (9)

SmartPoop 1.0: An AI Safety Science-Fiction
Lê Nguyên Hoang (le-nguyen-hoang-1) · 2021-12-15T22:28:50.615Z · comments (1)

Bay Area Rationalist Field Day
Raj Thimmiah (raj-thimmiah) · 2021-12-15T19:57:01.410Z · comments (1)

Framing approaches to alignment and the hard problem of AI cognition
ryan_greenblatt · 2021-12-15T19:06:52.640Z · comments (15)

South Bay ACX/LW Pre-Holiday Get-Together
IS (is) · 2021-12-15T16:58:31.187Z · comments (0)

Leverage
lsusr · 2021-12-15T05:20:46.287Z · comments (2)

We'll Always Have Crazy
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2021-12-15T02:55:29.940Z · comments (22)

2020 Review: The Discussion Phase
Vaniver · 2021-12-15T01:12:44.746Z · comments (14)

The Natural Abstraction Hypothesis: Implications and Evidence
CallumMcDougall (TheMcDouglas) · 2021-12-14T23:14:24.825Z · comments (8)

[link] Robin Hanson's "Humans are Early"
Raemon · 2021-12-14T22:07:00.000Z · comments (0)

Ngo's view on alignment difficulty
Richard_Ngo (ricraz) · 2021-12-14T21:34:50.593Z · comments (7)

A proposed system for ideas jumpstart
Just Learning · 2021-12-14T21:01:00.506Z · comments (2)

Should we rely on the speed prior for safety?
Marc Carauleanu (Marc-Everin Carauleanu) · 2021-12-14T20:45:02.478Z · comments (5)

[link] ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano · 2021-12-14T20:09:50.209Z · comments (90)

ARC is hiring!
paulfchristiano · 2021-12-14T20:09:33.977Z · comments (2)

Interlude: Agents as Automobiles
Daniel Kokotajlo (daniel-kokotajlo) · 2021-12-14T18:49:20.884Z · comments (6)

Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)
Zvi · 2021-12-14T14:30:01.096Z · comments (65)

Consequentialism & corrigibility
Steven Byrnes (steve2152) · 2021-12-14T13:23:02.730Z · comments (27)

Decision Theory Breakdown—Personal Attempt at a Review
Jake Arft-Guatelli · 2021-12-14T00:40:53.625Z · comments (1)

Mystery Hunt 2022
Scott Garrabrant · 2021-12-13T21:57:06.527Z · comments (5)

Enabling More Feedback for AI Safety Researchers
frances_lorenz · 2021-12-13T20:10:07.047Z · comments (0)

Language Model Alignment Research Internships
Ethan Perez (ethan-perez) · 2021-12-13T19:53:32.156Z · comments (1)

Omicron Post #6
Zvi · 2021-12-13T18:00:01.098Z · comments (30)

Analysis of Bird Box (2018)
TekhneMakre · 2021-12-13T17:30:15.045Z · comments (3)

Solving Interpretability Week
Logan Riggs (elriggs) · 2021-12-13T17:09:12.822Z · comments (5)

Understanding and controlling auto-induced distributional shift
L Rudolf L (LRudL) · 2021-12-13T14:59:40.704Z · comments (4)

A fate worse than death?
RomanS · 2021-12-13T11:05:57.729Z · comments (26)

What’s the backward-forward FLOP ratio for Neural Networks?
Marius Hobbhahn (marius-hobbhahn) · 2021-12-13T08:54:48.104Z · comments (12)

Summary of the Acausal Attack Issue for AIXI
Diffractor · 2021-12-13T08:16:26.376Z · comments (6)

Hard-Coding Neural Computation
MadHatter · 2021-12-13T04:35:51.705Z · comments (8)

[question] Is "gears-level" just a synonym for "mechanistic"?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2021-12-13T04:11:45.159Z · answers+comments (29)

Baby Nicknames
jefftk (jkaufman) · 2021-12-13T02:20:01.861Z · comments (0)

[question] Why do governments refer to existential risks primarily in terms of national security?
Evan_Gaensbauer · 2021-12-13T01:05:29.722Z · answers+comments (3)

[question] [Resolved] Who else prefers "AI alignment" to "AI safety?"
Evan_Gaensbauer · 2021-12-13T00:35:12.215Z · answers+comments (8)

[link] Working through D&D.Sci, problem 1
Pablo Repetto (pablo-repetto-1) · 2021-12-12T23:10:39.553Z · comments (2)

Teaser: Hard-coding Transformer Models
MadHatter · 2021-12-12T22:04:53.092Z · comments (19)

The Three Mutations of Dark Rationality
DarkRationalist · 2021-12-12T22:01:10.890Z · comments (0)

Redwood's Technique-Focused Epistemic Strategy
adamShimi · 2021-12-12T16:36:22.666Z · comments (1)

For and Against Lotteries in Elite University Admissions
Sam Enright (sam-enright) · 2021-12-12T13:41:04.868Z · comments (2)

[question] Nuclear war anthropics
smountjoy · 2021-12-12T04:54:10.637Z · answers+comments (7)

Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment
Rob Bensinger (RobbBB) · 2021-12-12T02:08:08.798Z · comments (35)

Magna Alta Doctrina
jacob_cannell · 2021-12-11T21:54:36.192Z · comments (7)

EA Dinner Covid Logistics
jefftk (jkaufman) · 2021-12-11T21:50:02.587Z · comments (7)

Transforming myopic optimization to ordinary optimization - Do we want to seek convergence for myopic optimization problems?
tailcalled · 2021-12-11T20:38:46.604Z · comments (1)

What on Earth is a Series I savings bond?
rossry · 2021-12-11T12:18:00.392Z · comments (7)

D&D.Sci GURPS Dec 2021: Hunters of Monsters
J Bostock (Jemist) · 2021-12-11T12:13:02.574Z · comments (18)

Anxiety and computer architecture
Adam Zerner (adamzerner) · 2021-12-11T10:37:30.898Z · comments (8)

[question] Reasons to act according to the free will paradigm?
Maciej Jałocha (maciej-jalocha) · 2021-12-11T08:44:38.651Z · answers+comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ete on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

The space of values is large, and many people have crystalized into liking nature for fairly clear reasons (positive experiences in natural environments, memetics in many subcultures idealizing nature, etc). Also, misaligned, optimizing AI easily maps to the destructive side of humanity, which many memeplexes demonize.

dagon on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

It's always seemed strange to me what preferences people have for things well outside their own individual experiences, or at least outside their sympathized experiences of beings they consider similar to themselves.

Why would one particularly prefer unthinking terrestrial biology (moss, bugs, etc.) over actual thinking being(s) like a super-AI? It's not like bacteria are any more aligned than this hypothetical destroyer.

cubefox on DeepMind's "Frontier Safety Framework" is weak and unambitious

RSP = Responsible Scaling Policy

rudi-c on AI #64: Feel the Mundane Utility

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

rudi-c on AI #64: Feel the Mundane Utility

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

mateusz-baginski on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Note to the LW team: it might be worth considering making links to AI Safety Info live-previewable (like links to other LW posts/sequences/comments and Arbital pages), depending on how much effort it would take and how much linking to AISI on LW we expect in the future.

tsvibt on Stephen Fowler's Shortform

On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it's not clear that there's someone "we" know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I'd probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn't have tried hard to have independent opinions about stuff that wasn't in my technical focus area, or to express those opinions with much conviction, maybe because it seemed like Many Highly Respected Community Members With Substantially Greater Decision Making Experience would know far better, and would not have the time or the non-status to let me in on the secret subtle reasons for doing counterintuitive things. Now I think everyone's dumb and everyone should say their opinions a lot so that later they can say that they've been saying this all along. I've become extremely disagreeable in the last few years, I'm still not disagreeable enough, and approximately no one I know personally is disagreeable enough.

sil-ver on Rafael Harth's Shortform

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

algon on TurnTrout's shortform feed

Because future rewards are discounted

Don't you mean future values? Also, AFAICT, the only thing going on here that seperates online from offline RL is that offline RL algorithms shape the initial value function to give conservative behaviour. And so you get conservative behaviour.

bec-hawk on Stephen Fowler's Shortform

Did OpenAI have the for-profit element at that time?