LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Intent alignment should not be the goal for AGI x-risk reduction
John Nay (john-nay) · 2022-10-26T01:24:21.650Z · comments (10)

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?
StefanHex (Stefan42) · 2022-10-25T20:48:50.895Z · comments (2)

[link] A Walkthrough of A Mathematical Framework for Transformer Circuits
Neel Nanda (neel-nanda-1) · 2022-10-25T20:24:54.638Z · comments (7)

[link] Nothing.
rogersbacon · 2022-10-25T16:33:59.357Z · comments (4)

Maps and Blueprint; the Two Sides of the Alignment Equation
Nora_Ammann · 2022-10-25T16:29:40.202Z · comments (1)

Consider Applying to the Future Fellowship at MIT
jefftk (jkaufman) · 2022-10-25T15:40:03.839Z · comments (0)

Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-10-25T15:13:56.484Z · comments (17)

What does it take to defend the world against out-of-control AGIs?
Steven Byrnes (steve2152) · 2022-10-25T14:47:41.970Z · comments (47)

Refine: what helped me write more?
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-10-25T14:44:14.813Z · comments (0)

[link] Logical Decision Theories: Our final failsafe?
Noosphere89 (sharmake-farah) · 2022-10-25T12:51:23.799Z · comments (8)

What will the scaled up GATO look like? (Updated with questions)
Amal (asta-vista) · 2022-10-25T12:44:39.184Z · comments (22)

Mechanism Design for AI Safety - Reading Group Curriculum
Rubi J. Hudson (Rubi) · 2022-10-25T03:54:20.777Z · comments (3)

Furry Rationalists & Effective Anthropomorphism both exist
agentydragon · 2022-10-25T03:37:57.213Z · comments (3)

EA & LW Forums Weekly Summary (17 - 23 Oct 22')
Zoe Williams (GreyArea) · 2022-10-25T02:57:43.696Z · comments (0)

Dance Weekends: Tests not Masks
jefftk (jkaufman) · 2022-10-25T02:10:04.171Z · comments (0)

[question] What is good Cyber Security Advice?
Gunnar_Zarncke · 2022-10-24T23:27:58.428Z · answers+comments (12)

Connections between Mind-Body Problem & Civilizations
oblivion · 2022-10-24T21:55:51.888Z · comments (1)

[question] Rationalism and money
[deleted] · 2022-10-24T21:22:11.505Z · answers+comments (2)

[question] Game semantics
[deleted] · 2022-10-24T21:22:11.272Z · answers+comments (2)

A Good Future (rough draft)
Michael Soareverix (michael-soareverix) · 2022-10-24T20:45:45.029Z · comments (5)

[link] A Barebones Guide to Mechanistic Interpretability Prerequisites
Neel Nanda (neel-nanda-1) · 2022-10-24T20:45:27.938Z · comments (12)

[link] POWERplay: An open-source toolchain to study AI power-seeking
Edouard Harris · 2022-10-24T20:03:57.560Z · comments (0)

Consider trying Vivek Hebbar's alignment exercises
Akash (akash-wasil) · 2022-10-24T19:46:40.847Z · comments (1)

[question] Education not meant for mass-consumption
Tolo · 2022-10-24T19:45:09.165Z · answers+comments (5)

Realizations in Regards to Masculinity
[deleted] · 2022-10-24T19:42:28.603Z · comments (2)

The Futility of Religion
[deleted] · 2022-10-24T19:42:28.520Z · comments (5)

The optimal timing of spending on AGI safety work; why we should probably be spending more now
Tristan Cook · 2022-10-24T17:42:05.865Z · comments (0)

[link] QACI: question-answer counterfactual intervals
Tamsin Leake (carado-1) · 2022-10-24T13:08:54.457Z · comments (0)

AGI in our lifetimes is wishful thinking
niknoble · 2022-10-24T11:53:11.809Z · comments (25)

[link] DeepMind on Stratego, an imperfect information game
sanxiyn · 2022-10-24T05:57:39.462Z · comments (9)

[question] TOMT: Post from 1-2 years ago talking about a paper on social networks
Simon Berens (sberens) · 2022-10-24T01:29:11.453Z · answers+comments (1)

[link] AI researchers announce NeuroAI agenda
Cameron Berg (cameron-berg) · 2022-10-24T00:14:46.574Z · comments (12)

Empowerment is (almost) All We Need
jacob_cannell · 2022-10-23T21:48:55.439Z · comments (44)

"Originality is nothing but judicious imitation" - Voltaire
Vestozia (damien-lasseur) · 2022-10-23T19:00:02.732Z · comments (0)

Mid-Peninsula ACX/LW Meetup [CANCELLED]
moshezadka · 2022-10-23T17:37:54.530Z · comments (0)

[link] I am a Memoryless System
NicholasKross · 2022-10-23T17:34:48.367Z · comments (2)

Accountability Buddies: Why you might want one.
Samuel Nellessen (samuel-nellessen) · 2022-10-23T16:25:12.568Z · comments (3)

How to get past Haidt's elephant and listen
Astynax · 2022-10-23T16:06:20.902Z · comments (4)

Writing Russian and Ukrainian words in Latin script
Viliam · 2022-10-23T15:25:41.855Z · comments (22)

[question] Have you noticed any ways that rationalists differ? [Brainstorming session]
tailcalled · 2022-10-23T11:32:13.368Z · answers+comments (22)

Mnestics
Jarred Filmer (4thWayWastrel) · 2022-10-23T00:30:11.159Z · comments (5)

Telic intuitions across the sciences
mrcbarbier · 2022-10-22T21:31:28.672Z · comments (0)

A basic lexicon of telic concepts
mrcbarbier · 2022-10-22T21:28:10.475Z · comments (0)

Do we have the right kind of math for roles, goals and meaning?
mrcbarbier · 2022-10-22T21:28:04.935Z · comments (5)

[question] The Last Year - is there an existing novel about the last year before AI doom?
Luca Petrolati · 2022-10-22T20:44:58.055Z · answers+comments (4)

The highest-probability outcome can be out of distribution
tailcalled · 2022-10-22T20:00:16.233Z · comments (5)

Newsletter for Alignment Research: The ML Safety Updates
Esben Kran (esben-kran) · 2022-10-22T16:17:18.208Z · comments (0)

Crypto loves impact markets: Notes from Schelling Point Bogotá
Rachel Shu (wearsshoes) · 2022-10-22T15:58:39.101Z · comments (2)

[question] When trying to define general intelligence is ability to achieve goals the best metric?
jmh · 2022-10-22T03:09:51.923Z · answers+comments (0)

[question] Simple question about corrigibility and values in AI.
jmh · 2022-10-22T02:59:15.950Z · answers+comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

davidmanheim on Biorisk is an Unhelpful Analogy for AI Risk

I'm arguing exactly the opposite; experts want to make comparisons carefully, and those trying to transmit the case to the general public should, at this point, stop using these rhetorical shortcuts that imply wrong and misleading things.

davidmanheim on Biorisk is an Unhelpful Analogy for AI Risk

On net, the analogies being used to try to explain are bad and misleading.

I agree that I could have tried to convey a different message, but I don't think it's the right one [LW · GW]. Anyone who wants to dig in can decide for themselves, but you're arguing that ideal reasoners won't conflate different things and can disentangle the similarities and differences, and I agree, but I'm noting that people aren't doing that, and others seem to agree [LW · GW].

metachirality on metachirality's Shortform

I wish I could bookmark comments/shortform posts.

daemonicsigil on Explaining a Math Magic Trick

Heh, sure.

Promote from a function to a linear operator on the space of functions, $F$ . The action of this operator is just "multiply by $f$ ". We'll similarly define $F^{\sim}, F^{\sim^{2}}$ meaning to multiply by the first, second integral of $f$ , etc.

Observe:

$I F = F^{\sim} - I F^{\sim} D$

$I F = F^{\sim} - F^{\sim^{2}} D + F^{\sim^{3}} D^{2} - \dots$

Now we can calculate what we get when applying $k$ times. The calculation simplifies when we note that all terms are of the form $F^{\sim^{a}} (- D)^{(a - k)}$ . Result:

$I^{k} F = \infty \sum j = k (\frac{j - 1}{k - 1}) F^{\sim^{j}} (- D)^{j - k}$

Now we apply the above operator to $p$ :

$I^{k} F p = \infty \sum j = k (\frac{j - 1}{k - 1}) F^{\sim^{j}} (- D)^{j - k} p$

$I^{k} (f p) = \infty \sum j = k (\frac{j - 1}{k - 1}) (I^{j} f) (- D)^{j - k} p$

The sum terminates because a polynomial can only have finitely many derivatives.

wei-dai on How do open AI models affect incentive to race?

A government might model the situation as something like "the first country/coalition to open up an AI capabilities gap of size X versus everyone else wins" because it can then easily win a tech/cultural/memetic/military/economic competition against everyone else and take over the world. (Or a fuzzy version of this to take into account various uncertainties.) Seems like a very different kind of utility function.

seth-herd on AXRP Episode 31 - Singular Learning Theory with Daniel Murfet

Please just wait until you have the podcast link to post these to LW? We probably don't want to read it if you went to the trouble of making a podcast.

This is now available as a podcast if you search. I don't have the RSS feed link handy.

chris_leong on Does reducing the amount of RL for a given capability level make AI safer?

You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?

t3t on How do open AI models affect incentive to race?

Yeah, there needs to be something like a nonlinearity somewhere. (Or just preference inconsistency, which humans are known for, to say nothing of larger organizations.)

chris_leong on How do open AI models affect incentive to race?

This fails to account for one very important psychological fact: the population of startup founders who get a company off the ground is very heavily biased toward people who strongly believe in their ability to succeed. So it'll take quite a while for "it'll be hard to make money" to flow through and slow down training. And, in the mean time, it'll be acceleratory from pushing companies to stay ahead.

norimori1992 on Let's split the cake, lengthwise, upwise and slantwise

You can plug it into the Wayback Machine.