LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (23)

Ilya Sutskever and Jan Leike resign from OpenAI [updated]
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (90)

Dyslucksia
Shoshannah Tekofsky (DarkSym) · 2024-05-09T19:21:33.874Z · comments (42)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (26)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (12)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (12)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (32)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (5)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (11)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (4)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (35)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (2)

MATS Winter 2023-24 Retrospective
Rocket (utilistrutil) · 2024-05-11T00:09:17.059Z · comments (28)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (11)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (13)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (5)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

OpenAI: Exodus
Zvi · 2024-05-20T13:10:03.543Z · comments (8)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (4)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot_Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (0)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (0)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

some thoughts on LessOnline
Raemon · 2024-05-08T23:17:41.372Z · comments (5)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (2)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (7)

Why Care About Natural Latents?
johnswentworth · 2024-05-09T23:14:30.626Z · comments (3)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (7)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (26)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (22)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (14)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (30)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (4)

next page (older posts) →

Archive

Recent comments

zac-hatfield-dodds on OpenAI: Exodus

Conjecture is indeed a for-profit company, and I've often found Connor disingenuous at best.

tapatakt on OpenAI: Exodus

I'm sorry if this is a stupid question.

How can NDA actually work effectively? What if Alice, ex-employee of OpenAI write a "totally fictional short story about Bob, ex-employee of ClosedDM, who wants to tell about horrible things this company did"?

cleo-nardo on mesaoptimizer's Shortform

if a lab has 100 million AI employs and 1000 human employees then you only need one human employee to spend 1% of their allotted AI headcount on your pet project and you’ll have 1000 AI employees

tailcalled on tailcalled's Shortform

Back to clipping away an entire range, rather than a single dimension. Here's ordering it by the importance computed by clipping away a single dimension:

Less chaotic maybe, but also much slower at reaching a reasonable performance, so I tried a compromise ordering that takes both size and performance into account:

Doesn't seem like it works super great tbh.

Edit: for completeness' sake, here's the initial graph with log-surprise-based plotting.

shminux on On Privilege

That makes sense! Maybe you feel like writing a post on the topic? Potentially including a numerical or analytical model.

8e9 on 8e9's Shortform

Inspired by Concentration of Force [LW · GW], which introduced me to the concept, I'm trying to create a TAP [LW · GW] to answer the question "what specific task do I need to accomplish?" before I unlock my phone. If I can't answer the question, maybe that "mental speed bump" makes it easier to put my phone back down.

cata on Some perspectives on the discipline of Physics

As a non-physicist I kind of had the idea that the reason I was taught Newtonian mechanics in high school was that it was assumed I wasn't going to have the time, motivation, or brainpower to learn some kind of fancy, real university version of it, so the alternate idea that it's useful for intuition-building of the concepts is novel and interesting to me.

review-bot on What Discovering Latent Knowledge Did and Did Not Find

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

stephen-fowler on Stephen Fowler's Shortform

(I'm the OP)

I'm not trying to say "it's bad to large sums of money to any group because humans have a tendency to to seek power."

I'm saying "you should be exceptionally cautious about giving large sums of money to a group of humans with the stated goal of constructing an AGI."

You need to weight any reassurances they give you against two observations:

The commonly observed pattern of individual humans or organisations seeking power (and/or wealth) at the expense of the wider community.
The strong likelihood that there will be an opportunity for organisations pushing ahead with AI research to obtain incredible wealth or power.

So, I'm not trying to say "humans seek power therefore giving any group of humans money is bad".

It's that "humans seek power" and, in the specific case of AI companies, there may be incredibly strong rewards for groups that behave in a self-interested way.

The general idea I'm working off is that you need to be exceptionally skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).

stephen-fowler on Stephen Fowler's Shortform

"In particular, it emphasized the importance of distributing AI broadly;¹ our current view is that this may turn out to be a promising strategy for reducing potential risks"

Yes, I'm interpreting the phrase "may turn out" to be treating the idea with more seriousness than it deserves.

Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.