LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (6)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (16)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (35)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (7)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (14)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (3)

Evidence against Learned Search in a Chess-Playing Neural Network
p.b. · 2024-09-13T11:59:55.634Z · comments (3)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Coalitional agency
Richard_Ngo (ricraz) · 2024-07-22T00:09:51.525Z · comments (6)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (13)

[link] Demis Hassabis — Google DeepMind: The Podcast
Zach Stein-Perlman · 2024-08-16T00:00:04.712Z · comments (8)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (2)

[link] Making Eggs Without Ovaries
Niko_McCarty (niko-2) · 2024-09-22T17:44:46.733Z · comments (3)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

[link] Datasets that change the odds you exist
dynomight · 2024-06-29T18:45:14.385Z · comments (4)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (39)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

towards_keeperhood on Fluent, Cruxy Predictions

If you do end up installing the Fatebook browser extension and making some predictions, let me know!

I did.

philh on [Completed] The 2024 Petrov Day Scenario

Yeah, if that was the only consideration I think I would have created the market myself.

amalthea on Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

Some reporting on this: https://www.vox.com/future-perfect/374275/openai-just-sold-you-out

sherrinford on Sherrinford's Shortform

When I write a post and select text, a menu appears where I can select text appearance properties etc. However, in my latest post, this menu does not appear when I edit the post and select text. Any idea why that could be the case?

benito on A Path out of Insufficient Views

Okay, sounds like I have misunderstood you.

Sure, I can retry.

My next attempt to pass your ITT is thus:

Broadly when gaining new skills we go from doing what feels natural, to doing things differently within rigid structures, to getting good at them, to releasing the structures and then just doing what comes naturally. And often afterwards it is both more effective and also comes more naturally than it did before.
Some people seem trapped in the middle step on certain things. They always practice music with a metronome ticking in order to keep the beat, they never trust themselves to just feel it. They always leave the party without drinking, never trusting themselves to behave well and have fun with it. They always need an explicit theory guiding their overall trajectory in life (e.g. career decisions involving spreadsheets), they can never make a major life decision because it feels good in their gut. They always have to discuss purchases over $1,000 with their spouse and sleep on it, they never feel comfortable just going with something that feels right in the moment.
Such people have successfully found useful structures, but are also trapped in them, never venturing forward into the world themselves, always bound by the formalities. This limits their personhood and humanity from coming through, it bounds them to only be as good as the structures they've adopted.
Insofar as you name a structure or set of rules for living life, you are always bound by them and will never let your humanity outshine them.

How close is this to what you're saying, from 1 to 10?

neil-warren on [Completed] The 2024 Petrov Day Scenario

Ah right, the decades part--I had written about the 1930 revolution, commune, and bourbon destitution, then checked the dates online and stupidly thought "ah, it must be just 1815 then" and only talked about that. Thanks

sherrinford on Sherrinford's Shortform

That would be great, but maybe it is covered much more in your bubble than in large newspapers etc? Moreover, if this is covered like the OpenAI-internal fight last year, the typical news outlet comment will be: "crazy sci-fi cult paranoid people are making noise about this totally sensible change in the institutional structure of this very productive firm!"

wei-dai on Wei Dai's Shortform

As a tangent to my question, I wonder how many AI companies are already using RLAIF and not even aware of it. From a recent WSJ story [LW(p) · GW(p)]:

Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.

When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.

So they detected the cheating that time, but in RLHF how would they know if contractors used AI to select which of two AI responses is more preferred?

BTW here's a poem(?) I wrote for Twitter, actually before coming across the above story:

The people try to align the board. The board tries to align the CEO. The CEO tries to align the managers. The managers try to align the employees. The employees try to align the contractors. The contractors sneak the work off to the AI. The AI tries to align the AI.

generalbelov on [Completed] The 2024 Petrov Day Scenario

I surveyed some participants about their preferences. I believe this is nine generals plus a petrov (second from bottom).

9 people prefer peace; 1 deranged person says "peace = mutual destruction" and "I was kinda hoping the other side would launch the nukes"; fortunately we survived them.

If one side had admitted to launching nukes, it looks like at least one person on the other side would have favored retaliating; unclear whether they'd get a majority (and unclear whether they'd launch without a majority).

I agree with the long note. I think anonymity is ~necessary to get decent P(defection) from a small group of high-karma users. But there are issues with that and the ritual is fine with low P(destruction).

redman on Rabin's Paradox

That $769 number might be more relevant than you expect for college undergrads participating in weird psychology research studies for $10 or $25 depending on the study.