LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality
Steven Byrnes (steve2152) · 2023-12-18T15:26:29.970Z · comments (12)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
Mikhail Samin (mikhail-samin) · 2023-12-27T18:44:33.976Z · comments (17)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (31)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

Book Review: 1948 by Benny Morris
Yair Halberstadt (yair-halberstadt) · 2023-12-03T10:29:16.696Z · comments (9)

Concrete positive visions for a future without AGI
Max H (Maxc) · 2023-11-08T03:12:42.590Z · comments (28)

On plans for a functional society
kave · 2023-12-12T00:07:46.629Z · comments (8)

Estimating effective dimensionality of MNIST models
Arjun Panickssery (arjun-panickssery) · 2023-11-02T14:13:09.012Z · comments (3)

What makes teaching math special
Viliam · 2023-12-17T14:15:01.136Z · comments (27)

How Emergency Medicine Solves the Alignment Problem
StrivingForLegibility · 2023-12-26T05:24:35.579Z · comments (4)

[link] What's new at FAR AI
AdamGleave · 2023-12-04T21:18:03.951Z · comments (0)

[link] Jailbreak steering generalization
Sarah Ball · 2024-06-20T17:25:24.110Z · comments (4)

Pivotal Acts might Not be what You Think they are
Johannes C. Mayer (johannes-c-mayer) · 2023-11-05T17:23:50.464Z · comments (13)

[question] What did you change your mind about in the last year?
mike_hawke · 2023-11-23T20:53:45.664Z · answers+comments (16)

[link] Podcast with Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”
garrison · 2024-05-10T17:23:20.436Z · comments (0)

The Perils of Professionalism
Screwtape · 2023-11-07T00:07:33.213Z · comments (1)

[link] Beyond the Board: Exploring AI Robustness Through Go
AdamGleave · 2024-06-19T16:40:06.594Z · comments (2)

Goals selected from learned knowledge: an alternative to RL alignment
Seth Herd · 2024-01-15T21:52:06.170Z · comments (17)

Matrix completion prize results
paulfchristiano · 2023-12-20T15:40:04.281Z · comments (0)

One-shot strategy games?
Raemon · 2024-03-11T00:19:20.480Z · comments (42)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

A Teacher vs. Everyone Else
ronak69 · 2024-03-21T17:45:35.714Z · comments (8)

AI Risk and the US Presidential Candidates
Zane · 2024-01-06T20:18:04.945Z · comments (22)

Superintelligent AI is possible in the 2020s
HunterJay · 2024-08-13T06:03:26.990Z · comments (3)

The Pointer Resolution Problem
Jozdien · 2024-02-16T21:25:57.374Z · comments (20)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

Book review: The Quincunx
cousin_it · 2024-06-05T21:13:55.055Z · comments (12)

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (16)

Surviving Seveneves
Yair Halberstadt (yair-halberstadt) · 2024-06-19T13:11:55.414Z · comments (4)

How ARENA course material gets made
CallumMcDougall (TheMcDouglas) · 2024-07-02T18:04:00.209Z · comments (2)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations
Linch · 2024-06-12T13:46:29.535Z · comments (0)

[link] Book review: Cuisine and Empire
eukaryote · 2024-01-21T06:15:12.969Z · comments (2)

[link] Progress Conference 2024: Toward Abundant Futures
jasoncrawford · 2024-06-26T15:39:45.267Z · comments (2)

Choosing My Quest (Part 2 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-24T21:31:45.377Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rogerdearnaley on Motivation control

Opacity: if you could directly inspect an AI’s motivations (or its cognition more generally), this would help a lot. But you can’t do this with current ML models.

The ease with which Anthropic's model organisms of misalignment were diagnosed by a simple and obvious linear probe suggests otherwise. So does the number of elements in SAE feature dictionaries that describe emotions, motivations, and behavioral patterns. Current ML models are no longer black boxes, they rapidly becoming translucent grey boxes.

elityre on avturchin's Shortform

You have been attacked by a pack of stray dogs twice?!?!

clone-of-saturn on The Alignment Trap: AI Safety as Path to Power

Can anyone lay out a semi-plausible scenario where humanity survives but isn't dominated by an AI or posthuman god-king? I can't really picture it. I always thought that's what we were going for since it's better than being dead.

czynski on Lighthaven Sequences Reading Group #8 (Tuesday 10/29)

Could you please announce these further in advance? Especially given the reading required beforehand it's inconvenient and honestly seems a little inconsiderate.

matthew4244 on Chapter 45: Humanism, Pt 3

Great chapter, Great message. +1

maxwell-peterson on The central limit theorem in terms of convolutions

The integral was incorrect! Fixed now, thanks! Also added the (f * g)(x) to the equality for those who find that notation better (I've just discovered that GPT-4o prefers it too). Cheers!

daphne_w on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

The Demon King does not solely attack the Frozen Fortress to profit on prediction markets. The story tells us that the demons engage in regular large-scale attacks, large enough to serve as demon population control. There is no indication that these attacks decreased in size when they were accompanied with market manipulation (and if they did, that would be a win in and of itself).

So the prediction market's counterfactual is not that the Demon King's forces don't attack, but that they attack at an indeterminate time with the same approximate frequency and strength. By letting the Demon King buy and profit from "demon attack on day X" shares, the Circular Citadel learns with decently high probability when these attacks take place and can allocate its resources more effectively. Hire mercenaries on days the probability is above 90%, focus on training and recruitment on days of low-but-typical probability, etc.

This ability to allocate resources more efficiently has value, which is why the Heroine organized the prediction market in the first place. The only thing that doesn't go according to the Heroine's liking is that the Circular Citadel buys that information from the Demon King rather than from 'the invisible hand of the market'.

more generally the Demon King would only do this if the information revealed weren't worth the market cost

The Demon King would sell the information as soon as she thinks it is in her best interests, which is different from it being bad for the Circular Citadel. Especially considering the Circular Citadel doesn't even have to pay the full cost of the information - everyone who bets is also paying.

It is very possible that the Demon King and the Circular Citadel both profit from the prediction market existing, while the demon ground forces and naive prediction market bettors lose.

ryankidd44 on Ryan Kidd's Shortform

Hourly stipends for AI safety fellowship programs, plus some referents. The average AI safety program stipend is $27/h.

kave on Habryka's Shortform Feed

One sad thing about older versions of Gill Sans: Il1 all look the same. Nova at least distinguishes the 1.

IMO, we should probably move towards system fonts, though I would like to choose something that preserves character a little more.

sharmake-farah on A path to human autonomy

There should probably be a dialogue between you and @Vladimir_Nesov [LW · GW] over how much algorithmic improvements actually work to make AI more powerful, since this might reveal cruxes and help everyone else prepare better for the various AI scenarios.