LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

[link] How to Upload a Mind (In Three Not-So-Easy Steps)
aggliu · 2023-11-13T18:13:32.893Z · comments (0)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

[link] Forecasting future gains due to post-training enhancements
elifland · 2024-03-08T02:11:57.228Z · comments (2)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (22)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

Evidential Correlations are Subjective, and it might be a problem
Martín Soto (martinsq) · 2024-03-07T18:37:54.105Z · comments (6)

Singular learning theory and bridging from ML to brain emulations
kave · 2023-11-01T21:31:54.789Z · comments (16)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

The Limitations of GPT-4
p.b. · 2023-11-24T15:30:30.933Z · comments (12)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

Am I going insane or is the quality of education at top universities shockingly low?
ChrisRumanov (pseudonymous-ai) · 2023-11-20T03:53:30.056Z · comments (30)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

AI debate: test yourself against chess 'AIs'
Richard Willis · 2023-11-22T14:58:10.847Z · comments (35)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Facebook is Paying Me to Post
jefftk (jkaufman) · 2023-11-14T19:10:07.303Z · comments (5)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

[question] Thoughts on Francois Chollet's belief that LLMs are far away from AGI?
O O (o-o) · 2024-06-14T06:32:48.170Z · answers+comments (17)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rogerdearnaley on Motivation control

Opacity: if you could directly inspect an AI’s motivations (or its cognition more generally), this would help a lot. But you can’t do this with current ML models.

The ease with which Anthropic's model organisms of misalignment were diagnosed by a simple and obvious linear probe suggests otherwise. So does the number of elements in SAE feature dictionaries that describe emotions, motivations, and behavioral patterns. Current ML models are no longer black boxes, they rapidly becoming translucent grey boxes.

elityre on avturchin's Shortform

You have been attacked by a pack of stray dogs twice?!?!

clone-of-saturn on The Alignment Trap: AI Safety as Path to Power

Can anyone lay out a semi-plausible scenario where humanity survives but isn't dominated by an AI or posthuman god-king? I can't really picture it. I always thought that's what we were going for since it's better than being dead.

czynski on Lighthaven Sequences Reading Group #8 (Tuesday 10/29)

Could you please announce these further in advance? Especially given the reading required beforehand it's inconvenient and honestly seems a little inconsiderate.

matthew4244 on Chapter 45: Humanism, Pt 3

Great chapter, Great message. +1

maxwell-peterson on The central limit theorem in terms of convolutions

The integral was incorrect! Fixed now, thanks! Also added the (f * g)(x) to the equality for those who find that notation better (I've just discovered that GPT-4o prefers it too). Cheers!

daphne_w on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

The Demon King does not solely attack the Frozen Fortress to profit on prediction markets. The story tells us that the demons engage in regular large-scale attacks, large enough to serve as demon population control. There is no indication that these attacks decreased in size when they were accompanied with market manipulation (and if they did, that would be a win in and of itself).

So the prediction market's counterfactual is not that the Demon King's forces don't attack, but that they attack at an indeterminate time with the same approximate frequency and strength. By letting the Demon King buy and profit from "demon attack on day X" shares, the Circular Citadel learns with decently high probability when these attacks take place and can allocate its resources more effectively. Hire mercenaries on days the probability is above 90%, focus on training and recruitment on days of low-but-typical probability, etc.

This ability to allocate resources more efficiently has value, which is why the Heroine organized the prediction market in the first place. The only thing that doesn't go according to the Heroine's liking is that the Circular Citadel buys that information from the Demon King rather than from 'the invisible hand of the market'.

more generally the Demon King would only do this if the information revealed weren't worth the market cost

The Demon King would sell the information as soon as she thinks it is in her best interests, which is different from it being bad for the Circular Citadel. Especially considering the Circular Citadel doesn't even have to pay the full cost of the information - everyone who bets is also paying.

It is very possible that the Demon King and the Circular Citadel both profit from the prediction market existing, while the demon ground forces and naive prediction market bettors lose.

ryankidd44 on Ryan Kidd's Shortform

Hourly stipends for AI safety fellowship programs, plus some referents. The average AI safety program stipend is $27/h.

kave on Habryka's Shortform Feed

One sad thing about older versions of Gill Sans: Il1 all look the same. Nova at least distinguishes the 1.

IMO, we should probably move towards system fonts, though I would like to choose something that preserves character a little more.

sharmake-farah on A path to human autonomy

There should probably be a dialogue between you and @Vladimir_Nesov [LW · GW] over how much algorithmic improvements actually work to make AI more powerful, since this might reveal cruxes and help everyone else prepare better for the various AI scenarios.