LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (2)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)

A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (10)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (6)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (22)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jblack on Is the Power Grid Sustainable?

At $150/kW-hr and assuming a somewhat low 3000 cycle lifetime, such batteries would cost $0.05 per cycled kW-hr which is very much cost-effective when paired with the extremely low cost but inconveniently timed nature of solar power. It would drop the amortized cost of a complete off-grid power system for my home to half that of grid power in my area, for example.

Even now at $1000/kW-hr retail it's almost cost-effective here to buy batteries to time-shift energy from solar generation to time of consumption. At $700/kW-hr it would definitely be cost-effective to do daily load-shifting with the grid as a backup only for heavily cloudy days.

zy on ZY's Shortform

And actually, if some of these "short term" issues are not worked on, these issues will likely be forever distractions/barriers, to these populations and family/friends of these population (at some point anyone could be a part of that population), on anything including their own lives, and their life goals (maybe AI safety).

tailcalled on Three Notions of "Power"

Western states today use state violence to enforce high taxes and lots of government regulations. In my view they're probably more dominance-oriented than states which just leave rural farmers alone. At least some of this is part of a Keynesian policy to boost economic output, and economic output is closely related to military formidability (due to ability to afford raw resources and advanced technology for the military).

Hm, I guess you would see this as more closely related to bargaining power than to dominance, because in your model dominance is a human-psychology-thing and bargaining power isn't restricted to voluntary transactions?

davidmanheim on Occupational Licensing Roundup #1

Question for a lawyer: how is non-reciprocity not an interstate trade issue that federal courts can strike down?

romeostevensit on Three Notions of "Power"

I have attempted to communicate to ultra-high-net-worth individuals, seemingly to little success so far, that given the reality of limited personal bandwidth, with over 99% of their influence and decision-making typically mediated through others, it’s essential to refine the ability to identify trustworthy advisors in each domain. Expert judgment is an active field of research with valuable, actionable insights.

chris_leong on What TMS is like

Fascinating. Sounds related to the Yoga concept of kryias.

rogerdearnaley on Motivation control

Opacity: if you could directly inspect an AI’s motivations (or its cognition more generally), this would help a lot. But you can’t do this with current ML models.

The ease with which Anthropic's model organisms of misalignment were diagnosed by a simple and obvious linear probe suggests otherwise. So does the number of elements in SAE feature dictionaries that describe emotions, motivations, and behavioral patterns. Current ML models are no longer black boxes: they rapidly becoming more-translucent grey boxes. So the sorts of applications for this you go on to discuss look like they're rapidly becoming practicable.

elityre on avturchin's Shortform

You have been attacked by a pack of stray dogs twice?!?!

clone-of-saturn on The Alignment Trap: AI Safety as Path to Power

Can anyone lay out a semi-plausible scenario where humanity survives but isn't dominated by an AI or posthuman god-king? I can't really picture it. I always thought that's what we were going for since it's better than being dead.

czynski on Lighthaven Sequences Reading Group #8 (Tuesday 10/29)

Could you please announce these further in advance? Especially given the reading required beforehand it's inconvenient and honestly seems a little inconsiderate.