LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:45:49.675Z · comments (3)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

Decent plan prize announcement (1 paragraph, $1k)
lukehmiles (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)

The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

meemi on Finishing The SB-1047 Documentary In 6 Weeks

I think this is a great project. I believe your documentary would have high impact via informing and inspiring AI policy discussions. You've already interviewed an impressive amount of relevant people. I admire your initiative to take on this project quickly, even before getting funding for it.

tailcalled on Three Notions of "Power"

Ah. I would say human psychology is too epiphenomenal so I'm mainly modelling things that shape (dis)equillibria in complex ecologies.

jblack on Is the Power Grid Sustainable?

At $150/kW-hr and assuming a somewhat low 3000 cycle lifetime, such batteries would cost $0.05 per cycled kW-hr which is very much cost-effective when paired with the extremely low cost but inconveniently timed nature of solar power. It would drop the amortized cost of a complete off-grid power system for my home to half that of grid power in my area, for example.

Even now at $1000/kW-hr retail it's almost cost-effective here to buy batteries to time-shift energy from solar generation to time of consumption. At $700/kW-hr it would definitely be cost-effective to do daily load-shifting with the grid as a backup only for heavily cloudy days.

Pumped hydro is already underway in this region, though it's proving more expensive and time-consuming to build than expected. Have there been some recent advances in compressed air energy storage? The information I read 2-3 years ago did not look promising at any scale.

zy on ZY's Shortform
tailcalled on Three Notions of "Power"

Western states today use state violence to enforce high taxes and lots of government regulations. In my view they're probably more dominance-oriented than states which just leave rural farmers alone. At least some of this is part of a Keynesian policy to boost economic output, and economic output is closely related to military formidability (due to ability to afford raw resources and advanced technology for the military).

Hm, I guess you would see this as more closely related to bargaining power than to dominance, because in your model dominance is a human-psychology-thing and bargaining power isn't restricted to voluntary transactions?

davidmanheim on Occupational Licensing Roundup #1

Question for a lawyer: how is non-reciprocity not an interstate trade issue that federal courts can strike down?

romeostevensit on Three Notions of "Power"

I have attempted to communicate to ultra-high-net-worth individuals, seemingly to little success so far, that given the reality of limited personal bandwidth, with over 99% of their influence and decision-making typically mediated through others, it’s essential to refine the ability to identify trustworthy advisors in each domain. Expert judgment is an active field of research with valuable, actionable insights.

chris_leong on What TMS is like

Fascinating. Sounds related to the Yoga concept of kryias.

rogerdearnaley on Motivation control

Opacity: if you could directly inspect an AI’s motivations (or its cognition more generally), this would help a lot. But you can’t do this with current ML models.

The ease with which Anthropic's model organisms of misalignment were diagnosed by a simple and obvious linear probe suggests otherwise. So does the number of elements in SAE feature dictionaries that describe emotions, motivations, and behavioral patterns. Current ML models are no longer black boxes: they rapidly becoming more-translucent grey boxes. So the sorts of applications for this you go on to discuss look like they're rapidly becoming practicable.

elityre on avturchin's Shortform

You have been attacked by a pack of stray dogs twice?!?!