LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Bay Winter Solstice 2024: Speech Auditions
ozymandias · 2024-11-04T22:31:38.680Z · comments (0)

[link] Stone Age Herbalist's notes on ant warfare and slavery
trevor (TrevorWiesinger) · 2024-11-09T02:40:01.128Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (23)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

Context-dependent consequentialism
Jeremy Gillen (jeremy-gillen) · 2024-11-04T09:29:24.310Z · comments (3)

Incentive design and capability elicitation
Joe Carlsmith (joekc) · 2024-11-12T20:56:05.088Z · comments (0)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (12)

Meme Talking Points
ymeskhout · 2024-11-06T15:27:54.024Z · comments (0)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (108)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

SAE Probing: What is it good for? Absolutely something!
Subhash Kantamneni (subhashk) · 2024-11-01T19:23:55.418Z · comments (0)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (5)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

atillayasar on AI alignment via civilizational cognitive updates

Despite being "into" AI safety for a while, I haven't picked a side. I do believe it's extremely important and deserves more attention and I believe that AI actually could kill everyone in less 5 years.

But any effort spent on pinning down one's "p(doom)" is not spent usefully on things like: how to actually make AI safe, how AI works, how to approach this problem as a civilization/community, how to think about this problem. And, as was my intention with this article, "how to think about things in general, and how to make philosophical progress".

rhollerith_dot_com on Thomas Kwa's Shortform

Are Eliezer and Nate right that continuing the AI program will almost certainly lead to extinction or something approximately as disastrous as extinction?

rhollerith_dot_com on quila's Shortform

A lot of people e.g. Andrew Huberman (who recommends many supplements for cognitive enhancement and other ends) recommend against supplementing melatonin except to treat insomnia that has failed to respond to many other interventions.

brendan-long on The Humanitarian Economy

I think you've re-invented Communism. The reason we don't implement it is that in practice it's much worse for everyone, including poor people.

rhollerith_dot_com on What are the primary drivers that caused selection pressure for intelligence in humans?

It's also important to consider the selection pressure keeping intelligence low, namely, the fact that most animals chronically have trouble getting enough calories, combined with the high caloric needs of neural tissue.

It is no coincidence that human intelligence didn't start rising much till humans were reliably getting meat in their diet and they started to routinely cook their food, which makes whatever calories are in food easier to digest, allowing the human gut to get smaller, which in turn reduced the caloric demands of the gut (which needs to be kept alive 24 hours a day even on a day when the person finds no food to eat).

ricraz on The Minority Coalition

Just changed the name to The Minority Coalition.

chris_leong on Thomas Kwa's Shortform

"How can we get more evidence on whether scheming is plausible?" - What if we ran experiments where we included some pressure towards scheming (either RL or fine-tuning) and we attempted to determine the minimum such pressure required to cause scheming? We could further attempt to see how this interacts with scaling.

benito on AI Safety is Dropping the Ball on Clown Attacks

I saw this image shared on Twitter, which I feel takes a pretty opposite position on this phenomena.

(I'm not linking to attribution because Twitter feels like a bad game and it's shared in a highly political context.)

james-lucassen on Context-dependent consequentialism

Maybe I'm just reading my own frames into your words, but this feels quite similar to the rough model of human-level LLMs I've had in the back of my mind for a while now.

You think that an intelligence that doesn't-reflect-very-much is reasonably simple. Given this, we can train chain-of-thought type algorithms to avoid reflection using examples of not-reflecting-even-when-obvious-and-useful. With some effort on this, reflection could be crushed with some small-ish capability penalty, but massive benefits for safety.

In particular, this reads to me like the "unstable alignment" paradigm I wrote about a while ago.

You have an agent which is consequentialist enough to be useful, but not so consequentialist that it'll do things like spontaneously notice conflicts in the set of corrigible behaviors you've asked it to adhere to and undertake drastic value reflection to resolve those conflicts. You might hope to hit this sweet spot by default, because humans are in a similar sort of sweet spot. It's possible to get humans to do things they massively regret upon reflection as long as their day to day work can be done without attending to obvious clues (eg guy who's an accountant for the Nazis for 40 years and doesn't think about the Holocaust he just thinks about accounting). Or you might try and steer towards this sweet spot by developing ways to block reflection in cases where it's dangerous without interfering with it in cases where it's essential for capabilities.

elityre on quila's Shortform

I predict this won't work as well as you hope because you'll be fighting the circadian effect that partially influences your cognitive performance.

Also, some ways to maximize your sleep quality are too exercise very intensely and/or to sauna, the day before.