LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (1)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (15)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (8)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (14)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (4)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (14)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (3)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (14)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bilalchughtai on The Field of AI Alignment: A Postmortem, and What To Do About It

In fact, this mindset gave me burnout earlier this year.

I relate pretty strongly to this. I think almost all junior researchers are incentivised to 'paper grind' for longer than is correct. That said, I do think there are pretty strong returns to having one good paper for credibility reasons; it signals that you are capable of doing AI safety research, and thus makes it easier to apply for subsequent opportunities.

Over the past 6 months I've dropped the paper grind mindset and am much happier for this. Notably, were it not for short term grants where needing to visibly make progress is important, I would have made this update sooner. A related take that I have is that if you have the flexibility to do so, front-loading learning time seems good. See here for a related take by Rohin. Making progress on hard problems requires understanding things deeply, in a way which making progress on easy problems that you could complete during e.g. MATS might not.

christiankl on Jimrandomh's Shortform

If it would be only true in the case of calorie restriction, why don't we have better studies about the effects of salt?

People like to eat together with other people. They go together to restaurants to eat shared meals. They have family dinners.

bilalchughtai on bilalchughtai's Shortform

You might want to stop using the honey extension. Here are some shady things they do, beyond the usual:

Steal affiliate marketing revenue from influencers (who they also often sponsor), by replacing the genuine affiliate referral cookie with their affiliate referral cookie.
Deceive customers by deliberately withholding the best coupon codes, while claiming they have found the best coupon codes on the internet; partner businesses control which coupon codes honey shows consumers.

declan-molony on Americans are fat and sick—and it’s their fault…right?

I'm not sure if any country has successfully been able to withstand the pressures of the processed food industry once it has entered their country. At least, I couldn't find any examples in my research.

The only countries that potentially could make quick resolutions are those that have high levels of state power over personal freedoms, like China. But so far that hasn't happened yet.

Places like the US will likely continue to suffer from chronic disease in the short-to-medium term (~10-50 years) due to its emphasis on personal freedom to be able to deteriorate one's health.

I'm not sure "we're super dumb as [a] society" so much as we're gridlocked by special interest groups. We were able to take action against the tobacco industry because nobody has to smoke. But everyone's gotta eat.

avturchin on Turing-Test-Passing AI implies Aligned AI

The main problem here is that this approach doesn't solve alignment, but merely shifts it to another system. We know that human organizational systems also suffer from misalignment - they are intrinsically misaligned. Here are several types of human organizational misalignment:

Dictatorship: exhibits non-corrigibility, with power becoming a convergent goal
Goodharting: manifests the same way as in AI systems
Corruption: acts as internal wireheading
Absurd projects (pyramids, genocide): parallel AI's paperclip maximization
Hansonian organizational rot: mirrors error accumulation in AI systems
Aggression: parallels an AI's drive to dominate the world

All previous attempts to create a government without these issues have failed (Musk's DOGE will likely be another such attempt).

Furthermore, this approach doesn't prevent others from creating self-improving paperclippers.

davidmanheim on When Is Insurance Worth It?

"If you already know that an adverse event is highly likely for your specific circumstances, then it is likely that the insurer will refuse to pay out for not disclosing "material information" - a breach of contract."

Having worked in insurance, that's not what the companies usually do. Denying explicitly for clear but legally hard to defend reasons, especially those which a jury would likely rule against, isn't a good way to reduce costs and losses. (They usually will just say no and wait to see if you bother following up. Anyone determined enough to push to get a reasonable claim is gonna be cheaper to pay out for than to fight.)

shankar-sivarajan on Comment on "Death and the Gorgon"

Given so many shared premises, it's puzzling to me why Egan seems to bear so much antipathy towards "us"

This is a fairly well-documented phenomenon: the narcissism of small differences.

Also:

the OpenPhil people and the MIRI people and the Vassarites and ... &c. are all totally different and in fact hate each other's guts

is clearly an instance of the same phenomenon.

johnswentworth on The Plan - 2024 Update

But why do we care more about statistical relationships between physical humans and dogs than statistical relationships between the word "human" and the word "dog" as characters on your screen?

An overly-cute but not completely wrong answer: because I care about whether AI kills all the physical humans, not whether something somewhere writes the string "kill all the humans". My terminal-ish values are mostly over the physical stuff.

I think the point you're trying to make is roughly "well, it's all pretty entangled with the physical stuff anyway, so why favor one medium or another? Instrumentally, either suffices.". And the point I'm trying to make in response is "it matters a lot how complicated the relationship is between the medium and the physical stuff, because terminally it's the physical we care about, so instrumentally stuff that's more simply related to the physical stuff is a lot more useful to understand.".

the-gears-to-ascension on Deontic Explorations In "Paying To Talk To Slaves"

Bit of a tangent, but topical: I don't think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it's not clear what internal phenomena are negative for the reanimated mind. The equivalence to slavery seems to me better expressed by saying they approximately reanimated mind-defining data without the consent of the minds being reanimated; the way people express this is normally to say things like "stolen data".

joseph-miller on Review: Planecrash

Has someone made an ebook that I can easily download onto my kindle?

I'm unclear if a good ebook should include all the pictures from the original version.