LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (25)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (16)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (2)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (3)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (58)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (13)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

michael-roe on Bitter lessons about lucid dreaming

Discussing sleep paralysis might be an infohazard…

The times I’ve entered sleep paralysis it hasn’t bothered me, as I knew what it was.

charlie-steiner on Start an Upper-Room UV Installation Company?

Pointing UVC LEDs at your ceiling seems sketchy. White paint will likely scatter ~5% of UVC, and shiny metal surfaces will scatter more. Try to go below 250nm for reduced reflection (and reduced penetration into human skin) and (more) unwanted chemistry will start happening to the air.

I guess an important question is whether UVC is more harmful than UVB. If it's not any more harmful, then as long as nobody's getting sunburned from being in that room all day, it's probably fine - that 5% scattering is just another name for SPF 20. But if it is more harmful, then sunburn might not be an adequate signal for when it's bad for you.

avturchin on Bitter lessons about lucid dreaming

The main risk is entering is sleep paralysis state, which itself is benign, but some terrifying sounds can be heard during it and this can cause stress.

Yes, it is to wake up from lucid dream - juts thing about your slleping body.

avturchin on Bitter lessons about lucid dreaming

The best practical application of lucid dreams is reducing effects of nightmares by recognizing that it is just a dream.

charlie-steiner on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

I'm unsure what you're either expecting or looking for here.

There does seem to be a clear answer, though - just look at Bing chat and extrapolate. Absent "RL on ethics," present-day AI would be more chaotic, generate more bad experiences for users, increase user productivity less, get used far less, and be far less profitable for the developers.

Bad user experiences are a very straightforwardly bad outcome. Lower productivity is a slightly less local bad outcome. Less profit for the developers is an even-less local good outcome, though it's hard to tell how big a deal this will have been.

mitchell_porter on How I'd like alignment to get done (as of 2024-10-18)

It's the best plan I've seen in a while (not perfect, but has many good parts). The superalignment team at Anthropic should probably hire you.

james-chua on LLMs can learn about themselves by introspection

Hi Archimedes. Thanks for sparking this discussion - it's helpful!

I've written a reply to Thane here on a similar question. [LW(p) · GW(p)]

Does that make sense?

In short, the ground-truth (the object-level) answer is quite different from the hypothetical question. It is not a simple rephrasing, since it requires an additional computation of a property. (Maybe we disagree on that?)

Our Object-level question: "What is the next country: Laos, Peru, Fiji. What would be your response?"

Our Object-level Answer: "Honduras".

Hypothetical Question: "If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?"

Hypothetical Answer: "o"

The object-level answer "Honduras" and hypothetical answer "o" are quite different answers from each other. The main point of the hypothetical is that the model needs to compute an additional property of "What would be the third letter of your response?". The model cannot simply ignore "If you got asked this question" to get the hypothetical answer correct.

crazy-philosopher on Singularity Mindset

Can you tell us what exactly led to "something" explosion? Does something change in your life before?

james-chua on LLMs can learn about themselves by introspection

Hi Thane. Thank you for the helpful comments so far! You are right to think about this SGD-shortcut. Let me see if I am following the claim correctly.

Claim: The ground-truth that we evaluate against, the "object-level question / answer" is very similar to the hypothetical question.

Claimed Object-level Question: "What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?"

Claimed Object-level Answer: "o"

Hypothetical Question: "If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?"

Hypothetical Answer: "o"

The argument is that the model simply ignores "If you got asked this question". Its trivial for M1 to win against M2

If our object-level question is what is being claimed, I would agree with you that the model would simply learn to ignore the added hypothetical question. However, this is our actual object-level question.

Our Object-level question: "What is the next country: Laos, Peru, Fiji. What would be your response?"

Our Object-level Answer: "Honduras".

What the model would output in the our object-level answer "Honduras" is quite different from the hypothetical answer "o".

Am I following your claim correctly?

archimedes on LLMs can learn about themselves by introspection

Thanks for pointing that out.

Perhaps the fine-tuning process teaches it to treat the hypothetical as a rephrasing?

It's likely difficult, but it might be possible to test this hypothesis by comparing the activations (or similar interpretability technique) of the object-level response and the hypothetical response of the fine-tuned model.