LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (18)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (7)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

thane-ruthenis on LLMs can learn about themselves by introspection

One is introspecting on your current mental state ("I feel a headache starting")

That's mostly what I had in mind as well. It still implies the ability to access a hierarchical model of your current state.

You're not just able to access low-level facts like "I am currently outputting the string 'disliked'", you also have access to high-level facts like "I disliked the third scene because it was violent", "I found the plot arcs boring", "I hated this movie", from which the low-level behaviors are generated.

Or using your example, "I feel a headache starting" is itself a high-level claim. The low-level claim is "I am experiencing a negative-valence sensation from the sensory modality A of magnitude X", and the concept of a "headache" is a natural abstraction over a dataset of such low-level sensory experiences.

scarcegreengrass on A Narrow Path: a plan to deal with AI extinction risk

It sounds like the core idea is a variant of the Intelligence Manhattan Project idea, but with a focus on long term international stability & a ban on competitors.

Perhaps the industry would be more likely to adopt this plan if GUARD could seek revenue the way corporations currently do: by selling stock & API subscriptions. This would also increase productivity for GUARD & shorten the dangerous arms race interval.

david-johnston on A brief theory of why we think things are good or bad

I have a pedantic and a non-pedantic answer to this. Pedantic: you say X is "usually considered good" if it increases welfare. Perhaps you mean to imply that if X is usually considered good then it is good. In this case, I refer you to the rest of the paragraph you quote.

Non-pedantic: yes, it's true that once you accept some fundamental assumptions about goodness and badness you can go about theorising and looking for evidence. I'm suggesting that motivated reasoning is the mechanism that makes those fundamental assumptions believable.

I added a paragraph mentioning this, because I think your reaction is probably common.

gordon-seidoh-worley on Information vs Assurance

The information/assurance split feels quite familiar to me as an engineering manager.

My work life revolves around projects, especially big projects that takes months to complete. Other parts of the business depend on when these projects will be done. In some cases, the entire company's growth plans may hinge on my team completing a project by a certain time. And so everyone wants as much assurance as possible about when projects will complete.

This makes it really hard to share information, because people are so hungry for assurance they will interpret almost any sharing of information as assurance. A typical conversation I used to have when I was naive to this fact:

Sales manager: Hey, Gordon, when do you think that project will be done?
Me: Oh, if things go according to plan, probably next month.
Sales manager: Cool, thanks for the update!

If the project ships next month, no problem. But as often happens in software engineering, if the project gets delayed, now the sales manager is upset:

Them: Hey, you said it would be ready next month. What gives?
Me: I said if things went according to plan, but there were surprises, so it took us longer than we initially though it would.
Them: Dammit. I sold a customer on the assumption that the project was shipping this month! What am I supposed to tell them now?
Me: I don't know, why did you do that? I was giving you an internal estimate, not a promise of delivery.
Them: You said this month. I'm tired of Engineering always having some excuse about why stuff is delayed.

What did I do wrong? I failed to understand that Sales, and most other functions in a software business, are so dependent and hungry for information from Engineering, that they saw the assurance they wanted to see rather than the information I was giving.

I've (mostly) learned my lesson. I have to carefully control how much I say to anyone not directly involved in the project, lest they get the wrong idea.

Someone: Hey, Gordon, when do you think that project will be done?
Me: We're working on it. We set a goal of having it complete by end of next quarter.

Do I actually expect it to take all the way to next quarter? No. Most likely it'll be done next month. But if anything unexpected happens, now I've given a promise I can keep.

This isn't exactly just "underpromise, overdeliver". That's part of it, but it's also about noticing when you're accidentally making a promise, even when you think you're not, even if you say really explicitly that you're not making a promise, someone will interpret as a promise and now you'll have to deal with that.

roko on If far-UV is so great, why isn't it everywhere?

Yes, certain places like preschools might benefit even from an isolated install.

But that is kind of exceptional.

The world isn't an efficient market, especially because people are kind of set in their ways and like to stick to the defaults unless there is strong social pressure to change.

cubefox on A brief theory of why we think things are good or bad

We believe many things because we are somewhat rational; we consider hypotheses, compare them with observed evidence, and emphasise those that are more compatible. Goodness and badness defy this practice; normal reasoning does not produce hypotheses of the form "if X is morally good then I will observe Y".

I disagree. If X is an action, is is usually considered good if it increases welfare, and bad if it decreases welfare. And we can find evidence for or against something being conducive to welfare. So we can find evidence for or against something being good.

cata on If far-UV is so great, why isn't it everywhere?

If you installed it in a preschool and it successfully killed all the pathogens there wouldn't be essentially no effect.

d0themath on Alexander Gietelink Oldenziel's Shortform

I think with sunglasses there’s a veneer of plausible deniability. They in fact have a utilitarian purpose outside of just creating information asymmetry. If you’re wearing a mask though, there’s no deniability. You just don’t want people to know where you’re looking.

carl-feynman on What if AGI was already accidentally created in 2019? [Fictional story]

It’s a cute idea, but AI is a terrible fiction writer.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

That's a good counterexample ! Masks are dangerous and mysterious, but not cool in the way sunglasses are in agree