LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (11)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Noticing Panic
Cole Wyeth (Amyr) · 2024-02-05T03:45:51.794Z · comments (8)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
[deleted] · 2024-05-17T16:25:02.267Z · comments (10)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (18)

some thoughts on LessOnline
Raemon · 2024-05-08T23:17:41.372Z · comments (5)

[link] Meditations on Mot
Richard_Ngo (ricraz) · 2023-12-04T00:19:19.522Z · comments (11)

Phallocentricity in GPT-J's bizarre stratified ontology
mwatkins · 2024-02-17T00:16:15.649Z · comments (37)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

anthonyc on AI #90: The Wall

That's a good point about public discussions. It's not how I absorb information, but I can definitely see that.

david-james on Neutrality

they weren’t designed to be ultra-robust to exploitation, or to make serious attempts to assess properties like truth, accuracy, coherence, usefulness, justice

There are notable differences between these properties. Usefulness and justice are quite different from the others (truth, accuracy, coherence). Usefulness (defined as suitability for a purpose, which is non-prescriptive as to the underlying norms) is different from justice (defined by some normative ideal). Coherence requires fewer commitments than truth and accuracy.

Ergo, I could see various instantiations of a library designed to satisfy various levels. Level 1 would value coherence. Level 2 would add truth and accuracy. Level 3: +usefulness. Level 4, +justice.

selfmaker662 on Ayn Rand’s model of “living money”; and an upside of burnout

I wouldn’t say the subsconscious calibrating on more substantial measures of success, such has “how happy something made me” or “how much status that seems to have brought” is irrational. What you’re proposing, it seems to me, is calibrating only on how good of an idea it was from the predictor part / System 2. Which gets calibrated, I would guess, when the person analyses the situation? But if the system 2 is sufficiently bad, calibrating on pure results is a good idea to shield against pursuing some goal, the pursuit of which yields nothing but evaluations of System 2, that the person did well. Which is bad, if one of the end goals of the subconscious is “objective success”.

For example, a situation I could easily imagine myself to have been in: Every day I struggle to go to bed, because I can’t put away my phone. But when I do, at 23:30, I congratulate myself - it took a lot of effort, and I did actually succeed in giving myself enough time to sleep almost long enough. If I didn’t recalibrate rationally, and “me-who-uses-internal-metrics-of-success” were happy with good effort every day, I’d keep doing it. All while real me would get fed up soon, and get a screen blocker app to turn on at 23:00 every day to sleep well every day at no willpower cost. (+- the other factors and supposing phone after 23 isn’t very important for some parts of me)

dakara on Noosphere89's Shortform

I've been reading a lot of the stuff that you have written and I agree with most of it (like 90%). However, one thing which you mentioned (somewhere else, but I can't seem to find the link, so I am commenting here) and which I don't really understand is iterative alignment.

I think that the iterative alignment strategy has an ordering error – we first need to achieve alignment to safely and effectively leverage AIs.

Consider a situation where AI systems go off and “do research on alignment” for a while, simulating tens of years of human research work. The problem then becomes: how do we check that the research is indeed correct, and not wrong, misguided, or even deceptive? We can’t just assume this is the case, because the only way to fully trust an AI system is if we’d already solved alignment, and knew that it was acting in our best interest at the deepest level.

Thus we need to have humans validate the research. That is, even automated research runs into a bottleneck of human comprehension and supervision.

The appropriate analogy is not one researcher reviewing another, but rather a group of preschoolers reviewing the work of a million Einsteins. It might be easier and faster than doing the research itself, but it will still take years and years of effort and verification to check any single breakthrough.

Fundamentally, the problem with iterative alignment is that it never pays the cost of alignment. Somewhere along the story, alignment gets implicitly solved.

lblack on Alexander Gietelink Oldenziel's Shortform

for a large enough (overparameterized) architecture - in other words it can be measured by the

The sentence seems cut off.

gunnar_zarncke on OpenAI Email Archives (from Musk v. Altman)

A much smaller subset was also published here, but does include documents:

https://www.techemails.com/p/elon-musk-and-openai?r=1jki4r

rotatingpaguro on AI #90: The Wall

I agree with whay you say about how to maximize what you get out of an interview. I also agree about that discussion vs. debate distinction you make, and I wasn't specifically trying to go there when I used the word "debate", I was just sloppy with words.

I guess you agree that it is friction to create a social norm that you should do a read up of the other person material before engaging in public. I expect less discussions would happen. There is not a clear threshold at how much you should be prepared.

I guess we disagree about how much value do we lose due to eliminating discussions that could have happaned, vs. how much value we gain by eliminating some lower quality discussions.

Another angle I have in mind that sidesteps this direct compromise, is that maybe what we value out of such discussions is not just doing an optimal play in terms of information transmitted between the parties. A public discussion has many different viewers. In the case at hand, I expect many people get more out of the discussion if they can see Wolfram think through the thing for the first time in real time, rather than having two informed people start discussing finer points in medias res.

viliam on Neutrality

Library in the sense of "we collect texts written by other people" is: The Best Textbooks on Every Subject [LW · GW]

I would like to see this one improved; specifically to have a dedicated UI where people can add books, vote on books, and review them. Maybe something like "people who liked X also liked Y".

Also, not just textbooks, but also good popular science books, etc.

gerardus-mercator on Claude seems to be smarter than LessWrong community

I see those assertions, but I don't see why an intelligent agent would be persuaded by them. Why would it think that the hypothetical objective goal is better than its utility function? Caring about objective facts and investigating them is also an instrumental goal compared to the terminal goal of optimizing its utility function. The agent's only frame of reference for 'better' and 'worse' is relative to its utility function; it would presumably understand that there are other frames of reference, but I don't think it would apply them, because that would lead to a worse outcome according to its current frame of reference.

dakara on Simple probes can catch sleeper agents

I am also interested in knowing whether the probing method is a solution to the undetectable backdoor problem.