LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (22)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (58)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (2)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (7)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yair-halberstadt on The average rationalist IQ is about 122

Here’s the breakdown: a median SAT score of 1490 (from the LessWrong 2014 survey) corresponds to +2.42 SD, which regresses to +1.93 SD for IQ using an SAT-IQ correlation of +0.80. This equates to an IQ of 129.

I don't think that works unless Less wrong specifically selects for high SAT score. If it selects for high IQ and the high SAT is as a result of the high IQ then you would have to go the other way and assume an SD of 3.03.

If, as seems more likely, Less wrong correlates with both IQ and SAT score, then the exact number is impossible to calculate, but assuming it correlates with both equally we would estimate IQ at 2.42 SD.

finalformal2 on Being Present is Not a Skill

How would you recommend learning how to get rid of emotional blocks?

unexpectedvalues on The average rationalist IQ is about 122

These both seem valid to me! Now, if you have multiple predictors (like SAT and height), then things get messy because you have to consider their covariance and stuff.

jenn on Ideas for benchmarking LLM creativity

I wrote up my much less seriously considered tests at https://jenn.site/2024/12/llm-creativity/ in part due to this post.

LLMs for creative work seems to be an area that you're poking at a lot these days and I always enjoy seeing what you get up to with it :]

tsvibt on Alexander Gietelink Oldenziel's Shortform

Alternative: "AI x-derisking"

fer32dwt34r3dfsz on AlphaAndOmega's Shortform

I decided to run the same question through the latest models to gauge their improvements.

Not exactly sure if there is much advantage at all in you having done this, but I feel inclined to say Thank You for persisting in persuading your cousin to at least consider concerns regarding AI, even if he perceptually filters those concerns to mostly regard job automation over others, such as a global catastrophe.

In my own life, over the last several years, I have found it difficult to persuade those close to me to really consider concerns from AI.

I thought that capabilities advancing observably before them might stoke them to think more about their own future and how possibly to behave and or live differently conditional on different AI capabilities, but this has been of little avail.

Expanding capabilities seem to best dissolve skepticism but conversations seem to have not had as large an effect as I would have expected. I've not thought or acted as much as I want to on how to coordinate more of humanity around decision-making regarding AI (or the consequences of AI), partially since I do not have a concrete notion where to steer humanity or justification for where to steer (even I knew it was highly likely I was actually contributing to the steering through my actions).

petermccluskey on What happens next?

I want to register different probabilities:

Jobs hits first (25%).
AGI race hits first (50%).
Alignment hits first (15%).

archimedes on Review: Planecrash

This review led me to find the following podcast version of Planecrash. I've listened to the first couple of episodes and the quality is quite good.

https://askwhocastsai.substack.com/s/planecrash

fer32dwt34r3dfsz on The average rationalist IQ is about 122

The second paragraph puts into words something I've noticed but not really mentally formalized before. Some anecdotal evidence from my own life in support of the claims made in this paragraph: I've met individuals whose tested IQ exceeds those of other, lower but not much lower, IQ individuals I know who are more educated / trained in epistemological thinking and tangential disciplines. For none of the individual-pairs I have in mind would I declare that one person "ran circles around" the other, however, the difference (advantage going to the lower but better "trained" IQ individual) in conversational dynamics were notable enough for me to remember well. The catch here is the accuracy of the IQ claims made by some of these individuals, as some did not personally reveal their scores to me.

cstinesublime on CstineSublime's Shortform

Kantmogorov Imperative - more of a philosophical dad-joke than a actual thing, it is the shortest possible computer program that outputs descriptions of morally consistent behaviors in all/any circumstances