LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Gradient hacking is extremely difficult
beren · 2023-01-24T15:45:46.518Z · comments (22)

Small and Vulnerable
sapphire (deluks917) · 2021-05-03T04:55:52.149Z · comments (17)

The 2021 Less Wrong Darwin Game
lsusr · 2021-09-24T21:16:35.356Z · comments (102)

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor · 2022-09-28T01:20:11.605Z · comments (19)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

Secure homes for digital people
paulfchristiano · 2021-10-10T15:50:02.697Z · comments (37)

RAISE post-mortem
[deleted] · 2019-11-24T16:19:05.163Z · comments (12)

The Dial of Progress
Zvi · 2023-06-13T13:40:06.354Z · comments (119)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

Rereading Atlas Shrugged
Vaniver · 2020-07-28T18:54:45.272Z · comments (36)

ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

Saving Time
Scott Garrabrant · 2021-05-18T20:11:14.651Z · comments (20)

Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)

Agentized LLMs will change the alignment landscape
Seth Herd · 2023-04-09T02:29:07.797Z · comments (102)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

Jailbreaking GPT-4's code interpreter
Nikola Jurkovic (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)

Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

Repeal the Foreign Dredge Act of 1906
Zvi · 2022-05-05T15:20:01.739Z · comments (16)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

My research methodology
paulfchristiano · 2021-03-22T21:20:07.046Z · comments (38)

Curing insanity with malaria
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-08-04T02:28:11.731Z · comments (8)

Wireless is a trap
benkuhn · 2020-06-07T15:30:02.352Z · comments (13)

Why all the fuss about recursive self-improvement?
So8res · 2022-06-12T20:53:42.392Z · comments (62)

«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (33)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

Godzilla Strategies
johnswentworth · 2022-06-11T15:44:16.385Z · comments (71)

[link] What would a compute monitoring plan look like? [Linkpost]
Akash (akash-wasil) · 2023-03-26T19:33:46.896Z · comments (10)

Slack matters more than any outcome
Valentine · 2022-12-31T20:11:02.287Z · comments (56)

[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steven Byrnes (steve2152) · 2022-01-26T15:23:22.429Z · comments (19)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] Pseudorandomness contest: prizes, results, and analysis
Eric Neyman (UnexpectedValues) · 2021-01-15T06:24:15.317Z · comments (22)

How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (89)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)

AI doom from an LLM-plateau-ist perspective
Steven Byrnes (steve2152) · 2023-04-27T13:58:10.973Z · comments (24)

My computational framework for the brain
Steven Byrnes (steve2152) · 2020-09-14T14:19:21.974Z · comments (26)

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth · 2022-06-04T05:41:56.713Z · comments (55)

Biology-Inspired AGI Timelines: The Trick That Never Works
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-12-01T22:35:28.379Z · comments (142)

My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (37)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (36)

[link] grey goo is unlikely
bhauth · 2023-04-17T01:59:57.054Z · comments (120)

[link] Tuning your Cognitive Strategies
Raemon · 2023-04-27T20:32:06.337Z · comments (57)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

t3t on Everywhere I Look, I See Kat Woods

I was thinking the same thing. This post badly, badly clashes with the vibe of Less Wrong. I think you should delete it, and repost to a site in which catty takedowns are part of the vibe. Less Wrong is not the place for it.

I think this is a misread of LessWrong's "vibes" and would discourage other people from thinking of LessWrong as a place where such discussions should be avoided by default.

With the exception of the title, I think the post does a decent job at avoiding making it personal.

embee on Open Thread Winter 2024/2025

Hi! I'm Embee but you can call me Max.

I'm a mathematics for quantum physics graduate student considering redirecting my focus toward AI alignment research. My background includes:
- Graduate-level mathematics
- Focus on quantum physics
- Programming experience with Python
- Interest in type theory and formal systems

I'm particularly drawn to MIRI-style approaches and interested in:
- Formal verification methods
- Decision theory implementation
- Logical induction
- Mathematical bounds on AI systems

My current program feels too theoretical and disconnected from urgent needs. I'm looking to:
- Connect with alignment researchers
- Find concrete projects to contribute to
- Apply mathematical rigor to safety problems
- Work on practical implementations

Regarding timelines: I have significant concerns about rapid capability advances, particularly given recent developments (o3). I'm prioritizing work that could contribute meaningfully in a compressed timeframe.

Looking for guidance on:
- Most neglected mathematical approaches to alignment
- Collaboration opportunities
- Where to start contributing effectively
- Balance between theory and implementation

huera on Unregulated Peptides: Does BPC-157 hold its promises?

A blogger who goes by Troof created a huge questionnaire to get people to report their experiences with various nootropics including peptides. He writes:
Selank, Semax, Cerebrolysin, BPC-157 are all peptides, and they are all in the green “uncommon-but-great” rectangle above. Their mean ratings are excellent, but their probabilities of changing your life are especially impressive: between 5 and 20% for Cerebrolysin (which matches anecdotal reports), between 2 and 13% for BPC-157, and between 3 and 7% for Semax.

This article pretty much convinced me that cerebrosylin doesn't work (as a nootropic), which made me quite sceptical of all popular peptides, since it's also the highest-rated one in troof's survey.

embee on Welcome & FAQ!

The best pathway towards becoming a member is to produce lots of great AI Alignment content, and to post it to LessWrong and participate in discussions there. The LessWrong/Alignment Forum admins monitor activity on both sites, and if someone consistently contributes to Alignment discussions on LessWrong that get promoted to the Alignment Forum, then it’s quite possible full membership will be offered.

Got it. Thanks.

quetzal_rainbow on How do fictional stories illustrate AI misalignment?

I think, collusion between AIs?

zy on Habryka's Shortform Feed

Out of curiosity - what was the time span for this raise that achieved this goal/when did first start again? Was it 2 months ago?

faul_sname on LLMs for language learning

Adapting spaced repetition to interruptions in usage: Even without parsing the user’s responses (which would make this robust to difficult audio conditions), if the reader rewinds or pauses on some answers, the app should be able to infer that the user is having some difficulty with the relevant material, and dynamically generate new content that repeats those words or grammatical forms sooner than the default.

Likewise, if the user takes a break for a few days, weeks, or months, the ratio of old to new material should automatically adjust accordingly, as forgetting is more likely, especially of relatively new material. (And of course with text to speech, an interactive app that interpreted responses from the user could and should be able to replicate LanguageZen’s ability to specifically identify (and explain) which part of a user’s response was incorrect, and why, and use this information to adjust the schedule on which material is reviewed or introduced.)

Seems like this one is mostly a matter of schlep rather than capability. The abilities you would need to make this happen are

Have a highly granular curriculum for what vocabulary and what skills are required to learn the language and a plan for what order to teach them in / what spaced repetition schedule to aim for
Have a granular and continuously updated model of the user's current knowledge of vocabulary, rules of grammar and acceptability, idioms, if there are any phonemes or phoneme sequences they have trouble with
Given specific highly granular learning goals (e.g. "understanding when to use preterite vs imperfect when conjugating saber" in spanish) within the curriculum and the model of the user's knowledge and abilities, produce exercises which teach / evaluate those specific skills.
Determine whether the user had trouble with the exercise, and if so what the trouble was
Based on the type of trouble the user had, describe whay updates should be made to the model of the user's knowledge and vocabulary
Correctly apply the updates from (6)
Adapt to deviations from the spaced repetition plan (tbh this seems like the sort of thing you would want to do with normal code)

I expect that the hardest things here will be 1, 2, and 6, and I expect them to be hard because of the volume of required work rather than the technical difficulty. But I also expect the LanguageZen folks have already tried this and could give you a more detailed view about what the hard bits are here.

Automatic customization of content through passive listening

This sounds like either a privacy nightmare or a massive battery drain. The good language models are quite compute intensive, so running them on a battery-powered phone will drain the battery very fast. Especially since this would need to hook into the "granular model of what the user knows" piece.

metawrong on Shortform

How does this explain the Decoy effect ^[1]?

^{^}
I am not sure how real and how well researched the 'decoy effect' is

shankar-sivarajan on Why abandon “probability is in the mind” when it comes to quantum dynamics?

You might also like this short summary from MinutePhysics:

owain_evans on Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

Author here: I'm excited for people to make better versions of TruthfulQA. We started working on TruthfulQA in early 2021 and we would do various things differently if we were making a truthfulness benchmark for LLMs in early 2025.

That said, you do not provide evidence that "many" questions are badly labelled. You just pointed to one question where you disagree with our labeling. (I agree with you that there is ambiguity as to how to label questions like that). I acknowledge that there are mistakes in TruthfulQA but this is true of almost all benchmarks of this kind.