LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (11)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (65)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
sturb (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

On Complexity Science
Garrett Baker (D0TheMath) · 2024-04-05T02:24:32.039Z · comments (19)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

On OpenAI’s Model Spec 2.0
Zvi · 2025-02-21T14:10:06.827Z · comments (3)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

AI #100: Meet the New Boss
Zvi · 2025-01-23T15:40:07.473Z · comments (4)

Against blanket arguments against interpretability
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-22T09:46:23.486Z · comments (4)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (29)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (16)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (8)

Analysis of Global AI Governance Strategies
Sammy Martin (SDM) · 2024-12-04T10:45:25.311Z · comments (10)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (44)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (15)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on Have LLMs Generated Novel Insights?

I think the argument you’re making is that since LLMs can make eps > 0 progress, they can repeat it N times to make unbounded progress. But this is not the structure of conceptual insight as a general rule. Concretely, it fails for the architectural reasons I explained in the original post.

martin-randall on How might we safely pass the buck to AI?

The IMO Challenge Bet [LW · GW] was on a related topic, but not directly comparable to Bio Anchors. From MIRI's 2017 Updates and Strategy:

There’s no consensus among MIRI researchers on how long timelines are, and our aggregated estimate puts medium-to-high probability on scenarios in which the research community hasn’t developed AGI by, e.g., 2035. On average, however, research staff now assign moderately higher probability to AGI’s being developed before 2035 than we did a year or two ago.

I don't think the individual estimates that made up the aggregate were ever published. Perhaps someone at MIRI can help us out, it would help build a forecasting track record [LW · GW] for those involved.

For Yudkowsky in particular, I have a small collection of sources to hand. In Biology-Inspired AGI Timelines [LW · GW] (2021-12-01), he wrote:

But I suppose I cannot but acknowledge that my outward behavior seems to reveal a distribution whose median seems to fall well before 2050.

On Twitter (2022-12-02):

I could be wrong, but my guess is that we do not get AGI just by scaling ChatGPT, and that it takes surprisingly long from here. Parents conceiving today may have a fair chance of their child living to see kindergarten.

Also, in Shut it all down [LW · GW] (March 2023):

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a six-month moratorium.

Yudkowsky also has a track record betting on Manifold that AI will wipe out humanity by 2030, at up to 40%.

Putting these together:

2021: median well before 2050
2022: "fair chance" when a 2023 baby goes to kindergarten (Sep 2028 or 2029)
2023: before a young child grows up (about 2035)
40% P(Doom by 2030)

So a median of 2029, with very wide credible intervals around both sides. This is just an estimate based on his outward behavior.

Would Yudkowsky describe this as "Yudkowsky's doctrine of AGI in 2029"?

cole-wyeth on Have LLMs Generated Novel Insights?

Obviously it’s not a hard line, but your example doesn’t count, and proving any open conjecture in mathematics which was not constructed for the purpose does count. I think the quote from my post gives some other central examples. The standard is conceptual knowledge production.

donald-hobson on How to Make Superbabies

Von Neumann existed,

Yes. I expect extreme cases of human intelligence to come from a combination of fairly good genes, and a lot of environmental and developmental luck. Ie if you took 1000 clones of Von Neumann, you still probably wouldn't get that lucky again. (Although it depends on the level of education too)

Some ideas about what the tradeoffs might be.

Emotional social getting on with people vs logic puzzle solving IQ.

Engineer parents are apparently more likely to have autistic children. This looks like a tradeoff to me. To many "high IQ" genes and you risk autism.

How many angels can dance on the head of a pin. In the modern world, we have complicated elaborate theoretical structures that are actually correct and useful. In the pre-modern world, the sort of mind that now obsesses about quantum mechanics would be obsessing about angels dancing on pinheads or other equally useless stuff.

mr-hire on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

Here's the part of the blog post where they describe what's different about Claude 3.7

We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market. Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely. This unified approach also creates a more seamless experience for users.
Claude 3.7 Sonnet embodies this philosophy in several ways. First, Claude 3.7 Sonnet is both an ordinary LLM and a reasoning model in one: you can pick when you want the model to answer normally and when you want it to think longer before answering. In the standard mode, Claude 3.7 Sonnet represents an upgraded version of Claude 3.5 Sonnet. In extended thinking mode, it self-reflects before answering, which improves its performance on math, physics, instruction-following, coding, and many other tasks. We generally find that prompting for the model works similarly in both modes.
Second, when using Claude 3.7 Sonnet through the API, users can also control the budget for thinking: you can tell Claude to think for no more than N tokens, for any value of N up to its output limit of 128K tokens. This allows you to trade off speed (and cost) for quality of answer.
Third, in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.

I assume they're referring to points 1 and 2. It's a single model, that can have its reasoning anywhere from 0 tokens (which I imagine is the default non-reasoning model) all the way up to 128k tokens.

davidmanheim on Dream, Truth, & Good

There's a critical (and interesting) question about how you generate the latent space of authors, and/or how it is inferred from the text. Did you have thoughts on how this would be done?

niplav on shortplav

Huh, cool. Intuitively, I'd expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless "kwiecień" or "kviten" are often misspelled as words with the prefix "kwiet". (I check with Google translate, which ~always translates "kwiet" as "quiet" for Slavic languages & Maltese, and as "flower" in Polish).

whestler on Historical mathematicians exhibit a birth order effect too

I'm surprised to see so little discussion of educational attainment and it's relation to birth order here. It seems that a lot of the discussion is around biological differences. Did I miss something?

Families may only have enough money to send one child to school or university, and this is commonly the first born. As a result, I'd expect to see a trend of more first-borns in academic fields like mathematics, as well as on LessWrong.

As a quick example to back up this hunch, this paper seems to reach the same conclusion:

https://www.sciencedirect.com/science/article/abs/pii/S0272775709001368

"birth order turns out to have a significant negative effect on educational attainment. This decline in years of schooling with birth order turns out to be approximately linear."

I'd be interested if the effect still exists if we control for educational attendance/ resources somehow.

niplav on shortplav

Yeah, definitely not the least likely trajectories, instead it's just the next token with the smallest probability. I was thinking of doing beam search with minimizing logits, but that looked difficult to implement. Still surprised that it produces things like prü|stor|oire| which are pretty pronounceable.

lsusr on List of most interesting ideas I encountered in my life, ranked

+1 to Taleb's Extremistan vs Mediocristan model