LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (34)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

Important open problems in voting
Closed Limelike Curves · 2024-07-01T02:53:44.690Z · comments (1)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

[link] Vacuum: Theory and Technologies
ethanmorse · 2024-01-21T17:23:49.257Z · comments (0)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abe-frei-pearson on Big tech transitions are slow (with implications for AI)

This argument seems to be a one by analogy. steam engine:industrial revolution::???:machine learning. But as you can see there's a term in the analogy I don't understand. Is ??? chatgpt? LLMs? Transformers? AlexNet? The internet? Digital computers? Something that hasn't yet been invented?

niplav on Zach Stein-Perlman's Shortform

Two others that come to mind:

Metaculus (used to be better though)
lobste.rs (quite specialized)
Quanta Magazine has some good comments, e.g. this article has the original researcher showing up & clarifying some questions in the comments

notfnofn on A Logical Proof for the Emergence and Substrate Independence of Sentience

Proof is a really strong word and (in my opinion) inappropriate in this context. This is about to become an extremely important question and we should be careful to avoid overconfidence. I've personally found this comment chain [LW(p) · GW(p)] to be an enlightening discussion on the complexity of this issue (but of course this is something that has been discussed endlessly elsewhere).

As a separate issue, let's say I write down the rule set for an automaton that will slowly grow and eventually emulate every finite string of finite grids of black and white pixels. This is not hard to do. Does it require a substrate to become conscious or is the rule set itself conscious? What if I actually run it in a corner of the universe that slowly uses surrounding matter to allow its output to grow larger and larger?

anthonyc on Big tech transitions are slow (with implications for AI)

There are also many ways the max likelihood model could be consistent with very rapid near-term change, too.

One is that, like in past transitions, the faster growth isn't an exponential, it gets faster and then eventually peters out, like any s-curve. If look at the world from 1700 to now, the industrial revolution is a sum of many individual such curves, but even so, the fastest years/decades of growth globally were ~50x faster than the slowest. If you shorten 1000x growth down to a couple of decades and assume a similar distribution of growth rates, then it matters a whole lot whether 2024 is year 1, year 5, or what. We could be 7 years into a two decade transition that began with transformer architecture, or two decades into a fifty year transition that started with some other machine learning advance, and those would be consistent with both the OP and "Things are about to move ridiculously fast."

In other words: Sustained faster-than-population economic growth didn't show up in Britain until a century or so into the industrial revolution began, peak global growth was a century or so after that, and in recent years the largest remaining countries have been catching up even faster than that even while growth in the UK and US and EU are slower than past peaks. If this were transitional year 7 of 20, and peak growth in the industrial revolution was 5-10%/yr, and this transition is 10x faster, than it's plausible to expect 1 year economic doubling times in each of several years between now and the early 2030s.

The OP seems to assume we're in year 1 or so out of 20-50, and that the most significant or fastest changes will happen near the end of that window. I'm not quite sure why I should agree with those assumptions.

sharmake-farah on Jimrandomh's Shortform

The issue is that all cryptography depends on one-way functions, so any ability to break a cryptographic algorithm that depends on one-way functions in a scalable way means you have defeated almost all of cryptography in practice.

So in one sense, a mathematical advance on a one-way function underlying a symmetric key algorithm would be disastrous for overall cryptographic prospects.

olli-savolainen on johnswentworth's Shortform

So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you'd be able to avoid that loss, which is meaningful! ...in a tiny fraction of all embryos. On average, you'd just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.

That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.

measure on Claude Sonnet 3.5.1 and Haiku 3.5

If you're reading this direct, this text is the last one that is wise like what's written between.

This sounds like it tried to encode something steganographically in the message? Maybe that accounts for some of the bizarre language.

cole-wyeth on Cole Wyeth's Shortform

I'm starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List [LW · GW]. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added - also some people outside of lesswrong are interested.

akash-wasil on Anthropic rewrote its RSP

Henry from SaferAI claims that the new RSP is weaker and vaguer than the old RSP. Do others have thoughts on this claim? (I haven't had time to evaluate yet.)

Main Issue: Shift from precise definitions to vague descriptions.
The primary issue lies in Anthropic's shift away from precisely defined capability thresholds and mitigation measures. The new policy adopts more qualitative descriptions, specifying the capability levels they aim to detect and the objectives of mitigations, but it lacks concrete details on the mitigations and evaluations themselves. This shift significantly reduces transparency and accountability, essentially asking us to accept a "trust us to handle it appropriately" approach rather than providing verifiable commitments and metrics.

More from him:

Example: Changes in capability thresholds.
To illustrate this change, let's look at a capability threshold:

1️⃣ Version 1 (V1): AI Security Level 3 (ASL-3) was defined as "The model shows early signs of autonomous self-replication ability, as defined by a 50% aggregate success rate on the tasks listed in [Appendix on Autonomy Evaluations]."

2️⃣ Version 2 (V2): ASL-3 is now defined as "The ability to either fully automate the work of an entry-level remote-only researcher at Anthropic, or cause dramatic acceleration in the rate of effective scaling" (quantified as an increase of approximately 1000x in a year).

In V2, the thresholds are no longer defined by quantitative benchmarks. Anthropic now states that they will demonstrate that the model's capabilities are below these thresholds when necessary. However, this approach is susceptible to shifting goalposts as capabilities advance.

🔄 Commitment Changes: Dilution of mitigation strategies.
A similar trend is evident in their mitigation strategies. Instead of detailing specific measures, they focus on mitigation objectives, stating they will prove these objectives are met when required. This change alters the nature of their commitments.

💡 Key Point: Committing to robust measures and then diluting them significantly is not how genuine commitments are upheld.
The general direction of these changes is concerning. By allowing more leeway to decide if a model meets thresholds, Anthropic risks prioritizing scaling over safety, especially as competitive pressures intensify.

I was expecting the RSP to become more specific as technology advances and their risk management process matures, not the other way around.

gunnar_zarncke on [Intuitive self-models] 2. Conscious Awareness

I mean this (my summary of the Libet experiments and their replications):

Brain activity detectable with EEG (Readiness Potential) begins between 350 and multiple seconds (depending on experiment and measurement resolution) before the person consciously feels the intention to act (voluntary motor movement).
Subjects report becoming aware of their intention to act (via clock tracking) about 200 ms before the action itself (e.g., pressing a button). 200ms seems relatively fixed, but cognitive load can delay.

To give a specific quote:

Matsuhashi and Hallet: Our result suggests that the perception of intention rises through multiple levels of awareness, starting just after the brain initiates movement.
[...]
1. The first detected event in most subjects was the onset of BP. They were not aware of the movement genesis at this time, even if they were alerted by tones.
2. As the movement genesis progressed, the awareness state rose higher and after the T time, if the subjects were alerted, they could consciously access awareness of their movement genesis as intention. The late BP began within this period.
3. The awareness state rose even higher as the process went on, and at the W time it reached the level of meta-awareness without being probed. In Libet et al’s clock task, subjects could memorize the clock position at this time.
4. Shortly after that, the movement genesis reached its final point, after which the subjects could not veto the movement any more (P time).
[...]
We studied the immediate intention directly preceding the action. We think it best to understand movement genesis and intention as separate phenomena, both measurable. Movement genesis begins at a level beyond awareness and over time gradually becomes accessible to consciousness as the perception of intention.

Now, I think you'd say that what they measured wasn't S(A) but something else that is causally related, but then you are moving farther away from patterns we can observe in the brain. And your theory still has to explain the subclass of those S(A) that they did measure. The participants apparently thought these to be their decisions S(A) about their actions A.