LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (128)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (44)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (59)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (77)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (18)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (27)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (100)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (50)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (43)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (11)

OpenAI #12: Battle of the Board Redux
Zvi · 2025-03-31T15:50:02.156Z · comments (1)

The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (13)

Do models say what they learn?
Andy Arditi (andy-arditi) · 2025-03-22T15:19:18.800Z · comments (12)

How I've run major projects
benkuhn · 2025-03-16T18:40:04.223Z · comments (10)

Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)

New Cause Area Proposal
CallumMcDougall (TheMcDouglas) · 2025-04-01T07:12:34.360Z · comments (4)

2024 Unofficial LessWrong Survey Results
Screwtape · 2025-03-14T22:29:00.045Z · comments (28)

Downstream applications as validation of interpretability progress
Sam Marks (samuel-marks) · 2025-03-31T01:35:02.722Z · comments (1)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (14)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)

AI 2027: Responses
Zvi · 2025-04-08T12:50:02.197Z · comments (3)

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (29)

[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (22)

[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)

How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)

How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (3)

You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (12)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

next page (older posts) →

Archive

Recent comments

randaly on Comments on "AI 2027"

I want to emphasize that I'm criticizing "AI 2027"'s projection of R&D spending, i.e. this table [LW · GW]. If companies cut R&D spending, that falsifies the "AI 2027" forecast.

In particular, the comment I'm replying to proposed that while the current money would run out in ~2027, companies could raise more to continue expanding R&D spending. Raising money for 2028 R&D would need to occur in 2027; and it would need to occur on the basis of financial statements of at least a quarter before the raise. So in this scenario, they needs to slash R&D spending in 2027- something the "AI 2027" authors definitely don't anticipate.

Furthermore, your claim may itself be false. We lack sufficient breakdown of OpenAI's budget to be certain. My estimate from the post was that most AI companies have 75% cost of revenue; OpenAI specifically has a 20% revenue sharing agreement with Microsoft; and the remaining 5% needs to cover General and Administrative expenses. Depending on the exact percentage of salary and G&A expenses caused by R&D, it's plausible that OpenAI eliminating R&D entirely wouldn't make it profitable today. And in the future OpenAI will also need to pay interest on tens of billions in debt.

neel-nanda-1 on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

My understanding is that there was a separate image model in historical vlms like flamingo but that it passed on a vector representation of the image not text

samuelshadrach on [Letter] Chinese Quickstart

Thanks for taking time to reply!

Yes OpenAI realtime API is really cool. When speaking to realtime API, I start each sentence with two words indicating what I want it to do. It's clunky but it works. "Translate Chinese, what is the time?" "Reply Chinese, how are you?" Ideally yes I could write an app to prepend the instruction audio to each sentence.

If I had this as higher priority I'd actually want to setup this Twilio app.

jiro on A Dissent on Honesty

He asks “How interested are you in Widgets?” He has learnt from previous job interviews that, if he answers honestly, the interviewer will think he is any of lying, insane, or too weird to deal with, and not hire him, even though this is not in the best financial interests of the company, were they fully informed.

By the standard "intentionally or knowingly cause the other person to have false beliefs", answering 'honestly' would be lying, and answering in a toned down way would not (because it maximizes the truth of the belief that the interviewer gets).

nick_tarleton on Unbendable Arm as Test Case for Religious Belief

(I have successfully done Unbendable Arm after Valentine showed me in person, though he didn't explain any of the mechanics. My experience of it didn't involve visualization, but felt like placing my fingertips on the wall across the room and resolving that they'd stay there. Contra jimmy's comment [LW(p) · GW(p)], IIRC I initially held my arm wrong without any cueing.)

Strongly related: Believing In [LW · GW]. From that post:

My guess is that for lack of good concepts for distinguishing “believing in” from deception, LessWrongers, EAs, and “nerds” in general are often both too harsh on folks doing positive-sum “believing in,” and too lax on folks doing deception. (The “too lax” happens because many can tell there’s a “believing in”-shaped gap in their notions of e.g. “don’t say better things about your start-up than a reasonable outside observer would,” but they can’t tell its exact shape, so they loosen their “don’t deceive” in general.)

I feel like this post is similarly too lax on, not deception, but propositional-and-false religious beliefs.

jenn on jenn's Shortform

this week's meetup is on the train to crazy town [? · GW]. it was fun putting together all the readings and discussion questions, and i'm optimistic about how the meetup's going to turn out! (i mean, in general, i don't run meetups i'm not optimistic about, so i guess that's not saying much.) im slightly worried about some folks coming in and just being like "this metaphor is entirely unproductive and sucks", should consider how to frame the meetup productively to such folks.

i think one of my strengths as an organizer is that ive read sooooo much stuff and so its relatively easy for me to pull together cohesive readings for any meetup. but ultimately im not sure if it's like, the most important work, to e.g. put together a bibliography of the crazy town idea and its various appearances since 2021. still, it's fun to do.

elityre on Eli's shortform feed

For the same reasons 'training an agent on a constitution that says to care about ' does not, at arbitrary capability levels, produce an agent that cares about $x$

Ok, but I'm trying to ask why not.

Here's the argument that I would make for why not, followed by why I'm skeptical of it right now.

New options for the AI will open up at high capability levels that were not available at lower capability levels. This could in principle lead to undefined behavior that deviates from what we intended.

More specifically, if it's the case that if...

The best / easiest-for-SGD-to-find way to compute corrigible outputs (as evaluated by the AI) is to reinforce an internal proxy measure that is correlated with corrigibility (as evaluated by the AI) in distribution, instead of to reinforce circuits that implement corrigibility more-or-less directly.
When the AI gains new options unlocked by new advanced capabilities, that proxy measure comes apart from corrigibility (as evaluated by the AI), in the limit of capabilities, so that the poxy measure is almost uncorrelated with corrigibility

...then the resulting system will not end up corrigible.

(Is this the argument that you would give, or is there another reason why you expect that "training an agent on a constitution that says to care about $x$ ' does not, at arbitrary capability levels, produce an agent that cares about $x$ "?)

But, at the moment, I'm skeptical of the above line of argument for several reasons.

I'm skeptical of the first premise, that the best way that SGD can find to produce corrigible (as evaluated by the AI) is to reinforce a proxy measure.
- I understand that natural selection, when shaping humans for inclusive genetic fitness, instilled in them a bunch of proxy-drives. But I think this analogy is misleading in several ways.
- Most relevantly, there's a genetic bottleneck, so evolution could only shape human behavior by selecting over genomes, and genomes don't encode that much knowledge about the world. If humans were born into the world with detailed world models, that included the concept of inclusive genetic fitness baked in, evolution would absolutely shaped humans to be inclusive fitness maximizers. AIs are "born into the world" with expansive world models that already include concepts like corrigibility (indeed, if they didn't, Constitutional AI wouldn't work at all). So it would be surprising if SGD opted to reinforce proxy measures instead of relying on the concepts directly.
We would run the constitutional AI reinforcement process continuously, in parallel with the capability improvements from the RL training.
- AI's capabilities increase, it will gain new options. If the AI is steering based on proxy measures, some of those options will involved the proxy coming apart from the target of the proxy. But when that starts to happen, the constitutional AI loop will exert an optimization pressure on the AI's internals to hit the target, not just the proxies.

Is this the main argument? What are other reasons to think that 'training an agent on a constitution that says to care about $x$ ' does not, at arbitrary capability levels, produce an agent that cares about $x$ ?

lblack on Lucius Bushnaq's Shortform

Nope. Try it out. If you attempt to split the activation vector into 1050 vectors for animals + attributes, you can't get the dictionary activations to equal the feature activations , $c_{i}^{'} (x)$ .

kairos_ on Mo Putera's Shortform

I believe the Scramblers from blindsight weren’t self aware, which means they couldn’t think about their own interactions with the world.

As I recall the crew was giving one of the Scramblers a series of cognitive tests. It aced all the tests that had to do with numbers and spatial reasoning, but failed a test that required the testee to be self aware.

thane-ruthenis on johnswentworth's Shortform

Oh, if you're in the business of compiling a comprehensive taxonomy of ways the current AI thing may be fake, you should also add:

Vibe coders and "10x'd engineers", who (on this model) would be falling into one of the failure modes outlined here [LW · GW]: producing applications/features that didn't need to exist, creating pointless code bloat (which helpfully show up in productivity metrics like "volume of code produced" or "number of commits"), or "automatically generating" entire codebases in a way that feels magical, then spending so much time bugfixing them it eats up ~all perceived productivity gains.
e/acc and other Twitter AI fans, who act like they're bleeding-edge transhumanist visionaries/analysts/business gurus/startup founders, but who are just shitposters/attention-seekers who will wander off and never look back the moment the hype dies down.