LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Garden of Eden
Alexander Turok · 2024-07-22T16:07:42.509Z · comments (2)

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs
Winnie Yang (winnie-yang) · 2024-08-22T07:32:07.600Z · comments (1)

aisafety.info, the Table of Content
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-31T13:57:15.916Z · comments (1)

[question] Are there high-quality surveys available detailing the rates of polyamory among Americans age 18-45 in metropolitan areas in the United States?
Evan_Gaensbauer · 2024-01-18T23:50:52.053Z · answers+comments (0)

Announcing Convergence Analysis: An Institute for AI Scenario & Governance Research
David_Kristoffersson · 2024-03-07T21:37:00.526Z · comments (1)

Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization
Jacob Dunefsky (jacob-dunefsky) · 2024-01-14T02:06:00.290Z · comments (0)

From the outside, American schooling is weird
Jacob G-W (g-w1) · 2024-03-28T22:45:30.485Z · comments (4)

Less Anti-Dakka
Mateusz Bagiński (mateusz-baginski) · 2024-05-31T09:07:10.450Z · comments (5)

[link] The unreasonable effectiveness of plasmid sequencing as a service
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-08T02:02:55.352Z · comments (2)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

Rashomon - A newsbetting site
ideasthete · 2024-10-15T18:15:02.476Z · comments (8)

[link] A Defense of Peer Review
Niko_McCarty (niko-2) · 2024-10-22T16:16:49.982Z · comments (1)

Boring & straightforward trauma explanation
lemonhope (lcmgcd) · 2024-11-08T09:45:19.486Z · comments (7)

Launching Adjacent News
Lucas Kohorst (lucas-kohorst) · 2024-10-16T17:58:10.289Z · comments (0)

Evolution's selection target depends on your weighting
tailcalled · 2024-11-19T18:24:53.117Z · comments (22)

Complete Feedback
abramdemski · 2024-11-01T16:58:50.183Z · comments (7)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

[link] Managing Emotional Potential Energy
adamShimi · 2024-07-10T18:20:45.640Z · comments (4)

I Want XMP But I Know Why I Can't Have It
jefftk (jkaufman) · 2024-01-19T15:30:07.492Z · comments (0)

[link] Libs vs Frameworks, Middle-Level Regularities vs Theories
adamShimi · 2024-07-04T19:01:59.440Z · comments (0)

Disentangling Competence and Intelligence
Robert Kralisch (nonmali-1) · 2024-04-29T00:12:50.779Z · comments (7)

[question] Money Pump Arguments assume Memoryless Agents. Isn't this Unrealistic?
Dalcy (Darcy) · 2024-08-16T04:16:23.159Z · answers+comments (6)

[LDSL#2] Latent variable models, network models, and linear diffusion of sparse lognormals
tailcalled · 2024-08-09T19:57:56.122Z · comments (2)

[link] 11 diceware words is enough
DanielFilan · 2024-02-15T00:13:43.420Z · comments (6)

Blessed information, garbage information, cursed information
tailcalled · 2024-04-18T16:56:17.370Z · comments (8)

Trying to align humans with inclusive genetic fitness
peterbarnett · 2024-01-11T00:13:29.487Z · comments (5)

Making the "stance" explicit
NicholasKees (nick_kees) · 2024-02-16T23:57:11.265Z · comments (3)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

AI #77: A Few Upgrades
Zvi · 2024-08-20T00:20:09.717Z · comments (3)

Bent or Blunt Hoods?
jefftk (jkaufman) · 2024-01-09T20:10:11.545Z · comments (0)

Tend to your clarity, not your confusion
Severin T. Seehrich (sts) · 2024-03-11T15:09:24.099Z · comments (1)

[link] [Talk transcript] What “structure” is and why it matters
Alex_Altair · 2024-07-25T15:49:00.844Z · comments (0)

Inducing human-like biases in moral reasoning LMs
Artyom Karpov (artkpv) · 2024-02-20T16:28:11.424Z · comments (3)

Луна Лавгуд и Комната Тайн, Часть 1
Kongo Landwalker (kongo-landwalker) · 2024-05-26T22:17:17.137Z · comments (0)

[link] How are voluntary commitments on vulnerability reporting going?
Adam Jones (domdomegg) · 2024-02-22T08:43:56.996Z · comments (1)

[link] The natural boundaries between people
Chipmonk · 2024-02-23T01:09:28.592Z · comments (2)

[link] Masculinity—A Case For Courage
James Stephen Brown (james-brown) · 2024-06-04T00:04:48.411Z · comments (0)

Text Posts from the Kids Group: 2019
jefftk (jkaufman) · 2024-06-23T13:20:01.495Z · comments (0)

GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
Charlie George (charlie-george) · 2024-08-27T20:44:08.683Z · comments (7)

[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)

Is it justifiable for non-experts to have strong opinions about Gaza?
Yair Halberstadt (yair-halberstadt) · 2024-01-08T17:31:21.934Z · comments (12)

AXRP Episode 34 - AI Evaluations with Beth Barnes
DanielFilan · 2024-07-28T03:30:07.192Z · comments (0)

[link] [EA xpost] The Rationale-Shaped Hole At The Heart Of Forecasting
dschwarz · 2024-04-02T17:40:44.278Z · comments (2)

[link] Increasing IQ by 10 Points is Possible
George3d6 · 2024-03-19T20:48:41.277Z · comments (50)

Invest in ACX Grants projects!
Saul Munn (saul-munn) · 2024-03-06T20:27:04.616Z · comments (0)

Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research.
sevdeawesome · 2024-07-01T05:50:49.498Z · comments (0)

Interpretability: Integrated Gradients is a decent attribution method
Lucius Bushnaq (Lblack) · 2024-05-20T17:55:22.893Z · comments (7)

3a. Towards Formal Corrigibility
Max Harms (max-harms) · 2024-06-09T16:53:45.386Z · comments (2)

[LDSL#3] Information-orientation is in tension with magnitude-orientation
tailcalled · 2024-08-10T21:58:27.659Z · comments (2)

Monthly Roundup #21: August 2024
Zvi · 2024-08-20T00:20:08.178Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mondsemmel on Repeal the Jones Act of 1920

I don't know, I've been reading a lot of Slow Boring and Nate Silver, and to me this just really doesn't seem to remotely describe how the Trump coalition works. Beginning with the idea that there are powerful party elites whose opinion Trump has to care about, rather than the other way around.

Like, the fact that Trump moderated the entire party on abortion and entitlement cuts seems like pretty strong evidence against that idea, as well. Or, Trump's recent demand that the US Senate should confirm his appointees via recess appointments, similarly really does not strike me as Trump caring about what party elites think.

My model is more like, both Trump and party elites care about what their base thinks, and Trump can mobilize the base better (but not perfectly) than the party elites can, so Trump has a stronger position in that power dynamic. And isn't that how he won the 2016 primary in the first place? He ran as a populist, so of course party elites did not want him to win, since the whole point of populist candidates is that they're less beholden to elites. But he won, so now those elites mostly have to acquiesce.

All that said, to get back to the Jones Act thing: if Trump somehow wanted it repealed, that would have to happen via an act of Congress, so at that point he would obviously need votes in the US House and Senate. But that could in principle (though not necessarily in practice) happen on a bipartisan vote, too.

lemonhope on lemonhope's Shortform

Someone agreed to give some starter funding for an AI safety research funding thing with fully anonymous donors & recipients but only if someone reputable would watch the wallet and make sure I'm not stealing. So you would basically check that each recipient is a real person and not me. Any volunteers?

donatas-luciunas on Alignment is not intelligent

A little thought experiment.

Imagine there is an agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year's Eve to produce paperclips. The agent has only one action available to him - start paperclip factory. The factory starts producing paperclips 6 hours after it is started.

When will the agent start the paperclip factory? 2024-12-31 18:00? 2025-01-01 00:00? Now? Some other time?

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

QwQ-32B-Preview was released open-weights, seems comparable to o1-preview. Unless they're gaming the benchmarks, I find it both pretty impressive and quite shocking that a 32B model can achieve this level of performance. Seems like great news vs. opaque (e.g in one-forward-pass) reasoning. Less good with respect to proliferation (there don't seem to be any [deep] algorithmic secrets), misuse and short timelines.

seth-herd on Hierarchical Agency: A Missing Piece in AI Alignment

Are you working on this because you expect our first AGIs to be such hierarchical systems of subagents?
1. Or because you expect systems in which AGIs supervise subagents?
In either case, isn't the key question still whether the agent(s) at the top of the hierarchy are aligned?
In other areas of complex systems (economics, politics and nations, and notably psychology), mathematical formulations address sub-parts of the systems, but typically are not relied on for an overall analysis. Instead, understanding complex systems requires integrating a number of tools for understanding different parts, levels, and aspects of the system.
1. I worry that the cultural foundations of AI alignment bias the people most serious about it to focus excessively on mathematical/formal approaches.

sunwillrise on Repeal the Jones Act of 1920

But rather that if he doesn't do it, it will be because he doesn't want to, not because his constituents don't.

I generally prefer not to dive into the details of partisan politics on LW, but my reading of the comment you are responding to makes me believe that, by "Republicans under his watch", ChristianKl is referring to Republican politicians/executive appointees and not to Republican voters.

I am not saying I agree with this perspective, just that it seems to make a bit more sense to me in context. The idea would be that Trump has been able to use "leadership" to remake the Republican party in his image and get the party elites to support him only because he has mostly governed as a standard conservative Republican on economic issues (tax cuts for rich people&corporations, attempts to repeal the ACA, deregulation, etc); the symbiotic relationship they enjoy would therefore supposedly have as a prerequisite the idea that Trump would not try to enforce idiosyncratic views on other Republicans too much...

seth-herd on Hierarchical Agency: A Missing Piece in AI Alignment

I strongly support you using this format if it helps you share your thinking. It sounds like we wouldn't be seeing this any time soon without the interview format. It's interesting and I might try it. And I encourage others to do so if helps them share sooner or more efficiently.

Along those lines, I strongly encourage any sort of AI-assisted writing as long as the central ideas are human-generated or at least thoroughly thought-through and endorsed by the human posting them.

This post and every post longer than two paragraphs would really benefit from some sort of summary or TLDR so people can prioritize properly.

Questions/thoughts on the comment posted separately.

daystareld on You are not too "irrational" to know your preferences.

Makes sense! I probably will have time to dedicate to do this properly over the next few days, but maybe after that.

steve2152 on Counting AGIs

Yeah it’s fine to assume that there might be some period of time that (1) the AGIs don’t escape control, (2) the code doesn’t leak or get stolen, (3) nobody else reinvents the same thing, (4) Company A doesn’t have infinite capital (yet) to spend on renting cloud compute (or the contracts haven’t yet been signed or whatever). And it’s fine to be curious about how many AGIs would Company A have available during this period of time.
We think that period might be substantial, for reasons discussed in Section II.

I don’t think Section II is related to that. Again, the question I’m asking is How long is the period where an already-existing AGI model type / training approach is only running on the compute already owned by the company that made that AGI, rather than on most of the world’s then-existing compute? If I compare that question to the considerations that you bring up in Section II, they seem almost entirely irrelevant, right? I’ll go through them:

Plateau: There may be unexpected development plateaus that come into effect at around human-level intelligence. These plateaus could be architecture-specific (scaling laws break down; getting past AGI requires something outside the deep learning paradigm) or fundamental to the nature of machine intelligence.

That doesn’t prevent any of those four things I mentioned: it doesn’t prevent (1) the AGIs escaping control and self-reproducing, nor (2) the code / weights leaking or getting stolen, nor (3) other companies reinventing the same thing, nor (4) the AGI company (or companies) having an ability to transform compute into profits at a wildly higher exchange rate than any other compute customer, and thus making unprecedented amounts of money off their existing models, and thus buying more and more compute to run more and more copies of their AGI (e.g. see the “Everything, Inc.” scenario of §3.2.4 here [LW · GW]).

Pause: Government intervention could pause frontier AI development. Such a pause could be international. It is plausible that achieving or nearly achieving an AGI system would constitute exactly the sort of catalyzing event that would inspire governments to sharply and suddenly restrict frontier AI development.

That definitely doesn’t prevent (1) or (2), and it probably doesn’t prevent (3) or (4) either depending on implementation details.

Collapse: Advances in AI are dependent on the semiconductor industry, which is composed of several fragile supply chains. A war between China and Taiwan is considered reasonably possible by experts and forecasters. Such an event would dramatically disrupt the semiconductor industry (not to mention the world economy). If this happens around the time that AGI is first developed, AI capabilities could be artificially suspended at human-level for years while computer chip supply chains and AI firms recover.

That doesn’t prevent any of (1,2,3,4). Running an already-existing AGI model on the world’s already-existing stock of chips is unrelated to how many new chips are being produced. And war is not exactly a time when governments tend to choose caution and safety over experimenting with powerful new technologies at scale. Likewise, war is a time when rival countries are especially eager to steal each other’s military-relevant IP.

Abstention: Many frontier AI firms appear to take the risks of advanced AI seriously, and have risk management frameworks in place (see those of Google DeepMind, OpenAI, and Anthropic). Some contain what Holden Karnofsky calls if-then commitments: “If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time.” Commitments to pause further development may kick at human-level capabilities. AGI firms might avoid recursive self-improvement to avoid existential or catastrophic risks.

That could be relevant to (1,2,4) with luck. As for (3), it might buy a few months, before Meta and the various other firms and projects that are extremely dismissive of the risks of advanced AI catch up to the front-runners.

Windup: There are hard-to-reduce windup times in the production process of frontier AI models. For example, a training run for future systems may run into the hundreds of billions of dollars, consuming vast amounts of compute and taking months of processing. Other bottlenecks, like the time it takes to run ML experiments, might extend this windup period.

That doesn’t prevent any of (1,2,3,4). Again, we’re assuming the AGI already exists, and discussing how many servers will be running copies of it, and how soon. The question of training next-generation even-more-powerful AGIs is irrelevant to that question. Right?

sharmake-farah on The Queen’s Dilemma: A Paradox of Control

Sometimes, the answer is actually yes.