LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)

Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (0)

[link] Chess As The Model Game
criticalpoints · 2024-11-17T19:45:26.499Z · comments (0)

D/acc AI Security Salon
Allison Duettmann (allison-duettmann) · 2024-10-19T22:17:57.067Z · comments (0)

Announcing the CLR Foundations Course and CLR S-Risk Seminars
JamesFaville (elephantiskon) · 2024-11-19T01:18:10.085Z · comments (0)

[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)

Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (3)

Reality is Fractal-Shaped
silentbob · 2024-12-17T13:52:16.946Z · comments (1)

[link] GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng (ccstan99) · 2024-11-01T00:10:50.718Z · comments (0)

In the Name of All That Needs Saving
pleiotroth · 2024-11-07T15:26:12.252Z · comments (2)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)

Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (3)

[link] AI & Liability Ideathon
Kabir Kumar (kabir-kumar) · 2024-11-26T13:54:01.820Z · comments (2)

Word Spaghetti
Gordon Seidoh Worley (gworley) · 2024-10-23T05:39:20.105Z · comments (9)

[link] AI safety content you could create
Adam Jones (domdomegg) · 2025-01-06T15:35:56.167Z · comments (0)

[link] Can o1-preview find major mistakes amongst 59 NeurIPS '24 MLSB papers?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-18T14:21:03.661Z · comments (0)

[link] Genesis
PeterMcCluskey · 2024-12-31T22:01:17.277Z · comments (0)

[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)

Economic Post-ASI Transition
[deleted] · 2025-01-01T22:37:31.722Z · comments (11)

Latent Adversarial Training (LAT) Improves the Representation of Refusal
alexandraabbas · 2025-01-06T10:24:53.419Z · comments (6)

2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:46:18.674Z · comments (0)

Advisors for Smaller Major Donors?
jefftk (jkaufman) · 2024-11-06T14:30:06.187Z · comments (2)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (0)

Beliefs and state of mind into 2025
RussellThor · 2025-01-10T22:07:01.060Z · comments (7)

[link] A primer on machine learning in cryo-electron microscopy (cryo-EM)
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-22T15:11:58.860Z · comments (0)

Everything you care about is in the map
Tahp · 2024-12-17T14:05:36.824Z · comments (27)

A Collection of Empirical Frames about Language Models
Daniel Tan (dtch1997) · 2025-01-02T02:49:05.965Z · comments (0)

[question] What is the most impressive game LLMs can play well?
Cole Wyeth (Amyr) · 2025-01-08T19:38:18.530Z · answers+comments (3)

Proposal to increase fertility: University parent clubs
Fluffnutt (Pear) · 2024-11-18T04:21:26.346Z · comments (3)

Heresies in the Shadow of the Sequences
Cole Wyeth (Amyr) · 2024-11-14T05:01:11.889Z · comments (12)

Incredibow
jefftk (jkaufman) · 2025-01-07T03:30:02.197Z · comments (3)

[link] Building AI safety benchmark environments on themes of universal human values
Roland Pihlakas (roland-pihlakas) · 2025-01-03T04:24:36.186Z · comments (3)

Using Dangerous AI, But Safely?
habryka (habryka4) · 2024-11-16T04:29:20.914Z · comments (2)

Most Minds are Irrational
Davidmanheim · 2024-12-10T09:36:33.144Z · comments (4)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (16)

[link] some questionable space launch guns
bhauth · 2024-10-13T22:52:26.418Z · comments (0)

Computational functionalism probably can't explain phenomenal consciousness
EuanMcLean (euanmclean) · 2024-12-10T17:11:28.044Z · comments (34)

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
Alvin Ånestrand (alvin-anestrand) · 2025-01-10T16:22:16.905Z · comments (0)

Should you have children? All LessWrong posts about the topic
Sherrinford · 2024-11-26T23:52:44.113Z · comments (0)

Rebuttals for ~all criticisms of AIXI
Cole Wyeth (Amyr) · 2025-01-07T17:41:10.557Z · comments (11)

[link] We are in a New Paradigm of AI Progress - OpenAI's o3 model makes huge gains on the toughest AI benchmarks in the world
garrison · 2024-12-22T21:45:52.026Z · comments (3)

EC2 Scripts
jefftk (jkaufman) · 2024-12-10T03:00:01.906Z · comments (1)

Doing Sport Reliably via Dancing
Johannes C. Mayer (johannes-c-mayer) · 2024-12-20T12:06:59.517Z · comments (0)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

Historical Net Worth
jefftk (jkaufman) · 2024-12-07T23:10:01.519Z · comments (0)

Coin Flip
XelaP (scroogemcduck1) · 2024-12-27T11:53:01.781Z · comments (0)

Evolutionary prompt optimization for SAE feature visualization
neverix · 2024-11-14T13:06:49.728Z · comments (0)

[link] o3 is not being released to the public. First they are only giving access to external safety testers. You can apply to get early access to do safety testing
KatWoods (ea247) · 2024-12-20T18:30:44.421Z · comments (0)

[question] What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question?
avturchin · 2024-12-26T11:08:23.545Z · answers+comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

I reached out to them and they said pooling isn't possible.

charlie-steiner on Human-AI Complementarity: A Goal for Amplified Oversight

Thanks for the great reply :) I think we do disagree after all.

humans are definitionally the source of information about human values, even if it may be challenging to elicit this information from humans

Except about that - here we agree.

Now, what this human input looks like could (and probably should) go beyond introspection and preference judgments, which, as you point out, can be unreliable. It could instead involve expert judgment from humans with diverse cultural backgrounds, deliberation and/or negotiation, incentives to encourage deep, reflective thinking rather than snap judgments or falling back on heuristics. It could also involve AI assistance to help counter human biases, find common ground, and consider the logical consequences of communicated values.

This might be summarized as "If humans are inaccurate, let's strive to make them more accurate."

I think this, as a research priority or plan A, is doomed by a confluence of practical facts (humans aren't actually that consistent, even in what we'd consider a neutral setting) and philosophical problems (What if I think the snap judgments and heuristics are important parts of being human? And, how do you square a univariate notion of 'accuracy' with the sensitivity of human conclusions to semi-arbitrary changes to e.g. their reading lists, or the framings of arguments presented to them?).

Instead, I think our strategy should be "If humans are inconsistent and disagree, let's strive to learn a notion of human values that's robust to our inconsistency and disagreement."

We contend that even as AI gets really smart, humans ultimately need to be in the loop to determine whether or not a constitution is aligned and reasonable.

A committee of humans reviewing an AI's proposal is, ultimately, a physical system that can be predicted. If you have an AI that's good at predicting physical systems, then before it makes an important decision it can just predict this Committee(time, proposal) system and treat the predicted output as feedback on its proposal. If the prediction is accurate, then actual humans meeting in committee is unnecessary.

(And indeed, putting human control of the AI in the physical world actually exposes it to more manipulation than if the control is safely ensconced in the logical structure of the AI's decision-making.)

transhumanist_atom_understander on The Laws of Large Numbers

Well, usually I'm not inherently interested in a probability density function, I'm using it to calculate something else, like a moment or an entropy or something. But I guess I'll see what you use it for in future posts.

karl-krueger on johnswentworth's Shortform

I see a lot of discussion of AI doom stemming from research, business, and government / politics (including terrorism). Not a lot about AI doom from crime. Criminals don't stay in the box; the whole point of crime is to benefit yourself by breaking the rules and harming others. Intentional creation of intelligent cybercrime tools — ecosystems of AI malware, exploit discovery, spearphishing, ransomware, account takeovers, etc. — seems like a path to uncontrolled evolution of explicitly hostile AGI, where a maxim of "discover the rules; break them; profit" is designed-in.

sam-marks on Scaling Sparse Feature Circuit Finding to Gemma 9B

Good work! A few questions:

Where do the edges you draw come from? IIUC, this method should result in a collection of features but not say what the edges between them are.
IIUC, the binary masking technique here is the same as the subnetwork probing baseline from the ACDC paper, where it seemed to work about as well as ACDC (which in turn works a bit worse than attribution patching). Do you know why you're finding something different here? Some ideas:
1. The SP vs. ACDC comparison from the ACDC paper wasn't really apples-to-apples because ACDC pruned edges whereas SP pruned nodes (and kept all edges betwen non-pruned nodes IIUC). If Syed et al. had compared attribution patching on nodes vs. subnetwork probing, they would have found that subnetwork probing was better.
2. There's something special about SAE features which changes which subnetwork discovery technique works best.
  1. I'd be a bit interested in seeing your experiments repeated for finding subnetworks of neurons (instead of subnetworks of SAE features); does the comparison between attribution patching/integrated gradients and training a binary mask still hold in that case?

vladimir_nesov on The Golden Opportunity for American AI

The $25-40bn figure is an estimate for about 1 GW worth of GB200s. SemiAnalysis expects 1 GW training systems for Google in 2025 and something comparable for Microsoft/OpenAI. This is discussed by Dylan Patel publicly on Dwarkesh Podcast, claiming that there is a 300K B200s cluster and 500K-700K B200s worth of compute in total currently being constructed, possibly networked into a single training system. So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.

With Stargate, $100bn is still too much for the training systems of 2024-2025, so it's either not about what's being built in 2024-2025 at all, or a larger project that has current activities as part (which wouldn't fit building a big training system using a specific generation of hardware). Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer. The preliminary steps (land, power, permits, buildings) are much cheaper, but securing power and permits can require starting years in advance. So talking about a $100bn Stargate in 2024 is consistent with building it in late 2026, once there is a plot with 3-5 GW of power and datacenter permits, most of the expense will then be in 2026 (Nvidia Rubin probably).

rationalelf on Human takeover might be worse than AI takeover

I mean humans with strong AGIs under their control might function as if they don't need sleep, might become immortal, will probably build up superhuman protections from assasination, etc

benquo on Guilt, Shame, and Depravity

Different example - I said "instead"

If you look back, you'll see I was specifically responding to the hypothetical scenario about public admission in that comment. For your points about private shame, you might want to check my other comment replying to you [LW(p) · GW(p)] where I addressed how internal shame and self-image maintenance connect to social dynamics.

I notice you're attributing positions to me that I haven't taken and expressing confusion about points I've already addressed in detail. It would be helpful if you could engage more carefully with what I've carefully written.

so if the musician openly admits and apologize for only being average they are ashamed because they are afraid of the reaction of the fan who clearly loved their performance (not their failure to abstain from what they believe is the cause of their average performance?)

You're introducing new elements that weren't in your original scenario. But more importantly: you described the show as "a hit" where "everyone loves them." Calling this performance "only average" isn't revealing accurate adverse information - it's a lie.

but if they don't mention it to anyone (therefore are committing neither a dominance nor submission gesture) they are also ashamed?

In my other reply to you [LW(p) · GW(p)], I explained how private shame often involves maintaining conflicting mental models - one that enables confident performance and another that tracks specific flaws for improvement. Even when no one would directly know or care about staying up late drinking, the performer may feel shame because they've invested in an identity as a "professional musician" or "disciplined performer" - an identity that others care about and grant certain privileges to. The shame comes from violating the requirements of this identity, which serves as a proxy for social approval and professional opportunities. This creates internal pressure toward shame even without a specific idea of someone else who would directly condemn the behavior or trait in question.

Are you telling me there is no conceivable circumstance where any human being feels shame for something which is totally alone, none at all?

What I'm suggesting is that shame inherently involves at least a tacit social component - some imagined perspective by which we are condemned. This is consistent with Smith's and Hume's moral sentiments theory, where moral judgments fundamentally involve taking up imagined perspectives of others. This doesn't mean the shame isn't genuinely felt or that any specific others would actually condemn us. But in my experience people can frequently unravel particular cases of such shame by honestly examining what specific others would actually think if they knew, which is some experimental validation for this view.

lsusr on Open Thread Winter 2024/2025

Is anyone else on this website making YouTube videos? Less Wrong is great, but if you want to broadcast to a larger audience, video seems like the place to be. I know Rational Animations [LW · GW] makes videos. Is there anyone else? Are you, personally, making any?

gwern on The Golden Opportunity for American AI

Stargate was reported in 2024, and that reporting specified that the Stargate $100b phase hadn't started yet because MS was still building the previous phase, with "in excess of $115b" for all the phases, implying a large ramp up. And since Stargate was intended for OA, while MS of course has its own knitting to tend to, that implies much larger datacenter capex total. Given how vague the reporting is and how large the numbers are, but that the sooner the better, $80b in FY 2025 doesn't clearly tell me that there must be some mystery $25-40bn training system which is a big surprise. You don't build a Stargate overnight, and if it is to be finished and fully operational "as soon as 2028", you're going to need to be spending a lot of money 3 years beforehand.