LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A necessary Membrane formalism feature
ThomasCederborg · 2024-09-10T21:33:09.508Z · comments (6)

The Bar for Contributing to AI Safety is Lower than You Think
Chris_Leong · 2024-08-16T15:20:19.055Z · comments (1)

[link] [Linkpost] 'The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery'
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-15T21:32:59.979Z · comments (1)

[link] Four Randomized Control Trials In Economics
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-08T15:59:23.250Z · comments (1)

[question] What should we do about COVID in 2024?
ChristianKl · 2024-08-04T10:57:24.140Z · answers+comments (2)

Musings on Text Data Wall (Oct 2024)
Vladimir_Nesov · 2024-10-05T19:00:21.286Z · comments (2)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

[link] Does natural selection favor AIs over humans?
cdkg · 2024-10-03T18:47:43.517Z · comments (1)

My decomposition of the alignment problem
Daniel C (harper-owen) · 2024-09-02T00:21:08.359Z · comments (22)

Simon DeDeo on Explore vs Exploit in Science
Elizabeth (pktechgirl) · 2024-09-10T03:40:08.311Z · comments (0)

What program structures enable efficient induction?
Daniel C (harper-owen) · 2024-09-05T10:12:14.058Z · comments (4)

[question] What are the best resources for building gears-level models of how governments actually work?
adamShimi · 2024-08-19T14:05:02.590Z · answers+comments (6)

[link] Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority
Zach Stein-Perlman · 2024-10-28T17:00:18.660Z · comments (3)

Scaling Laws and Likely Limits to AI
Davidmanheim · 2024-08-18T17:19:46.597Z · comments (0)

Why I'm bearish on mechanistic interpretability: the shards are not in the network
tailcalled · 2024-09-13T17:09:25.407Z · comments (40)

D/acc AI Security Salon
Allison Duettmann (allison-duettmann) · 2024-10-19T22:17:57.067Z · comments (0)

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

[link] To Be Born in a Bag
Niko_McCarty (niko-2) · 2024-10-06T17:21:00.605Z · comments (1)

Tokenized SAEs: Infusing per-token biases.
tdooms · 2024-08-04T09:17:46.755Z · comments (20)

Why Reflective Stability is Important
Johannes C. Mayer (johannes-c-mayer) · 2024-09-05T15:28:19.913Z · comments (2)

Ten counter-arguments that AI is (not) an existential risk (for now)
Ariel Kwiatkowski (ariel-kwiatkowski) · 2024-08-13T22:35:15.341Z · comments (5)

Economics Roundup #4
Zvi · 2024-10-15T13:20:06.923Z · comments (4)

Announcing the PIBBSS Symposium '24!
DusanDNesic · 2024-09-03T11:19:47.568Z · comments (0)

Can Large Language Models effectively identify cybersecurity risks?
emile delcourt (emile-delcourt) · 2024-08-30T20:20:21.345Z · comments (0)

Word Spaghetti
Gordon Seidoh Worley (gworley) · 2024-10-23T05:39:20.105Z · comments (9)

[question] How great is the utility of "saving" endangered languages?
SpectrumDT · 2024-08-20T13:14:32.895Z · answers+comments (29)

[link] Should Sports Betting Be Banned?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-21T14:13:35.404Z · comments (2)

Finding Deception in Language Models
Esben Kran (esben-kran) · 2024-08-20T09:42:13.060Z · comments (4)

Housing Roundup #10
Zvi · 2024-10-29T13:50:09.416Z · comments (2)

[link] Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis · 2024-10-28T19:45:07.571Z · comments (2)

Avoiding the Bog of Moral Hazard for AI
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-13T21:24:34.137Z · comments (12)

"Real AGI"
Seth Herd · 2024-09-13T14:13:24.124Z · comments (18)

Rabin's Paradox
Charlie Steiner · 2024-08-14T05:40:25.572Z · comments (40)

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (16)

[link] Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T13:23:27.033Z · comments (11)

A short project on Mamba: grokking & interpretability
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-10-18T16:59:45.314Z · comments (0)

[question] Is this voting system strategy proof?
Donald Hobson (donald-hobson) · 2024-09-06T20:44:46.691Z · answers+comments (9)

"Which Future Mind is Me?" Is a Question of Values
dadadarren · 2024-08-09T18:17:09.884Z · comments (12)

[link] Will we ever run out of new jobs?
Kevin Kohler (KevinKohler) · 2024-08-19T15:04:03.849Z · comments (7)

[link] Four Levels of Voting Methods
hive · 2024-09-26T18:15:00.565Z · comments (3)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

Is Text Watermarking a lost cause?
egor.timatkov · 2024-10-01T16:20:51.113Z · comments (13)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

[link] some questionable space launch guns
bhauth · 2024-10-13T22:52:26.418Z · comments (0)

[link] Instruction Following without Instruction Tuning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-24T13:49:09.078Z · comments (0)

Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-08-24T07:39:00.057Z · comments (0)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (7)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

Review: “The Case Against Reality”
David Gross (David_Gross) · 2024-10-29T13:13:29.643Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-kokotajlo on avturchin's Shortform

Huh. If you pretend to throw the stone, does that mean you make a throwing motion with your arm, but just don't actually release the object you are holding? If so, how come they run away instead of e.g. cringing and expecting to get hit, and then not getting hit, and figuring that you missed and are now out of ammo?

Or does it mean you make menacing gestures as if to throw, but don't actually make the whole throwing motion?

avturchin on I turned decision theory problems into memes about trolleys

Can you make Trolley meme for Death in Damascus and Doomsday Argument?

Can prove that you can express any decision theory problem as some Trolley problem?

nathan-helm-burger on Three Notions of "Power"

Thinking a bit more about this, I might group types of power into:

Power through relating: Social/economic/government/negotiating/threatening, reshaping the social world and the behavior of others

Power through understanding: having intellect and knowledge affordances, being able to solve clever puzzles in the world to achieve aims

Power through control: having physical affordances that allow for taking potent actions, reshaping the physical world

They all bleed together at the edges and are somewhat fungible in various ways, but I think it makes sense to talk of clusters despite their fuzzy edges.

johnswentworth on Three Notions of "Power"

Human psychology, mainly. "Dominance"-in-the-human-intuitive-sense was in the original post mainly because I think that's how most humans intuitively understand "power", despite (I claimed) not being particularly natural for more-powerful agents. So I'd expect humans to be confused insofar as they try to apply those dominance-in-the-human-intuitive-sense intuitions to more powerful agents.

And like, sure, one could use a notion of "dominance" which is general enough to encompass all forms of conflict, but at that point we can just talk about "conflict" and the like without the word "dominance"; using the word "dominance" for that is unnecessarily confusing, because most humans' intuitive notion of "dominance" is narrower.

johnswentworth on Three Notions of "Power"

Because there'd be an unexploitable-equillibrium condition where a government that isn't focused on dominance is weaker than a government more focused on government, it would generally be held by those who have the strongest focus on dominance.

This argument only works insofar as governments less focused on dominance are, in fact, weaker militarily, which seems basically-false in practice in the long run. For instance, autocratic regimes just can't compete industrially with a market economy like e.g. most Western states today, and that industrial difference turns into a comprehensive military advantage with relatively moderate time and investment. And when countries switch to full autocracy, there's sometimes a short-term military buildup but they tend to end up waaaay behind militarily a few years down the road IIUC.

nathan-helm-burger on Three Notions of "Power"

The post seems to me to be about notions of power, and the affordances of intelligent agents. I think this is a relevant kind of power to keep in mind.

tailcalled on Three Notions of "Power"

What phenomenon are you modelling where this distinction is relevant?

nathan-helm-burger on Three Notions of "Power"

I think we're using different concepts of 'dominance' here. I usually think of 'dominance' as a social relationship between a strong party and a submissive party, a hierarchy. A relationship between a ruler and the ruled, or an abuser and abused. I don't think that a human driving a bulldozer which destroys an anthill without the human even noticing that the anthill existed is the same sort of relationship. I think we need some word other than 'dominant' to describe the human wiping out the ants in an instant without sparing them a thought. It doesn't particularly seem like a conflict even. The human in a bulldozer didn't perceive themselves to be in a conflict, the ants weren't powerful enough to register as an opponent or obstacle at all.

fabien-roger on The case for unlearning that removes information from LLM weights

I am not sure that it is over-conservative. If you have an HP-shaped that can easily be transformed in HP-data using fine-tuning, does it give you a high level of confidence that people misusing the model won't be able to extract the information from the HP-shaped hole or that a misaligned model won't be able to notice to HP-shaped hole and use that to answer to question to HP when it really wants to?

I think that it depends on the specifics of how you built the HP-shaped hole (without scrambling the information). I don't have a good intuition for what a good technique like that could look like. A naive thing that comes to mind would be something like "replace all facts in HP by their opposite" (if you had a magic fact-editing tool), but I feel like in this situation it would be pretty easy for an attacker (human misuse or misaligned model) to notice "wow all HP knowledge has been replaced by anti-HP knowledge" and then extract all the HP information by just swapping the answers.

tailcalled on Three Notions of "Power"

Except for the child and the blacksmith, all of these seem like dominance conflicts to me. The blacksmith plausibly becomes a dominance conflict too once you consider how he ended up with the resources and what tasks he's likely to face. You contrast these with conflicts between human groups, but I'd compare to e.g. a conflict between a drunk middle-aged loner who is looking for a brawl vs two young policemen and a bar owner.