LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (10)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (6)

In Defense of Parselmouths
Screwtape · 2023-11-15T23:02:19.344Z · comments (10)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

[link] Contra Scott on Abolishing the FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-15T14:00:17.247Z · comments (3)

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (22)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

So You Created a Sociopath - New Book Announcement!
Garrett Baker (D0TheMath) · 2024-04-01T18:02:18.010Z · comments (3)

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-kokotajlo on avturchin's Shortform

Huh. If you pretend to throw the stone, does that mean you make a throwing motion with your arm, but just don't actually release the object you are holding? If so, how come they run away instead of e.g. cringing and expecting to get hit, and then not getting hit, and figuring that you missed and are now out of ammo?

Or does it mean you make menacing gestures as if to throw, but don't actually make the whole throwing motion?

avturchin on I turned decision theory problems into memes about trolleys

Can you make Trolley meme for Death in Damascus and Doomsday Argument?

Can prove that you can express any decision theory problem as some Trolley problem?

nathan-helm-burger on Three Notions of "Power"

Thinking a bit more about this, I might group types of power into:

Power through relating: Social/economic/government/negotiating/threatening, reshaping the social world and the behavior of others

Power through understanding: having intellect and knowledge affordances, being able to solve clever puzzles in the world to achieve aims

Power through control: having physical affordances that allow for taking potent actions, reshaping the physical world

They all bleed together at the edges and are somewhat fungible in various ways, but I think it makes sense to talk of clusters despite their fuzzy edges.

johnswentworth on Three Notions of "Power"

Human psychology, mainly. "Dominance"-in-the-human-intuitive-sense was in the original post mainly because I think that's how most humans intuitively understand "power", despite (I claimed) not being particularly natural for more-powerful agents. So I'd expect humans to be confused insofar as they try to apply those dominance-in-the-human-intuitive-sense intuitions to more powerful agents.

And like, sure, one could use a notion of "dominance" which is general enough to encompass all forms of conflict, but at that point we can just talk about "conflict" and the like without the word "dominance"; using the word "dominance" for that is unnecessarily confusing, because most humans' intuitive notion of "dominance" is narrower.

johnswentworth on Three Notions of "Power"

Because there'd be an unexploitable-equillibrium condition where a government that isn't focused on dominance is weaker than a government more focused on government, it would generally be held by those who have the strongest focus on dominance.

This argument only works insofar as governments less focused on dominance are, in fact, weaker militarily, which seems basically-false in practice in the long run. For instance, autocratic regimes just can't compete industrially with a market economy like e.g. most Western states today, and that industrial difference turns into a comprehensive military advantage with relatively moderate time and investment. And when countries switch to full autocracy, there's sometimes a short-term military buildup but they tend to end up waaaay behind militarily a few years down the road IIUC.

nathan-helm-burger on Three Notions of "Power"

The post seems to me to be about notions of power, and the affordances of intelligent agents. I think this is a relevant kind of power to keep in mind.

tailcalled on Three Notions of "Power"

What phenomenon are you modelling where this distinction is relevant?

nathan-helm-burger on Three Notions of "Power"

I think we're using different concepts of 'dominance' here. I usually think of 'dominance' as a social relationship between a strong party and a submissive party, a hierarchy. A relationship between a ruler and the ruled, or an abuser and abused. I don't think that a human driving a bulldozer which destroys an anthill without the human even noticing that the anthill existed is the same sort of relationship. I think we need some word other than 'dominant' to describe the human wiping out the ants in an instant without sparing them a thought. It doesn't particularly seem like a conflict even. The human in a bulldozer didn't perceive themselves to be in a conflict, the ants weren't powerful enough to register as an opponent or obstacle at all.

fabien-roger on The case for unlearning that removes information from LLM weights

I am not sure that it is over-conservative. If you have an HP-shaped that can easily be transformed in HP-data using fine-tuning, does it give you a high level of confidence that people misusing the model won't be able to extract the information from the HP-shaped hole or that a misaligned model won't be able to notice to HP-shaped hole and use that to answer to question to HP when it really wants to?

I think that it depends on the specifics of how you built the HP-shaped hole (without scrambling the information). I don't have a good intuition for what a good technique like that could look like. A naive thing that comes to mind would be something like "replace all facts in HP by their opposite" (if you had a magic fact-editing tool), but I feel like in this situation it would be pretty easy for an attacker (human misuse or misaligned model) to notice "wow all HP knowledge has been replaced by anti-HP knowledge" and then extract all the HP information by just swapping the answers.

tailcalled on Three Notions of "Power"

Except for the child and the blacksmith, all of these seem like dominance conflicts to me. The blacksmith plausibly becomes a dominance conflict too once you consider how he ended up with the resources and what tasks he's likely to face. You contrast these with conflicts between human groups, but I'd compare to e.g. a conflict between a drunk middle-aged loner who is looking for a brawl vs two young policemen and a bar owner.