LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (21)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (18)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

[link] Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas
jake_mendel · 2025-02-06T18:58:53.076Z · comments (0)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (39)

2024 Unofficial LessWrong Survey Results
Screwtape · 2025-03-14T22:29:00.045Z · comments (28)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (37)

My AGI safety research—2024 review, ’25 plans
Steven Byrnes (steve2152) · 2024-12-31T21:05:19.037Z · comments (4)

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (5)

Thread for Sense-Making on Recent Murders and How to Sanely Respond
Ben Pace (Benito) · 2025-01-31T03:45:48.201Z · comments (146)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (8)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (15)

New Cause Area Proposal
CallumMcDougall (TheMcDouglas) · 2025-04-01T07:12:34.360Z · comments (4)

Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)

Two hemispheres - I do not think it means what you think it means
Viliam · 2025-02-09T15:33:53.391Z · comments (21)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

You can just wear a suit
lsusr · 2025-02-26T14:57:57.260Z · comments (48)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (7)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (33)

[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (21)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (1)

Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

AI 2027: Responses
Zvi · 2025-04-08T12:50:02.197Z · comments (3)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (7)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (173)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (23)

[link] Steering Gemini with BiDPO
TurnTrout · 2025-01-31T02:37:55.839Z · comments (5)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (9)

Judgements: Merging Prediction & Evidence
abramdemski · 2025-02-23T19:35:51.488Z · comments (5)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on Cole Wyeth's Shortform

I thought the statement was pretty clearly not about the average lesswronger.

But in terms of the “call to action” - 20 was pretty conservative, so I think it’s still in that range, and doesn’t change the conclusions one should draw much.

purplehermann on MichaelDickens's Shortform

When you say ~zero value, do you mean hyperbolically dicounted or something more extreme?

purplehermann on Purplehermann's Shortform

I don't remeber who said it, but building AI isn't just about power dynamics or a bit of efficiency.

It's about whether humanity should keep doing things.

Civilization (feels like it?) stagnated and degraded for the last decades (the main technological upgrade being the cause of social degradation).

We haven't solved cancer, can't regrow limbs, people are unhealthy, commuting to work is unpleasant and work weeks are long. The list can go on.

Humans make tools do let them do better and more work. Humans even set up full automation of certain things. Now humans are looking to fully automate humans, perhaps because we don't believe in the human race. (I think EY and doomers generally are the same as the accelerationists, neither has faith in humanity).

What could humans make that would restore faith- faith that we could compete with AGIs, faith that we can get out of stagnation without replacing humans, faith that we can make the world of humans a better one?

A tech advance, an organizational efficiency advance, quality of life, something else?

viliam on Cole Wyeth's Shortform

That's an interesting idea. However, people who read this comments probably already have power much greater than the baseline -- a developed country, high intelligence, education, enough money and free time to read websites...

Not sure how many of those 20 doublings still remain.

cousin_it on “The Era of Experience” has an unsolved technical alignment problem

Thanks for the link! It's indeed very relevant to my question.

I have another question, maybe a bit philosophical. Humans seem to reward-hack in some aspects of value, but not in others. For example, if you offered a mathematician a drug that would make them feel like they solved Riemann's hypothesis, they'd probably refuse. But humans aren't magical: we are some combination of reinforcement learning, imitation learning and so on. So there's got to be some non-magical combination of these learning methods that would refuse reward hacking, at least in some cases. Do you have any thoughts what it could be?

viliam on Double's Shortform

the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.

The side effects of the agent failing might still kill us.

For example, the failure could be something like "build a huge device which with probability 20% enables faster-than-light travel (which would allow colonizing more galaxies), and with probability 80% causes false vacuum collapse or otherwise destroys the entire universe".

Or something on smaller scale, where the failure means blowing up the Earth, destroying all life, etc.

viliam on sarahconstantin's Shortform

the default way that people make their voices heard in politics these days is by stopping things or banning things or blocking things or slowing things down.

Maybe because it is easier to agree on a binary question ("should this be allowed or banned?") than on an open-ended one ("something should be done -- but what specifically?"). Give people a binary choice, and there is a chance that enough of them will agree. Give them an open-ended question, and most people will come with their own proposals, unwilling to support anyone else's proposal (unless they are allowed to do large modifications, which other people will oppose).

(Here an individualistic culture probably makes it worse, because coming with your own proposal is high-status.)

I guess most people have this experience, so they don't even try to make proposals to the public. Instead, if possible, they act alone, or with a small group of friends.

We can go around the neighborhood, show everybody the mockup and say, "Are you excited about us doing this to the park?" Then if we have a reasonable number of signatures on a petition, we get to build it.

I am often too pessimistic, but I would expect many people to say "no", for reasons including "no specific reason, it just sounds suspicious to me: why you? why now? is this perhaps some kind of scam?" or "I will only agree if you update your proposal to include <my pet peeve, completely unrelated to the project>", plus a few people saying "I don't give a fuck, so I will vote 'no' in principle (maybe try to bribe me if you want my 'yes')".

However, there are two situations near me where people somehow succeeded to build something for the community, so I should probably try to learn the details. In one case, it is a community garden: area between two garages was surrounded by a fence, and how there are tables and chairs, and about once in a month someone organizes some activities for kids there. In another case, in place of a former shop, a community center was set up. I think the latter is just one person's activity who someone got grant money to rent the place (maybe also made a non-profit for that purpose) so I would still kinda classify that as a pro-social grant-supported unilateral action. No idea how the former may have succeeded.

BTW, you seem impressed by George Church very much, because you linked his page 3 times. :D

lordwesquire on LordWesquire's Shortform

I agree. Intersex people are like the 1% of blorks. If one of them was mostly white with only a little black and wanted to identify as a white blork, that is a completely different situation than a 100% black blork using that as a justification for being able to identify as a white blork as well.

mateusz-baginski on ryan_greenblatt's Shortform

Eric Schwitzgebel has argued by disjunction/exhaustion for the necessity of craziness in the sense of "contrary to common sense and we are not epistemically compelled to believe it" at least in the context of philosophy of mind and cosmology.

https://faculty.ucr.edu/~eschwitz/SchwitzAbs/CrazyMind.htm

https://press.princeton.edu/books/hardcover/9780691215679/the-weirdness-of-the-world

lukedrago on The Intelligence Curse: an essay series

Following up with some resource curse literature that understands the problem as incentive misalignment:

On how state revenue sources shape institutional development and incentives, Karl (1997) writes,

"Thus the fate of oil-exporting countries must be understood in a context in which economies shape institutions and, in turn, are shaped by them. Specific modes of economic development, adapted in a concrete institutional setting, gradually transform political and social institutions in a manner that subsequently encourages or discourages productive outcomes. Because the causal arrow between economic development and institutional change constantly runs in both directions, the accumulated outcomes give form to divergent long-run national trajectories. Viewed in this vein, economic effects like the Dutch Disease become outcomes of particular institutional arrangements and not simply causes of economic decline. This deeper explanation is revealed in the relentless interaction between a mode of economic development and the political and social institutions it fosters.
[...]
How are frameworks for decision-making created and reproduced in late-developing countries? I argue that determining the "structuring principle" for these countries—that is, the appropriate starting point for identifying how ranges of choice are constructed—should begin with their leading sector. This means examining the export dependence that molds their economies, societies, and state institutional capacities, and that, in turn, is either reinforced or transformed by them. My effort to understand this set of interactions begins with differentiating the asset specificity, tax structure, and other features inherent in the exploitation of one particular commodity, petroleum. It terminates by examining the state, where the impact of particular economic models and
A central corollary of this argument is that countries dependent on the same export activity are likely to display significant similarities in the capacity of their states to guide development. In other words, countries dependent on mining should share certain properties of "stateness," especially their framework for decision-making and range of choice, even though their actual institutions are quite different in virtually all other respects. This should be true unless significant state building has occurred prior to the introduction of the export activity.
The specific mechanism for the creation of this institutional sameness lies in the origin of state revenues. It matters whether a state relies on taxes from extractive activities, agricultural production, foreign aid, remittances, or international borrowing because these different sources of revenues, whatever their relative economic merits or social import, have a powerful (and quite different) impact on the state's institutional development and its abilities to employ personnel, subsidize social and economic programs, create new organizations, and direct the activities of private interests. Simply stated, the revenues a state collects, how it collects them, and the uses to which it puts them define its nature. Thus it should not be surprising that states dependent on the same revenue source resemble each other in specific ways (and consequently so do the decisions made by their leaders)."

I'd note that Karl's argument has nearly 5,000 citations and is one of the most common (if not the dominant) explanations of the resource curse.

From Cooper (2002) Chapter 7:

"Oil can turn a gatekeeper state into a caricature of itself. Unlike agriculture, which involves vast numbers of people in the production and marketing of exports, oil requires little labor, and much of it from foreigners. It also entails relationships between the few global firms capable of extracting it and the state rulers who collect the rents. It defines a spigot economy: whoever controls access to the tap, collects the rent."

On the importance of taxing citizens to state development, Centeno (1997) notes:

"The key to the relationship between war and state making in Western Europe is what Finer (1975) calls the “extraction-coercion” cycle. [...] For the “extraction-coercion cycle” to begin, the relevant states must not have alternative sources of ﬁnancing while the domestic economy must be capable of sustaining the new ﬁscal and bureaucratic growth. Conﬂict-induced extraction will only occur if easier options are not available. Even then, the relevant societies might not be able to produce enough surplus to make the effort productive. Thus, for example, the availability of Latin American silver and the willingness of bankers to risk massive sums freed the Spanish Hapsburgs from imposing greater ﬁscal control over their provinces as a means to pay for their wars. Conversely, the relative scarcity of such external supports drove the expansion of the early English state."

On how non-taxation revenue inhibited state development in Latin America, and therefore did not follow Tilley's pattern of "war making states", Centeno (1997) argues:

"As in the European cases, war produced immediate deﬁcits, but with one prominent exception, the Latin American states did not respond to these with increased extractions, at least not in the form of domestic taxes. [...] If they could not borrow on international markets (as was the case from roughly 1830 to 1870), Latin American states could sell access to a commodity. Guano allowed Peru to become what Shane Hunt (1973) has called a “rentier state.” The availability of guano revenues retarded the development of the state by allowing it to exist without the remotest contact with the society on which it rested and without having to institute a more efﬁcient administrative machine. Guano did allow the removal of the regressive contribucion (in 1855), but it also permitted the state to avoid modernizing its ﬁscal structure while borrowing large amounts of money. A contemporary British observer (Markham 1883, p. 37; my emphasis) noted that “a wise government would have treated this source of revenues as temporary and extraordinary. The Peruvians looked upon it as if it was permanent, abolishing other taxes, and recklessly increasing expenditure.” Much like the guano bonanza in the Peruvian case, the conquest of nitrate territories allowed the Chilean state to expand without having to “penetrate” its society and confront the rampant inequality (Loveman 1979, p. 169; Sater 1986, p. 227). By 1900, nitrate and iodine were accounting for 50% of Chilean revenues and 14% of GDP (Mamalakis 1977, pp. 19–21; Sater 1986, p. 275)."

Happy to cite some more of the literature if it's helpful.