LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
TurnTrout · 2025-01-16T02:14:35.098Z · comments (3)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

[link] Paper: Open Problems in Mechanistic Interpretability
Lee Sharkey (Lee_Sharkey) · 2025-01-29T10:25:54.727Z · comments (0)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (7)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (8)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

Some lessons from the OpenAI-FrontierMath debacle
7vik (satvik-golechha) · 2025-01-19T21:09:17.990Z · comments (9)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (7)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (3)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (17)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)

Timaeus is hiring researchers & engineers
Jesse Hoogland (jhoogland) · 2025-01-17T19:13:14.739Z · comments (2)

[link] Ideas for benchmarking LLM creativity
gwern · 2024-12-16T05:18:55.631Z · comments (11)

On polytopes
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-25T13:56:35.681Z · comments (5)

Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (17)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (35)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (6)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (21)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning
rife (edgar-muniz) · 2025-01-15T22:59:46.321Z · comments (31)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (9)

Predict 2025 AI capabilities (by Sunday)
Jonas V (Jonas Vollmer) · 2025-01-15T00:16:05.034Z · comments (3)

AI #99: Farewell to Biden
Zvi · 2025-01-16T14:20:05.768Z · comments (5)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

On DeepSeek’s r1
Zvi · 2025-01-22T19:50:17.168Z · comments (2)

Tail SP 500 Call Options
sapphire (deluks917) · 2025-01-23T05:21:51.221Z · comments (27)

The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)

Estimates of GPU or equivalent resources of large AI players for 2024/5
CharlesD · 2024-11-28T23:01:58.522Z · comments (7)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (3)

Correct my H5N1 research
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (25)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-duvenaud on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Good point. The reason AI risk is distinct is simply that it removes the need of those bureaucracies and corporations to keep some humans happy and healthy enough to actually run them. This doesn't exactly put limits on how much they can disempower humans, but it does tend to provide at least some bargaining power for the humans involved.

david-duvenaud on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Thanks for the detailed objection and the pointers. I agree there's a chance that solving alignment with designers' intentions might be sufficient. I think the objection is a good one that "if the AI was really aligned with one agent, it'd figure out a way to help them avoid multipolar traps".

My reply is that I'm worried that avoiding races-to-the-bottom will continue to be hard, especially since competition operates on so many levels. I think the main question is what's: the tax for coordinating to avoid a multipolar trap? If it's cheap we might be fine, if it's expensive then we might walk into a trap with eyes wide open.

As for human power grabs, maybe we should have included those in our descriptions. But the slower things change, the less there's a distinction between "selfishly grab power" and "focus on growth so you don't get outcompeted". E.g. Is starting a company or a political party a power grab?

As for reading the paper in detail, it's largely just making the case that a sustained period of technological unemployment, without breakthroughs in alignment and cooperation, would tend to make our civilization serve humans' interests more and more poorly over time in a way that'd be hard to resist. I think arguing that things are likely to move faster would be a good objection to the plausibility of this scenario. But we still think it's an important point that the misalignment of our civilization is possibly a second alignment problem that we'll have to solve.

raemon on Predation as Payment for Criticism

This seems mostly right, but, seems plausible to me that the predatory/prey cycle was a necessary prerequisite to get us into a basin where intelligence-sexual-selection was a plausible outcome.

russellthor on What's Behind the SynBio Bust?

This does seem different however https://solarfoods.com/ - they are competing with food not fuel which can't be done synthetically (well if at all). Also widely distributed capability like this helps make humanity more resilient e.g. against nuke winter, extreme climate change, space habitats

habryka4 on Ten people on the inside

(I meant the more expansive definition. Plausible that me and Zac talked past each other because of that)

nathan-helm-burger on Tetherware #1: The case for humanlike AI

You might like my related essay A Path to Human Autonomy [LW · GW]

ricraz on Ten people on the inside

FWIW I think of "OpenAI leadership being untrustworthy" (a significant factor in me leaving) as different from "OpenAI having bad safety policies" (not a significant factor in me leaving). Not sure if it matters, I expect that Scott was using "safety policies" more expansively than I do. But just for the sake of clarity:

I am generally pretty sympathetic to the idea that it's really hard to know what safety policies to put in place right now. Many policies pushed by safety people (including me, in the past) have been mostly kayfabe (e.g. being valuable as costly signals, not on the object level). There are a few object-level safety policies that I really wish OpenAI would do right now (most clearly, implementing better security measures) but I didn't leave because of that (if I had, I would have tried harder to check before I left what security measures OpenAI did have, made specific objections internally about them before I left, etc).

This may just be a semantic disagreement, it seems very reasonable to define "don't make employees sign non-disparagements" as a safety policy. But in my mind at least stuff like that is more of a lab governance policy (or maybe a meta-level safety policy).

raemon on The Gentle Romance

Yeah, but I'm contrasting this with (IMO more likely) futures where everyone dies, and nothing that's remotely like a human copy goes on. Even if you conceptualize it as "these people died", I think there are much worse possibilities for what sort of entity continues into the future. (i.e. an AI with no human/social/creative/emotional values, that just tiles the universe with simple struggles [? · GW])

I think the most likely outcome is nonsentient AI killing everyone with no morality without anyone having a choice in the matter. (i.e. I'm not contrasting this with "nobody ever builds AI at all" or "we build it but without doing any of the mental-process replacement/augmentation going on in this story", because those seem less likely than the "everyone dies" or "this story happens, but with even less agency and more blatantly dystopian outcomes.")

[of course, the reason I described this as "optimistic" instead of "less pessimistic than I hope for" is that I don't think the characters died, I think if you slowly augment yourself with AI tools, the pattern of you counts as "you" even as it starts to be instantiated in silicon, so I think this is just a pretty good outcome. I also think the world (implies) many people thinking about moral / personhood philosophy before taking the final plunge. I don't think there's anything even plausibly wrong with the first couple chunks, and I think the second half contains a lot of qualifiers (such as integrating his multiple memories into a central node) that make it pretty unobjectionable.

I realize you don't believe that, and, seems fine for you to see it as horror. It's been awhile since I discussed the "does a copy of you count as you" and I might be up for discussing that if you want to argue about it, but also seems fine to leave as-is]

ori-nagel on Can someone, anyone, make superintelligence a more concrete concept?

Ah yes, Rational Animations did a great video of that story. That did make superintelligence more graspable, but you know I had watched it and forgotten about it. I think it showed how our human civilization is vulnerable to other intelligences (aliens), but didn't still made the superintelligence concept one that that easy to grok.

eggsyntax on eggsyntax's Shortform

Well, we (humans) categorize our epistemic state largely in propositional terms, e.g. in beliefs and suppositions.

I'm not too confident of this. It seems to me that a lot of human cognition isn't particularly propositional, even if nearly all of it could in principle be translated into that language. For example, I think a lot of cognition is sensory awareness, or imagery, or internal dialogue. We could contort most of that into propositions and propositional attitudes (eg 'I am experiencing a sensation of pain in my big toe', 'I am imagining a picnic table'), but that doesn't particularly seem like the natural lens to view those through.

That said, I do agree that propositions and propositional attitudes would be a more useful language to interpret LLMs through than eg activation vectors of float values.