LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (14)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (10)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (9)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (10)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (9)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (0)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

leogao on leogao's Shortform

the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.

at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups

yams on yams's Shortform

Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.

In more detail, though:

I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.

I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL'd researcher to RL the RL...) is intelligence-explosion fuel. But I think there's a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven't seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it's not an explosion).

I take "depending on how concentrated AI R&D is" to foreshadow that you'd reply to the above with something like: "This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they're unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that's been RL'd by the..."

I think that's right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker 'for free' as long as they keep advancing down the tech tree, but I don't quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever's selling this service, even if they're basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as 'bespoke').

In general, I don't like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don't know what we're going to get. 'By the time we'll have x, we'll certainly have y' is not a form of prediction that anyone has a particularly good track record making.

martin-randall on Unnatural Categories Are Optimized for Deception

I find this hypothetical about neural fireplaces curious, because the ambiguity exists in real fireplaces, speculative fiction is not needed. Please excuse any inaccuracies in this brief history of fireplaces:

Wood-burning fireplaces
Gas-burning fireplaces
Central heating
Electric heaters
Decorative fireplaces (no heat)

The original fireplaces produced both heat and a decorative flame effect. With each new type of invention there was a question of what to do with our previous terms. We've ended up with "heaters" to refer to things that heat a room and "fireplace" to refer to things that have a decorative flame effect. Both of these things are slightly fuzzy natural categories in the sense of this post.

Except... maybe we should say that "decorative" is a privative adjective and so a "decorative fireplace" isn't really a fireplace? For the sake of the thought experiment, let's say that practical rural folk place a higher value on having a secondary heat source because it takes longer to restore electricity after a storm. Meanwhile snobby urbanites place a higher value on decorative flame effects because they value gaining status through conspicuous consumption.

I see that someone could say "well, it's not a real fireplace, is it?" in order to signal that they share the values of practical rural folks. If they're actually a snobby urbanite politician and they don't actually have those practical rural values then they are being deceptive. That would be a deception about values, not about heat sources.

If a practical rural person says "well, it's not a real fireplace, is it?", then that could indeed be a true signal of their values. But my guess is the more restrictive meaning of fireplace came first. The causal diagram is something like:

Practical Rural Values -> Categorize functional fireplaces separately to decorative fireplaces -> Use the short word "fireplace" for functional fireplaces (for communication and signaling)

Not:

Practical Rural Values -> Use the short word "fireplace" for functional fireplaces (for signaling) -> Categorize functional fireplaces separately to decorative fireplaces

Because until practical rural folks have settled on a common meaning of "fireplace", they can't reliably use that meaning to signal their values to each other or to outsiders.

Except... maybe if it got caught up in the modern culture war there could be a flood of fireplace-related memes and then everyone would have very strong opinions about the best definition of "fireplace" a few months later for no real reason? Wow, that sure would suck for the CEO of Decorative Fireplaces Inc.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

I'm no musician, but music-generating AIs are already way better than I could ever be. It took me about an hour of prompting to get Suno to make this: https://suno.com/playlist/34e6de43-774e-44fe-afc6-02f9defa7e22

It's not perfect (especially: I can't figure out how to get it to create a song of the correct length, so I had to cut and paste snippets from two songs into a playlist, and that creates audible glitches/issues at the beginning, middle, and end) but overall I'm full of wonder and appreciation.

daniel-kokotajlo on Dave Kasten's AGI-by-2027 vignette

Interesting! You should definitely think more about this and write it up sometime, either you'll change your mind about timelines till superintelligence or you'll have found an interesting novel argument that may change other people's minds (such as mine).

lucid_levi_ackerman on Which things were you surprised to learn are metaphors?

The Rumbling.

kvmanthinking on You are not too "irrational" to know your preferences.

The above statement could be applied to a LOT of other posts too, not just this one.

sharmake-farah on yams's Shortform

I think 1 and 2 are actually pretty likely, but 3 and 4 is where I'm a lot less confident in actually happening.

A big reason for this is that I suspect one of the reasons people aren't reacting to AI progress is they assume it won't take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there's a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.

gwern on Eli's shortform feed

Just ask a LLM. The author can always edit it, after all.

My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

jkaufman on Secular Solstice Songbook Update

Done; thanks!