LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (26)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (14)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (133)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (22)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (52)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (62)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (49)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (40)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

leogao on leogao's Shortform

the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.

at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups

yams on yams's Shortform

Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.

In more detail, though:

I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.

I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL'd researcher to RL the RL...) is intelligence-explosion fuel. But I think there's a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven't seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it's not an explosion).

I take "depending on how concentrated AI R&D is" to foreshadow that you'd reply to the above with something like: "This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they're unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that's been RL'd by the..."

I think that's right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker 'for free' as long as they keep advancing down the tech tree, but I don't quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever's selling this service, even if they're basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as 'bespoke').

In general, I don't like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don't know what we're going to get. 'By the time we'll have x, we'll certainly have y' is not a form of prediction that anyone has a particularly good track record making.

martin-randall on Unnatural Categories Are Optimized for Deception

I find this hypothetical about neural fireplaces curious, because the ambiguity exists in real fireplaces, speculative fiction is not needed. Please excuse any inaccuracies in this brief history of fireplaces:

Wood-burning fireplaces
Gas-burning fireplaces
Central heating
Electric heaters
Decorative fireplaces (no heat)

The original fireplaces produced both heat and a decorative flame effect. With each new type of invention there was a question of what to do with our previous terms. We've ended up with "heaters" to refer to things that heat a room and "fireplace" to refer to things that have a decorative flame effect. Both of these things are slightly fuzzy natural categories in the sense of this post.

Except... maybe we should say that "decorative" is a privative adjective and so a "decorative fireplace" isn't really a fireplace? For the sake of the thought experiment, let's say that practical rural folk place a higher value on having a secondary heat source because it takes longer to restore electricity after a storm. Meanwhile snobby urbanites place a higher value on decorative flame effects because they value gaining status through conspicuous consumption.

I see that someone could say "well, it's not a real fireplace, is it?" in order to signal that they share the values of practical rural folks. If they're actually a snobby urbanite politician and they don't actually have those practical rural values then they are being deceptive. That would be a deception about values, not about heat sources.

If a practical rural person says "well, it's not a real fireplace, is it?", then that could indeed be a true signal of their values. But my guess is the more restrictive meaning of fireplace came first. The causal diagram is something like:

Practical Rural Values -> Categorize functional fireplaces separately to decorative fireplaces -> Use the short word "fireplace" for functional fireplaces (for communication and signaling)

Not:

Practical Rural Values -> Use the short word "fireplace" for functional fireplaces (for signaling) -> Categorize functional fireplaces separately to decorative fireplaces

Because until practical rural folks have settled on a common meaning of "fireplace", they can't reliably use that meaning to signal their values to each other or to outsiders.

Except... maybe if it got caught up in the modern culture war there could be a flood of fireplace-related memes and then everyone would have very strong opinions about the best definition of "fireplace" a few months later for no real reason? Wow, that sure would suck for the CEO of Decorative Fireplaces Inc.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

I'm no musician, but music-generating AIs are already way better than I could ever be. It took me about an hour of prompting to get Suno to make this: https://suno.com/playlist/34e6de43-774e-44fe-afc6-02f9defa7e22

It's not perfect (especially: I can't figure out how to get it to create a song of the correct length, so I had to cut and paste snippets from two songs into a playlist, and that creates audible glitches/issues at the beginning, middle, and end) but overall I'm full of wonder and appreciation.

daniel-kokotajlo on Dave Kasten's AGI-by-2027 vignette

Interesting! You should definitely think more about this and write it up sometime, either you'll change your mind about timelines till superintelligence or you'll have found an interesting novel argument that may change other people's minds (such as mine).

lucid_levi_ackerman on Which things were you surprised to learn are metaphors?

The Rumbling.

kvmanthinking on You are not too "irrational" to know your preferences.

The above statement could be applied to a LOT of other posts too, not just this one.

sharmake-farah on yams's Shortform

I think 1 and 2 are actually pretty likely, but 3 and 4 is where I'm a lot less confident in actually happening.

A big reason for this is that I suspect one of the reasons people aren't reacting to AI progress is they assume it won't take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there's a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.

gwern on Eli's shortform feed

Just ask a LLM. The author can always edit it, after all.

My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

jkaufman on Secular Solstice Songbook Update

Done; thanks!