LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

Being against involuntary death and being open to change are compatible
Andy_McKenzie · 2024-05-27T06:37:27.644Z · comments (5)

Is suffering like shit?
KatjaGrace · 2024-05-31T01:20:03.855Z · comments (5)

Learning Math in Time for Alignment
Nicholas / Heather Kross (NicholasKross) · 2024-01-09T01:02:37.446Z · comments (3)

If you are also the worst at politics
lemonhope (lcmgcd) · 2024-05-26T20:07:49.201Z · comments (8)

Video and transcript of presentation on Scheming AIs
Joe Carlsmith (joekc) · 2024-03-22T15:52:03.311Z · comments (1)

[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

[link] Manifund: 2023 in Review
Austin Chen (austin-chen) · 2024-01-18T23:50:13.557Z · comments (0)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

An argument that consequentialism is incomplete
cousin_it · 2024-10-07T09:45:12.754Z · comments (27)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

In Defense of Lawyers Playing Their Part
Isaac King (KingSupernova) · 2024-07-01T01:32:58.695Z · comments (9)

[link] Why you, personally, should want a larger human population
jasoncrawford · 2024-02-23T19:48:10.526Z · comments (32)

[link] the subreddit size threshold
bhauth · 2024-01-23T00:38:13.747Z · comments (3)

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you

Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it.

Secondly I think there is some sort of slight of had here.

ChatGPT isn't yet fully general. Neither is a 3-sat solver. 3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.

In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't.

Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes.

dakara on AI Control: Improving Safety Despite Intentional Subversion

This paper creates a new 1-bit strategy for circumventing Ryan and Buck's safety strategy. The collusion team achieves 90%+ success rate.

I would really appreciate any input, especially from Ryan or his co-authors. This seems like a very important issue to address.

benito on Sabotage Evaluations for Frontier Models

I have found it fruitful to argue this case back and forth with you, thank you for defending and explaining your perspective.

I will restate my overall position, I invite you to do the same, and then I propose that we consider this 40+ comment thread concluded for now.

———

The comment of yours that (to me) started this thread was the following.

If the default path is AI's taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.

I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so. One should not assume that the incentives are aligned – anyone who is risking omnicide-level outcomes via investing in novel tech development currently faces no criminal penalties, fines, or international sanctions.

Given the current intellectual elite scene where a substantial number of prestigious people care about extinction level outcomes, it is also not surprising that power-seeking companies have large departments focused on 'ethics' and 'safety' in order to look respectable to such people. Separately from any intrinsic interest, it has been a useful political chip for enticing a great deal of talent from scientific communities and communities interested in ethics to work for them (not dissimilar to how Sam Bankman-Fried managed to cause a lot of card-carrying members of the Effective Altruist philosophy and scene to work very hard to help build his criminal empire by talking a good game about utilitarianism, veganism, and the rest).

Looking at a given company's plan for preventing doom, and noticing it does not check out, should not be followed by an assumption of adequacy [? · GW] and good incentives such that surely this company would not exist nor do work on AI safety if it did not have a strong plan, I must be mistaken. I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not. Given the lack of accountability, and my belief that alignment is clearly unsolved and we fundamentally do not knowing what we're doing [LW(p) · GW(p)], I believe the people involved are getting rich risking all of our lives and there is (currently) no justice here.

We have agreed on many points, and from the outset I believe you felt my position had some truth to it ("I do get that point that you are making, but I think this is a little bit unfair to these organizations."). I will leave you to outline whichever overall thoughts and remaining points of agreement or disagreement that you wish.

turntrout on Announcing turntrout.com, my new digital home

Another bit I forgot to highlight in the original post: the fonts available on my site.

saidachmiz on Announcing turntrout.com, my new digital home

Not bad at all! Needs some work on the details and some bug fixes, but—really not bad! The dropcaps, in particular, are well done; and the overall theme is elegant.

raemon on Neutrality

Curated. This was one of the more inspiring things I read this year (in a year that had a moderate number of inspiring things!)

I really like how Sarah lays out the problem and desiderata for neutrality in our public/civic institutional spaces.

LessWrong's strength is being a fairly opinionated university about how to do epistemics, which the rest of the world isn't necessarily bought into. Trying to make LW a civic institution would fail. But, this post has me more excited to revisit "what would be necessary to build good, civic infrastructure" (where "good" requires both "be 'good' in some kind of deep sense," but also "be memetically fit enough to compete with Twitter et all." One solution might be convincing Musk of specific policies rather than building a competitor)

mako-yass on Trying Bluesky

Hmm. I think the core thing is transparency. So if it cultivates human network intelligence, but that intelligence is opaque to the user, algorithm. Algorithms can have both machine and egregoric components.

daniel-kokotajlo on Vignettes Workshop (AI Impacts)

Hey! Exciting! How about you go ahead and write your first stab at it, and then post it online? You could then make a comment here or on What 2026 Looks Like linking to it.

daniel-kokotajlo on 5 ways to improve CoT faithfulness

I'm afraid not, as far as I know it still hasn't been implemented and tested. Though maybe OpenAI is doing something halfway there. If you want to implement and test the idea, great! I'd love to advise on that project.

daemonicsigil on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Anthropic shadow isn't a real thing, check this post: https://www.lesswrong.com/posts/LGHuaLiq3F5NHQXXF/anthropically-blind-the-anthropic-shadow-is-reflectively [LW · GW]

Also, you should care about worlds proportional to the square of their amplitude.