LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (22)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (6)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

AI #33: Cool New Interpretability Paper
Zvi · 2023-10-12T16:20:01.481Z · comments (18)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

[question] Rationalist horror movies
Elizabeth (pktechgirl) · 2023-10-15T07:42:14.509Z · answers+comments (35)

In Defense of Parselmouths
Screwtape · 2023-11-15T23:02:19.344Z · comments (10)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

LW UI features you might not have tried
Elizabeth (pktechgirl) · 2023-10-13T03:04:57.542Z · comments (6)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

So You Created a Sociopath - New Book Announcement!
Garrett Baker (D0TheMath) · 2024-04-01T18:02:18.010Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sovran on OpenAI Email Archives (from Musk v. Altman)

Been thinking a lot about whether it's possible to stop humanity from developing AI.
I think the answer is almost definitely not.

Interesting that the very first thing he discusses is whether AI can be stopped

benito on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Thanks for making this point.

I note Musk was the first one to start a competitor, which seems to me to be very costly.

I think that founding OpenAI could be right if the non-profit structure was likely to work out. I don't know if that made sense at the time. Altman has overpowered getting fired by the board, removed parts of the board, and rumor has it he is moving to a for-profit, which is strong evidence against the non-profit being able to withstand the pressures that were coming, but even without Altman I suspect it would still involve billions of $ of funding, partnerships like the one with Microsoft, and other for-profit pressures to be the sort of player it is today. So I don't know that Musk's plan was viable at all.

lorec on Project Adequate: Seeking Cofounders/Funders

This is one of those subject areas that'd be unfortunately bad to get into publicly. If you or any other individual wants to grill me on this, feel free to DM me or contact me by any of the above methods and I will take disclosure case by case.

meme-marine on Project Adequate: Seeking Cofounders/Funders

Kudos to you for actually trying to solve the problem, but I must remind you that the history of symbolic AI is pretty much nothing but failure after failure; what do you intend to do differently, and how do you intend to overcome the challenges that halted progress in this area for the past ~40 years?

mako-yass on Trying Bluesky

For a while I just stuck to that, but eventually it occurred to me that the rules of following mode favor whoever tweets the most, which is a similar social problem as when meetups end up favoring whoever talks the loudest and interrupts the most, and so I came to really prefer bsky's "Quiet Posters" mode.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

That makes sense under certain assumptions - I find them so foreign I wasn't thinking in those terms. I find this move strange if you worry about either alignment or misuse. If you hand AGI to a bunch of people, one of them is prone to either screw up and release a misaligned AGI, or deliberately use their AGI to self-improve and either take over or cause mayhem.

To me these problems both seem highly likely. That's why the move of responding to concern over AGI by making more AGIs makes no sense to me. I think a singleton in responsible hands is our best chance at survival.

If you think alignment is so easy nobody will screw it up, or if you strongly believe that an offense-defense balance will strongly hold so that many good AGIs safely counter a few misaligned/misused ones, then sure. I just don't think either of those are very plausible views once you've thought back and forth through things.

Cruxes of disagreement on alignment difficulty [LW(p) · GW(p)] explains why I think anybody who thinks alignment is super easy is overestimating their confidence (as is anyone who's sure it's really really hard) - we just haven't done enough analysis or experimentation yet.

If we solve alignment, do we die anyway? [LW · GW] addresses why I think offense-defense balance is almost guaranteed to shift to offense with self-improving AGI, meaning a massively multipolar scenario means we're doomed to misuse.

My best guess is that people who think open-sourcing AGI is a good idea either are thinking only of weak "AGI" and not the next step to autonomously self-improving AGI, or they've taken an optimistic guess at the offense-defense balance with many human-controlled real AGIs.

cubefox on Trying Bluesky

The algorithm has been horrific for a while

After Musk took over, they implemented a mode which doesn't use an algorithm on the timeline at all. It's the "following" tab.

cronodas on The Online Sports Gambling Experiment Has Failed

I've heard that, in Las Vegas, if you put yourself on the government's "compulsive gambler" list, you can still walk into any casino, give them your money, and place a bet - the only difference being that, if you happen to win, the casino keeps your money as if you had lost.

I think it should work the other way around, making it the casino's responsibility to avoid accepting bets from self-proclaimed problem gamblers - if you're on the list and the casino doesn't stop you from betting, the casino has to give you back any money you lose.

benito on Lao Mein's Shortform

Maybe there's a hope there, but I'll point out that many of the people needed to run a business (finance, legal, product, etc) are not idealistic scientists who would be willing to have their equity become worthless.

mako-yass on Trying Bluesky

Markets put bsky exceeding twitter at 44%, 4x higher than mastodon.
My P would be around 80%. I don't think most people (who use social media much in the first place) are proud to be on twitter. The algorithm has been horrific for a while and bsky at least offers algorithmic choice (but only one feed right now is a sophisticated algorithm, and though that algorithm isn't impressive, it at least isn't repellent)

For me, I decided I had to move over (@makoConstruct) when twitter blocked links to rival systems, which included substack. They seem to have made the algorithm demote any tweet with links, which makes it basically useless as a news curation/discovery system.

I also tentatively endorse the underlying protocol. Due to its use of content-addressed datastructures, an atproto server is usually much lighter to run than an activitypub server, it makes nomadic identity/personal data host transfer much easier to implement, and it makes it much more likely that atproto is going to dovetail cleanly with verifiable computing, upon which much more consequential social technologies than microblogging could be built.