LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

[link] cold aluminum for medicine
bhauth · 2023-12-16T14:38:03.260Z · comments (4)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

Taking responsibility and partial derivatives
Ruby · 2023-12-31T04:33:51.419Z · comments (1)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

Upgrading the AI Safety Community
trevor (TrevorWiesinger) · 2023-12-16T15:34:26.600Z · comments (9)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

Navigating emotions in an uncertain & confusing world
Akash (akash-wasil) · 2023-11-20T18:16:09.492Z · comments (1)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

Protocol evaluations: good analogies vs control
Fabien Roger (Fabien) · 2024-02-19T18:00:09.794Z · comments (10)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (16)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (32)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

MonoPoly Restricted Trust
ymeskhout · 2024-01-02T23:02:55.066Z · comments (37)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
Mikhail Samin (mikhail-samin) · 2023-12-27T18:44:33.976Z · comments (17)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality
Steven Byrnes (steve2152) · 2023-12-18T15:26:29.970Z · comments (12)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mako-yass on Trying Bluesky

For a while I just stuck to that, but eventually it occurred to me that the rules of following mode favor whoever tweets the most, which is a similar social problem as when meetups end up favoring whoever talks the loudest and interrupts the most, and so I came to really prefer bsky's "Quiet Posters" mode.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

That makes sense under certain assumptions - I find them so foreign I wasn't thinking in those terms. I find this move strange if you worry about either alignment or misuse. If you hand AGI to a bunch of people, one of them is prone to either screw up and release a misaligned AGI, or deliberately use their AGI to self-improve and either take over or cause mayhem.

To me these problems both seem highly likely. That's why the move of responding to concern over AGI by making more AGIs makes no sense to me. I think a singleton in responsible hands is our best chance at survival.

If you think alignment is so easy nobody will screw it up, or if you strongly believe that an offense-defense balance will strongly hold so that many good AGIs safely counter a few misaligned/misused ones, then sure. I just don't think either of those are very plausible views once you've thought back and forth through things.

Cruxes of disagreement on alignment difficulty [LW(p) · GW(p)] explains why I think anybody who thinks alignment is super easy is overestimating their confidence (as is anyone who's sure it's really really hard) - we just haven't done enough analysis or experimentation yet.

If we solve alignment, do we die anyway? [LW · GW] addresses why I think offense-defense balance is almost guaranteed to shift to offense with self-improving AGI, meaning a massively multipolar scenario means we're doomed to misuse.

My best guess is that people who think open-sourcing AGI is a good idea either are thinking only of weak "AGI" and not the next step to autonomously self-improving AGI, or they've taken an optimistic guess at the offense-defense balance with many human-controlled real AGIs.

cubefox on Trying Bluesky

The algorithm has been horrific for a while

After Musk took over, they implemented a mode which doesn't use an algorithm on the timeline at all. It's the "following" tab.

cronodas on The Online Sports Gambling Experiment Has Failed

I've heard that, in Las Vegas, if you put yourself on the government's "compulsive gambler" list, you can still walk into any casino, give them your money, and place a bet - the only difference being that, if you happen to win, the casino keeps your money as if you had lost.

I think it should work the other way around, making it the casino's responsibility to avoid accepting bets from self-proclaimed problem gamblers - if you're on the list and the casino doesn't stop you from betting, the casino has to give you back any money you lose.

benito on Lao Mein's Shortform

Maybe there's a hope there, but many of the people needed to run a business (finance, legal, product, etc) are not idealistic scientists who would be willing to have their equity become worthless.

mako-yass on Trying Bluesky

Markets put bsky exceeding twitter at 44%, 4x higher than mastodon.
My P would be around 80%. I don't think most people (who use social media much in the first place) are proud to be on twitter. The algorithm has been horrific for a while and bsky at least offers algorithmic choice (but only one feed right now is a sophisticated algorithm, and though that algorithm isn't impressive, it at least isn't repellent)

For me, I decided I had to move over (@makoConstruct) when twitter blocked links to rival systems, which included substack. They seem to have made the algorithm demote any tweet with links, which makes it basically useless as a news curation/discovery system.

I also tentatively endorse the underlying protocol. Due to its use of content-addressed datastructures, an atproto server is usually much lighter to run than an activitypub server, it makes nomadic identity/personal data host transfer much easier to implement, and it makes it much more likely that atproto is going to dovetail cleanly with verifiable computing, upon which much more consequential social technologies than microblogging could be built.

zack_m_davis on Lao Mein's Shortform

I don't think Vance is e/acc. He has said positive things about open source, but consider that the context was specifically about censorship and political bias in contemporary LLMs (bolding mine):

There are undoubtedly risks related to AI. One of the biggest:

A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can't produce accurate history. ChatGPT promotes genocidal concepts.

The solution is open source

If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such an insane political bias? If you wanted to promote bipartisan efforts to regulate for safety, it's entirely counterproductive.

Any moderate or conservative who goes along with this obvious effort to entrench insane left-wing businesses is a useful idiot.

I'm not handing out favors to industrial-scale DEI bullshit because tech people are complaining about safety.

The words I've bolded indicate that Vance is at least peripherally aware that the "tech people [...] complaining about safety" are a different constituency than the "DEI bullshit" he deplores. If future developments or rhetorical innovations [LW · GW] persuade him that extinction risk is a serious concern, it seems likely that he'd be on board with "bipartisan efforts to regulate for safety."

john-wiseman on OpenAI Email Archives (from Musk v. Altman)

Found a link to the 100-page lawsuit of Guy Ravine vs Sam Altman and Greg Brockman here: https://guyravine.com/

They also filed a narrower complaint: https://storage.courtlistener.com/recap/gov.uscourts.cand.416410/gov.uscourts.cand.416410.103.0.pdf

mako-yass on Lao Mein's Shortform

judo flip the situation like he did with the OpenAI board saga, and somehow magically end up replacing Musk or Trump in the upcoming administration...

If Trump dies, Vance is in charge, and he's previously espoused bland eaccism.

I keep thinking: Everything depends on whether Elon and JD can be friends.

simon on OpenAI Email Archives (from Musk v. Altman)

Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself.

It seems the concern was that DeepMind would create a singleton, whereas their vision was for many people (potentially with different values) to have access to it. I don't think that's strange at all - it's only strange if you assume that Musk and Altman would believe that a singleton is inevitable.

Musk:

If they win, it will be really bad news with their one mind to rule the world philosophy.

Altman:

The mission would be to create the first general AI and use it for individual empowerment—ie, the distributed version of the future that seems the safest.