LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Timaeus is hiring researchers & engineers
Jesse Hoogland (jhoogland) · 2025-01-17T19:13:14.739Z · comments (2)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (7)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (21)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

The Simplest Good
Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · comments (5)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (11)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (18)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (18)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (19)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (22)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

knight-lee on Mikhail Samin's Shortform

I don't agree that the probability of alignment research succeeding is that low. 17 years or 22 years of trying and failing is strong evidence against it being easy, but doesn't prove that it is so hard that increasing alignment research is useless.

People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.

There is a chance that alignment research now might be more fruitful than alignment research earlier, though there is uncertainty in everything.

We should have uncertainty in the Ten Levels of AI Alignment Difficulty [LW · GW].

The comparison

It's unlikely that 22 years of alignment research is insufficient but 23 years of alignment research is sufficient.

But what's even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.

In the same way adding a little alignment research is unlikely to turn failure into success, adding a little capabilities research is unlikely to turn success into failure.

It's also unlikely that alignment effort is even deadlier than capabilities effort dollar for dollar. That would mean reallocating alignment effort into capabilities effort paradoxically slows down capabilities and saves everyone.

Even if you are right that delaying AI capabilities is all that matters

Anthropic still might be a good thing, because Dario Amodei alone isn't responsible for Anthropic's capabilities progress.

Even if Anthropic disappeared, or never existed in the first place, the AI investors will continue to pay money for research, and the AI researchers will continue to do research for money. Anthropic was just the middleman.

If Anthropic never existed, the middlemen would consist of only OpenAI, DeepMind, Meta AI, and other labs. These labs will not only act as the middle man, but lobby against regulation far more aggressively than Anthropic, and may discredit the entire "AI Notkilleveryoneism" movement.

To continue existing at one of these middlemen, you cannot simply stop paying the AI researchers for capabilities research, otherwise the AI investors and AI customers will stop paying you in turn. You cannot stem the flow, you can only decide how much goes through you.

It's the old capitalist dilemma of "doing evil or getting out-competed by those who do."

For their part, Anthropic redirected some of that flow to alignment research, and took the small amount of precautions which they could afford to take. That may be the best one can hope to accomplish against this unstoppable flow from the AI investors to AI researchers.

The small amount of precautions Anthropic did take may have already costed them their first mover advantage. Had Anthropic raced ahead before OpenAI released ChatGPT, Anthropic may have stolen the limelight, got the early customers and investors, and been bigger than OpenAI.

japancolorado on Hammertime Day 7: Aversion Factoring

A trivial inconvenience of my gym occasionally not having a barbell cover to protect my back during squats prevented me from going to the gym consistently. I didn't do probably around 10 workouts just because I got an ugh field around my back being in minor pain while the barbell was on it.

jblack on Gettier Cases [repost]

No, there is nothing wrong with the referents in the Gettier examples.

The problem is not that the proposition refers to Jones. Within the universe of the scenario, it in fact did not. Smith's mental model implied that the proposition referred to Jones, but Smith's mental model was incorrect in this important respect. Due to this, the fact that the model correctly predicted the truth of the proposition was an accident.

ruby on [deleted]

duplicate with Hyperstitions

trade_apprentice on Distillation Experiment: Chunk-Knitting

This reminded me to the Goldfish reading [LW · GW] post, and it turns out it's the same author.

trade_apprentice on Distillation Experiment: Chunk-Knitting

Images here won't load, but can be seen in the archived version: https://web.archive.org/web/20221107200157/https://www.lesswrong.com/posts/EEZsTatSoJz4CDvAc/distillation-experiment-chunk-knitting

sharmake-farah on What are the "no free lunch" theorems?

Another interpretation of the no free lunch theorem by @davidad [LW · GW] is that learning/optimization is too trivial under worst-case conditions, but also impractical, so you need to put more constraints to have an interesting solution:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN [LW(p) · GW(p)]

kvmanthinking on Toki pona FAQ

Also, it helps taboo your words. For example, "Toki Pona helps taboo your words" would be rendered as
tenpo toki pi toki pona li sama e tenpo toki pi ni: jan li ken kepeken ala e nimi pi ken ala sona pi pali lili.
"(the) speech-time related to Toki Pona is similar or the same as (the) speech-time with this quality: (the) person cannot use word(s) which cannot be known via small effort."

Before you complain that this is too long a phrase to be used practically, try to explain the concept of rationalist taboo in less syllables than I did in Toki Pona, whilst not relying on other rationalist jargon.

knight-lee on Pick two: concise, comprehensive, or clear rules

Let's just think about the pros and cons of picking another forum, vs. continuing to comment on LessWrong, but only being visible by others who choose to see you.

Picking another forum:

They fit better in other forums than LessWrong. For most rate-limited users, this is true, but they can go to other forums on their own without being forced.
Less need for LessWrong to write code and increase bandwidth to accommodate them.
Less chance they say really bad things (neoreactionary content) which worsens the reputation of LessWrong? This doesn't apply to most rate-limited users.

Continuing to comment but only visible to those interested:

They get to discuss the posts and topics they find engaging to talk about.
They don't feel upset at LessWrong and the rationalist community.

I think whether it's worth it depends on how hard it is to write the code for them.

viliam on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I agree that it's not looking good, if all you know is SBF and Zizians. But as far as I know, Annie Altman is not related to either rationalism or effective altruism.