LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker
Daniel Herrmann (Whispermute) · 2025-02-04T20:34:22.625Z · comments (22)

[link] Oppression and production are competing explanations for wealth inequality.
Benquo · 2025-01-05T14:13:15.398Z · comments (15)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

Higher-Order Forecasts
ozziegooen · 2024-05-22T21:49:42.802Z · comments (1)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (53)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

On the Meta and DeepMind Safety Frameworks
Zvi · 2025-02-07T13:10:08.449Z · comments (1)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (193)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise
lc · 2025-02-13T06:59:17.525Z · comments (16)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

Why care about AI personhood?
Francis Rhys Ward (francis-rhys-ward) · 2025-01-26T11:24:45.596Z · comments (6)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

In response to critiques of Guaranteed Safe AI
Nora_Ammann · 2025-01-31T01:43:05.787Z · comments (14)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (7)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio (yoshua-bengio) · 2025-02-24T18:31:48.580Z · comments (14)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (8)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ymeskhout on How to Corner Liars: A Miasma-Clearing Protocol

I'm unclear on what the distinction is exactly. This is a tutorial that works for catching a talented liar but also creating common knowledge between yourself and a bad liar.

gergogaspar on Announcement: Learning Theory Online Course

Will you publish the curriculum online? :)

chipmonk on Do you need years of therapy, or can one conversation resolve the issue?

is it the bulleted list in the beginning? (fwiw i didn't put it there, it was my friends/editors who added that. (saying this because they don't have the financial incentive i may))

chipmonk on Do you need years of therapy, or can one conversation resolve the issue?

any suggestions for how to talk about this stuff without having it read like an advertisement? i'm genuinely interested in the idea of one-shotting and legibilizing evidence that quick growth is possible

svitka on 2. Skim the Manual: Intelligent Voluntary Cooperation

I agree and I am building a "negotiable society of digital twins" at Kwaai based on this work.

jkaufman on Tax Price Gouging?

Their definition of "Price gouging occurs in a competitive market when lowering the price from the market-clearing level would increase total Utilitarian welfare" is a bit sneaky: it means that any time I say "here's an example of where price gouging helps improve disaster response" they can just say "but that's not real price gouging, since a lower price wouldn't increase welfare".

It also doesn't look to me like the paper's approach gives a good framework for thinking about long-term investment incentives and preparation for future disasters, or people selling/renting possessions they wouldn't normally put on the market sell (air purifiers, renting spare rooms).

The paper's division of circumstances into price gouging vs not isn't a good match for the real world, and leads them to support policies like the current ones that normally don't do anything and then suddenly make large impacts in a disaster. Instead I'd like to see recognition that it's hard to determine welfare-maximizing pricing in real time and that price signals can reach very far, and instead use a mechanism that allows price increases to occur but redistributes some of the profits.

mondsemmel on MondSemmel's Shortform

AI assistants are weird. Here's a Perplexity Pro search I did for an EY tweet about finding the sweet spot between utilitarianism & deontology. Perplexity Pro immediately found the correct tweet:

Eliezer Yudkowsky, a prominent figure in the rationalist community, has indeed expressed a view that suggests finding a balance between utilitarianism and deontology. In a tweet, he stated: "Go three-quarters of the way from deontology to utilitarianism and then stop. You are now in the right place. Stay there at least until you..."

But I wondered why it didn't provide the full quote (which is just a few more words, namely "Stay there at least until you have become a god."), and I just couldn't get it to do so, even with requests like "Just quote the full tweet from here: <URL>". Instead, it invented alternative versions like this:

Go three-quarters of the way from deontology to utilitarianism and then stop. You are now in the right place. Stay there at least until you understand why.

or this:

Go three-quarters of the way from deontology to utilitarianism and then stop. You are now in the correct place. Stay there at least until you understand why you shouldn't go any further.

I finally provided the full quote and asked it directly:

Does the following quote represent Yudkowsky's tweet with 100% accuracy?
"Go three-quarters of the way from deontology to utilitarianism and then stop. You are now in the right place. Stay there at least until you have become a god."

And it still doubled down on the wrong version.

ank on ank's Shortform

We can build the Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game?

To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet.

We should not keep it hidden behind the strict librarian (AI/AGI agent) that imposes rules on us to only read little quotes from books that it spits out while it itself has the whole output of humanity stolen.

We can explore The Library without any strict guardian in the comfort of our simulated planet Earth on our devices, in VR, and eventually through some wireless brain-computer interface (it would always remain a game that no one is forced to play, unlike the agentic AI-world that is being imposed on us more and more right now and potentially forever).

If you found it interesting, we discussed it here recently [EA · GW]

christiankl on How to Make Superbabies

Do you have hope that someone else does the required research, so that it's ready by the time the first superbabies are created?

If not, do you think it's okay to create superintelligent babies without it?

cole-wyeth on Daniel Kokotajlo's Shortform

I'm not actually relying on a heuristic, I'm compressing https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms [LW · GW]

If you extrapolate capability graphs in the most straightforward way, you get the result that AGI should arrive around 2027-2028. Scenario analyses (like the ones produced by Kokotajlo and Aschenbrenner) tend to converge on the same result.

If you extrapolate log GDP growth or the value of the S&P 500, superintelligence would not be anticipated any time soon. If you extrapolate then number of open mathematical theorems proved by LLMs you get ~a constant at 0. You have to decide which straight line you expect to stay straight - what Aschenbrenner did is not objective, and I don't know about Kokotajlo but I doubt it was meaningfully independent.

We mostly solved egg frying and laundry folding last year with Aloha and Optimus, which were some of the most long-standing issues in robotics. So human level robots in 2024 would actually have been an okay prediction. Actual human level probably requires human level intelligence, so 2027.

Interesting, link?

This reasoning feels a little motivated though - I think it would be obvious if we had human(-laborer)-level robots because they'd be walking around doing stuff. I've worked in robotics research a little bit and I can tell you that setting up a demo for an isolated task is VERY different from selling a product that can do it, let alone one product that can seamlessly transition between many tasks.