LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (2)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (12)

Racing Towards Fusion and AI
Jeffrey Heninger (jeffrey-heninger) · 2025-02-07T20:40:56.798Z · comments (6)

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng (ccstan99) · 2025-02-07T03:57:30.904Z · comments (0)

On the Meta and DeepMind Safety Frameworks
Zvi · 2025-02-07T13:10:08.449Z · comments (1)

A Problem to Solve Before Building a Deception Detector
Eleni Angelou (ea-1) · 2025-02-07T19:35:23.307Z · comments (0)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (0)

Reasons-based choice and cluelessness
JesseClifton · 2025-02-07T22:21:47.232Z · comments (0)

'High-Level Machine Intelligence' and 'Full Automation of Labor' in the AI Impacts Surveys
Jeffrey Heninger (jeffrey-heninger) · 2025-02-07T20:40:52.388Z · comments (0)

[link] Request for Information for a new US AI Action Plan (OSTP RFI)
agucova · 2025-02-07T20:40:36.034Z · comments (0)

[Translation] In the Age of AI don't Look for Unicorns
mushroomsoup · 2025-02-07T21:06:24.198Z · comments (0)

When you downvote, explain why
KvmanThinking (avery-liu) · 2025-02-07T01:03:44.097Z · comments (21)

[link] Request for proposals: improving capability evaluations
cb · 2025-02-07T18:51:34.926Z · comments (0)

Introducing SyDFAIS: A Systemic Design Framework for AI Safety Field-Building
Moneer Moukaddem (moneer-moukaddem) · 2025-02-07T18:51:24.067Z · comments (0)

the devil's ontology
lostinwilliamsburg · 2025-02-07T14:18:52.516Z · comments (3)

next page (older posts) →

Archive

Recent comments

davey-morse on Davey Morse's Shortform

does anyone think now that it's still possible to prevent recursively self-improving agents? esp now that r1 is open-source... materials for smart self-iterating agents seem accessible to millions of developers.

prompted in particular by the circulation of this essay in past three days https://huggingface.co/papers/2502.02649

davey-morse on Davey Morse's Shortform

As far as I can tell, OAI's new current safety practices page only names safety issues related to current LLMs, not agents powered by LLMs. https://openai.com/index/openai-safety-update/

Am I missing another section/place where they address x-risk?

mishka on Racing Towards Fusion and AI

I'd say that the ability to produce more energy overall than what is being spend on the whole cycle would count as a "GPT-3 moment". No price constraints, so it does not need to reach the level of "economically feasible", but it should stop being "net negative" energy-wise (when one honestly counts all energy inputs needed to make it work).

I, of course, don't know how to translate Q into this. GPT-4o tells me that it thinks that Q=10 is what is approximately needed for that (for "Engineering Break-even (reactor-level energy balance)"), at least for some of the designs, and Q in the neighborhood of 20-30 is what's needed for economic viability, but I don't really know if these are good estimates.

But assuming that these estimates are good, Q passing 10 would count as the GPT-3 moment.

What happens then might depend on the economic forecast (what's the demand for energy, what are expected profits, and so on). If they only expect to make profits typical for public utilities, and the whole thing is still heavily oriented towards publicly regulated setups, I would expect continuing collaboration.

If they expect some kind of super-profits, with market share being really important and with expectations of chunks of it being really lucrative, then I would not bet on continuing collaboration too much...

johnswentworth on So You Want To Make Marginal Progress...

Yup, if you actually have enough knowledge to narrow it down to e.g. a 65% chance of one particular major route, then you're good. The challenging case is when you have no idea what the options even are for the major route, and the possibility space is huge.

martin-randall on So You Want To Make Marginal Progress...

The fourth friend, Becky the Backward Chainer, started from their hotel in LA and...

Well, no. She started at home with a telephone directory. A directory seems intelligent but is actually a giant look-up table. It gave her the hotel phone number. Ring ring.

Heidi the Hotel Receptionist: Hello?

Becky: Hi, we have a reservation for tomorrow evening. I'm back-chaining here, what's the last thing we'll do before arriving?

Heidi: It's traditional to walk in through the doors to reception. You could park on the street, or we have a parking lot that's a dollar a night. That sounds cheap but it's not because we're in the past. Would you like to reserve a spot?

Becky: Yes please, we're in the past so our car's easy to break into. What's the best way to drive to the parking lot, and what's the best way to get from the parking lot to reception?

Heidi: We have signs from the parking lot to reception. Which way are you driving in from?

Becky: Ah, I don't know, Alice is taking care of that, and she's stepped out to get more string.

Heidi: Oh, sure, can't plan a car trip without string. In the future we'll have pet nanotech spiders that can make string for us, road trips will never be the same. Anyway, you'll probably be coming in via Highway 101, or maybe via the I-5, so give us a buzz when you know.

Becky: Sorry, I'm actually calling from an analogy, so we're planning everything in parallel.

Heidi: No worries, I get stuck in thought experiments all the time. Yesterday my friend opened a box and got a million dollars, no joke. Look, get something to take notes and I'll give you directions from the three main ways you could be coming in.

Becky: Ack! Hang on while I...

Gerald the General Helper: Here's a pen, Becky.

Trevor the Clever: Get off the phone! I need to call a gas station!

Susan the Subproblem Solver: Alice, I found some string and.... Hey, where's Alice?

ruby on So You Want To Make Marginal Progress...

This doesn't seem right. Suppose there are two main candidates for how to get there, I-5 and J-6 (but who knows, maybe we'll be surprised by a K-7) and I don't know which Alice will choose. Suppose I know there's already a Very General Helper and Kinda Decent Generalizer, then I might say "I assign 65% chance that Alice is going to choose the I-5 and will try to contribute having conditioned on that". This seems like a reasonable thing to do. It might be for naught, but I'd guess in many case the EV of something definitely helpful if we go down Route A is better than the EV of finding something that's helpful no matter the choice.

One should definitely track the major route they're betting on and make updates and maybe switch, but seems okay to say your plan is conditioning on some bigger plan.

bgold on In response to critiques of Guaranteed Safe AI

Minor point: It seems unfair to accuse GSAI of being vaporware. It has been less than a year since the GSAI paper came out and 1.5 since Tegmark/Omohundro's Provably Safe paper, and there are many projects being actively funded through ARIA and others that should serve as tests. No GSAI researchers that I know of promised significant projects in 2024 - in fact several explicitly think the goal should be to do deconfusion and conceptual work now and plan to leverage the advances in autoformalization and AI-assisted coding that are coming down the pipe fast.

While I agree that there are not yet compelling demonstrations, this hardly seems at the level of Duke Nukem Forever!

ruby on What is malevolence? On the nature, measurement, and distribution of dark traits

Curated. This piece definitely got me thinking. If we grant that some people are unusually altruistic, empathetic, etc., it stands to reason that there are others on the other end of various distributions. And then we should also expect various selection effects on where they end up.

It was definitely a puzzle piece clicking for me that these traits can coexist with [genuine] moral conviction and that the traits are egodystonic. This rings true but somehow hasn't been an explicit model for me, but yes. Combine with this the difficult of detecting these traits and resultant behaviors...and yeah, there's stuff here to think about.

I appreciate that the authors were thorough in their research but don't especially love the format. This was pretty dense and I think a post that pulled out the most key pieces of info and argued for some conclusions would be a better read, but I much prefer this to no post.

To the extent I should add my own opinions to curation notices, my thought is this makes me update against "benefit of the doubt" when witnessing concerning behaviors. I don't know that everyone beginning to scrutinize everyone else for having big D vibes would be good, but I do think scrutinizing behaviors for being high-integrity, cooperative, transparent, etc. might actually be a good direction – with the understanding that good norms around acceptable behaviors prevents abuses that anyone (however much D) is tempted towards. Something like we want to build "robust-to-malevolence" orgs and community that make it impractical or too costly to manipulate, etc.

johnswentworth on So You Want To Make Marginal Progress...

Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today's methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.

rebecca_records on Open Thread Winter 2024/2025

Hi, I'd like to start creating some wiki pages to help organize information. Would appreciate an upvote so I can get started. Thanks!