LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

Unpicking Extinction
ukc10014 · 2023-12-09T09:15:41.291Z · comments (10)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

Linear encoding of character-level information in GPT-J token embeddings
mwatkins · 2023-11-10T22:19:14.654Z · comments (4)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (4)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (16)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (33)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

Disentangling four motivations for acting in accordance with UDT
Julian Stastny · 2023-11-05T21:26:22.514Z · comments (3)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

Important open problems in voting
Closed Limelike Curves · 2024-07-01T02:53:44.690Z · comments (1)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

[link] Genocide isn't Decolonization
robotelvis · 2023-10-20T04:14:07.716Z · comments (19)

Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP (Nicky) · 2023-10-23T16:33:04.878Z · comments (2)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

akash-wasil on Sabotage Evaluations for Frontier Models

@ryan_greenblatt [LW · GW] can you say more about what you mean by this one?

We can't find countermeasures such that our control evaluations indicate any real safety.

akash-wasil on Sabotage Evaluations for Frontier Models

I appreciate this distinction between the different types of outcomes-- nice.

I'm pretty interested in peoples' forecasts for each. So revised questions would be:

1a) Using your best judgment, at what point do you expect sabotage evaluations will no longer be able to reliably/confidently help us understand whether or not a model is scheming? At what capability threshold do you expect we would no longer be able to say "yup, the model is not scheming, and we can be confident that our sabotage evals are accurate."? (Feel free to use heuristics like the OpenAI PF capability thresholds or ASL levels).

1b) Using your best judgment, how likely is it that sabotage evaluations are undermined such that humans think the model can be deployed safely, but actually the humans were misled? At what capability threshold would you worry about this form of undermining?

@David Duvenaud [LW · GW] would be curious about your takes, even if they're just speculative guesses/predictions.

(And @ryan_greenblatt [LW · GW] would be curious for your takes but replace "sabotage evaluations" with "control evaluations." In the event that you've written this up elsewhere, feel free to link it– helpful for me if you quote the particular section.)

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Sad to hear. Is this thread itself (starting with my parent comment which you replied to) an example of this, or are you referring instead to previous engagements/threads on LW?

johnswentworth on There aren't enough smart people in biology doing something boring

Right, thus the large sales force. Standard B2B business model where the product is mediocre but there's a strong sales team convincing idiots in suits to pay ridiculous amounts of money for it.

anthonyc on What's a good book for a technically-minded 11-year old?

And here I was hoping it would prompt someone to look things up or talk about them with the person who recommended the book.

sharmake-farah on The Mask Comes Off: At What Price?

I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.

My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.

christiankl on There aren't enough smart people in biology doing something boring

Making money at all in biology requires being a therapeutics company, which requires you to do something exciting

Illumina has a market cap of 22,77 billion. There was a time when Theranos had a high market cap even if they ultimately didn't manage to develop the technology for it.

It's possible to make a lot of money building tools, it's just that most of the capital is therapeutics-focused instead of tool-focused. However, theraputics-focus vs. tool focused is not the same thing as boring/interesting. Neither Illumina nor Theranos are boring. Alpha Fold was exciting but there's still a reason why it was developed at Google and not at a big pharma company.

If we look at the question of incubators, there's probably a company that sells the incubators and the software that runs them is closed-source so it's hard for someone besides the incubator company to provide software to control it.

The first sales page I found for an incubator is https://www.thermofisher.com/order/catalog/product/51031528?SID=srch-srp-51031528 . If you want to create an incubator startup, building an incubator that can do all the things that the incubator from Thermo Fisher can do and additionally has WLan and an app, you have to do a lot of work to match the features of the existing incubator. Even if you could produce the product, I expect it will not easy to sell it and get people to trust you to have a better product than Thermo Fisher.

Thermo Fisher likely does market analysis and would build build an app for their incubator if they would think that their customers want that but currently sees no demand.

It might be inherent, in idea of having an app to control the incubator being boring, that it's hard to sell it incubators with it.

raghuvar-nadig on OpenAI defected, but we can take honest actions

Thanks! I should have been more clear that the trajectory toward level 5 (with all human virtue/trust being hackable for instrumental gains) itself is concerning, not just the eventual leap when it gets there.

lalartu on The Personal Implications of AGI Realism

This chain of logic is founded on an assumption that these technologies are possible, which I find highly dubious. If an (aligned) superintelligence is built, and we ask it for life extension, the most probable answer would be that biological immortality (and all stuff requiring nanorobots) is just plain impossible, and brain uploading wouldn't help because your copy is not you.

christiankl on There aren't enough smart people in biology doing something boring

Somehow Docusign got the Swiss government to pay them a lot of money for providing e-signatures [LW · GW] instead of that service provided order of magnitudes cheaper by a startup with two full time developers. There are no companies who use the existence of AWS to do disruptive innovation to eat Docusigns profits away.