LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

[New Feature] Your Subscribed Feed
Ruby · 2024-06-11T22:45:00.000Z · comments (8)

MATS AI Safety Strategy Curriculum
Ronny Fernandez (ronny-fernandez) · 2024-03-07T19:59:37.434Z · comments (2)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (0)

AI #39: The Week of OpenAI
Zvi · 2023-11-23T15:10:04.865Z · comments (8)

AI #42: The Wrong Answer
Zvi · 2023-12-14T14:50:05.086Z · comments (6)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

[link] Why not electric trains and excavators?
bhauth · 2023-11-21T00:07:17.967Z · comments (39)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (22)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (14)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

Don't Share Information Exfohazardous on Others' AI-Risk Models
Thane Ruthenis · 2023-12-19T20:09:06.244Z · comments (11)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

Ophiology (or, how the Mamba architecture works)
Danielle Ensign (phylliida-dev) · 2024-04-09T19:31:09.975Z · comments (8)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

AI #35: Responsible Scaling Policies
Zvi · 2023-10-26T13:30:02.439Z · comments (10)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

[link] Funding case: AI Safety Camp
Remmelt (remmelt-ellen) · 2023-12-12T09:08:18.911Z · comments (5)

[link] Most experts believe COVID-19 was probably not a lab leak
DanielFilan · 2024-02-02T19:28:00.319Z · comments (89)

[link] Shane Legg interview on alignment
Seth Herd · 2023-10-28T19:28:52.223Z · comments (20)

Reinforcement Via Giving People Cookies
Screwtape · 2023-11-15T04:34:21.119Z · comments (9)

Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (21)

OpenAI's Preparedness Framework: Praise & Recommendations
Akash (akash-wasil) · 2024-01-02T16:20:04.249Z · comments (1)

minutes from a human-alignment meeting
bhauth · 2024-05-24T05:01:53.904Z · comments (4)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)
AE Studio (AEStudio) · 2024-03-26T20:59:09.129Z · comments (8)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (15)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (4)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (27)

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (32)

OpenAI: Altman Returns
Zvi · 2023-11-30T14:10:05.469Z · comments (12)

[link] Towards Understanding Sycophancy in Language Models
Ethan Perez (ethan-perez) · 2023-10-24T00:30:48.923Z · comments (0)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[link] So you want to save the world? An account in paladinhood
Tamsin Leake (carado-1) · 2023-11-22T17:40:33.048Z · comments (19)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)

[link] How LDT helps reduce the AI arms race
Tamsin Leake (carado-1) · 2023-12-10T16:21:44.409Z · comments (13)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

METR is hiring!
Beth Barnes (beth-barnes) · 2023-12-26T21:00:50.625Z · comments (1)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (52)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Sad to hear. Is this thread itself (starting with my parent comment which you replied to) an example of this, or are you referring instead to previous engagements/threads on LW?

johnswentworth on There aren't enough smart people in biology doing something boring

Right, thus the large sales force. Standard B2B business model where the product is mediocre but there's a strong sales team convincing idiots in suits to pay ridiculous amounts of money for it.

anthonyc on What's a good book for a technically-minded 11-year old?

And here I was hoping it would prompt someone to look things up or talk about them with the person who recommended the book.

sharmake-farah on The Mask Comes Off: At What Price?

I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.

My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.

christiankl on There aren't enough smart people in biology doing something boring

Making money at all in biology requires being a therapeutics company, which requires you to do something exciting

Illumina has a market cap of 22,77 billion. There was a time when Theranos had a high market cap even if they ultimately didn't manage to develop the technology for it.

It's possible to make a lot of money building tools, it's just that most of the capital is therapeutics-focused instead of tool-focused. However, theraputics-focus vs. tool focused is not the same thing as boring/interesting. Neither Illumina nor Theranos are boring. Alpha Fold was exciting but there's still a reason why it was developed at Google and not at a big pharma company.

If we look at the question of incubators, there's probably a company that sells the incubators and the software that runs them is closed-source so it's hard for someone besides the incubator company to provide software to control it.

The first sales page I found for an incubator is https://www.thermofisher.com/order/catalog/product/51031528?SID=srch-srp-51031528 . If you want to create an incubator startup, building an incubator that can do all the things that the incubator from Thermo Fisher can do and additionally has WLan and an app, you have to do a lot of work to match the features of the existing incubator. Even if you could produce the product, I expect it will not easy to sell it and get people to trust you to have a better product than Thermo Fisher.

Thermo Fisher likely does market analysis and would build build an app for their incubator if they would think that their customers want that but currently sees no demand.

It might be inherent, in idea of having an app to control the incubator being boring, that it's hard to sell it incubators with it.

raghuvar-nadig on OpenAI defected, but we can take honest actions

Thanks! I should have been more clear that the trajectory toward level 5 (with all human virtue/trust being hackable for instrumental gains) itself is concerning, not just the eventual leap when it gets there.

lalartu on The Personal Implications of AGI Realism

This chain of logic is founded on an assumption that these technologies are possible, which I find highly dubious. If an (aligned) superintelligence is built, and we ask it for life extension, the most probable answer would be that biological immortality (and all stuff requiring nanorobots) is just plain impossible, and brain uploading wouldn't help because your copy is not you.

christiankl on There aren't enough smart people in biology doing something boring

Somehow Docusign got the Swiss government to pay them a lot of money for providing e-signatures [LW · GW] instead of that service provided order of magnitudes cheaper by a startup with two full time developers. There are no companies who use the existence of AWS to do disruptive innovation to eat Docusigns profits away.

akash-wasil on What AI companies should do: Some rough ideas

I think I agree with this idea in principle, but I also feel like it misses some things in practice (or something). Some considerations:

I think my bar for "how much I trust a lab such that I'm OK with them not making transparency commitments" is fairly high. I don't think any existing lab meets that bar.
I feel like a lot of forms of helpful transparency are not that costly. The main 'cost' feels to me like "maybe the government will end up regulating the sector if/when it understands how dangerous industry people expect AI systems to be and how many safety/security concerns they have". But I think things like "report dangerous stuff to the govt", "have a whistleblower mechanism", and even "make it clear that you're willing to have govt people come and ask about safety/security concerns" don't seem very costly from a time/effort perspective.
If a Responsible Company implemented transparency stuff unilaterally, it would make it easier for the government to have proof-of-concept and implement the same requirements for other companies. In a lot of cases, showing that a concept works for company X (and that company X actually thinks it's a good thing) can reduce a lot of friction in getting things applied to companies Y and Z.

I do agree that some of this depends on the type of transparency commitment and there might be specific types of transparency commitments that don't make sense to pursue unilaterally. Off the top of my head, I can't think of any transparency requirements that I wouldn't want to see implemented unilaterally, and I can think of several that I would want to see (e.g., dangerous capability reports, capability forecasts, whistleblower mechanisms, sharing if-then plans with govt, sharing shutdown plans with govt, setting up interview program with govt, engaging publicly with threat models, having clear OpenAI-style tables that spell out which dangerous capabilities you're tracking/expecting).

bogdan-ionut-cirstea on The case for unlearning that removes information from LLM weights

This would seem like a great benchmark/dataset/eval to apply automated research to [LW(p) · GW(p)]. Would you have thoughts/recommendations on that?