LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

Don't Share Information Exfohazardous on Others' AI-Risk Models
Thane Ruthenis · 2023-12-19T20:09:06.244Z · comments (11)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

AI #42: The Wrong Answer
Zvi · 2023-12-14T14:50:05.086Z · comments (6)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (22)

AI #39: The Week of OpenAI
Zvi · 2023-11-23T15:10:04.865Z · comments (8)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

[link] Why not electric trains and excavators?
bhauth · 2023-11-21T00:07:17.967Z · comments (39)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

Implementing activation steering
Annah (annah) · 2024-02-05T17:51:55.851Z · comments (7)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (15)

OpenAI: Altman Returns
Zvi · 2023-11-30T14:10:05.469Z · comments (12)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

Reinforcement Via Giving People Cookies
Screwtape · 2023-11-15T04:34:21.119Z · comments (9)

[link] Most experts believe COVID-19 was probably not a lab leak
DanielFilan · 2024-02-02T19:28:00.319Z · comments (89)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (6)

Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (21)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)
AE Studio (AEStudio) · 2024-03-26T20:59:09.129Z · comments (8)

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (32)

OpenAI's Preparedness Framework: Praise & Recommendations
Akash (akash-wasil) · 2024-01-02T16:20:04.249Z · comments (1)

[link] Funding case: AI Safety Camp
Remmelt (remmelt-ellen) · 2023-12-12T09:08:18.911Z · comments (5)

minutes from a human-alignment meeting
bhauth · 2024-05-24T05:01:53.904Z · comments (4)

2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (22)

Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (17)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

METR is hiring!
Beth Barnes (beth-barnes) · 2023-12-26T21:00:50.625Z · comments (1)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (52)

[link] The Perceptron Controversy
Yuxi_Liu · 2024-01-10T23:07:23.341Z · comments (18)

[link] So you want to save the world? An account in paladinhood
Tamsin Leake (carado-1) · 2023-11-22T17:40:33.048Z · comments (19)

[link] How LDT helps reduce the AI arms race
Tamsin Leake (carado-1) · 2023-12-10T16:21:44.409Z · comments (13)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (11)

A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nunosempere on Why I’m not a Bayesian

Maybe you could address these problems, but could you do so in a way that is "computationally cheap"? E.g., for forecasting on something like extinction, it is much easier to forecast on a vague outcome than to precisely define it.

martinkunev on Correspondence visualizations for different interpretations of "probability"

frequentist correspondence is the only type that has any hope of being truly objective

I'd counter this.

If I have enough information about an event and enough computation power, I get only objectively true and false statements. There are limits to my knowledge of the laws of the universe, the event in question (e.g. due to measurement limits) and limits to my computational power. The situation is further complicated by being embedded in the universe and epistemic concerns (e.g. do I trust my eyes and cognition?).

The need for a concept "probability" comes from all these limits. There is nothing objective about it.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

That's why I said: "In expectation", "win or lose"

That the coinflip came out one way rather than another doesnt prove the guy had actual inside knowledge. He bought a large part of the shares at crazy odds because his market impact moved the price so much.

But yes, he could be a sharp in sheeps clothings. I doubt it but who knows.

Point is that the winners contribute epistemics and the losers contribute money. The real winner is society [if the questions are about socially-relevant topics].

nunosempere on Survival without dignity

I have a writeup on solar storm risk here [LW · GW] that could be of interest

charlie-steiner on How to put California and Texas on the campaign trail!

So a proportional vote interstate compact? :)

I like it - I think one could specify an automatic method for striking a fair bargain between states (and only include states that use that method in the bargain). Then you could have states join the compact asynchronously.

E.g. if the goal is to have the pre-campaign expected electors be the same, and Texas went 18/40 Biden in 2020 while California went 20/54 Trump in 2020, maybe in 2024 Texas assigns all its electors proportionally, while California assigns 49 electors proportionally and the remaining 5 by majority. That would cause the numbers to work out the same (plus or minus a rounding error).

Suppose Connecticut also wants to join the compact, but it's also a blue state. I think the obvious thing to do is to distribute the expected minority electors proportional to total elector count - if Connecticut has 7 electors, it's responsible for balancing 7/61 of the 18 minority electors that are being traded, or just about exactly 2 of them.

But the rounding is sometimes awkward - if we lived in a universe where Connecticut had 9 electors instead, it would be responsible for just about exactly 2.5 minority electors, which is super awkward especially if a lot of small states join and start accumulating rounding errors.

What you could do instead is specify a loss function: you take the variance of the proportion of electors assigned proportionally among the states that are on the 'majority' side of the deal, multiply that by a constant (probably something small like 0.05, but obviously you do some simulations and pick something more informed), add the squared rounding error of expected minority electors, and that's your measure for how imperfect the assignment of proportional electors to states is. Then you just pick the assignment that's least imperfect.

Add in some automated escape hatches in case of change of major parties, change of voting system, or being superseded by a more ambitious interstate compact, and bada bing.

q-home on Stable Pointers to Value II: Environmental Goals

I don't understand Model-Utility Learning [? · GW] (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?

Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.

But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?

If the answer to my questions is "yes", what did we even hope for with MUL?

benito on Are Your Enemies Innately Evil?

Do you think this is typical of people you know?

kave on Are Your Enemies Innately Evil?

I saw this poll and thought to myself "gosh, politics, religion and cultural opinions sure are areas where I actively try to be non-heroic, as they aren't where I wish to spend my energy".

rsaarelm on What's a good book for a technically-minded 11-year old?

Lewis Dartnell's The Knowledge - How to Rebuild Our World From Scratch is a sort of grand tour for technological underpinnings of industrial civilization and how you might bootstrap them. Might be a bit dry, but it's popular writing and if the kid's already reading encyclopedias it should fit right in. Lots of concrete details about specific technologies.

Might go for a left field option and see what he makes of Euclid's Elements.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Thanks! I don't think the arguments you make undermine my core points. Point by point reply:

--Vietnam, Hamas, etc. have dense tunnel networks but not anywhere near dense enough. My theory predicts that there will be a phase shift at some point where it is easier to attack underground than aboveground. Clearly, it is not easier for Israel or the USA to attack underground than aboveground! And this is for several reasons, but one of them is that the networks aren't dense enough -- Hamas has many tunnels but there is still more attack surface on land than underground.
--Yes, layout of tunnels is unknown to attackers. This is the thing I was referencing when I said you can't scout from the air.
--Again, with land mines and other such traps, as tunnel density increases eventually you will need more mines to defend underground than you would need to defend aboveground!!! At this point the phase shift occurs and attackers will prefer to attack underground, mines be damned -- because the mines will actually be sparser / rarer underground!
--Psychological burden is downstream of the already-discussed factors so if the above factors favor attacking underground, so will the psychological factors.
--Yes, if the density of the network is not approximately constant, such that e.g. there is a 'belt of low density' around the city, then obviously that belt is a good place to set up defenses. This is fighting my hypothetical rather than disagreeing with it though; you are saying basically 'yeah but what if it's not dense in some places, then those places would be hard to attack.' Yes. My point simply was that in place with sufficiently dense tunnel networks, underground attacks would be easier than overground attacks.