LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

OpenAI: Leaks Confirm the Story
Zvi · 2023-12-12T14:00:04.812Z · comments (9)

Send us example gnarly bugs
Beth Barnes (beth-barnes) · 2023-12-10T05:23:00.773Z · comments (10)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (13)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

Lying Alignment Chart
Zack_M_Davis · 2023-11-29T16:15:28.102Z · comments (17)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (9)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (61)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

[link] Are language models good at making predictions?
dynomight · 2023-11-06T13:10:36.379Z · comments (14)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

[link] The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-09T15:21:23.340Z · comments (23)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (16)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (10)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (19)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (5)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (22)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

[Valence series] 3. Valence & Beliefs
Steven Byrnes (steve2152) · 2023-12-11T20:21:30.570Z · comments (11)

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (77)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (12)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (16)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (9)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

charlie-steiner on How to put California and Texas on the campaign trail!

So a proportional vote interstate compact? :)

I like it - I think one could specify an automatic method for striking a fair bargain between states (and only include states that use that method in the bargain). Then you could have states join the compact asynchronously.

E.g. if the goal is to have the pre-campaign expected electors be the same, and Texas went 18/40 Biden in 2020 while California went 20/54 Trump in 2020, maybe in 2024 Texas assigns all its electors proportionally, while California assigns 49 electors proportionally and the remaining 5 by majority. That would cause the numbers to work out the same (plus or minus a rounding error).

Suppose Connecticut also wants to join the compact, but it's also a blue state. I think the obvious thing to do is to distribute the expected minority electors proportional to total elector count - if Connecticut has 7 electors, it's responsible for balancing 7/61 of the 18 minority electors that are being traded, or just about exactly 2 of them.

But the rounding is sometimes awkward - if we lived in a universe where Connecticut had 9 electors instead, it would be responsible for just about exactly 2.5 minority electors, which is super awkward especially if a lot of small states join and start accumulating rounding errors.

What you could do instead is specify a loss function: you take the variance of the proportion of electors assigned proportionally among the states that are on the 'majority' side of the deal, multiply that by a constant (probably something small like 0.05, but obviously you do some simulations and pick something more informed), add the squared rounding error of expected minority electors, and that's your measure for how imperfect the assignment of proportional electors to states is. Then you just pick the assignment that's least imperfect.

Add in some automated escape hatches in case of change of major parties, change of voting system, or being superseded by a more ambitious interstate compact, and bada bing.

q-home on Stable Pointers to Value II: Environmental Goals

I don't understand Model-Utility Learning [? · GW] (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?

Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.

But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?

If the answer to my questions is "yes", what did we even hope for with MUL?

benito on Are Your Enemies Innately Evil?

Do you think this is typical of people you know?

kave on Are Your Enemies Innately Evil?

I saw this poll and thought to myself "gosh, politics, religion and cultural opinions sure are areas where I actively try to be non-heroic, as they aren't where I wish to spend my energy".

rsaarelm on What's a good book for a technically-minded 11-year old?

Lewis Dartnell's The Knowledge - How to Rebuild Our World From Scratch is a sort of grand tour for technological underpinnings of industrial civilization and how you might bootstrap them. Might be a bit dry, but it's popular writing and if the kid's already reading encyclopedias it should fit right in. Lots of concrete details about specific technologies.

Might go for a left field option and see what he makes of Euclid's Elements.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Thanks! I don't think the arguments you make undermine my core points. Point by point reply:

--Vietnam, Hamas, etc. have dense tunnel networks but not anywhere near dense enough. My theory predicts that there will be a phase shift at some point where it is easier to attack underground than aboveground. Clearly, it is not easier for Israel or the USA to attack underground than aboveground! And this is for several reasons, but one of them is that the networks aren't dense enough -- Hamas has many tunnels but there is still more attack surface on land than underground.
--Yes, layout of tunnels is unknown to attackers. This is the thing I was referencing when I said you can't scout from the air.
--Again, with land mines and other such traps, as tunnel density increases eventually you will need more mines to defend underground than you would need to defend aboveground!!! At this point the phase shift occurs and attackers will prefer to attack underground, mines be damned -- because the mines will actually be sparser / rarer underground!
--Psychological burden is downstream of the already-discussed factors so if the above factors favor attacking underground, so will the psychological factors.
--Yes, if the density of the network is not approximately constant, such that e.g. there is a 'belt of low density' around the city, then obviously that belt is a good place to set up defenses. This is fighting my hypothetical rather than disagreeing with it though; you are saying basically 'yeah but what if it's not dense in some places, then those places would be hard to attack.' Yes. My point simply was that in place with sufficiently dense tunnel networks, underground attacks would be easier than overground attacks.

vladimir_nesov on Winning isn't enough

A choice can influence the reality of the situation where it could be taken. Thus a "dominated strategy" can be winning when choosing the "better possibilities" prevents the situation where you would be considering the decision from occurring. Problem statements in classical forms (such as payoff matrices of games) prohibit such considerations. In Newcomb's problem, where "winning" is a good way of looking at what's wrong with two-boxing, the issue is that the game theory way of framing possible outcomes doesn't recognize that some of the outcomes refute the situation where the outcomes are being chosen. This is clearer in examples like Transparent Newcomb. Overall behavior of an algorithm influences whether it's given the opportunity to run in the first place.

So the relevance of "winning" isn't so much about balancing the many senses of winning across the many possibilities where some winning occurs or doesn't, expected utility vs. other framings. It's more about paying attention to which possibilities are real, and whether winning in the more central senses occurs on those possibilities or not.

quetzal_rainbow on LDT (and everything else) can be irrational

It's just no free lunch theorem? For every computable decision procedure you can construct environment which predicts exact output for this decision procedure and reacts in way of maximum damage, making decision procedure to perform worse than random action selection.

scroogemcduck1 on Bets and updating

They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

scroogemcduck1 on Bets and updating

Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.