LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (12)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (15)

Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (9)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (13)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (74)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (4)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (10)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (31)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

malcolmocean on Open Thread Fall 2024

Daniel Schmachtenberger has lots of great stuff. Two pieces I recommend:

this article Higher Dimensional Thinking, the End of Paradox, and a More Adequate Understanding of Reality, which is about how just because two people disagree doesn't mean either is wrong
this Stoa video Converting Moloch from Sith to Jedi w/ Daniel Schmachtenberger, which is about races-to-the-bottom eating themselves

Also hi, welcome Sage! I dig the energy you're coming from here.

jkaufman on Is the Power Grid Sustainable?

At $1000/kW-hr it's (just barely) not worth even buying batteries to shift energy from daytime generation to night consumption, while at $700/kW-hr it definitely is worthwhile.

Doesn't this depend heavily on local utility rates, and so any discussion of crossover points should include rates? Ex: I'm at $0.33/kWh while a friend in TX is at half that.

jblack on Is the Power Grid Sustainable?

Batteries are primarily used for intra-day time shifting, not weekly. I agree that going completely off grid costs substantially more than being able to use your own generated power for 80-90% of usage. That's why I focused on the case where home owners remain grid-connected in my top-level comment:

With smart meters and cheaper home battery systems the incentives starts to shift toward wealthier solar enthusiasts buying batteries and selling excess power to the grid at peak times (or consuming it themselves), lowering peak demand at no additional capital or maintenance cost to the grid operators.

The only mention I made regarding completely off-grid power systems was about the counterfactual scenario of $150/kW-hr battery cost, which I have not assumed anywhere else. I didn't say that it would be marginally cost effective to go completely off grid with such battery prices, just that it would be substantially more cost-effective than buying all my power from the grid. The middle option of 80-90% reduced but not completely eliminated grid use is still cheaper than either of the two extremes, and likely to remain so for any feasible home energy storage system.

That's what I was referring to regarding $700 kW/hr. At $1000/kW-hr it's (just barely) not worth even buying batteries to shift energy from daytime generation to night consumption, while at $700/kW-hr it definitely is worthwhile. Do you need the calculation for that?

habryka4 on Open Thread Fall 2024

On mobile we by default use a markdown editor, so you can use markdown to format things.

codyz on What TMS is like

My sister tried TMS and said it made her ears ring. Did you experience that?

simon on No, really, it predicts next tokens.

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.

I should have asked for clarification what you meant. Literally you said "adversarial examples", but I assumed you actually meant something like backdoors.

In an adversarial example the AI produces wrong output. And usually that's the end of it. The output is just wrong, but not wrong in an optimized way, so not dangerous. Now, if an AI is sophisticated enough to have some kind of optimizer that's triggered in specific circumstances, like an agentic mask that came into existence because it was needed to predict agentically generated tokens in the training data, then it might be triggered inappropriately by some inputs. This case I would classify as a mask takeover.

In the case of direct optimization for token prediction (which I consider highly unlikely for anything near current-level AIs, but afaik might be possible), then adversarial examples, I suppose, might cause it to do some wrong optimization. I still don't think modeling this as an underlying different goal taking over is particularly helpful, since the "normal" goal is directed to what's rewarded in training - the deviation is essentially random. Also, unlike in the mask case where the mask might have goals about real-world state, there's no particular reason for the direct optimizer to have goals about real-world state (see below).

Is it more complicated? What ontological framework is this AI using to represent it's goal anyway?

Asking about the AI using an "ontological framework" to "represent" a goal is not the correct question in my view. The AI is a bunch of computations represented by particular weights. The computation might exhibit goal-directed behaviour. A better question, IMO, is "how much does it constrain the weights for it to exhibit this particular goal directed behaviour?" And here, I think it's pretty clear that a goal of arranging the world to cause next tokens to be predicted constrains the weights enormously more than a goal of predicting the next tokens, because in order to exhibit behaviour directed to that goal, the AI's weights need to implement computation that doesn't merely check what the next token is likely to be, but also assess what current data says about the world state, how different next token predictions would affect that world state, and how that would affect it's ultimate goal.

So, is the network able to tell whether or not it's in training?

The training check has no reason to come into existence in the first place under gradient descent. Of course, if the AI were to self-modify while already exhibiting goal directed behaviour, obviously it would want to implement such a training check. But I am talking about an AI trained by gradient descent. The training process doesn't just affect the AI, it literally is what creates the AI in the first place.

simon on No, really, it predicts next tokens.

Some interesting points there. The lottery ticket hypothesis does make it more plausible that side computations could persist longer if they come to exist outside the main computation.

Regarding the homomorphic encryption thing: yes, it does seem that it might be impossible to make small adjustments to the homomorphically encrypted computation without wrecking it. Technically I don't think that would be a local minimum since I'd expect the net would start memorizing the failure cases, but I suppose that the homomorphic computation combined with memorizations might be a local optimum particularly if the input and output are encrypted outside the network itself.

So I concede the point on the possible persistence of an underlying goal if it were to come to exist, though not on it coming to exist in the first place.

And there are few ways to predict next tokens, but lots of different kinds of paperclips the AI could want.

For most computations, there are many more ways for that computation to occur than there are ways for that computation to occur while also including anything resembling actual goals about the real world. Now, if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask), it only takes a small increase in complexity to have that computation apply outside the normal context. So, that's the mask takeover possibility again. Even so, no matter how small the increase in complexity, that extra step isn't likely to be reinforced in training, unless it can do self-modification or control the training environment.

anthonyc on Open Thread Fall 2024

Is there a way to access text formatting options when commenting on Android devices?

denkenberger on Is the Power Grid Sustainable?

If you have 3 days worth of storage, even if you completely discharge it in 3 days and completely charge it in the next 3 days, you would only go through about 60 cycles per year. In reality, you might get 10 full cycles per year. With interest rates and per year depreciation, typically you would only look out around 10 years, so you might get ~100 discounted full cycles. That's why it makes more sense to calculate it based on capital cost as I have done above. If you're interested in digging deeper, you can get free off grid modeling software, such as the original version of HOMER (new versions you have to pay).

Even now at $1000/kW-hr retail it's almost cost-effective here to buy batteries to time-shift energy from solar generation to time of consumption. At $700/kW-hr it would definitely be cost-effective to do daily load-shifting with the grid as a backup only for heavily cloudy days.

Please write out the calculation.

Have there been some recent advances in compressed air energy storage? The information I read 2-3 years ago did not look promising at any scale.

Aboveground compressed air energy storage (tanks) is a little cheaper than chemical batteries. But belowground large compressed air energy storage is much cheaper for days of storage, with estimates around $1 to $10 per kilowatt hour. Current large installations are in particularly favorable geology, but we already store huge amounts of natural gas seasonally in saline aquifers. So we can basically do the same thing with compressed air, though the cycling needs to be more frequent.

t3t on Habryka's Shortform Feed

(We switched back to shipping Calibri above Gill Sans Nova pending a fix for the horrible rendering on Windows, so if Ubuntu has Calibri, it'll have reverted back to the previous font.)