LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

[link] I'd also take $7 trillion
bhauth · 2024-02-19T03:31:45.552Z · comments (12)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (4)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

Higher-Order Forecasts
ozziegooen · 2024-05-22T21:49:42.802Z · comments (1)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

[link] Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]
Akash (akash-wasil) · 2023-11-01T13:28:43.723Z · comments (4)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (9)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (26)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johnswentworth on Ryan Kidd's Shortform

... WOW that is not an efficient market.

johnswentworth on Trading Candy

My two siblings and I always used to trade candy after Halloween and Easter. We'd each lay out our candy on table, like a little booth, and then haggle a lot.

My memories are fuzzy, but apparently the way this most often went was that I tended to prioritize quantity moreso than my siblings, wanting to make sure that I had a stock of good candy which would last a while. So naturally, a few days later, my siblings had consumed their tastiest treats and complained that I had all the best candy. My mother then stepped in and redistributed the candy.

And that was how I became a libertarian at a very young age.

Only years later did we find out that my mother would also steal the best-looking candies after we went to bed and try to get us to blame each other, which... is altogether too on-the-nose for this particular analogy.

jblack on Is the Power Grid Sustainable?

Yes, it definitely does depend upon local conditions. For example if your grid operator uses net metering (and is reliable) then it is not worthwhile at any positive price. This statement was in regard to my disputed upstream comment "Even now at $1000/kW-hr retail it's almost cost-effective here [...]".

malcolmocean on Open Thread Fall 2024

Daniel Schmachtenberger has lots of great stuff. Two pieces I recommend:

this article Higher Dimensional Thinking, the End of Paradox, and a More Adequate Understanding of Reality, which is about how just because two people disagree doesn't mean either is wrong
this Stoa video Converting Moloch from Sith to Jedi w/ Daniel Schmachtenberger, which is about races-to-the-bottom eating themselves

Also hi, welcome Sage! I dig the energy you're coming from here.

jkaufman on Is the Power Grid Sustainable?

At $1000/kW-hr it's (just barely) not worth even buying batteries to shift energy from daytime generation to night consumption, while at $700/kW-hr it definitely is worthwhile.

Doesn't this depend heavily on local utility rates, and so any discussion of crossover points should include rates? Ex: I'm at $0.33/kWh while a friend in TX is at half that.

jblack on Is the Power Grid Sustainable?

Batteries are primarily used for intra-day time shifting, not weekly. I agree that going completely off grid costs substantially more than being able to use your own generated power for 80-90% of usage. That's why I focused on the case where home owners remain grid-connected in my top-level comment:

With smart meters and cheaper home battery systems the incentives starts to shift toward wealthier solar enthusiasts buying batteries and selling excess power to the grid at peak times (or consuming it themselves), lowering peak demand at no additional capital or maintenance cost to the grid operators.

The only mention I made regarding completely off-grid power systems was about the counterfactual scenario of $150/kW-hr battery cost, which I have not assumed anywhere else. I didn't say that it would be marginally cost effective to go completely off grid with such battery prices, just that it would be substantially more cost-effective than buying all my power from the grid. The middle option of 80-90% reduced but not completely eliminated grid use is still cheaper than either of the two extremes, and likely to remain so for any feasible home energy storage system.

That's what I was referring to regarding $700 kW/hr. At $1000/kW-hr it's (just barely) not worth even buying batteries to shift energy from daytime generation to night consumption, while at $700/kW-hr it definitely is worthwhile. Do you need the calculation for that?

habryka4 on Open Thread Fall 2024

On mobile we by default use a markdown editor, so you can use markdown to format things.

codyz on What TMS is like

My sister tried TMS and said it made her ears ring. Did you experience that?

simon on No, really, it predicts next tokens.

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.

I should have asked for clarification what you meant. Literally you said "adversarial examples", but I assumed you actually meant something like backdoors.

In an adversarial example the AI produces wrong output. And usually that's the end of it. The output is just wrong, but not wrong in an optimized way, so not dangerous. Now, if an AI is sophisticated enough to have some kind of optimizer that's triggered in specific circumstances, like an agentic mask that came into existence because it was needed to predict agentically generated tokens in the training data, then it might be triggered inappropriately by some inputs. This case I would classify as a mask takeover.

In the case of direct optimization for token prediction (which I consider highly unlikely for anything near current-level AIs, but afaik might be possible), then adversarial examples, I suppose, might cause it to do some wrong optimization. I still don't think modeling this as an underlying different goal taking over is particularly helpful, since the "normal" goal is directed to what's rewarded in training - the deviation is essentially random. Also, unlike in the mask case where the mask might have goals about real-world state, there's no particular reason for the direct optimizer to have goals about real-world state (see below).

Is it more complicated? What ontological framework is this AI using to represent it's goal anyway?

Asking about the AI using an "ontological framework" to "represent" a goal is not the correct question in my view. The AI is a bunch of computations represented by particular weights. The computation might exhibit goal-directed behaviour. A better question, IMO, is "how much does it constrain the weights for it to exhibit this particular goal directed behaviour?" And here, I think it's pretty clear that a goal of arranging the world to cause next tokens to be predicted constrains the weights enormously more than a goal of predicting the next tokens, because in order to exhibit behaviour directed to that goal, the AI's weights need to implement computation that doesn't merely check what the next token is likely to be, but also assess what current data says about the world state, how different next token predictions would affect that world state, and how that would affect it's ultimate goal.

So, is the network able to tell whether or not it's in training?

The training check has no reason to come into existence in the first place under gradient descent. Of course, if the AI were to self-modify while already exhibiting goal directed behaviour, obviously it would want to implement such a training check. But I am talking about an AI trained by gradient descent. The training process doesn't just affect the AI, it literally is what creates the AI in the first place.

simon on No, really, it predicts next tokens.

Some interesting points there. The lottery ticket hypothesis does make it more plausible that side computations could persist longer if they come to exist outside the main computation.

Regarding the homomorphic encryption thing: yes, it does seem that it might be impossible to make small adjustments to the homomorphically encrypted computation without wrecking it. Technically I don't think that would be a local minimum since I'd expect the net would start memorizing the failure cases, but I suppose that the homomorphic computation combined with memorizations might be a local optimum particularly if the input and output are encrypted outside the network itself.

So I concede the point on the possible persistence of an underlying goal if it were to come to exist, though not on it coming to exist in the first place.

And there are few ways to predict next tokens, but lots of different kinds of paperclips the AI could want.

For most computations, there are many more ways for that computation to occur than there are ways for that computation to occur while also including anything resembling actual goals about the real world. Now, if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask), it only takes a small increase in complexity to have that computation apply outside the normal context. So, that's the mask takeover possibility again. Even so, no matter how small the increase in complexity, that extra step isn't likely to be reinforced in training, unless it can do self-modification or control the training environment.