Posts

Williamstown – ACX Meetups Everywhere Spring 2025 2025-03-25T23:49:14.656Z
satchlj's Shortform 2024-11-15T18:48:59.057Z

Comments

Comment by satchlj on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-02T19:29:04.276Z · LW · GW

This doesn't make complete sense to me, but you are going down a line of thought I recognize.

There are certainly stable utility functions which, while having some drawbacks, don't result in dangerous behavior from superintelligences. Finding a good one doesn't seem all that difficult.

The real nasty challenge is how to build a superintelligence that has the utility function we want it to have. If we could do this, then we could start by choosing an extremely conservative utility function and slowly and cautiously iterate towards a balance of safe and useful.

Comment by satchlj on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-02T19:17:22.864Z · LW · GW

I've been thinking about a similar thing a lot.

Consider a little superintelligent child who always wants to eat as much candy as possible over the course of the next ten minutes. Assume the child doesn't ever care about what happens ten minutes from now.

This child won't work very hard at any instrumental goals like self improvement and conquering the world to redirect resources towards candy production, since that would be a waste of time, even though it might maximize candy consumption in the long term.

AI alignment isn't any easier here, the point of this is just to illustrate that instrumental convergence is far from given.

Comment by satchlj on VDT: a solution to decision theory · 2025-04-02T00:13:03.204Z · LW · GW

Claude says the vibes are 'inherently cursed'

But then it chooses not to pull the lever because it's 'less karmically disruptive'

Comment by satchlj on Against Yudkowsky's evolution analogy for AI x-risk [unfinished] · 2025-03-18T20:06:29.778Z · LW · GW

Note that if computing an optimization step reduces the loss, the training process will reinforce it, even if other layers aren’t doing similar steps, so this is another reason to expect more explicit optimizers.

Basically, self attention is a function of certain matrices, something like this:

Which looks really messy when you put it like this but is pretty natural in context.

If you can get the big messy looking term to approximate a gradient descent step for a given loss function, then you're golden.

In appendix A.1., they show the matrices that yield this gradient descent step. They are pretty simple, and probably an easy point of attraction to find.

All of this reasoning is pretty vague, and without the experimental evidence it wouldn't be nearly good enough. So there's definitely more to understand here. But given the experimental evidence I think this is the right story about what's going on.

Comment by satchlj on Against Yudkowsky's evolution analogy for AI x-risk [unfinished] · 2025-03-18T18:57:10.802Z · LW · GW

I think you do this post a disservice by presenting it as a failure. It had a wrong conclusion, but its core arguments are still interesting and relevant, and exploring the reasons they are wrong is very useful.

Your model of neural nets predicted the wrong thing, that's super exciting! We can improve the model now.

Comment by satchlj on Against Yudkowsky's evolution analogy for AI x-risk [unfinished] · 2025-03-18T18:51:49.136Z · LW · GW

The fundamental idea about genes having an advantage over weights at internally implementing looping algorithms is apparently wrong though (even though I don't understand how the contrary is possible...)

I've been trying to understand this myself. Here’s a the understanding I’ve come to, which is very simplistic. If someone who knows more about transformers than me says I’m wrong I will defer to them.

I used this paper to come to this understanding.

In order to have a mesa-optimizer, lots and lots of layers need to be in on the game of optimization, rather than just one or several key elements which gets referenced repeatedly during the optimization process.

But self-attention is, by default, not very far away from being one step in gradient descent. Every layer doesn't need to learn to do optimization independently from scratch, since it's relatively easy to find given the self-attention architecture.

That's why it's not forbiddingly difficult for neural networks to implement internal optimization algorithms. It still could be forbiddingly difficult for most optimization algorithms, ones that aren't easy to find from the basic architecture.

Comment by satchlj on Help make the orca language experiment happen · 2025-03-16T01:37:22.067Z · LW · GW

Why don’t I do the project myself?

Because I think I’m one of the smartest young supergeniuses, and I’m working on things that I think are even more useful in expectation, and which almost nobody except me can do.

 

Even if this is by some small chance actually true, it's stupid of you to say it, because from the perspective of your readers, you are almost certainly wrong and so you undermine your own credibility. I'm sure you were aware some people would think this, and don't care. Have you experimented with trying not to piss people off and see if it helps you?

As for your actual idea, it's cool and even if it doesn't work out we could learn some important things. Good luck!

Comment by satchlj on A Bear Case: My Predictions Regarding AI Progress · 2025-03-07T15:48:55.765Z · LW · GW

Why do you think Anthropic and OpenAI are making such bold predictions? (https://x.com/kimmonismus/status/1897628497427701961)

As I see it, one of the following is true:

  1. They agree with you but want shape the narrative away from the truth to sway investors
  2. They have mostly the same info as you but come to a different conclusion
  3. They have evidence we don't have which gives them confidence
Comment by satchlj on The case for the death penalty · 2025-02-21T16:03:08.641Z · LW · GW

Where on this planet could the USA cheaply put people instead of executing them where they

  1. Have the option to survive if they try
  2. Can't escape
  3. Can't cause harm to non-exiled people?
Comment by satchlj on satchlj's Shortform · 2025-02-17T23:52:28.480Z · LW · GW

If you haven't already, I'd recommend reading Vinge's 1993 essay on 'The Coming Technological Singularity': https://accelerating.org/articles/comingtechsingularity

He is remarkably prescient, to the point that I wonder if any really new insights into the broad problem have been made in the last 22 years since he wrote. He discusses, among other things, using humans as a base to build superintelligence on as an possible alignment strategy, as well as the problems with this approach.

Here's one quote:

Eric Drexler [...] agrees that superhuman intelligences will be available in the near future — and that such entities pose a threat to the human status quo. But Drexler argues that we can confine such transhuman devices so that their results can be examined and used safely. This is I. J. Good's ultraintelligent machine, with a dose of caution. I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself locked in your home with only limited data access to the outside, to your masters. If those masters thought at a rate — say — one million times slower than you, there is little doubt that over a period of years (your time) you could come up with "helpful advice" that would incidentally set you free. [...] 

Comment by satchlj on Change my mind: Veganism entails trade-offs, and health is one of the axes · 2024-12-17T23:11:36.868Z · LW · GW

I found this post very helpful in laying out a very good argument for weak claims that many truth seeking people with different values may be able to agree on. It clarifies a lot of the conversation about veganism so that misleading/confused arguments can be avoided.

The author says that her goal is to be clear and easy to argue with, and I think she succeeded in that goal.

Comment by satchlj on Passages I Highlighted in The Letters of J.R.R.Tolkien · 2024-11-25T19:30:03.423Z · LW · GW

Thank you so much for compiling these quotes; they are impactful and I might never have read them if you hadn't posted them here.

Comment by satchlj on satchlj's Shortform · 2024-11-15T18:48:59.299Z · LW · GW
Comment by satchlj on [Intuitive self-models] 1. Preliminaries · 2024-10-26T15:50:47.419Z · LW · GW

Your brain has a giant space of possible generative models[2] that map from underlying states of the world (e.g. “there’s a silhouette dancer with thus-and-such 3D shape spinning clockwise against a white background etc.”) to how the photoreceptor cells would send signals into the brain (“this part of my visual field is bright, that part is dark, etc.”)

 

How do you argue that the models are really implemented backwards like this in the brain?

Comment by satchlj on Arithmetic is an underrated world-modeling technology · 2024-10-22T19:12:26.708Z · LW · GW

Calculations on Hydroelectric Energy Storage

For those interested in the numbers on pumped hydroelectric storage, we can get more energy by increasing 'head' or the distance that the weight falls, from 6 meters to up to 500 meters for some of the largest projects (and we could in theory go bigger).

Let's pick a more reasonable number like 60 meters:

MASS/house = 15 kWh/house / (9.8 m/s² × 60 m) = 91,836 kg/house = 91 m^3/house

Let's say we have a dam with ~20 meters of water level fluctuation (drawdown). Then that's 5 m^2 per house of surface area.

As a sanity check, Bath County Pumped Storage Station in VA stores about 24000 MWh/ 30 KWh/house = 800,000 houses worth of energy.

800,000 houses * 5 m^2 = 4 km^2

The Bath County reservoir is about 1km^2 so we're in the right range here (the reservoir has a little more drawdown and a way bigger head).

Comment by satchlj on Edinburgh Scotland - ACX Meetups Everywhere Fall 2024 · 2024-10-12T12:54:21.433Z · LW · GW

We're going to be at Söderberg The Meadows; although the rain has stopped the world is still soaking wet.