tailcalled

(Not sure if by "runtime" you mean "time spent running" or "memory/program state during the running time" (or something else)? I was imagining memory/program state in mine, though that is arguably a simplification since the ultimate goal is probably something to do with the business.)

Comment by tailcalled on Utility Maximization = Description Length Minimization · 2025-04-20T16:42:31.894Z · LW · GW

Potentially challenging example: let's say there's a server that's bottlenecked on some poorly optimized algorithm, and you optimize it to be less redundant, freeing resources that immediately gets used for a wide range of unknown but presumably good tasks.

Superficially, this seems like an optimization that increased the description length. I believe the way this is solved in the OP is that the distributions are constructed in order to assign an extra long description length to undesirable states, even if these undesirable states are naturally simpler and more homogenous than the desirable ones.

I am quite suspicious that this risks you end up with improper probability distributions. Maybe that's OK.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-20T11:06:37.748Z · LW · GW

Writing the part that I didn't get around to yesterday:

You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It'd be a massive technical challenge of course, because atoms don't really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.

This doesn't really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can't assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.

To reverse-engineer people in order to make AI, you'd instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.

However, there's just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there's lots of reason to think humans are primarily adapted to those.

One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.

The above is similar to how we don't worry so much about 'website misalignment' because generally there's a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn't have to be true, in the sense that there are many short programs with behavior that's not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don't know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.

(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won't lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)

Comment by tailcalled on johnswentworth's Shortform · 2025-04-19T22:18:15.718Z · LW · GW

After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!

I've grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it's much more powerful than individual intelligence (whether natural or artificial).

Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn't meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).

Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution's information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.

(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, ... . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there's also often subniches.)

And then obviously beyond these points, individual intelligence and evolution focus on different things - what's happening recently vs what's happened deep in the past. Neither are perfect; society has changed a lot, which renders what's happened deep in the past less relevant than it could have been, but at the same time what's happening recently (I argue) intrinsically struggles with rare, powerful factors.

If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.

Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don't have any good way of knowing which of these are the important ones or not.

You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)

The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate "small-scale" understanding (like an autoregressive convolutional model to predict next time given previous time) into "large-scale" understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I've studied a bunch of different approaches for that, and ultimately it doesn't really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)

If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.

First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn't develop.

Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don't want money tied up into durability or strength that you're not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent - and as a consequence, those people would then gain more agency.)

Also, I do get the impression you are overestimating the feasibility of "“durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern". I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it's relatively far from falling naturally out of the methods.

One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.

(I should maybe write more but it's past midnight and also I guess I wonder how you'd respond to this.)

Comment by tailcalled on johnswentworth's Shortform · 2025-04-19T18:33:15.724Z · LW · GW

If there's some big object, then it's quite possible for it to diminish into a large number of similar obstacles, and I'd agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.

However, my assertion wasn't that intelligence cannot handle almost all obstacles, it was that consequentialism can't convert intelligence into powerful agency. It's enough for there to be rare powerful obstacles in order for this to fail.

Comment by tailcalled on The Russell Conjugation Illuminator · 2025-04-18T17:12:36.050Z · LW · GW

"Stupidly obstinate" is a root-cause analysis of obstinate behavior. Like an alternative root cause might be conflict, for instance.

At first glance, your linked document seems to match this. The herald who calls the printer "pig-headed" does so in direct connection with calling him "dull", which at least in modern terms would be considered a way of calling him stupid? Or maybe I'm missing some of the nuances due to not knowing the older terms/not reading your entire document?

Comment by tailcalled on The Russell Conjugation Illuminator · 2025-04-18T10:22:04.749Z · LW · GW

"Pigheaded" is not a description of behavior, it's a proposed root cause analysis. The idea is that pigs are dumb so if someone has a head (brain) like a pig, they might do dumb things.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-17T15:05:02.568Z · LW · GW

It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-17T09:10:21.126Z · LW · GW

I mean, we exist and we are at least somewhat intelligent, which implies strong upper bound on heterogenity of environment.

We don't just use intelligence.

On the other hand, words like "durability" imply possibility of categorization, which itself implies intelligence. If environment is sufficiently heterogenous, you are durable at one second and evaporate at another.

???

Vaporization is prevented by outer space which drains away energy.

Not clear why you say durability implies intelligence, surely trees are durable without intelligence.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-16T21:11:50.513Z · LW · GW

Right, that’s what I was gonna say. You need intelligence to sort out which traditions should be copied and which ones shouldn’t.

I think the necessity of intelligence for tradition exists on a much more fundamental level than that. Intelligence allows people to from an extremely rich model of the world with tons of different concepts. If one had no intelligence at all, one wouldn't even be able to copy the traditions. Like consider a collection of rocks or a forest; it can't pass any tradition onto itself.

But conversely, just as intelligence cannot be converted into powerful agency, I don't think it can be used to determine which traditions should be copied and which ones shouldn't.

There was a 13-billion-year “tradition” of not building e-commerce megastores, but Jeff Bezos ignored that “tradition”, and it worked out very well for him (and I’m happy about it too). Likewise, the Wright Brothers explicitly followed the “tradition” of how birds soar, but not the “tradition” of how birds flap their wings.

It seems to me that you are treating any variable attribute that's highly correlated across generations as a "tradition", to the point where not doing something is considered on the same ontological level as doing something. That is the sort of ontology that my LDSL series is opposed to.

I'm probably not the best person to make the case for tradition as (despite my critique of intelligence) I'm still a relatively strong believer in equillibration and reinvention.

I feel like I’ve lost the plot here. If you think there are things that are very important, but rare in the training data, and that LLMs consequently fail to learn, can you give an example?

Whenever there's any example of this that's too embarrassing or too big of an obstacle for applying them in a wide range of practical applications, a bunch of people point it out, and they come up with a fix that allows the LLMs to learn it.

The biggest class of relevant examples would all be things that never occur in the training data - e.g. things from my job, innovations like how to build a good fusion reactor, social relationships between the world's elites, etc.. Though I expect you feel like these would be "cheating", because it doesn't have a chance to learn them?

The things in question often aren't things that most humans have a chance to learn, or even would benefit from learning. Often it's enough if just 1 person realizes and handles them, and alternately often if nobody handles them then you just lose whatever was dependent on them. Intelligence is a universal way to catch on to common patterns; other things than common patterns matter too, but there's no corresponding universal solution.

I guess you’re using “empirical data” in a narrow sense. If Joe tells me X, I have gained “empirical data” that Joe told me X. And then I can apply my intelligence to interpret that “data”. For example, I can consider a number of hypotheses: the hypothesis that Joe is correct and honest, that Joe is mistaken but honest, that Joe is trying to deceive me, that Joe said Y but I misheard him, etc. And then I can gather or recall additional evidence that favors one of those hypotheses over another. I could ask Joe to repeat himself, to address the “I misheard him” hypothesis. I could consider how often I have found Joe to be mistaken about similar things in the past. I could ask myself whether Joe would benefit from deceiving me. Etc.
This is all the same process that I might apply to other kinds of “empirical data” like if my car was making a funny sound. I.e., consider possible generative hypotheses that would match the data, then try to narrow down via additional observations, and/or remain uncertain and prepare for multiple possibilities when I can’t figure it out. This is a middle road between “trusting people blindly” versus “ignoring everything that anyone tells you”, and it’s what reasonable people actually do. Doing that is just intelligence, not any particular innate human tendency—smart autistic people and smart allistic people and smart callous sociopaths etc. are all equally capable of traveling this middle road, i.e. applying intelligence towards the problem of learning things from what other people say.
(For example, if I was having this conversation with almost anyone else, I would have quit, or not participated in the first place. But I happen to have prior knowledge that you-in-particular have unusual and well-thought-through ideas, and even they’re wrong, they’re often wrong in very unusual and interesting ways, and that you don’t tend to troll, etc.)

You ran way deeper into the "except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception" point than I meant you to. My main point is that humans have grounding on important factors that we've acquired through non-intelligence-based means. I bring up the possibility of copying other's conclusions because for many of those factors, LLMs still have access to this via copying them.

It might be helpful to imagine what it would look like if LLMs couldn't copy human insights. For instance, imagine if there was a planet with life much like Earth's, but with no species that were capable of language. We could imagine setting up a bunch of cameras or other sensors on the planet and training a self-supervised learning algorithm on them. They could surely learn a lot about the world that way - but it also seems like they would struggle with a lot of things. The exact things they would struggle with might depend a lot on how much prior your build into the algorithm, and how dynamic the sensors are, and whether there's also ways for it to perform interventions upon the planet. But for instance even recognizing the continuity of animal lives as they wander off the screen would either require a lot of prior knowledge built in to the algorithm, or a very powerful learning algorithm (e.g. Solomonoff induction can use a simplicity prior to infer that there must be an entire planet full of animals off-screen, but that's computationally intractable).

(Also, again you still need to distinguish between "Is intelligence a useful tool for bridging lots of common gaps that other methods cannot handle?" vs "Is intelligence sufficient on its own to detect deception?". My claim is that the the answer to the former is yes and the latter is no. To detect deception, you don't just use intelligence but also other facets of human agency.)

I feel like I’m misunderstanding you somehow. You keep saying things that (to me) seem like you could equally well argue that humans cannot possibly survive in the modern world, but here we are. Do you have some positive theory of how humans survive and thrive in (and indeed create) historically-unprecedented heterogeneous environments?

First, some things that might seem like nitpicks but are moderately important to my position:

In many ways, our modern world is much less heterogeneous than the past. For instance thanks to improved hygeine, we are exposed to far fewer diseases, and thanks to improved policing/forensics, we are exposed to much less violent crime. International trade allows us to average away troubles with crop failures. While distribution shifts generically should make it harder for humans to survive, they can (especially if made by humans) make it easier to survive.
Humans do not in fact survive; our average lifespan is less than 100 years. Humanity as a species survives by birthing, nurturing, and teaching children, and by collaborating with each other. My guess would be that aging is driven to a substantial extent by heterogeneity (albeit perhaps endogenous heterogeneity?) that hasn't been protected against. (I'm aware of John Wentworth's 'gears of aging' series arguing that aging has a common cause, but I've come to think that his arguments don't sufficiently much distinguish between 'is eventually mediated by a common cause' vs 'is ultimately caused by a common cause'. By analogy, computer slowdowns may be said to be attributable to a small number of causes like CPU exhaustion, RAM exhaustion, network bandwidth exhaustion, etc., but these are mediators and the root causes will typically be some particular program that is using up those resources, and there's a huge number of programs in the world which could be to blame depending on the case.)
We actually sort of are in a precarious situation? The world wars were unprecedentedly bloody. They basically ended because of the invention of nukes, which are so destructive that we avoid using them in war. But I don't think we actually have a robust way to avoid that?

But more fundamentally, my objection to this question is that I doubt the meaningfulness of a positive theory of how humans survive and thrive. "Intelligence" and "consequentialism" are fruitful explanations of certain things because they can be fairly-straightforwardly constructed, have fairly well-characterizable properties, and even can be fairly well-localized anatomically in humans (e.g. parts of the brain).

Like one can quibble with the details of what counts as intelligence vs understanding vs consequentialism, but under the model where intelligence is about the ability to make use of patterns, you can hand a bunch of data to computer scientists and tell them to get to work untangling the patterns, and then it turns out there are some fairly general algorithms that can work on all sorts of datasets and patterns. (I find it quite plausible that we've already "achieved superhuman intelligence" in the sense that if you give both me and a transformer a big dataset that neither of us are pre-familiar with to study through, then (at least for sufficiently much data) eventually the transformer will clearly outperform me at predicting the next token.) And probably these fairly general algorithms are probably more-or-less the same sort of thing that much of the human brain is doing.

Thus "intelligence" factors out relatively nicely as a concept that can be identified as a major contributor to human success (I think intelligence is the main reason humans outperformed other primates). But this does not mean that the rest of human success can equally well be factored out into a small number of nicely attributable and implementable concepts. (Like, some of it probably can, but there's not as much reason to presume that all of it can. "Durability" and "strength" are examples of things that fairly well can, and indeed we have definitely achieved far-superhuman strength. These are purely physical though, whereas a lot of the important stuff has a strong cognitive element to it - though I suspect it's not purely cognitive...)

Comment by tailcalled on johnswentworth's Shortform · 2025-04-16T12:19:53.088Z · LW · GW

Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns,

I guess to add, I'm not talking about unknown unknowns. Often the rare important things are very well known (after all, they are important, so people put a lot of effort into knowing them), they just can't efficiently be derived from empirical data (except essentially by copying someone else's conclusion blindly, and that leaves you vulnerable to deception).

Comment by tailcalled on johnswentworth's Shortform · 2025-04-16T12:03:53.342Z · LW · GW

Dunno if anything’s changed since 2023, but this says LLMs learn things they’ve seen exactly once in the data.

I don't have time to read this study in detail until later today, but if I'm understanding it correctly, the study isn't claiming that neural networks will learn rare important patterns in the data, but rather that they will learn rare patterns that they were recently trained on. So if you continually train on data, you will see a gradual shift towards new patterns and forgetting old ones.

I can vouch that you can ask LLMs about things that are extraordinarily rare in the training data—I’d assume well under once per billion tokens—and they do pretty well. E.g. they know lots of random street names.

Random street names aren't necessarily important though? Like what would you do with them?

Humans successfully went to the moon, despite it being a quite different environment that they had never been in before. And they didn’t do that with “durability, strength, healing, intuition, tradition”, but rather with intelligence.

I didn't say that intelligence can't handle different environments, I said it can't handle heterogenous environments. The moon is nearly a sterile sphere in a vacuum; this is very homogenous, to the point where pretty much all of the relevant patterns can be found or created on Earth. It would have been more impressive if e.g. the USA could've landed a rocket with a team of Americans in Moscow than on the moon.

Also people did use durability, strength, healing, intuition and tradition to go the moon. Like with strength, someone had to build the rockets (or build the machines which built the rockets). And without durability and healing, they would have been damaged too much in the process of doing that. Intuition and healing are harder to clearly attribute, but they're part of it too.

Speaking of which, one can apply intelligence towards the problem of being resilient to unknown unknowns, and one would come up with ideas like durability, healing, learning from strategies that have stood the test of time (when available), margins of error, backup systems, etc.

Learning from strategies that stood the test of time would be tradition moreso than intelligence. I think tradition requires intelligence, but it also requires something else that's less clear (and possibly not simple enough to be assembled manually, idk).

Margins of error and backup systems would be, idk, caution? Which, yes, definitely benefit from intelligence and consequentialism. Like I'm not saying intelligence and consequentialism are useless, in fact I agree that they are some of the most commonly useful things due to the frequent need to bypass common obstacles.

Comment by tailcalled on Alexander Gietelink Oldenziel's Shortform · 2025-04-16T10:48:04.202Z · LW · GW

People read more into this shortform than I intended. It is not a cryptic reaction, criticism, or reply to/of another post.

Ah, fair enough! I just thought given the timing, it might be that you had seen my post and thought a bit about the limitations of intelligence.

I don't know what you mean by intelligent [pejorative] but it sounds sarcarcastic.

The reason I call it intelligent is: Intelligence is the ability to make use of patterns. If one was to look for patterns in intelligent political forecasting and archaeology, or more generally patterns in the application of intelligence and in the discussion of the limitations of intelligence, then what you've written is a sort of convergent outcome.

It's [perjorative] because it's bad.

To be clear, the low predictive efficiency is not a dig at archeology. It seems I have triggered something here.
Whether a question/domain has low or high (marginal) predictive effiency is not a value judgement, just an observation.

I mean I'm just highlighting it here because I thought it was probably a result of my comments elsewhere and if so I wanted to ping that it was the opposite of what I was talking about.

If it's unrelated then... I don't exactly want to say "carry on" because I still think it's bad, but I'm not exactly sure where to begin or how you ended up with this line of inquiry, so I don't exactly have much to comment on.

Comment by tailcalled on Human-level is not the limit · 2025-04-16T10:34:05.792Z · LW · GW

People sometimes argue that AI can’t become very powerful because the limiting factor in human success is something else than intelligence — perhaps determination or courage.

We live in something shockingly close to a post-scarcity society, at least if you come from a wealthy country like Denmark or the US. There are some limiting factors to success - e.g. if you get run over by a bus. But by and large You Can Just Do Things.

But even if they’re right, the same processes that will increase the intelligence of future systems can also increase their determination and courage, and any other cognitive trait that matters.

Mathematically impossible. If X matters then so does -X, but any increase in X corresponds to a decrease in -X.

In particular the easiest way to become richer is to own something that very predictably gives profit, since some people want to store value, and that makes them want to buy things that are as robustly producing profit as possible, so they bid the price of those things all the way up to where the profit is infinitesimal relative to it. But courage interferes with running a predictably profitable venture (because sometimes you charge into a loss), so this leads to a selection for reductions in courage, rather than increases in courage.

Comment by tailcalled on Alexander Gietelink Oldenziel's Shortform · 2025-04-16T10:13:43.283Z · LW · GW

Dunno if this is meant to be inspired by/a formalization of [my previous position against intelligence](https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc). But if it is meant to be inspired by it, I just want to flag/highlight that this is the opposite of my position because I'd say intelligence does super well on this hypothetical task because it can just predict 50/50 and be nearly optimal. (Which doesn't imply low marginal return to intelligence because then you could go apply the intelligence to other tasks.) I also think it is extremely intelligent [perjorative] of you to say that this sort of thing is common in archaeology and political forecasting.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-16T09:15:48.051Z · LW · GW

Nah, there are other methods than intelligence for survival and success. E.g. durability, strength, healing, intuition, tradition, ... . Most of these developed before intelligence did.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-16T07:30:21.575Z · LW · GW

I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.

This I'd dispute. If your model if underparameterized (which I think is true for the typical model?), then it can't learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can't learn any pattern that never occurs in the data.

I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.

I'm saying that intelligence is the thing that allows you to handle patterns. So if you've got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.

Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that's meant to build an understanding of actions based on data or experience.

One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they're not really independent of each other. Either way that just seems like a small labelling thing to me.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-15T21:57:14.052Z · LW · GW

(IMO this is kinda unrelated to the OP, but I want to continue this thread.)

I think it's quite related to the OP. If a field is founded on a wrong assumption, then people only end up working in the field if they have some sort of blind spot, and that blind spot leads to their work being fake.

Have you elaborated on this anywhere?

Not hugely. One tricky bit is that it basically ends up boiling down to "the original arguments don't hold up if you think about them", but the exact way they don't hold up depends on what the argument is, so it's kind of hard to respond to in general.

Perhaps you missed it, but some guy in 2022 wrote this great post which claimed that “Consequentialism, broadly defined, is a general and useful way to develop capabilities.” ;-)

Haha! I think I mostly still stand by the post. In particular, "Consequentialism, broadly defined, is a general and useful way to develop capabilities." remains true; it's just that intelligence relies on patterns and thus works much better on common things (which must be small, because they are fragments of a finite world), than on rare things (which can be big, though don't have to). This means that consequentialism isn't very good at developing powerful capabilities unless it works in an environment that has already been highly filtered to be highly homogenous, because an inhomogenous environment is going to BTFO the intelligence.

(I'm not sure I stand 101% by my post; there's some funky business about how to count evolution that I still haven't settled on yet. And I was too quick to go from "imitation learning isn't going to lead to far-superhuman abilities" to "consequentialism is the road to far-superhuman abilities". But yeah I'm actually surprised at how well I stand by my old view despite my massive recent updates.)

I’m actually just in the course of writing something about why “consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency” … maybe I can send you the draft for criticism when it’s ready?

Sounds good!

Comment by tailcalled on A Dissent on Honesty · 2025-04-15T08:46:42.608Z · LW · GW

Actually, even if your personality is good enough, you should probably still pretend to be Flynn Rider, because his personality is better. It was, after all, carefully designed by a crack team of imagineers. Was yours? Didn't think so.

Personalities don't just fall into a linear ranking from worse to better.

Imagineers' job isn't to design a good personality for a friendless nerd, it's to come up with children's stories that inspire and entertain parents and which they proudly want their children to consume.

The parents think they should try to balance the demands of society with the needs of their children by teaching their children to scam the surrounding society but being honest about the situation with their loved ones. Disney is assisting the parents with producing propaganda/instructions for it.

https://benjaminrosshoffman.com/guilt-shame-and-depravity/

Basing your life on scamming society is a bad idea but you shouldn't solve it by also trying to scam your loved ones. If you are honest, you can more easily collaborate with others to figure out what is needed and how you can contribute and what you want.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-15T07:18:01.877Z · LW · GW

I'm not trying to present johnswentworth's position, I'm trying to present my position.

Comment by tailcalled on johnswentworth's Shortform · 2025-04-14T19:48:40.851Z · LW · GW

The entire field is based on fears that consequentialism provides an extremely powerful but difficult-to-align method of converting intelligence into agency. This is basically wrong. Yes, people attempt to justify it with coherence theorems, but obviously you can be approximately-coherent/approximately-consequentialist and yet still completely un-agentic, so this justification falls flat. Since the field is based on a wrong assumption with bogus justification, it's all fake.

Comment by tailcalled on Slopworld 2035: The dangers of mediocre AI · 2025-04-14T14:19:12.431Z · LW · GW

The big picture is plausible but one major error you make is assuming "academics" will be a solid bastion of opposition. My understanding is that academics are often some of the first ones to fall (like when teachers struggle with students who use ChatGPT to cheat on homework), and many of the academic complaints about AI are just as slop-y as what the AI produces.

Comment by tailcalled on Eli's shortform feed · 2025-04-14T11:00:16.039Z · LW · GW

Maybe someone who believes in following the will of the majority even if he/she disagrees (and could easily become a dictator)?

Do you mean "resigns from a presidential position/declines a dictatorial position because they disagree with the will of the people" or "makes policy they know will be bad because the people demand it"?

Maybe a good parent who listens to his/her child's dreams?

Can you expand on this?

Comment by tailcalled on Eli's shortform feed · 2025-04-14T10:51:53.045Z · LW · GW

Can you give 1 example of a person choosing to be corrigible to someone they are not dependent upon for resources/information and who they have much more expertise than?

Comment by tailcalled on Richard Ngo's Shortform · 2025-04-14T10:44:19.780Z · LW · GW

I feel like "evil" and "corruption" mean something different.

Corruption is about selfish people exchanging their power within a system for favors (often outside the system) when they're not supposed to according to the rules of the system. For example a policeman taking bribes. It's something the creators/owners of the system should try to eliminate, but if the system itself is bad (e.g. Nazi Germany during the Holocaust), corruption might be something you sometimes ought to seek out instead of to avoid, like with Schindler saving his Jews.

"Evil" I've in the past tended to take to take to refer to a sort of generic expression of badness (like you might call a sadistic sexual murderer evil, and you might call Hitler evil, and you might call plantation owners evil, but this has nothing to do with each other), but that was partly due to me naively believing that everyone is "trying to be good" in some sense. Like if I had to define evil, I would have defined it as "doing bad stuff for badness's sake, the inversion of good, though of course nobody actually is like that so it's only really used hyperbolically or for fictional characters as hyperstimuli".

But after learning more about morality, there seem to be multiple things that can be called "evil":

Antinormativity (which admittedly is pretty adjacent to corruption, like if people are trying to stop corruption, then the corruption can use antinormativity to survive)
Coolness, i.e. countersignalling against goodness-hyperstimuli wielded by authorities, i.e. demonstrating an ability and desire to break the rules
People who hate great people cherry-picking unfortunate side-effects of great people's activities to make good people think that the great people are conspiring against good people and that they must fight the great people
Leaders who commit to stopping the above by selecting for people who do bad stuff to prove their loyalty to those leaders (think e.g. the Trump administration)

I think "evil" is sufficiently much used in the generic sense that it doesn't make sense to insist that any of the above are strictly correct. However if it's just trying to describe someone who might unpredictably do something bad then I think I'd use words like "dangerous" or "creepy", and if it's just trying to describe someone who carries memes that would unpredictably do something bad then I think I'd use words like "brainworms" (rather than evil).

Comment by tailcalled on Eli's shortform feed · 2025-04-14T08:51:44.192Z · LW · GW

If the AI can't do much without coordinating with a logistics and intelligence network and collaborating with a number of other agents, and its contact to this network routes through a commanding agent that is as capable if not more capable than the AI itself, then sure, it may be relatively feasible to make the AI corrigible to said commanding agent, if that is what you want it to be.

(This is meant to be analogous to the soldier-commander example.)

But was that the AI regime you expect to find yourself working with? In particular I'd expect you expect that the commanding agent would be another AI, in which case being corrigible to them is not sufficient.

Comment by tailcalled on Eli's shortform feed · 2025-04-14T08:29:21.484Z · LW · GW

Discriminating on the basis of the creators vs a random guy on the street helps with many of the easiest cases, but in an adversarial context, it's not enough to have something that works for all the easiest cases, you need something that can't predictably made to fail by a highly motivated adversary.

Like you could easily do some sort of data augmentation to add attempts at invoking the corrigibility system from random guys on the street, and then train it not to respond to that. But there'll still be lots of other vulnerabilities.

Comment by tailcalled on Eli's shortform feed · 2025-04-14T07:07:16.563Z · LW · GW

Let's say you are using the AI for some highly sensitive matter where it's important that it resists prompt-hacking - e.g. driving a car (prompt injections could trigger car crashes), something where it makes financial transactions on the basis of public information (online websites might scam it), or military drones (the enemy might be able to convince the AI to attack the country that sent it).

A general method for ensuring corrigibility is to be eager to follow anything instruction-like that you see. However, this interferes with being good at resisting prompt-hacking.

Comment by tailcalled on What is autism? · 2025-04-12T18:36:23.606Z · LW · GW

My current best guess is that:

Like for most other concepts, we don't have rigorous statistics and measurements showing that there is a natural clustering of autism symptoms, (there are some non-rigorous ones though)
When various schools of psychotherapy, psychiatry and pediatrics sorted children with behavioral issues together, they often ended up with an autistic group,
Each school has their own diagnosis on what exactly is wrong in the case of autism, and presumably they aren't all correct about all autistic people, so to know the True Reason autism is "a thing", you'd first have to figure out which school is correct in its analysis of autism,
"Autism" as a concept exists because the different schools mostly agreed that the kids in question had a similar pathology, even if they disagreed on what the pathology is.

Comment by tailcalled on Why are neuro-symbolic systems not considered when it comes to AI Safety? · 2025-04-11T10:24:54.979Z · LW · GW

https://www.lesswrong.com/posts/gebzzEwn2TaA6rGkc/deep-learning-systems-are-not-less-interpretable-than-logic

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-04T11:20:14.888Z · LW · GW

The assumption of virtue ethics isn't that virtue is unknown and must be discovered - it's that it's known and must be pursued.

If it is known, then why do you not ever answer my queries about providing an explicit algorithm for converting intelligence into virtuous agency, instead running in circles about how There Must Be A Utility Function!?

If the virtuous action, as you posit, is to consume ice cream, intelligence would allow an agent to acquire more ice cream, eat more over time by not making themselves sick, etc.

I'm not disagreeing with this, I'm saying that if you apply the arguments which show that you can fit a utility function to any policy to the policies that turn down some ice cream, then as you increase intelligence and that increases the pursuit of ice cream, the resulting policies will score lower on the utility function which values turning down ice cream.

But any such decision algorithm, for a virtue ethicist, is routing through continued re-evaluation of whether the acts are virtuous, in the current context, not embracing some farcical LDT version of needing to pursue ice cream at all costs. Your assumption, which is evidently that the entire thing turns into a compressed and decontextualized utility function ("algorithm") is ignoring the entire hypothetical.

You're the one who said that virtue ethics implies a utility function! I didn't say anything about it being compressed and decontextualized, except as a hypothetical example of what virtue ethics is because you refused to provide an implementation of virtue ethics and instead require abstracting over it.

I'm not interested in continuing this conversation until you stop strawmanning me.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-04T07:37:20.801Z · LW · GW

No, that's not my argument.

Let's imagine that True Virtue is seeking and eating ice cream, but that you don't know what true virtue is for some reason.

Now let's imagine that we have some algorithm for turning intelligence into virtuous agency. (This is not an assumption that I'm willing to grant (since you haven't given something like argmax for virtue), and really that's the biggest issue with my proposal, but let's entertain it to see my point.)

If the algorithm is run on the basis of some implementation of intelligence that is not good enough, then the resulting agent might turn down some opportunities to get ice cream, by mistake, and instead do something else, such as pursue money (but less money than you could get the ice cream for). As a result of this, you would conclude that pursuing ice cream is not virtuous, or at least, not as virtuous as pursuing money.

If you then turn up the level of intelligence, the resulting agent would pursue ice cream in this situation where it previously pursued virtue. However, this would make it score worse on your inferred utility function where pursuing money is more virtuous than pursuing intelligence.

Now of course you could say that your conclusion that pursuing ice cream is less virtuous than pursuing money is wrong. But then you can only say that if you grant that you cannot infer a virtue-ethical utility function from a virtue-ethical policy, as this utility function was inferred from the policy.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-04T05:49:33.006Z · LW · GW

I didn't say you need to understand what an argument is, I said you need to understand your own argument.

It is true that if the utility functions cover a sufficiently broad set of possibilities, any "reasonable" policy (for a controversial definition of "reasonable") maximizes a utility function, and if the utility functions cover an even broader set of possibilities, literally any policy maximizes a utility function.

But, if you want to reference these facts, you should know why they are true. For instance, here's a rough sketch of a method for finding a utility function for the first statement:

If you ask a reasonable policy to pick between two options, it shouldn't have circular preferences, so you should be able to offer it different options and follow the preferred one until you find the absolute best scenario according to the policy. Similarly, you should be able to follow the dispreferred one until you find the absolute worst scenario according to the policy. Then you can define the utility of any outcome based on the probability mixture of the best and worst scenario where the policy switches between preferring the outcome vs preferring the probability mixture.

Now let's say there's an option where e.g. you're not smart enough to realize that option gives you ice cream. Then you won't be counting the ice cream when you decide at what threshold you prefer that option to the mixture. But then that means the induced utility function won't include the preference for ice cream.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-03T22:54:03.515Z · LW · GW

I'm showing that the assumptions necessary for your argument don't hold, so you need to better understand your own argument.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-03T05:56:55.947Z · LW · GW

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-03T05:18:15.659Z · LW · GW

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-02T12:34:49.072Z · LW · GW

I didn't claim virtue ethics says not to predict consequences of actions. I said that a virtue is more like a procedure than it is like a utility function. A procedure can include a subroutine predicting the consequences of actions and it doesn't become any more of a utility function by that.

The notion that "intelligence is channeled differently" under virtue ethics requires some sort of rule, like the consequentialist argmax or Bayes, for converting intelligence into ways of choosing.

Comment by tailcalled on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-02T07:27:18.801Z · LW · GW

Consequentialism is an approach for converting intelligence (the ability to make use of symmetries to e.g. generalize information from one context into predictions in another context or to e.g. search through highly structured search spaces) into agency, as one can use the intelligence to predict the consequences of actions and find a policy which achieves some criterion unusually well.

While it seems intuitively appealing that non-consequentialist approaches could be used to convert intelligence into agency, I have tried a lot and not been able to come up with anything convincing. For virtues in particular, I would intuitively think that a virtue is not a motivator per se, but rather the policy generated by the motivator. So I think virtue-driven AI agency just reduces to ordinary programming/GOFAI, and that there's no general virtue-ethical algorithm to convert intelligence into agency.

The most straightforward approach to programming a loyal friend would be to let the structure of the program mirror the structure^[1] of the loyal friendship. That is, you would think of some situation that a loyal friend might encounter, and write some code that detects and handles this situation. Having a program whose internal structure mirrors its external behavior avoids instrumental convergence (or any kind of convergence) because each behavior is specified separately and one can make arbitrary exceptions as one sees fit. However, it also means that the development and maintenance burden scales directly with how many situations the program generalizes to.

^{^}
This is the "standard" way to write programs - e.g. if you make a SaaS app, you often have template files with a fairly 1:1 correspondence to the user interface, database columns with a 1:many correspondence to the user interface fields, etc.. By contrast, a chess bot that does a tree search does not have a 1:1 correspondence between the code and the plays; for instance the piece value table does not clearly affect it's behavior in any one situation, but obviously kinda affects its behavior in almost all situations. (I don't think consequentialism is the only way for the structure of a program to not mirror the structure of its behavior, but it's the most obvious way.)

Comment by tailcalled on Latent variables for prediction markets: motivation, technical guide, and design considerations · 2025-03-25T15:04:39.027Z · LW · GW

Not sure what you mean. Are you doing a definitional dispute about what counts as the "standard" definition of Bayesian networks?

Comment by tailcalled on Latent variables for prediction markets: motivation, technical guide, and design considerations · 2025-03-25T09:36:03.127Z · LW · GW

Your linked paper is kind of long - is there a single part of it that summarizes the scoring so I don't have to read all of it?

Either way, yes, it does seem plausible that one could create a market structure that supports latent variables without rewarding people in the way I described it.

Comment by tailcalled on Mo Putera's Shortform · 2025-03-21T12:20:29.063Z · LW · GW

I'm not convinced Scott Alexander's mistakes page accurately tracks his mistakes. E.g. the mistake on it I know the most about is this one:

56: (5/27/23) In Raise Your Threshold For Accusing People Of Faking Bisexuality, I cited a study finding that most men’s genital arousal tracked their stated sexual orientation (ie straight men were aroused by women, gay men were aroused by men, bi men were aroused by either), but women’s genital arousal seemed to follow a bisexual pattern regardless of what orientation they thought they were - and concluded that although men’s orientation seemed hard-coded, women’s orientation must be more psychological. But Ozy cites a followup study showing that women (though not men) also show genital arousal in response to chimps having sex, suggesting women’s genital arousal doesn’t track actual attraction and is just some sort of mechanical process triggered by sexual stimuli. I should not have interpreted the results of genital arousal studies as necessarily implying attraction.

But that's basically wrong. The study found women's arousal to chimps having sex to be very close to their arousal to nonsexual stimuli, and far below their arousal to sexual stimuli.

Comment by tailcalled on How far along Metr's law can AI start automating or helping with alignment research? · 2025-03-20T17:47:59.753Z · LW · GW

I mean I don't really believe the premises of the question. But I took "Even if you're not a fan of automating alignment, if we do make it to that point we might as well give it a shot!" to imply that even in such a circumstance, you still want me to come up with some sort of answer.

Comment by tailcalled on How far along Metr's law can AI start automating or helping with alignment research? · 2025-03-20T17:37:07.979Z · LW · GW

Life on earth started 3.5 billion years ago. Log_2(3.5 billion years/1 hour) = 45 doublings. With one doubling every 7 months, that makes 26 years, or in 2051.

(Obviously this model underestimates the difficulty of getting superalignment to work. But also extrapolating the METR trend is questionable for 45 doublings is dubious in an unknown direction. So whatever.)

Comment by tailcalled on How to Make Superbabies · 2025-03-15T13:15:40.498Z · LW · GW

I talk to geneticists (mostly on Twitter, or rather now BlueSky) and they don't really know about this stuff.

Comment by tailcalled on How to Make Superbabies · 2025-03-15T13:03:33.020Z · LW · GW

(Presumably there exists some standard text about this that one can just link to lol.)

I don't think so.

I'm still curious whether this actually happens.... I guess you can have the "propensity" be near its ceiling.... (I thought that didn't make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...

For something like divorce, you could imagine the following causes:

Most common cause is you married someone who just sucks
... but maybe you married a closeted gay person
... or maybe your partner was good but then got cancer and you decided to abandon them rather than support them through the treatment

The genetic propensities for these three things are probably pretty different: If you've married someone who just sucks, then a counterfactually higher genetic propensity to marry people who suck might counterfactually lead to having married someone who sucks more, but a counterfactually higher genetic propensity to marry a closeted gay person probably wouldn't lead to counterfactually having married someone who sucks more, nor have much counterfactual effect on them being gay (because it's probably a nonlinear thing), so only the genetic propensity to marry someone who sucks matters.

In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.

Comment by tailcalled on How to Make Superbabies · 2025-03-15T12:47:45.020Z · LW · GW

Ok, more specifically, the decrease in the narrowsense heritability gets "double-counted" (after you've computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.

Comment by tailcalled on How to Make Superbabies · 2025-03-15T12:32:42.981Z · LW · GW

It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn't decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you'd get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.

Comment by tailcalled on How to Make Superbabies · 2025-03-15T12:17:53.253Z · LW · GW

If some amount of heritability is from the second chunk, then to that extent, there's a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you'd see these pairs of people and then you'd find out how specifically the second chunk affects the trait.

This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients. When you create the PGS, you include both groups, so the PGS coefficients will be downwards biased relative to .

Comment by tailcalled on How to Make Superbabies · 2025-03-15T12:00:39.692Z · LW · GW

Why?

Comment by tailcalled on How to Make Superbabies · 2025-03-15T11:44:01.773Z · LW · GW

Some of the heritability would be from the second chunk of genes.

User info

Posts

Comments