Posts

Consequentialism is in the Stars not Ourselves 2023-04-24T00:02:39.803Z
Is "Strong Coherence" Anti-Natural? 2023-04-11T06:22:22.525Z
Feature Request: Right Click to Copy LaTeX 2023-04-08T23:27:30.151Z
Beren's "Deconfusing Direct vs Amortised Optimisation" 2023-04-07T08:57:59.777Z
Is "Recursive Self-Improvement" Relevant in the Deep Learning Paradigm? 2023-04-06T07:13:31.579Z
Orthogonality is Expensive 2023-04-03T00:43:34.566Z
"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman 2023-03-30T15:43:32.814Z
Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research 2023-03-23T05:45:12.004Z
Contra "Strong Coherence" 2023-03-04T20:05:28.346Z
Incentives and Selection: A Missing Frame From AI Threat Discussions? 2023-02-26T01:18:13.487Z
Is InstructGPT Following Instructions in Other Languages Surprising? 2023-02-13T23:26:28.594Z
Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why? 2023-02-09T13:36:00.325Z
[About Me] Cinera's Home Page 2023-02-07T12:56:10.518Z
What Are The Preconditions/Prerequisites for Asymptotic Analysis? 2023-02-03T21:26:48.987Z
AI Risk Management Framework | NIST 2023-01-26T15:27:19.807Z
"Heretical Thoughts on AI" by Eli Dourado 2023-01-19T16:11:56.567Z
How Does the Human Brain Compare to Deep Learning on Sample Efficiency? 2023-01-15T19:49:33.301Z
Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind 2023-01-13T16:53:10.279Z
Microsoft Plans to Invest $10B in OpenAI; $3B Invested to Date | Fortune 2023-01-12T03:55:10.248Z
Open & Welcome Thread - January 2023 2023-01-07T11:16:18.646Z
[Discussion] How Broad is the Human Cognitive Spectrum? 2023-01-07T00:56:21.456Z
The Limit of Language Models 2023-01-06T23:53:32.638Z
Default Sort for Shortforms is Very Bad; How Do I Change It? 2023-01-02T21:50:12.779Z
Why The Focus on Expected Utility Maximisers? 2022-12-27T15:49:36.536Z
Against Agents as an Approach to Aligned Transformative AI 2022-12-27T00:47:03.706Z
Contra Steiner on Too Many Natural Abstractions 2022-12-24T17:42:53.828Z
[DISC] Are Values Robust? 2022-12-21T01:00:29.939Z
[Incomplete] What is Computation Anyway? 2022-12-14T16:17:43.093Z
Why I'm Sceptical of Foom 2022-12-08T10:01:01.397Z
"Far Coordination" 2022-11-23T17:14:41.830Z
X-risk Mitigation Does Actually Require Longtermism 2022-11-14T12:54:53.237Z
In Defence of Temporal Discounting in Longtermist Ethics 2022-11-13T21:54:38.706Z
Should I Pursue a PhD? 2022-11-06T10:58:51.241Z
[Sketch] Validity Criterion for Logical Counterfactuals 2022-10-11T13:31:22.918Z
"Free Will" in a Computational Universe 2022-09-22T21:25:26.087Z
Initial Thoughts on Dissolving "Couldness" 2022-09-22T21:23:31.510Z
Are Human Brains Universal? 2022-09-15T15:15:21.302Z
Why Do People Think Humans Are Stupid? 2022-09-14T13:55:30.929Z
Are Speed Superintelligences Feasible for Modern ML Techniques? 2022-09-14T12:59:10.058Z
Would a Misaligned SSI Really Kill Us All? 2022-09-14T12:15:31.440Z
Why do People Think Intelligence Will be "Easy"? 2022-09-12T17:32:38.185Z
A First Attempt to Dissolve "Is Consciousness Reducible?" 2022-08-20T23:39:35.433Z
What are the Limits on Computability? 2022-08-20T22:02:39.695Z
So, I Want to Be a "Thinkfluencer" 2022-08-20T18:39:05.825Z
Is General Intelligence "Compact"? 2022-07-04T13:27:32.166Z
What is the LessWrong Logo(?) Supposed to Represent? 2022-06-28T20:20:52.321Z
[Yann Lecun] A Path Towards Autonomous Machine Intelligence 2022-06-27T19:24:50.543Z
Why Are Posts in the Sequences Tagged [Personal Blog] Instead of [Frontpage]? 2022-06-27T09:35:26.778Z
[LQ] Some Thoughts on Messaging Around AI Risk 2022-06-25T13:53:26.833Z
How Do You Quantify [Physics Interfacing] Real World Capabilities? 2022-06-14T14:49:47.264Z

Comments

Comment by DragonGod on Uncertainty in all its flavours · 2024-01-20T22:46:49.706Z · LW · GW

i.e. if each forecaster  has an first-order belief , and  is your second-order belief about which forecaster is correct, then  should be your first-order belief about the election.

I think there might be a typo here. Did you instead mean to write: "" for the second order beliefs about the forecasters?

Comment by DragonGod on Order Matters for Deceptive Alignment · 2023-07-26T15:50:15.982Z · LW · GW

The claim is that given the presence of differential adversarial examples, the optimisation process would adjust the parameters of the model such that it's optimisation target is the base goal.

Comment by DragonGod on DragonGod's Shortform · 2023-07-25T21:16:20.418Z · LW · GW

That was it, thanks!

Comment by DragonGod on DragonGod's Shortform · 2023-07-25T20:30:29.625Z · LW · GW

Probably sometime last year, I posted on Twitter something like: "agent values are defined on agent world models" (or similar) with a link to a LessWrong post (I think the author was John Wentworth).

I'm now looking for that LessWrong post.

My Twitter account is private and search is broken for private accounts, so I haven't been able to track down the tweet. If anyone has guesses for what the post I may have been referring to was, do please send it my way.

Comment by DragonGod on DragonGod's Shortform · 2023-07-24T08:08:16.174Z · LW · GW

Most of the catastrophic risk from AI still lies in superhuman agentic systems.

Current frontier systems are not that (and IMO not poised to become that in the very immediate future).

I think AI risk advocates should be clear that they're not saying GPT-5/Claude Next is an existential threat to humanity.

[Unless they actually believe that. But if they don't, I'm a bit concerned that their message is being rounded up to that, and when such systems don't reveal themselves to be catastrophically dangerous, it might erode their credibility.]

Comment by DragonGod on DragonGod's Shortform · 2023-07-22T12:46:08.296Z · LW · GW

Immigration is such a tight constraint for me.

My next career steps after I'm done with my TCS Masters are primarily bottlenecked by "what allows me to remain in the UK" and then "keeps me on track to contribute to technical AI safety research".

What I would like to do for the next 1 - 2 years ("independent research"/ "further upskilling to get into a top ML PhD program") is not all that viable a path given my visa constraints.

Above all, I want to avoid wasting N more years by taking a detour through software engineering again so I can get Visa sponsorship.

[I'm not conscientious enough to pursue AI safety research/ML upskilling while managing a full time job.]

Might just try and see if I can pursue a TCS PhD at my current university and do TCS research that I think would be valuable for theoretical AI safety research.

The main detriment of that is I'd have to spend N more years in <city> and I was really hoping to come down to London.

Advice very, very welcome.

[Not sure who to tag.]

Comment by DragonGod on Hedonic Loops and Taming RL · 2023-07-20T15:36:47.064Z · LW · GW

Specifically, the experiments by Morrison and Berridge demonstrated that by intervening on the hypothalamic valuation circuits, it is possible to adjust policies zero-shot such that the animal has never experienced a previously repulsive stimulus as pleasurable.

I find this a bit confusing as worded, is something missing?

Comment by DragonGod on DragonGod's Shortform · 2023-07-10T18:20:04.726Z · LW · GW

Does anyone know a ChatGPT plugin for browsing documents/webpages that can read LaTeX?

The plugin I currently use (Link Reader) strips out the LaTeX in its payload, and so GPT-4 ends up hallucinating the LaTeX content of the pages I'm feeding it.

Comment by DragonGod on Ruby's Public Drafts & Working Notes · 2023-07-08T20:39:40.058Z · LW · GW

How frequent are moderation actions? Is this discussion about saving moderator effort (by banning someone before you have to remove the rate-limited quantity of their bad posts), or something else? I really worry about "quality improvement by prior restraint" - both because low-value posts aren't that harmful, they get downvoted and ignored pretty easily, and because it can take YEARS of trial-and-error for someone to become a good participant in LW-style discussions, and I don't want to make it impossible for the true newbies (young people discovering this style for the first time) to try, fail, learn, try, fail, get frustrated, go away, come back, and be slightly-above-neutral for a bit before really hitting their stride.

I agree with Dagon here.

Six years ago after discovering HPMOR and reading part (most?) of the Sequences, I was a bad participant in old LW and rationalist subreddits.

I would probably have been quickly banned on current LW.

It really just takes a while for people new to LW like norms to adjust.

Comment by DragonGod on DragonGod's Shortform · 2023-07-08T20:29:42.739Z · LW · GW

I find noticing surprise more valuable than noticing confusion.

Hindsight bias and post hoc rationalisations make it easy for us to gloss over events that were apriori unexpected.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-26T11:06:55.061Z · LW · GW

I think the model of "a composition of subagents with total orders on their preferences" is a descriptive model of inexploitable incomplete preferences, and not a mechanistic model. At least, that was how I interpreted "Why Subagents?".

I read @johnswentworth as making the claim that such preferences could be modelled as a vetocracy of VNM rational agents, not as claiming that humans (or other objects of study) are mechanistically composed of discrete parts that are themselves VNM rational.

 

I'd be more interested/excited by a refutation on the grounds of: "incomplete inexploitable preferences are not necessarily adequately modelled as a vetocracy of parts with complete preferences". VNM rationality and expected utility maximisation is mostly used as a descriptive rather than mechanistic tool anyway.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T20:11:44.539Z · LW · GW

Oh, do please share.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T19:57:10.550Z · LW · GW

Suppose it is offered (by a third party) to switch  and then 

Seems incomplete (pun acknowledged). I feel like there's something missing after "to switch" (e.g. "to switch from A to B" or similar).

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T19:46:31.080Z · LW · GW

Another example is an agent through time where as in the Steward of Myselves

This links to Scott Garrabrant's page, not to any particular post. Perhaps you want to review that?

I think you meant to link to: Tyranny of the Epistemic Majority.

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-17T06:41:18.256Z · LW · GW

It's working now!

https://podcasts.google.com/feed/aHR0cHM6Ly9heHJwb2RjYXN0LmxpYnN5bi5jb20vcnNz/episode/ODVlM2RkNmItMTdkZi00MWYwLTg2YjAtOWIxY2JkOTBlYjgw?ep=14

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-16T19:03:46.850Z · LW · GW

Ditto for me.

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-16T19:02:55.579Z · LW · GW

I've been waiting for this!

Comment by DragonGod on ARC's first technical report: Eliciting Latent Knowledge · 2023-06-15T14:08:36.730Z · LW · GW

We aren’t offering these criteria as necessary for “knowledge”—we could imagine a breaker proposing a counterexample where all of these properties are satisfied but where intuitively M didn’t really know that A′ was a better answer. In that case the builder will try to make a convincing argument to that effect.

Bolded should be sufficient.

Comment by DragonGod on In Defense of Wrapper-Minds · 2023-06-05T08:44:19.181Z · LW · GW

In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

Yeah, I agree with this. But I don't think the human system aggregates into any kind of coherent total optimiser. Humans don't have an objective function (not even approximately?).

A human is not well modelled as a wrapper mind; do you disagree?

Comment by DragonGod on In Defense of Wrapper-Minds · 2023-06-04T19:04:11.991Z · LW · GW

Thus, any greedy optimization algorithm would convergently shape its agent to not only pursue , but to maximize for 's pursuit — at the expense of everything else.

Conditional on:

  1. Such a system being reachable/accessible to our local/greedy optimisation process
  2. Such a system being actually performant according to the selection metric of our optimisation process 

 

I'm pretty sceptical of #2. I'm sceptical that systems that perform inference via direct optimisation over their outputs are competitive in rich/complex environments. 

Such optimisation is very computationally intensive compared to executing learned heuristics, and it seems likely that the selection process would have access to much more compute than the selected system. 

See also: "Consequentialism is in the Stars not Ourselves". 

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-04T06:50:33.955Z · LW · GW

Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.

Alternatively, just read the post I linked.

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-04T06:49:08.975Z · LW · GW

Oh gosh, how did I hallucinate that?

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:51:15.512Z · LW · GW

In what sense are they "not trying their hardest"?

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:22:52.815Z · LW · GW

It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text.

"The upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum".

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:12:06.738Z · LW · GW

which is indifferent to the simplicify of the architecture the insight lets you find.

The bolded should be "simplicity". 

Comment by DragonGod on AI Alignment Research Engineer Accelerator (ARENA): call for applicants · 2023-05-26T03:29:52.199Z · LW · GW

Sorry, please where can I get access to the curriculum (including the reading material and exercises) if I want to study it independently?

The chapter pages on the website doesn't seem to list full curricula.

Comment by DragonGod on DragonGod's Shortform · 2023-05-11T00:54:46.564Z · LW · GW

If you define your utility function over histories, then every behaviour is maximising an expected utility function no?

Even behaviour that is money pumped?

I mean you can't money pump any preference over histories anyway without time travel.

The Dutchbook arguments apply when your utility function is defined over your current state with respect to some resource?

I feel like once you define utility function over histories, you lose the force of the coherence arguments?

What would it look like to not behave as if maximising an expected utility function for a utility function defined over histories.

Comment by DragonGod on DragonGod's Shortform · 2023-05-09T18:23:45.142Z · LW · GW

My contention is that I don't think the preconditions hold.

Agents don't fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.

Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.

Comment by DragonGod on DragonGod's Shortform · 2023-05-08T17:15:48.465Z · LW · GW

Yeah, I think the preconditions of VNM straightforwardly just don't apply to generally intelligent systems.

Comment by DragonGod on Orthogonal's Formal-Goal Alignment theory of change · 2023-05-08T00:49:12.782Z · LW · GW

Not at all convinced that "strong agents pursuing a coherent goal is a viable form for generally capable systems that operate in the real world, and the assumption that it is hasn't been sufficiently motivated.

Comment by DragonGod on DragonGod's Shortform · 2023-05-06T19:50:01.715Z · LW · GW

What are the best arguments that expected utility maximisers are adequate (descriptive if not mechanistic) models of powerful AI systems?

[I want to address them in my piece arguing the contrary position.]

Comment by DragonGod on LLMs and computation complexity · 2023-04-30T04:13:04.414Z · LW · GW

Caveat to the caveat:

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

Comment by DragonGod on LLMs and computation complexity · 2023-04-30T02:27:24.482Z · LW · GW

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

Comment by DragonGod on LLMs and computation complexity · 2023-04-29T23:23:26.986Z · LW · GW

Very big caveat: the LLM doesn't actually perform O(1) computations per generated token.

The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp

Comment by DragonGod on LLMs and computation complexity · 2023-04-29T22:15:08.466Z · LW · GW

Strongly upvoted.

Short but powerful.

Tl;Dr: LLMs perform O(1) computational steps per generated token and this is true regardless of the generated token.

The LLM sees each token in its context window when generating the next token so can compute problems in O(n^2) [where n is the context window size].

LLMs can get along the computational requirements by "showing their working" and simulating a mechanical computer (one without backtracking, so not Turing complete) in their context window.

This only works if the context window is large enough to contain the workings for the entire algorithm.

Thus LLMs can perform matrix multiplication when showing workings, but not when asked to compute it without showing workings.

Important fundamental limitation on the current paradigm.

We can now say with certainty tasks that GPT will never be able to solve (e.g. beat Stockfish at Chess because Chess is combinatorial and the LLM can't search the game tree to any depth) no matter how far it's scaled up.

This is a very powerful argument.

Comment by DragonGod on DragonGod's Shortform · 2023-04-25T13:34:04.565Z · LW · GW

A reason I mood affiliate with shard theory so much is that like...

I'll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I'll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.

One example is that like I had independently concluded that "finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem".

And then today I discovered that Alex Turner advances a similar contention in "Inner and outer alignment decompose one hard problem into two extremely hard problems".

Shard theory also seems to nicely encapsulates my intuitions that we shouldn't think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.

Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don't even aspire to do that are ill motivated.

 

It is the case that I don't think I can adequately communicate shard theory to a disbeliever, so on reflection there's some scepticism that I properly understand it.

 

That said, the vibes are right.

Comment by DragonGod on DragonGod's Shortform · 2023-04-25T13:32:17.546Z · LW · GW

"All you need is to delay doom by one more year per year and then you're in business" — Paul Christiano.

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-24T00:05:15.493Z · LW · GW

Took this to drafts for a few days with the intention of refining it and polishing the ontology behind the post.

I ended up not doing that as much, because the improvements I was making to the underlying ontology felt better presented as a standalone post, so I mostly factored them out of this one.

I'm not satisfied with this post as is, but there's some kernel of insight here that I think is valuable, and I'd want to be able to refer to the basic thrust of this post/some arguments made in it elsewhere.

I may make further edits to it in future.

Comment by DragonGod on Risks from Learned Optimization: Conclusion and Related Work · 2023-04-21T18:17:49.951Z · LW · GW

It should be noted, however, that while inner alignment is a robustness problem, the occurrence of unintended mesa-optimization is not. If the base optimizer's objective is not a perfect measure of the human's goals, then preventing mesa-optimizers from arising at all might be the preferred outcome. In such a case, it might be desirable to create a system that is strongly optimized for the base objective within some limited domain without that system engaging in open-ended optimization in new environments.(11) One possible way to accomplish this might be to use strong optimization at the level of the base optimizer during training to prevent strong optimization at the level of the mesa-optimizer.(11)

I don't really follow this paragraph, especially the bolded.

Why would mesa-optimisation arising when not intended not be an issue for robustness (the mesa-optimiser could generalise capably of distribution but pursue the wrong goal). 

The rest of the post also doesn't defend that claim; it feels more like defending a claim like:

The non-occurrence of mesa-optimisation is not a robustness problem.

Comment by DragonGod on Deceptive Alignment · 2023-04-20T18:49:10.628Z · LW · GW

Is this a correct representation of corrigible alignment:

  1. The mesa-optimizer (MO) has a proxy of the base objective that it's optimising for.
  2. As more information about the base objective is received, MO updates the proxy.
  3. With sufficient information, the proxy may converge to a proper representation of the base objective.
  4. Example: a model-free RL algorithm whose policy is argmax over actions with respect to its state-action value function 
    1. The base objective is the reward signal
    2. The value function serves as a proxy for the base objective.
    3. The value function is updated as future reward signals are received, gradually refining the proxy to better align with the base objective.
Comment by DragonGod on AI Alignment Research Engineer Accelerator (ARENA): call for applicants · 2023-04-20T15:32:37.907Z · LW · GW

Sounds good, will do!

Comment by DragonGod on AI Alignment Research Engineer Accelerator (ARENA): call for applicants · 2023-04-20T13:53:40.415Z · LW · GW

March 22nd is when my first exam starts.

It finishes June 2nd.

Is it possible for me to delay my start a bit?

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-19T21:22:36.012Z · LW · GW

I'm gestating on this post. I suggest part of my original framing was confused, and so I'll just let the ideas ferment some more.

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-19T07:59:58.520Z · LW · GW

Yeah for humans in particular, I think the statement is not true of solely biological evolution.

But also, I'm not sure you're looking at it on the right level. Any animal presumably doesvmany bits worth of selection in a given day, but the durable/macroscale effects are better explained by evolutionary forces acting on the population than actions of different animals within their lifetimes.

Or maybe this is just a confused way to think/talk about it.

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-19T07:41:53.386Z · LW · GW

I could change that. I was thinking of work done in terms of bits of selection.

Though I don't think that statement is true of humans unless you also include cultural memetic evolution (which I think you should).

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-18T23:39:59.162Z · LW · GW

Currently using "task specific"/"total".

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-18T23:23:36.089Z · LW · GW

Yeah, I'm aware.

I would edit the post once I have better naming/terminology for the distinction I was trying to draw.

It happened as something like "humans optimise for local objectives/specific tasks" which eventually collapsed to "local optimisation".

[Do please subject better adjectives!]

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-18T18:05:02.597Z · LW · GW

Hmm, the etymology was that I was using "local optimisation" to refer to the kind of task specific optimisation humans do.

And global was the natural term to refer to the kind of optimisation I was claiming humans don't do but which an expected utility maximiser does.

Comment by DragonGod on Consequentialism is in the Stars not Ourselves · 2023-04-18T17:36:42.222Z · LW · GW

The "global" here means that all actions/outputs are optimising towards the same fixed goal(s):

Local Optimisation

  • Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
  • The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of another optimisation problem (e.g., picking a good next move as part of winning a chess game), generated via learned heuristics, etc.

 

Global Optimisation

  • Entails consistently employing optimisation throughout a system's active lifetime to achieve fixed terminal goals.
  • All actions flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
Comment by DragonGod on DragonGod's Shortform · 2023-04-18T15:51:32.186Z · LW · GW

Consequentialism is in the Stars not Ourselves?

Still thinking about consequentialism and optimisation. I've argued that global optimisation for an objective function is so computationally intractable as to be prohibited by the laws of physics of our universe. Yet it's clearly the case that e.g. evolution is globally optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you're taking a broader view). I think examining why evolution is able to successfully globally optimise for its objective function would be enlightening.

Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).


Local vs Global Optimisation

Optimisation here refers to "direct" optimisation, a mechanistic procedure for internally searching through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space.

 

Local Optimisation

  • Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
  • The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of a another optimisation problem (e.g. picking a good next move as part of winning a chess game), generated via learned heuristics, etc.

 

Global Optimisation

  • Entails consistently employing optimisation throughout a system's active lifetime to achieve fixed terminal goals.
  • All actions flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).

Outer Optimisation Processes as Global Optimisers

As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate global optimisation:

 

Access to more compute power

  • ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
    • Economic incentives favour this: centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
      • Optimising inference costs has lead to "overtraining" smaller models
    • In some cases trained models are intended to be run on consumer hardware or edge computing devices
  • Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
  • This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions

 

Relaxation of time constraints

  • Real-time inference imposes a strict bound on how much computation can be performed in a single time step
    • Robotics, self driving cars, game AIs, etc. must make actions within fractions of a second
      • Sometimes hundreds of actions in a second
    • User facing cognitive models (e.g.) LLMs are also subject to latency constraints
      • Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
  • In contrast, the outer selection process just has a lot more time to perform optimisation
    • ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
      • For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
      • Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
    • Evolution occurs over timescales of hundreds to thousands of generations of an organism

 

Solving a (much) simpler optimisation problem

  • Outer optimisation processes evaluate the objective function by using actual consequences along single trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
    • Evaluating future consequences of actions is difficult (e.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?)
    • Chaos sharply limits how far into the future we can meaningfully predict (regardless of how much computational resources one has), which is not an issue when using actual consequences for selection
      • In a sense, outer optimisation processes get the "evaluate consequences of this trajectory on the objective" for free, and that's just a very difficult (and in some cases outright intractable) computational problem
    • The usage of actual consequences applies over longer time horizons
      • Evolution has a potentially indefinite/unbounded horizon
        • And has been optimising for much longer than any 
      • Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
  • Outer optimisation processes selects for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.

 

Summary

Outer optimisation processes are more capable of global optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc). 

These factors enable outer optimisation processes to globally optimise for their selection metric in a way that is infeasible for the intelligent systems they select for.


Cc: @beren, @tailcalled, @Chris_Leong, @JustisMills.