Posts

New intro textbook on AIXI 2024-05-11T18:18:50.945Z
Towards a formalization of the agent structure problem 2024-04-29T20:28:15.190Z
Raemon's Deliberate (“Purposeful?”) Practice Club 2023-11-14T18:24:19.335Z
My research agenda in agent foundations 2023-06-28T18:00:27.813Z
Why don't quantilizers also cut off the upper end of the distribution? 2023-05-15T01:40:50.183Z
Draft: Inferring minimizers 2023-04-01T20:20:48.676Z
Draft: Detecting optimization 2023-03-29T20:17:46.642Z
Draft: The optimization toolbox 2023-03-28T20:40:38.165Z
Draft: Introduction to optimization 2023-03-26T17:25:55.093Z
Dealing with infinite entropy 2023-03-01T15:01:40.400Z
Top YouTube channel Veritasium releases video on Sleeping Beauty Problem 2023-02-11T20:36:57.089Z
My first year in AI alignment 2023-01-02T01:28:03.470Z
How can one literally buy time (from x-risk) with money? 2022-12-13T19:24:06.225Z
Consider using reversible automata for alignment research 2022-12-11T01:00:24.223Z
A dynamical systems primer for entropy and optimization 2022-12-10T00:13:13.984Z
How do finite factored sets compare with phase space? 2022-12-06T20:05:54.061Z
Alex_Altair's Shortform 2022-11-27T18:59:05.193Z
When do you visualize (or not) while doing math? 2022-11-23T20:15:20.885Z
"Normal" is the equilibrium state of past optimization processes 2022-10-30T19:03:19.328Z
Introduction to abstract entropy 2022-10-20T21:03:02.486Z
Deliberate practice for research? 2022-10-08T03:45:21.773Z
How dangerous is human-level AI? 2022-06-10T17:38:27.643Z
I'm trying out "asteroid mindset" 2022-06-03T13:35:48.614Z
Request for small textbook recommendations 2022-05-25T22:19:56.549Z
Why hasn't deep learning generated significant economic value yet? 2022-04-30T20:27:54.554Z
Does the rationalist community have a membership funnel? 2022-04-12T18:44:48.795Z
When to use "meta" vs "self-reference", "recursive", etc. 2022-04-06T04:57:47.405Z
Giving calibrated time estimates can have social costs 2022-04-03T21:23:46.590Z
Ways to invest your wealth if you believe in a high-variance future? 2022-03-11T16:07:49.302Z
How can I see a daily count of all words I type? 2022-02-04T02:05:04.483Z
Tag for AI alignment? 2022-01-02T18:55:45.228Z
Is Omicron less severe? 2021-12-30T23:14:37.292Z
Confusion about Sequences and Review Sequences 2021-12-21T18:13:13.394Z
How I became a person who wakes up early 2021-12-18T18:41:45.732Z
What's the status of third vaccine doses? 2021-08-04T02:22:52.317Z
A new option for building lumenators 2021-07-12T23:45:34.294Z
Bay and/or Global Solstice* Job Search (2021 - 2022) 2021-03-16T00:21:10.290Z
Where does the phrase "central example" come from? 2021-03-12T05:57:49.253Z
One Year of Pomodoros 2020-12-31T04:42:31.274Z
Logistics for the 2020 online Secular Solstice* 2020-12-03T00:08:37.401Z
The Bay Area Solstice 2014-12-03T22:33:17.760Z
Mathematical Measures of Optimization Power 2012-11-24T10:55:17.145Z
Modifying Universal Intelligence Measure 2012-09-18T23:44:08.864Z
An Intuitive Explanation of Solomonoff Induction 2012-07-11T08:05:20.544Z
Should LW have a separate AI section? 2012-07-10T01:42:39.259Z
How Bayes' theorem is consistent with Solomonoff induction 2012-07-09T22:16:02.312Z
Computation Hazards 2012-06-13T21:49:19.986Z
How do you notice when you're procrastinating? 2012-03-02T09:25:08.917Z
[LINK] The NYT on Everyday Habits 2012-02-18T08:23:32.820Z
[LINK] Learning enhancement using "transcranial direct current stimulation" 2012-01-26T16:18:55.714Z

Comments

Comment by Alex_Altair on Towards a formalization of the agent structure problem · 2024-05-19T20:21:56.518Z · LW · GW

Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.

Comment by Alex_Altair on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-19T01:05:38.915Z · LW · GW

Hey Johannes, I don't quite know how to say this, but I think this post is a red flag about your mental health. "I work so hard that I ignore broken glass and then walk on it" is not healthy.

I've been around the community a long time and have seen several people have psychotic episodes. This is exactly the kind of thing I start seeing before they do.

I'm not saying it's 90% likely, or anything. Just that it's definitely high enough for me to need to say something. Please try to seek out some resources to get you more grounded.

Comment by Alex_Altair on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T19:44:22.816Z · LW · GW

I really appreciate this comment!

And yeah, that's why I said only "Note that...", and not something like "don't trust this guy". I think the content of the article is probably true, and maybe it's Metz who wrote it just because AI is his beat. But I do also hold tiny models that say "maybe he dislikes us" and also something about the "questionable understanding" etc that habryka mentions below. AFAICT I'm not internally seething or anything, I just have a yellow-flag attached to this name.

Comment by Alex_Altair on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T01:38:05.681Z · LW · GW

Note that the NYT article is by Cade Metz.

Comment by Alex_Altair on New intro textbook on AIXI · 2024-05-14T15:40:18.509Z · LW · GW

I think the biggest thing I like about it is that it exists! Someone tried to make a fully formalized agent model, and it worked. As mentioned above it's got some big problems, but it helps enormously to have some ground to stand on to try to build on further.

Comment by Alex_Altair on Partitioned Book Club · 2024-05-13T05:55:44.258Z · LW · GW

I love this idea!

Some other books this could work for:

  • The Ancestor's Tale
  • The Art of Game Design
  • The Anthropocene Reviewed
  • The LessWrong review books 😉

Many textbooks have a few initial "core" chapters, and then otherwise a bunch of independent chapters on applications or assorted advanced topics.

Comment by Alex_Altair on Open Thread Spring 2024 · 2024-05-11T05:54:12.198Z · LW · GW

You can "bookmark" a post, is that equivalent to your desired "read later"?

Comment by Alex_Altair on New to this community · 2024-05-11T05:18:18.324Z · LW · GW

Welcome kjsisco! One good place to start interacting with others here is on the current open thread.

Comment by Alex_Altair on Manifund Q1 Retro: Learnings from impact certs · 2024-05-02T17:23:20.503Z · LW · GW

[link to about]

Link missing

Comment by Alex_Altair on Towards a formalization of the agent structure problem · 2024-04-30T17:16:21.865Z · LW · GW

Hm... so anything that measures degree of agent structure should register a policy with a sub-agent as having some agent structure. But yeah, I haven't thought much about the scenarios where there are multiple agents inside the policy. The agent structure problem is trying to use performance to find a minimum measure of agent structure. So if there was an agent hiding in there that didn't impact the performance during the measured time interval, then it wouldn't be detected (although it would detect it "in the limit").

That said, we're not actually talking about how to measure degree of agent structure yet. It seems plausible to me that whatever method one uses to do that could be adapted to find multiple agents.

Comment by Alex_Altair on Forget Everything (Statistical Mechanics Part 1) · 2024-04-25T03:42:15.042Z · LW · GW

This will make more sense if you have a basic grasp on quantum mechanics, but if you're willing to accept "energy comes in discrete units" as a premise then you should be mostly fine.

My current understanding is that QM is not-at-all needed to make sense of stat mech. Instead, the thing where energy is equally likely to be in any of the degrees of freedom just comes from using a measure over your phase space such that the dynamical law of your system preservers that measure!

Comment by Alex_Altair on Express interest in an "FHI of the West" · 2024-04-19T15:09:12.656Z · LW · GW

Maybe it could be FLCI to avoid collision with the existing FLI.

Comment by Alex_Altair on Express interest in an "FHI of the West" · 2024-04-18T14:25:14.595Z · LW · GW

I also think the name is off, but for a different reason. When I hear "the west" with no other context, I assume it means this, which doesn't make sense here, because the UK and FHI are very solidly part of The West. (I have not heard the "Harvard of the west" phrase and I'm guessing it's pretty darn obscure, especially to the international audience of LW.)

Comment by Alex_Altair on LessOnline (May 31—June 2, Berkeley, CA) · 2024-03-28T00:56:08.126Z · LW · GW

Feedback on the website: it's not clear to me what the difference is between LessOnline and the summer camp right after. Is the summer camp only something you go to if you're also going to Manifest? Is it the same as LessOnline but longer?

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T02:41:50.396Z · LW · GW

Oh, no, I'm saying it's more like 2^8 afterwards. (Obviously it's more than that but I think closer to 8 than a million.) I think having functioning vision at all brings it down to, I dunno, 2^10000. I think you would be hard pressed to name 500 attributes of mammals that you need to pay attention to to learn a new species.

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T02:28:18.968Z · LW · GW

We then get around the 2^8000000 problem by having only a relatively very very small set of candidate “things” to which words might be attached.

A major way that we get around this is by having hierarchical abstractions. By the time I'm learning "dog" from 1-5 examples, I've already done enormous work in learning about objects, animals, something-like-mammals, heads, eyes, legs, etc. So when you point at five dogs and say "those form a group" I've already forged abstractions that handle almost all the information that makes them worth paying attention to, and now I'm just paying attention to a few differences from other mammals, like size, fur color, ear shape, etc.

I'm not sure how the rest of this post relates to this, but it didn't feel present; maybe it's one of the umpteenth things you left out for the sake of introductory exposition.

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T01:31:05.750Z · LW · GW

I've noticed you using the word "chaos" a few times across your posts. I think you're using it colloquially to mean something like "rapidly unpredictable", but it does have a technical meaning that doesn't always line up with how you use it, so it might be useful to distinguish it from a couple other things. Here's my current understanding of what some things mean. (All of these definitions and implications depend on a pile of finicky math and tend to have surprising counter-example if you didn't define things just right, and definitions vary across sources.)

 

Sensitive to initial conditions. A system is sensitive to initial conditions if two points in its phase space will eventually diverge exponentially (at least) over time. This is one way to say that you'll rapidly lose information about a system, but it doesn't have to look chaotic. For example, say you have a system whose phase space is just the real line, and its dynamics over time is just that points get 10x farther from the origin every time step. Then, if you know the value of a point to ten decimal places of precision, after ten time steps you only know one decimal place of precision. (Although there are regions of the real line where you're still sure it doesn't reside, for example you're sure it's not closer to the origin.)

Ergodic. A system is ergodic if (almost) every point in phase space will trace out a trajectory that gets arbitrarily close to every other point. This means that each point is some kind of chaotically unpredictable, because if it's been going for a while and you're not tracking it, you'll eventually end up with maximum uncertainty about where it is. But this doesn't imply sensitivity to initial conditions; there are systems that are ergodic, but where any pair of points will stay the same distance from each other. A simple example is where phase space is a circle, and the dynamics are that on each time step, you rotate each point around the circle by an irrational angle.

Chaos. The formal characterization that people assign to this word was an active research topic for decades, but I think it's mostly settled now. My understanding is that it essentially means this;

  1. Your system has at least one point whose trajectory is ergodic, that is, it will get arbitrarily close to every other point in the phase space
  2. For every natural number n, there is a point in the phase space whose trajectory is periodic with period n. That is, after n time steps (and not before), it will return back exactly where it started. (Further, these periodic points are "dense", that is, every point in phase space has periodic points arbitrarily close to it).

The reason these two criteria yield (colloquially) chaotic behavior is, I think, reasonably intuitively understandable. Take a random point in its phase space. Assume it isn't one with a periodic trajectory (which will be true with "probability 1"). Instead it will be ergodic. That means it will eventually get arbitrarily close to all other points. But consider what happens when it gets close to one of the periodic trajectories; it will, at least for a while, act almost as though it has that period, until it drifts sufficiently far away. (This is using an unstated assumption that the dynamics of the systems have a property where nearby points act similarly.) But it will eventually do this for every periodic trajectory. Therefore, there will be times when it's periodic very briefly, and times when it's periodic for a long time, et cetera. This makes it pretty unpredictable.

 

There are also connections between the above. You might have noticed that my example of a system that was sensitive to initial conditions but not ergodic or chaotic relied on having an unbounded phase space, where the two points both shot off to infinity. I think that if you have sensitivity to initial conditions and a bounded phase space, then you generally also have ergodic and chaotic behavior.

Anyway, I think "chaos" is a sexy/popular term to use to describe vaguely unpredictable systems, but almost all of the time you don't actually need to rely on the full technical criteria of it. I think this could be important for not leading readers into red-herring trails of investigation. For example, all of standard statistical mechanics only needs ergodicity.

Comment by Alex_Altair on Alex_Altair's Shortform · 2024-03-05T20:05:53.130Z · LW · GW

Has anyone checked out Nassim Nicholas Taleb's book Statistical Consequences of Fat Tails? I'm wondering where it lies on the spectrum from textbook to prolonged opinion piece. I'd love to read a textbook about the title.

Comment by Alex_Altair on Voting Results for the 2022 Review · 2024-02-28T17:48:10.468Z · LW · GW

Just noticing that every post has at least one negative vote, which feels interesting for some reason.

Comment by Alex_Altair on Dual Wielding Kindle Scribes · 2024-02-23T01:49:09.076Z · LW · GW

The e-ink tablet market has really diversified recently. I'd recommend that anyone interested look around at the options. My impression is that the Kindle Scribe is one of the least good ones (which doesn't mean it's bad).

Comment by Alex_Altair on Fixing The Good Regulator Theorem · 2024-02-20T16:33:56.502Z · LW · GW

Here's the arxiv version of the paper, with a bunch more content in appendices.

Comment by Alex_Altair on Where is the Town Square? · 2024-02-14T14:57:49.847Z · LW · GW

And, since I can't do everything: what popular platforms shouldn't I prioritize?

I think cross-posting between twitter, mastodon and bluesky would be pretty easy. And it would let you gather your own data on which platforms are worth continuing.

Comment by Alex_Altair on Choosing a book on causality · 2024-02-08T05:58:45.488Z · LW · GW

I looked at these several months ago and unfortunately recommend neither. Pearl's Causality is very dense, and not really a good introduction. The Primer is really egregiously riddled with errors; there seems to have been some problem with the publisher. And on top of that, I just found it not very well written.

I don't have a specific recommendation, but I believe that at this point there are a bunch of statistics textbooks that competently discuss the essential content of causal modelling; maybe check the reviews for some of those on amazon.

Comment by Alex_Altair on D0TheMath's Shortform · 2024-01-14T02:03:22.128Z · LW · GW

One way that the analogy with code doesn't carry over is that in math, you often can't even being to use a theorem if you don't know a lot of detail about what the objects in the theorem mean, and often knowing what they mean is pretty close to knowing why the theorem's you're building on are true. Being handed a theorem is less like being handed an API and more like being handed a sentence in a foreign language. I can't begin to make use of the information content in the sentence until I learn what every symbol means and how the grammar works, and at that point I could have written the sentence myself.

Comment by Alex_Altair on The Perceptron Controversy · 2024-01-11T21:30:37.813Z · LW · GW

I'd recommend porting it over as a sequence instead of one big post (or maybe just port the first chunk as an intro post?). LW doesn't have a citation format, but you can use footnotes for it (and you can use the same footnote number in multiple places).

Comment by Alex_Altair on A model of research skill · 2024-01-10T22:23:05.541Z · LW · GW

I had a side project to get better at research in 2023. I found very little resources that were actually helpful to me. But here are some that I liked. 

  • A few posts by Holden Karnofsky on Cold Takes, especially Useful Vices for Wicked Problems and Learning By Writing.
  • Diving into deliberate practice. Most easily read is the popsci book Peak. This book emphasizes "mental representations", which I find the most useful part of the method, though I think it's also the least supported by the science.
  • The popsci book Grit.
  • The book Ultralearning. Extremely skimmable, large collection of heuristics that I find essential for the "lean" style of research.
  • Reading a scattering of historical accounts of how researchers did their research, and how it came to be useful. (E.g. Newton, Einstein, Erdős, Shannon, Kolmogorov, and a long tail of less big names.)

(Many resources were not helpful for me for reasons that might not apply to others; I was already doing what they advised, or they were about how to succeed inside academia, or they were about emotional problems like lack of confidence or burnout. But, I think mostly I failed to find good resources because no one knows how to do good research.)

Comment by Alex_Altair on New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" · 2024-01-02T22:47:59.119Z · LW · GW

Finally, I want to note an aspect of the discussion in the report that makes me quite uncomfortable: namely, it seems plausible to me that in addition to potentially posing existential risks to humanity, the sorts of AIs discussed in the report might well be moral patients in their own right.

I strongly appreciate this paragraph for stating this concern so emphatically. I think this possibility is strongly under-represented in the AI safety discussion as a whole.

Comment by Alex_Altair on The Plan - 2023 Version · 2024-01-02T22:01:19.617Z · LW · GW

I agree there's a core principle somewhere around the idea of "controllable implies understandable". But when I think about this with respect to humans studying biology, then there's another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it's not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn't mean that our goal, "reliably stopping cancer" is understandable, since it is not a way that the system is controlling itself.

This mismatch seems pretty evidently applicable to AI alignment.

And perhaps the "environment" part is critical. A system being controllable in one environment doesn't imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.

Comment by Alex_Altair on Meaning & Agency · 2024-01-02T18:25:11.922Z · LW · GW

Looking back at Flint's work, I don't agree with this summary.

Ah, sorry, I wasn't intending for that to be a summary. I found Flint's framework very insightful, but after reading it I sort of just melded it into my own overall beliefs and understanding around optimization. I don't think he intended it to be a coherent or finished framework on its own, so I don't generally try to think "what does Flint's framework say about X?". I think its main influence on me was the whole idea of using dynamical systems and phase space as the basis for optimization. So for example;

In any case, I agree that Flint's work also eliminates the need for an unnatural baseline in which we have to remove the agent.

I would say that working in the framework of dynamical systems is what lets one get a natural baseline against which to measure optimization, by comparing a given trajectory with all possible trajectories.

I think I could have some more response/commentary about each of your bullet points, but there's a background overarching thing that may be more useful to prod at. I have a clear (-feeling-to-me) distinction between "optimization" and "agent", which doesn't seem to be how you're using the words. The dynamical systems + Yudkowsky measure perspective is a great start on capturing the optimization concept, but it is agnostic about (my version of) the agent concept (except insofar as agents are a type of optimizer). It feels to me like the idea of endorsement you're developing here is cool and useful and is... related to optimization, but isn't the basis of optimization. So I agree that e.g. "endorsement" is closer to alignment, but also I don't think that "optimization" is supposed to be all that close to alignment; I'd reserve that for "agent". I think we'll need a few levels of formalization in agent foundations, and you're working toward a different level than those, and so these ideas aren't in conflict.

Breaking that down just a bit more; let's say that "alignment" refers to aligning the intentional goals of agents. I'd say that "optimization" is a more general phenomenon where some types of systems tend to move their state up an ordering; but that doesn't mean that it's "intentional", nor that that goal is cleanly encoded somewhere inside the system. So while you could say that two optimizing systems "are more aligned" if they move up similar state orderings, it would be awkward to talk about aligning them.

(My notion of) optimization has its own version of the thing you're calling "Vingean", which is that if I believe a process optimizes along a certain state ordering, but I have no beliefs about how it works on the inside, then I can still at least predict that the state will go up the ordering. I can predict that the car will arrive at the airport even though I don't know the turns. But this has nothing to do with the (optimization) process having beliefs or doing reasoning of any kind (which I think of as agent properties). For example I believe that there exists an optimization process such that mountains get worn down, and so I will predict it to happen, even though I know very little about the chemistry of erosion or rocks. And this is kinda like "endorsement", but it's not that the mountain has probability assignments or anything.

In fact I think it's just a version of what makes something a good abstraction; an abstraction is a compact model that allows you to make accurate predictions about outcomes without having to predict all intermediate steps. And all abstractions also have the property that if you have enough compute/etc. then you can just directly calculate the outcome based on lower-level physics, and don't need the abstraction to predict the outcome accurately.

I think that was a longer-winded way to say that I don't think your concepts in this post are replacements for the Yudkowsky/Flint optimization ideas; instead it sounds like you're saying "Assume the optimization process is of the kind that has beliefs and takes actions. Then we can define 'endorsement' as follows; ..."

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:16:57.777Z · LW · GW

What's your preferred response/solution to ~"problems"(?) of events that have probability zero but occur nevertheless

My impression is that people have generally agreed that this paradox is resolved (=formally grounded) by measure theory. I know enough measure theory to know what it is but haven't gone out of my way to explore the corners of said paradoxes.

But you might be asking me about it in the framework of Yudkowsky's measure of optimization. Let's say the states are the real numbers in [0, 1] and the relevant ordering is the same as the one on the real numbers, and we're using the uniform measure over it. Then, even though the probability of getting any specific real number is zero, the probability mass we use to calculate bit of optimization power is all the probability mass below that number. In that case, all the resulting numbers would imply finite optimization power. ... except if we got the result that was exactly the number 0. But in that case, that would actually be infinitely surprising! And so the fact that the measure of optimization returns infinity bits reflects intuition.

It's (probably) true that our physical reality has only finite precision

I'm also not a physicist but my impression is that physicists generally believe that the world does actually have infinite precision.

I'd also guess that the description length of (a computable version of) the standard model as-is (which includes infinite precision because it uses the real number system) has lower K-complexity than whatever comparable version of physics where you further specify a finite precision.

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:04:35.980Z · LW · GW

I don't understand this part. How does probability mass constrain how "bad" the states can get? Could you rephrase this maybe?

The probability mass doesn't constraint how "bad" the states can get; I was saying that the fact that there's only 1 unit of probability mass means that the amount of probability mass on lower states is bounded (by 1).

Restricting the formalism to orderings means that there is no meaning to how bad a state is, only a meaning to whether it is better or worse than another state. (You can additionally decide on a measure of how bad, as long as it's consistent with the ordering, but we don't need that to analyze (this concept of) optimization.)

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:43:18.828Z · LW · GW

I'll also note that I think what you're calling "Vingean agency" is a notable sub-type of optimization process that you've done a good job at analyzing here. But it's definitely not the definition of optimization or agency to me. For example, in the post you say

We perceive agency when something is better at doing something than us; we endorse some aspect of its reasoning or activity.

This doesn't feel true to me (in the carve-nature-at-its-joints sense). I think children are strongly agents, even though I do everything more competently than they do.

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:36:05.097Z · LW · GW

I have some comments on the arbitrariness of the "baseline" measure in Yudkowsky's measure of optimization.

Sometimes, I am surprised in the moment about how something looks, and I quickly update to believing there's an optimization process behind it. For example, if I climb a hill expecting to see a natural forest, and then instead see a grid of suburban houses or an industrial logging site, I'll immediately realize that there's no way this is random and instead there's an optimization process that I wasn't previously modelling. In cases like this, I think Yudkowsky's measure accurately captures the measure of optimization.

Alternatively, sometimes I'm thinking about optimization processes that I've always known are there, and I'm wondering to myself how powerful they are. For example, sometimes I'll be admiring how competent one of my friends is. To measure their competence, I can imagine what a "typical" person would do in that situation, and check the Yudkowsky measure as a diff. I can feel what you mean about arbitrarily drawing a circle around the known optimizer and then "deleting" it, but this just doesn't feel that weird to me? Like I think the way that people model the world allows them to do this kind of operation with pretty substantially meaningful results.

While it may be clear how to do this in many cases, it isn't clear in general. I suspect if we tried to write down the algorithm for doing it, it would involve an "agency detector" at some point; you have to be able to draw a circle around the agent in order to selectively forget it.

I think this is where Flint's framework was insightful. Instead of "detecting" and "deleting" the optimization process and then measuring the diff, you consider the system of every possible trajectory, measure the optimization of each (with respect to the ordering over states), take the average, and then compare your potential optimizer to this. The potential optimization process will be in that average, but it will be washed out by all the other trajectories (assuming most trajectories don't go up the ordering nearly as much; if they did, then your observed process would rightly not register as an optimizer).

(Obviously this is not helpful for e.g. looking into a neural network and figuring out whether it contains something that will powerfully optimize the world around you. But that's not what this level of the framework is for; this level is for deciding what it even means for something to powerfully optimize something around you.)

Of course, to run this comparison you need a "baseline" of a measure over every possible trajectory. But I think this is just reflecting the true nature of optimization; I think it's only meaningful relative to some other expectation.

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T22:36:10.751Z · LW · GW

I feel like there's a key concept that you're aiming for that isn't quite spelled out in the math.

I remember reading somewhere that there's a typically unmentioned distinction between "Bayes' theorem" and "Bayesian inference". Bayes' theorem is the statement about , which is true from the axioms of probability theory for any  and  whatsoever. Notably, it has nothing to do with time, and it's still true even after you learn . On the other hand, Bayesian inference is the premise your beliefs should change in accordance with Bayes' theorem. Namely that  where  is an observation. That is, when you observe something, you wholesale replace your probability space  with a new probability space  which is calculated by applying the conditional (via Bayes' theorem).

And I think there's a similar thing going on with your definitions of endorsement. While trying to understand the equations, I found it easier to visualize  and  as two separate distributions on the same , where endorsement is simply a consistency condition. For belief consistency, you would just say that  endorses  on event  if .

But that isn't what you wrote; instead you wrote thing this with conditioning on a quoted thing. And of course, the thing I said is symmetrical between  and , whereas your concept of endorsement is not symmetrical. It seems like the intention is that  "learns" or "hears about" 's belief, and then  updates (in the above Bayesian inference sense) to have a new  that has the consistency condition with .

By putting  in the conditional, you're saying that it's an event on , a thing with the same type as . And it feels like that's conceptually correct, but also kind of the hard part. It's as if  is modelling  as an agent embedded into .

Comment by Alex_Altair on 2022 (and All Time) Posts by Pingback Count · 2023-12-16T21:46:38.970Z · LW · GW

You guys could compute a kind of Page Rank for LW posts.

Comment by Alex_Altair on Introduction to abstract entropy · 2023-12-16T19:33:23.859Z · LW · GW

Yeah, So8res wrote that post after reading this one and having a lot of discussion in the comments. That said, my memory was that people eventually convinced him that the title idea in his post was wrong.

Comment by Alex_Altair on Introduction to abstract entropy · 2023-12-16T00:06:15.686Z · LW · GW

[This is a self-review because I see that no one has left a review to move it into the next phase. So8res's comment would also make a great review.]

I'm pretty proud of this post for the level of craftsmanship I was able to put into it. I think it embodies multiple rationalist virtues. It's a kind of "timeless" content, and is a central example of the kind of content people want to see on LW that isn't stuff about AI.

It would also look great printed in a book. :)

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-12T16:44:17.844Z · LW · GW

You can also add the PIBBSS Speaker Events to your calendar through this link.

FYI this link redirects to a UC Berkeley login page.

Comment by Alex_Altair on How I became a person who wakes up early · 2023-12-10T03:56:30.291Z · LW · GW

Two years later, this is still pretty much how much sleep works!

  • Still aging
  • Still do regular morning climbing with my friend twice a week
  • Still hang out with my partner before best most nights
  • Still maintain control through time changes

I never went back into software, so I never again had a 9-5 job. Instead, I'm an independent researcher. In order to further motivate waking up for that, I schedule body-doubling with people on most days of the week, usually starting between 7 and 8:30am. I rarely use melatonin.

My current biggest sleep problem is that, if I don't have climbing, body-doubling, or something else scheduled early, then I usually stay in bed for a while, awake but unproductive. Hm, I haven't used Focusmate in a long time either. Maybe I should try that again?

Comment by Alex_Altair on LW is probably not the place for "I asked this LLM (x) and here's what it said!", but where is? · 2023-12-09T17:21:43.781Z · LW · GW

Isn't the shortform feature perfect for this?

Comment by Alex_Altair on Intro to Naturalism: Orientation · 2023-12-09T03:13:44.286Z · LW · GW

[This is a review for the whole sequence.]

I think of LessWrong as a place whose primary purpose is and always has been to develop the art of rationality. One issue is that this mission tends to attract a certain kind of person -- intelligent, systematizing, deprioritizing social harmony, etc -- and that can make it harder for other kinds of people to participate in the development of the art of rationality. But rationality is for everyone, and ideally the art would be equally accessible to all.

This sequence has many good traits, but one of the most distinguishing is that it wholly legible and welcoming to people not of the aforementioned kind. In a world where huge efforts of cooperation will be needed to ensure a good future, I think this trait takes this sequence worthy of being further showcased!

Comment by Alex_Altair on Toy Models of Superposition · 2023-12-09T02:22:34.959Z · LW · GW

This paper, like others from Anthropic, is is exemplary science and exceptional science communication. The authors are clear, precise and thorough. It is evident that their research motivation is to solve a problem, and not to publish a paper, and that their communication motivation is to help others understand, and not to impress.

Comment by Alex_Altair on How To Go From Interpretability To Alignment: Just Retarget The Search · 2023-12-06T00:20:18.182Z · LW · GW

This post expresses an important idea in AI alignment that I have essentially believed for a long time, and which I have not seen expressed elsewhere. (I think a substantially better treatment of the idea is possible, but this post is fine, and you get a lot of points for being the only place where an idea is being shared.)

Comment by Alex_Altair on Useful Vices for Wicked Problems · 2023-12-06T00:16:01.707Z · LW · GW

Earlier this year I spent a lot of time trying to understand how to do research better. This post was one of the few resources that actually helped. It described several models that I resonated with, but which I had not read anywhere else. It essentially described a lot of the things I was already doing, and this gave me more confidence in deciding to continue doing full time AI alignment research. (It also helps that Karnofsky is an accomplished researcher, and so his advice has more weight!)

Comment by Alex_Altair on The LessWrong 2022 Review · 2023-12-05T20:59:58.569Z · LW · GW

I'm curious what you would estimate the cost of producing the books to be. That is, how much would someone have to donate to pay for Lightcone to produce the books?

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T03:22:57.672Z · LW · GW

I'd like to gain clarity on what we think the relationship should be between AI alignment and agent foundations. To me, the relationship is 1) historical, in that the people bringing about the field of agent foundations are coming from the AI alignment community and 2) motivational, in that the reason they're investigating agent foundations is to make progress on AI alignment, but not 3) technical, in that I think agent foundations should not be about directly answering questions of how to make the development of AI beneficial to humanity. I think it makes more sense to pursue agent foundations as a quest to understand the nature of agents as a technical concept in its own right.

If you are a climate scientist, then you are very likely in the field in order to help humanity reduce the harms from climate change. But on a day-to-day basis, the thing you are doing is trying to understand the underlying patterns and behavior of the climate as a physical system. It would be unnatural to e.g. exclude papers from climate science journals on the grounds of not being clearly applicable to reducing climate change.

For agent foundations, I think some of the core questions revolve around things like, how does having goals work? How stable are goals? How retargetable are goals? Can we make systems that optimize strongly but within certain limitations? But none of those question are are directly about aligning the goals with humanity.

There's also another group of questions like, what are human's goals? How can we tell? How complex and fragile are they? How can we get an AI system to imitate a human? Et cetera. But I think these questions come from a field that is not agent foundations.

There should certainly be constant and heavy communication between these fields. And I also think that even individual people should be thinking about the applicability questions. But they're somewhat separate loops. A climate scientist will have an outer loop that does things like, chooses a research problem because they think the answer might help reduce climate change, and they should keep checking on that belief as they perform their research. But while they're doing their research, I think they should generally be using an inner loop that just thinks, "huh, how does this funny 'climate' thing work?"

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T01:46:26.222Z · LW · GW

FWIW I saw "Anti-MATS" in the sidebar and totally assumed that meant that someone in the dialogue was arguing that the MATS program was bad (instead of discussing the idea of a program that was like MATS but opposite).

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T01:44:47.971Z · LW · GW

Agent foundations is studying a strange alternate world where agents know the source code to themselves and the universe, where perfect predictors exist and so on

I just want to flag that this is very much not a defining characteristic of agent foundations! Some work in agent foundations will make assumptions like this, some won't -- I consider it a major goal of agent foundations to come up with theories that do not rely on assumptions like this.

(Or maybe you just meant those as examples?)

Comment by Alex_Altair on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T06:21:16.315Z · LW · GW

Maybe, "try gaining skill somewhere with lower standards"?

Comment by Alex_Altair on Inositol Non-Results · 2023-11-30T06:11:09.751Z · LW · GW

Somehow I read "non-results" in the title and unthinkingly interpreted it as "we now have more data that says inositol does nothing". Maybe the title could be "still not enough data on insotol"?