Posts

[Talk transcript] What “structure” is and why it matters 2024-07-25T15:49:00.844Z
A simple model of math skill 2024-07-21T18:57:33.697Z
Empirical vs. Mathematical Joints of Nature 2024-06-26T01:55:22.858Z
New intro textbook on AIXI 2024-05-11T18:18:50.945Z
Towards a formalization of the agent structure problem 2024-04-29T20:28:15.190Z
Raemon's Deliberate (“Purposeful?”) Practice Club 2023-11-14T18:24:19.335Z
My research agenda in agent foundations 2023-06-28T18:00:27.813Z
Why don't quantilizers also cut off the upper end of the distribution? 2023-05-15T01:40:50.183Z
Draft: Inferring minimizers 2023-04-01T20:20:48.676Z
Draft: Detecting optimization 2023-03-29T20:17:46.642Z
Draft: The optimization toolbox 2023-03-28T20:40:38.165Z
Draft: Introduction to optimization 2023-03-26T17:25:55.093Z
Dealing with infinite entropy 2023-03-01T15:01:40.400Z
Top YouTube channel Veritasium releases video on Sleeping Beauty Problem 2023-02-11T20:36:57.089Z
My first year in AI alignment 2023-01-02T01:28:03.470Z
How can one literally buy time (from x-risk) with money? 2022-12-13T19:24:06.225Z
Consider using reversible automata for alignment research 2022-12-11T01:00:24.223Z
A dynamical systems primer for entropy and optimization 2022-12-10T00:13:13.984Z
How do finite factored sets compare with phase space? 2022-12-06T20:05:54.061Z
Alex_Altair's Shortform 2022-11-27T18:59:05.193Z
When do you visualize (or not) while doing math? 2022-11-23T20:15:20.885Z
"Normal" is the equilibrium state of past optimization processes 2022-10-30T19:03:19.328Z
Introduction to abstract entropy 2022-10-20T21:03:02.486Z
Deliberate practice for research? 2022-10-08T03:45:21.773Z
How dangerous is human-level AI? 2022-06-10T17:38:27.643Z
I'm trying out "asteroid mindset" 2022-06-03T13:35:48.614Z
Request for small textbook recommendations 2022-05-25T22:19:56.549Z
Why hasn't deep learning generated significant economic value yet? 2022-04-30T20:27:54.554Z
Does the rationalist community have a membership funnel? 2022-04-12T18:44:48.795Z
When to use "meta" vs "self-reference", "recursive", etc. 2022-04-06T04:57:47.405Z
Giving calibrated time estimates can have social costs 2022-04-03T21:23:46.590Z
Ways to invest your wealth if you believe in a high-variance future? 2022-03-11T16:07:49.302Z
How can I see a daily count of all words I type? 2022-02-04T02:05:04.483Z
Tag for AI alignment? 2022-01-02T18:55:45.228Z
Is Omicron less severe? 2021-12-30T23:14:37.292Z
Confusion about Sequences and Review Sequences 2021-12-21T18:13:13.394Z
How I became a person who wakes up early 2021-12-18T18:41:45.732Z
What's the status of third vaccine doses? 2021-08-04T02:22:52.317Z
A new option for building lumenators 2021-07-12T23:45:34.294Z
Bay and/or Global Solstice* Job Search (2021 - 2022) 2021-03-16T00:21:10.290Z
Where does the phrase "central example" come from? 2021-03-12T05:57:49.253Z
One Year of Pomodoros 2020-12-31T04:42:31.274Z
Logistics for the 2020 online Secular Solstice* 2020-12-03T00:08:37.401Z
The Bay Area Solstice 2014-12-03T22:33:17.760Z
Mathematical Measures of Optimization Power 2012-11-24T10:55:17.145Z
Modifying Universal Intelligence Measure 2012-09-18T23:44:08.864Z
An Intuitive Explanation of Solomonoff Induction 2012-07-11T08:05:20.544Z
Should LW have a separate AI section? 2012-07-10T01:42:39.259Z
How Bayes' theorem is consistent with Solomonoff induction 2012-07-09T22:16:02.312Z
Computation Hazards 2012-06-13T21:49:19.986Z

Comments

Comment by Alex_Altair on A simple model of math skill · 2024-07-23T16:54:00.094Z · LW · GW

There is a little crackpot voice in my head that says something like, "the real numbers are dumb and bad and we don't need them!" I don't give it a lot of time, but I do let that voice exist in the back of my mind trying to work out other possible foundations. A related issue here is that it seems to me that one should be able to have a uniform probability distribution over a countable set of numbers. Perhaps one could do that by introducing infinitesimals.

Comment by Alex_Altair on 2022 AI Alignment Course: 5→37% working on AI safety · 2024-06-21T21:20:23.641Z · LW · GW

Agreed the title is confusing. I assumed it meant that some metric was 5% for last year's course, and 37% for this year's course. I think I would just nix numbers from the title altogether.

Comment by Alex_Altair on What distinguishes "early", "mid" and "end" games? · 2024-06-21T21:14:32.965Z · LW · GW

One model I have is that when things are exponentials (or S-curves), it's pretty hard to tell when you're about to leave the "early" game, because exponentials look the same when scaled. If every year has 2x as much activity as the previous year, then every year feels like the one that was the big transition.

For example, it's easy to think that AI has "gone mainstream" now. Which is true according to some order of magnitude. But even though a lot of politicians are talking about AI stuff more often, it's nowhere near the top of the list for most of them. It's more like just one more special interest to sometimes give lip service too, nowhere near issues like US polarization, China, healthcare and climate change.

Of course, AI isn't necessarily well-modelled by an S-curve. Depending on what you're measuring, it could be non-monotonic (with winters and summers). It could also be a hyperbola. And if we all dropped dead in the same minute from nanobots, then there wouldn't really be a mid- or end-game at all. But I currently hold a decent amount of humility around ideas like "we're in midgame now".

Comment by Alex_Altair on [deleted post] 2024-06-11T22:10:42.762Z

(Tiny bug report, I got an email for this comment reply, but I don't see it anywhere in my notifications.)

Comment by Alex_Altair on [deleted post] 2024-06-11T22:10:09.135Z

Done

Comment by Alex_Altair on [deleted post] 2024-06-11T20:43:09.992Z

I propose that this tag be merged into the tag called Infinities In Ethics.

Comment by Alex_Altair on 0. CAST: Corrigibility as Singular Target · 2024-06-08T04:00:09.496Z · LW · GW

3.

3b.*?

Comment by Alex_Altair on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T20:43:05.166Z · LW · GW

How about deconferences?

Comment by Alex_Altair on Open Problems Create Paradigms · 2024-06-05T18:39:40.610Z · LW · GW

I'm noticing what might be a miscommunication/misunderstanding between your comment and the post and Kuhn. It's not that the statement of such open problems creates the paradigm; it's that solutions to those problems creates the paradigm.

The problems exist because the old paradigms (concepts, methods etc) can't solve them. If you can state some open problems such that everyone agrees that those problems matter, and whose solution could be verified by the community, then you've gotten a setup for solutions to create a new paradigm. A solution will necessarily use new concepts and methods. If accepted by the community, these concepts and methods constitute the new paradigm.

(Even this doesn't always work if the techniques can't be carried over to further problems and progress. For example, my impression is that Logical Induction nailed the solution to a legitimately important open problem, but it does not seem that the solution has been of a kind which could be used for further progress.)

Comment by Alex_Altair on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T18:14:49.435Z · LW · GW

Interactively Learning the Ideal Agent Design

Comment by Alex_Altair on Thomas Kwa's Shortform · 2024-06-05T17:34:40.798Z · LW · GW

[Continuing to sound elitist,] I have a related gripe/hot take that comments give people too much karma. I feel like I often see people who are "noisy" in that they comment a lot and have a lot of karma from that,[1] but have few or no valuable posts, and who I also don't have a memory of reading valuable comments from. It makes me feel incentivized to acquire more of a habit of using LW as a social media feed, rather than just commenting when a thought I have passes my personal bar of feeling useful.

  1. ^

    Note that self-karma contributes to a comments position within the sorting, but doesn't contribute to the karma count on your account, so you can't get a bunch of karma just by leaving a bunch of comments that no one upvotes. So these people are getting at least a consolation prize upvote from others.

Comment by Alex_Altair on "Does your paradigm beget new, good, paradigms?" · 2024-06-04T18:55:57.816Z · LW · GW

I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”

One model that I'm currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.

Which is to say, problems are what establishes a paradigm. It's way easier to get a group of people to agree that "thing no go", than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person's ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)

That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track "are problems getting solved", and so it's probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.

(It is possible for the agreed-upon problem to be "everyone is confused", and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it's just pretty uncommon, because people's ontologies can be wildly different.)

When you say, "I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...", how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?

Comment by Alex_Altair on MIRI 2024 Communications Strategy · 2024-05-29T20:46:55.004Z · LW · GW

stable, durable, proactive content – called “rock” content

FWIW this is conventionally called evergreen content.

Comment by Alex_Altair on One way violinists fail · 2024-05-29T17:42:54.932Z · LW · GW

"you're only funky as [the moving average of] your last [few] cut[s]"

Somehow this is in a <a> link tag with no href attribute.

Comment by Alex_Altair on When is Goodhart catastrophic? · 2024-05-28T19:42:58.370Z · LW · GW

I finally got around to reading this sequence, and I really like the ideas behind these methods. This feels like someone actually trying to figure out exactly how fragile human values are. It's especially exciting because it seems like it hooks right into an existing, normal field of academia (thus making it easier to leverage their resources toward alignment).

I do have one major issue with how the takeaway is communicated, starting with the term "catastrophic". I would only use that word when the outcome of the optimization is really bad, much worse that "average" in some sense. That's in line with the idea that the AI will "use the atoms for something else", and not just leave us alone to optimize its own thing. But the theorems in this sequence don't seem to be about that; 

We call this catastrophic Goodhart because the end result, in terms of , is as bad as if we hadn't conditioned at all.

Being as bad as if you hadn't optimized at all isn't very bad; it's where we started from!

I think this has almost the opposite takeaway from the intended one. I can imagine someone (say, OpenAI) reading these results and thinking something like, great! They just proved that in the worst case scenario, we do no harm. Full speed ahead!

(Of course, putting a bunch of optimization power into something and then getting no result would still be a waste of the resources put into it, which is presumably not built into . But that's still not very bad.)

That said, my intuition says that these same techniques could also suss out the cases where optimizing for  pessimizes for , in the previously mentioned use-our-atoms sense.

Comment by Alex_Altair on Catastrophic Goodhart in RL with KL penalty · 2024-05-28T15:51:19.636Z · LW · GW

Does the notation get flipped at some point? In the abstract you say

prior policy 

and

there are arbitrarily well-performing policies 

But then later you say

This strongly penalizes  taking actions the base policy never takes

Which makes it sound like they're switched.

I also notice that you call it "prior policy", "base policy" and "reference policy" at different times; these all make sense but it'd be a bit nicer if there was one phrase used consistently.

Comment by Alex_Altair on Computational Mechanics Hackathon (June 1 & 2) · 2024-05-25T01:58:25.044Z · LW · GW

I'm curious if you knowingly scheduled this during LessOnline?

Comment by Alex_Altair on Towards a formalization of the agent structure problem · 2024-05-19T20:21:56.518Z · LW · GW

Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.

Comment by Alex_Altair on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-19T01:05:38.915Z · LW · GW

Hey Johannes, I don't quite know how to say this, but I think this post is a red flag about your mental health. "I work so hard that I ignore broken glass and then walk on it" is not healthy.

I've been around the community a long time and have seen several people have psychotic episodes. This is exactly the kind of thing I start seeing before they do.

I'm not saying it's 90% likely, or anything. Just that it's definitely high enough for me to need to say something. Please try to seek out some resources to get you more grounded.

Comment by Alex_Altair on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T19:44:22.816Z · LW · GW

I really appreciate this comment!

And yeah, that's why I said only "Note that...", and not something like "don't trust this guy". I think the content of the article is probably true, and maybe it's Metz who wrote it just because AI is his beat. But I do also hold tiny models that say "maybe he dislikes us" and also something about the "questionable understanding" etc that habryka mentions below. AFAICT I'm not internally seething or anything, I just have a yellow-flag attached to this name.

Comment by Alex_Altair on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T01:38:05.681Z · LW · GW

Note that the NYT article is by Cade Metz.

Comment by Alex_Altair on New intro textbook on AIXI · 2024-05-14T15:40:18.509Z · LW · GW

I think the biggest thing I like about it is that it exists! Someone tried to make a fully formalized agent model, and it worked. As mentioned above it's got some big problems, but it helps enormously to have some ground to stand on to try to build on further.

Comment by Alex_Altair on Partitioned Book Club · 2024-05-13T05:55:44.258Z · LW · GW

I love this idea!

Some other books this could work for:

  • The Ancestor's Tale
  • The Art of Game Design
  • The Anthropocene Reviewed
  • The LessWrong review books 😉

Many textbooks have a few initial "core" chapters, and then otherwise a bunch of independent chapters on applications or assorted advanced topics.

Comment by Alex_Altair on Open Thread Spring 2024 · 2024-05-11T05:54:12.198Z · LW · GW

You can "bookmark" a post, is that equivalent to your desired "read later"?

Comment by Alex_Altair on New to this community · 2024-05-11T05:18:18.324Z · LW · GW

Welcome kjsisco! One good place to start interacting with others here is on the current open thread.

Comment by Alex_Altair on Manifund Q1 Retro: Learnings from impact certs · 2024-05-02T17:23:20.503Z · LW · GW

[link to about]

Link missing

Comment by Alex_Altair on Towards a formalization of the agent structure problem · 2024-04-30T17:16:21.865Z · LW · GW

Hm... so anything that measures degree of agent structure should register a policy with a sub-agent as having some agent structure. But yeah, I haven't thought much about the scenarios where there are multiple agents inside the policy. The agent structure problem is trying to use performance to find a minimum measure of agent structure. So if there was an agent hiding in there that didn't impact the performance during the measured time interval, then it wouldn't be detected (although it would detect it "in the limit").

That said, we're not actually talking about how to measure degree of agent structure yet. It seems plausible to me that whatever method one uses to do that could be adapted to find multiple agents.

Comment by Alex_Altair on Forget Everything (Statistical Mechanics Part 1) · 2024-04-25T03:42:15.042Z · LW · GW

This will make more sense if you have a basic grasp on quantum mechanics, but if you're willing to accept "energy comes in discrete units" as a premise then you should be mostly fine.

My current understanding is that QM is not-at-all needed to make sense of stat mech. Instead, the thing where energy is equally likely to be in any of the degrees of freedom just comes from using a measure over your phase space such that the dynamical law of your system preservers that measure!

Comment by Alex_Altair on Express interest in an "FHI of the West" · 2024-04-19T15:09:12.656Z · LW · GW

Maybe it could be FLCI to avoid collision with the existing FLI.

Comment by Alex_Altair on Express interest in an "FHI of the West" · 2024-04-18T14:25:14.595Z · LW · GW

I also think the name is off, but for a different reason. When I hear "the west" with no other context, I assume it means this, which doesn't make sense here, because the UK and FHI are very solidly part of The West. (I have not heard the "Harvard of the west" phrase and I'm guessing it's pretty darn obscure, especially to the international audience of LW.)

Comment by Alex_Altair on LessOnline (May 31—June 2, Berkeley, CA) · 2024-03-28T00:56:08.126Z · LW · GW

Feedback on the website: it's not clear to me what the difference is between LessOnline and the summer camp right after. Is the summer camp only something you go to if you're also going to Manifest? Is it the same as LessOnline but longer?

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T02:41:50.396Z · LW · GW

Oh, no, I'm saying it's more like 2^8 afterwards. (Obviously it's more than that but I think closer to 8 than a million.) I think having functioning vision at all brings it down to, I dunno, 2^10000. I think you would be hard pressed to name 500 attributes of mammals that you need to pay attention to to learn a new species.

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T02:28:18.968Z · LW · GW

We then get around the 2^8000000 problem by having only a relatively very very small set of candidate “things” to which words might be attached.

A major way that we get around this is by having hierarchical abstractions. By the time I'm learning "dog" from 1-5 examples, I've already done enormous work in learning about objects, animals, something-like-mammals, heads, eyes, legs, etc. So when you point at five dogs and say "those form a group" I've already forged abstractions that handle almost all the information that makes them worth paying attention to, and now I'm just paying attention to a few differences from other mammals, like size, fur color, ear shape, etc.

I'm not sure how the rest of this post relates to this, but it didn't feel present; maybe it's one of the umpteenth things you left out for the sake of introductory exposition.

Comment by Alex_Altair on Natural Latents: The Concepts · 2024-03-25T01:31:05.750Z · LW · GW

I've noticed you using the word "chaos" a few times across your posts. I think you're using it colloquially to mean something like "rapidly unpredictable", but it does have a technical meaning that doesn't always line up with how you use it, so it might be useful to distinguish it from a couple other things. Here's my current understanding of what some things mean. (All of these definitions and implications depend on a pile of finicky math and tend to have surprising counter-example if you didn't define things just right, and definitions vary across sources.)

 

Sensitive to initial conditions. A system is sensitive to initial conditions if two points in its phase space will eventually diverge exponentially (at least) over time. This is one way to say that you'll rapidly lose information about a system, but it doesn't have to look chaotic. For example, say you have a system whose phase space is just the real line, and its dynamics over time is just that points get 10x farther from the origin every time step. Then, if you know the value of a point to ten decimal places of precision, after ten time steps you only know one decimal place of precision. (Although there are regions of the real line where you're still sure it doesn't reside, for example you're sure it's not closer to the origin.)

Ergodic. A system is ergodic if (almost) every point in phase space will trace out a trajectory that gets arbitrarily close to every other point. This means that each point is some kind of chaotically unpredictable, because if it's been going for a while and you're not tracking it, you'll eventually end up with maximum uncertainty about where it is. But this doesn't imply sensitivity to initial conditions; there are systems that are ergodic, but where any pair of points will stay the same distance from each other. A simple example is where phase space is a circle, and the dynamics are that on each time step, you rotate each point around the circle by an irrational angle.

Chaos. The formal characterization that people assign to this word was an active research topic for decades, but I think it's mostly settled now. My understanding is that it essentially means this;

  1. Your system has at least one point whose trajectory is ergodic, that is, it will get arbitrarily close to every other point in the phase space
  2. For every natural number n, there is a point in the phase space whose trajectory is periodic with period n. That is, after n time steps (and not before), it will return back exactly where it started. (Further, these periodic points are "dense", that is, every point in phase space has periodic points arbitrarily close to it).

The reason these two criteria yield (colloquially) chaotic behavior is, I think, reasonably intuitively understandable. Take a random point in its phase space. Assume it isn't one with a periodic trajectory (which will be true with "probability 1"). Instead it will be ergodic. That means it will eventually get arbitrarily close to all other points. But consider what happens when it gets close to one of the periodic trajectories; it will, at least for a while, act almost as though it has that period, until it drifts sufficiently far away. (This is using an unstated assumption that the dynamics of the systems have a property where nearby points act similarly.) But it will eventually do this for every periodic trajectory. Therefore, there will be times when it's periodic very briefly, and times when it's periodic for a long time, et cetera. This makes it pretty unpredictable.

 

There are also connections between the above. You might have noticed that my example of a system that was sensitive to initial conditions but not ergodic or chaotic relied on having an unbounded phase space, where the two points both shot off to infinity. I think that if you have sensitivity to initial conditions and a bounded phase space, then you generally also have ergodic and chaotic behavior.

Anyway, I think "chaos" is a sexy/popular term to use to describe vaguely unpredictable systems, but almost all of the time you don't actually need to rely on the full technical criteria of it. I think this could be important for not leading readers into red-herring trails of investigation. For example, all of standard statistical mechanics only needs ergodicity.

Comment by Alex_Altair on Alex_Altair's Shortform · 2024-03-05T20:05:53.130Z · LW · GW

Has anyone checked out Nassim Nicholas Taleb's book Statistical Consequences of Fat Tails? I'm wondering where it lies on the spectrum from textbook to prolonged opinion piece. I'd love to read a textbook about the title.

Comment by Alex_Altair on Voting Results for the 2022 Review · 2024-02-28T17:48:10.468Z · LW · GW

Just noticing that every post has at least one negative vote, which feels interesting for some reason.

Comment by Alex_Altair on Dual Wielding Kindle Scribes · 2024-02-23T01:49:09.076Z · LW · GW

The e-ink tablet market has really diversified recently. I'd recommend that anyone interested look around at the options. My impression is that the Kindle Scribe is one of the least good ones (which doesn't mean it's bad).

Comment by Alex_Altair on Fixing The Good Regulator Theorem · 2024-02-20T16:33:56.502Z · LW · GW

Here's the arxiv version of the paper, with a bunch more content in appendices.

Comment by Alex_Altair on Where is the Town Square? · 2024-02-14T14:57:49.847Z · LW · GW

And, since I can't do everything: what popular platforms shouldn't I prioritize?

I think cross-posting between twitter, mastodon and bluesky would be pretty easy. And it would let you gather your own data on which platforms are worth continuing.

Comment by Alex_Altair on Choosing a book on causality · 2024-02-08T05:58:45.488Z · LW · GW

I looked at these several months ago and unfortunately recommend neither. Pearl's Causality is very dense, and not really a good introduction. The Primer is really egregiously riddled with errors; there seems to have been some problem with the publisher. And on top of that, I just found it not very well written.

I don't have a specific recommendation, but I believe that at this point there are a bunch of statistics textbooks that competently discuss the essential content of causal modelling; maybe check the reviews for some of those on amazon.

Comment by Alex_Altair on D0TheMath's Shortform · 2024-01-14T02:03:22.128Z · LW · GW

One way that the analogy with code doesn't carry over is that in math, you often can't even being to use a theorem if you don't know a lot of detail about what the objects in the theorem mean, and often knowing what they mean is pretty close to knowing why the theorem's you're building on are true. Being handed a theorem is less like being handed an API and more like being handed a sentence in a foreign language. I can't begin to make use of the information content in the sentence until I learn what every symbol means and how the grammar works, and at that point I could have written the sentence myself.

Comment by Alex_Altair on The Perceptron Controversy · 2024-01-11T21:30:37.813Z · LW · GW

I'd recommend porting it over as a sequence instead of one big post (or maybe just port the first chunk as an intro post?). LW doesn't have a citation format, but you can use footnotes for it (and you can use the same footnote number in multiple places).

Comment by Alex_Altair on A model of research skill · 2024-01-10T22:23:05.541Z · LW · GW

I had a side project to get better at research in 2023. I found very little resources that were actually helpful to me. But here are some that I liked. 

  • A few posts by Holden Karnofsky on Cold Takes, especially Useful Vices for Wicked Problems and Learning By Writing.
  • Diving into deliberate practice. Most easily read is the popsci book Peak. This book emphasizes "mental representations", which I find the most useful part of the method, though I think it's also the least supported by the science.
  • The popsci book Grit.
  • The book Ultralearning. Extremely skimmable, large collection of heuristics that I find essential for the "lean" style of research.
  • Reading a scattering of historical accounts of how researchers did their research, and how it came to be useful. (E.g. Newton, Einstein, Erdős, Shannon, Kolmogorov, and a long tail of less big names.)

(Many resources were not helpful for me for reasons that might not apply to others; I was already doing what they advised, or they were about how to succeed inside academia, or they were about emotional problems like lack of confidence or burnout. But, I think mostly I failed to find good resources because no one knows how to do good research.)

Comment by Alex_Altair on New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" · 2024-01-02T22:47:59.119Z · LW · GW

Finally, I want to note an aspect of the discussion in the report that makes me quite uncomfortable: namely, it seems plausible to me that in addition to potentially posing existential risks to humanity, the sorts of AIs discussed in the report might well be moral patients in their own right.

I strongly appreciate this paragraph for stating this concern so emphatically. I think this possibility is strongly under-represented in the AI safety discussion as a whole.

Comment by Alex_Altair on The Plan - 2023 Version · 2024-01-02T22:01:19.617Z · LW · GW

I agree there's a core principle somewhere around the idea of "controllable implies understandable". But when I think about this with respect to humans studying biology, then there's another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it's not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn't mean that our goal, "reliably stopping cancer" is understandable, since it is not a way that the system is controlling itself.

This mismatch seems pretty evidently applicable to AI alignment.

And perhaps the "environment" part is critical. A system being controllable in one environment doesn't imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.

Comment by Alex_Altair on Meaning & Agency · 2024-01-02T18:25:11.922Z · LW · GW

Looking back at Flint's work, I don't agree with this summary.

Ah, sorry, I wasn't intending for that to be a summary. I found Flint's framework very insightful, but after reading it I sort of just melded it into my own overall beliefs and understanding around optimization. I don't think he intended it to be a coherent or finished framework on its own, so I don't generally try to think "what does Flint's framework say about X?". I think its main influence on me was the whole idea of using dynamical systems and phase space as the basis for optimization. So for example;

In any case, I agree that Flint's work also eliminates the need for an unnatural baseline in which we have to remove the agent.

I would say that working in the framework of dynamical systems is what lets one get a natural baseline against which to measure optimization, by comparing a given trajectory with all possible trajectories.

I think I could have some more response/commentary about each of your bullet points, but there's a background overarching thing that may be more useful to prod at. I have a clear (-feeling-to-me) distinction between "optimization" and "agent", which doesn't seem to be how you're using the words. The dynamical systems + Yudkowsky measure perspective is a great start on capturing the optimization concept, but it is agnostic about (my version of) the agent concept (except insofar as agents are a type of optimizer). It feels to me like the idea of endorsement you're developing here is cool and useful and is... related to optimization, but isn't the basis of optimization. So I agree that e.g. "endorsement" is closer to alignment, but also I don't think that "optimization" is supposed to be all that close to alignment; I'd reserve that for "agent". I think we'll need a few levels of formalization in agent foundations, and you're working toward a different level than those, and so these ideas aren't in conflict.

Breaking that down just a bit more; let's say that "alignment" refers to aligning the intentional goals of agents. I'd say that "optimization" is a more general phenomenon where some types of systems tend to move their state up an ordering; but that doesn't mean that it's "intentional", nor that that goal is cleanly encoded somewhere inside the system. So while you could say that two optimizing systems "are more aligned" if they move up similar state orderings, it would be awkward to talk about aligning them.

(My notion of) optimization has its own version of the thing you're calling "Vingean", which is that if I believe a process optimizes along a certain state ordering, but I have no beliefs about how it works on the inside, then I can still at least predict that the state will go up the ordering. I can predict that the car will arrive at the airport even though I don't know the turns. But this has nothing to do with the (optimization) process having beliefs or doing reasoning of any kind (which I think of as agent properties). For example I believe that there exists an optimization process such that mountains get worn down, and so I will predict it to happen, even though I know very little about the chemistry of erosion or rocks. And this is kinda like "endorsement", but it's not that the mountain has probability assignments or anything.

In fact I think it's just a version of what makes something a good abstraction; an abstraction is a compact model that allows you to make accurate predictions about outcomes without having to predict all intermediate steps. And all abstractions also have the property that if you have enough compute/etc. then you can just directly calculate the outcome based on lower-level physics, and don't need the abstraction to predict the outcome accurately.

I think that was a longer-winded way to say that I don't think your concepts in this post are replacements for the Yudkowsky/Flint optimization ideas; instead it sounds like you're saying "Assume the optimization process is of the kind that has beliefs and takes actions. Then we can define 'endorsement' as follows; ..."

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:16:57.777Z · LW · GW

What's your preferred response/solution to ~"problems"(?) of events that have probability zero but occur nevertheless

My impression is that people have generally agreed that this paradox is resolved (=formally grounded) by measure theory. I know enough measure theory to know what it is but haven't gone out of my way to explore the corners of said paradoxes.

But you might be asking me about it in the framework of Yudkowsky's measure of optimization. Let's say the states are the real numbers in [0, 1] and the relevant ordering is the same as the one on the real numbers, and we're using the uniform measure over it. Then, even though the probability of getting any specific real number is zero, the probability mass we use to calculate bit of optimization power is all the probability mass below that number. In that case, all the resulting numbers would imply finite optimization power. ... except if we got the result that was exactly the number 0. But in that case, that would actually be infinitely surprising! And so the fact that the measure of optimization returns infinity bits reflects intuition.

It's (probably) true that our physical reality has only finite precision

I'm also not a physicist but my impression is that physicists generally believe that the world does actually have infinite precision.

I'd also guess that the description length of (a computable version of) the standard model as-is (which includes infinite precision because it uses the real number system) has lower K-complexity than whatever comparable version of physics where you further specify a finite precision.

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:04:35.980Z · LW · GW

I don't understand this part. How does probability mass constrain how "bad" the states can get? Could you rephrase this maybe?

The probability mass doesn't constraint how "bad" the states can get; I was saying that the fact that there's only 1 unit of probability mass means that the amount of probability mass on lower states is bounded (by 1).

Restricting the formalism to orderings means that there is no meaning to how bad a state is, only a meaning to whether it is better or worse than another state. (You can additionally decide on a measure of how bad, as long as it's consistent with the ordering, but we don't need that to analyze (this concept of) optimization.)

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:43:18.828Z · LW · GW

I'll also note that I think what you're calling "Vingean agency" is a notable sub-type of optimization process that you've done a good job at analyzing here. But it's definitely not the definition of optimization or agency to me. For example, in the post you say

We perceive agency when something is better at doing something than us; we endorse some aspect of its reasoning or activity.

This doesn't feel true to me (in the carve-nature-at-its-joints sense). I think children are strongly agents, even though I do everything more competently than they do.

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:36:05.097Z · LW · GW

I have some comments on the arbitrariness of the "baseline" measure in Yudkowsky's measure of optimization.

Sometimes, I am surprised in the moment about how something looks, and I quickly update to believing there's an optimization process behind it. For example, if I climb a hill expecting to see a natural forest, and then instead see a grid of suburban houses or an industrial logging site, I'll immediately realize that there's no way this is random and instead there's an optimization process that I wasn't previously modelling. In cases like this, I think Yudkowsky's measure accurately captures the measure of optimization.

Alternatively, sometimes I'm thinking about optimization processes that I've always known are there, and I'm wondering to myself how powerful they are. For example, sometimes I'll be admiring how competent one of my friends is. To measure their competence, I can imagine what a "typical" person would do in that situation, and check the Yudkowsky measure as a diff. I can feel what you mean about arbitrarily drawing a circle around the known optimizer and then "deleting" it, but this just doesn't feel that weird to me? Like I think the way that people model the world allows them to do this kind of operation with pretty substantially meaningful results.

While it may be clear how to do this in many cases, it isn't clear in general. I suspect if we tried to write down the algorithm for doing it, it would involve an "agency detector" at some point; you have to be able to draw a circle around the agent in order to selectively forget it.

I think this is where Flint's framework was insightful. Instead of "detecting" and "deleting" the optimization process and then measuring the diff, you consider the system of every possible trajectory, measure the optimization of each (with respect to the ordering over states), take the average, and then compare your potential optimizer to this. The potential optimization process will be in that average, but it will be washed out by all the other trajectories (assuming most trajectories don't go up the ordering nearly as much; if they did, then your observed process would rightly not register as an optimizer).

(Obviously this is not helpful for e.g. looking into a neural network and figuring out whether it contains something that will powerfully optimize the world around you. But that's not what this level of the framework is for; this level is for deciding what it even means for something to powerfully optimize something around you.)

Of course, to run this comparison you need a "baseline" of a measure over every possible trajectory. But I think this is just reflecting the true nature of optimization; I think it's only meaningful relative to some other expectation.