Posts

Raemon's Deliberate (“Purposeful?”) Practice Club 2023-11-14T18:24:19.335Z
My research agenda in agent foundations 2023-06-28T18:00:27.813Z
Why don't quantilizers also cut off the upper end of the distribution? 2023-05-15T01:40:50.183Z
Draft: Inferring minimizers 2023-04-01T20:20:48.676Z
Draft: Detecting optimization 2023-03-29T20:17:46.642Z
Draft: The optimization toolbox 2023-03-28T20:40:38.165Z
Draft: Introduction to optimization 2023-03-26T17:25:55.093Z
Dealing with infinite entropy 2023-03-01T15:01:40.400Z
Top YouTube channel Veritasium releases video on Sleeping Beauty Problem 2023-02-11T20:36:57.089Z
My first year in AI alignment 2023-01-02T01:28:03.470Z
How can one literally buy time (from x-risk) with money? 2022-12-13T19:24:06.225Z
Consider using reversible automata for alignment research 2022-12-11T01:00:24.223Z
A dynamical systems primer for entropy and optimization 2022-12-10T00:13:13.984Z
How do finite factored sets compare with phase space? 2022-12-06T20:05:54.061Z
Alex_Altair's Shortform 2022-11-27T18:59:05.193Z
When do you visualize (or not) while doing math? 2022-11-23T20:15:20.885Z
"Normal" is the equilibrium state of past optimization processes 2022-10-30T19:03:19.328Z
Introduction to abstract entropy 2022-10-20T21:03:02.486Z
Deliberate practice for research? 2022-10-08T03:45:21.773Z
How dangerous is human-level AI? 2022-06-10T17:38:27.643Z
I'm trying out "asteroid mindset" 2022-06-03T13:35:48.614Z
Request for small textbook recommendations 2022-05-25T22:19:56.549Z
Why hasn't deep learning generated significant economic value yet? 2022-04-30T20:27:54.554Z
Does the rationalist community have a membership funnel? 2022-04-12T18:44:48.795Z
When to use "meta" vs "self-reference", "recursive", etc. 2022-04-06T04:57:47.405Z
Giving calibrated time estimates can have social costs 2022-04-03T21:23:46.590Z
Ways to invest your wealth if you believe in a high-variance future? 2022-03-11T16:07:49.302Z
How can I see a daily count of all words I type? 2022-02-04T02:05:04.483Z
Tag for AI alignment? 2022-01-02T18:55:45.228Z
Is Omicron less severe? 2021-12-30T23:14:37.292Z
Confusion about Sequences and Review Sequences 2021-12-21T18:13:13.394Z
How I became a person who wakes up early 2021-12-18T18:41:45.732Z
What's the status of third vaccine doses? 2021-08-04T02:22:52.317Z
A new option for building lumenators 2021-07-12T23:45:34.294Z
Bay and/or Global Solstice* Job Search (2021 - 2022) 2021-03-16T00:21:10.290Z
Where does the phrase "central example" come from? 2021-03-12T05:57:49.253Z
One Year of Pomodoros 2020-12-31T04:42:31.274Z
Logistics for the 2020 online Secular Solstice* 2020-12-03T00:08:37.401Z
The Bay Area Solstice 2014-12-03T22:33:17.760Z
Mathematical Measures of Optimization Power 2012-11-24T10:55:17.145Z
Modifying Universal Intelligence Measure 2012-09-18T23:44:08.864Z
An Intuitive Explanation of Solomonoff Induction 2012-07-11T08:05:20.544Z
Should LW have a separate AI section? 2012-07-10T01:42:39.259Z
How Bayes' theorem is consistent with Solomonoff induction 2012-07-09T22:16:02.312Z
Computation Hazards 2012-06-13T21:49:19.986Z
How do you notice when you're procrastinating? 2012-03-02T09:25:08.917Z
[LINK] The NYT on Everyday Habits 2012-02-18T08:23:32.820Z
[LINK] Learning enhancement using "transcranial direct current stimulation" 2012-01-26T16:18:55.714Z
[LINK] Cryo Comic 2011-12-12T05:31:12.630Z
Free Online Stanford Courses: AI and Machine Learning 2011-09-10T20:58:52.409Z

Comments

Comment by Alex_Altair on Alex_Altair's Shortform · 2024-03-05T20:05:53.130Z · LW · GW

Has anyone checked out Nassim Nicholas Taleb's book Statistical Consequences of Fat Tails? I'm wondering where it lies on the spectrum from textbook to prolonged opinion piece. I'd love to read a textbook about the title.

Comment by Alex_Altair on Voting Results for the 2022 Review · 2024-02-28T17:48:10.468Z · LW · GW

Just noticing that every post has at least one negative vote, which feels interesting for some reason.

Comment by Alex_Altair on Dual Wielding Kindle Scribes · 2024-02-23T01:49:09.076Z · LW · GW

The e-ink tablet market has really diversified recently. I'd recommend that anyone interested look around at the options. My impression is that the Kindle Scribe is one of the least good ones (which doesn't mean it's bad).

Comment by Alex_Altair on Fixing The Good Regulator Theorem · 2024-02-20T16:33:56.502Z · LW · GW

Here's the arxiv version of the paper, with a bunch more content in appendices.

Comment by Alex_Altair on Where is the Town Square? · 2024-02-14T14:57:49.847Z · LW · GW

And, since I can't do everything: what popular platforms shouldn't I prioritize?

I think cross-posting between twitter, mastodon and bluesky would be pretty easy. And it would let you gather your own data on which platforms are worth continuing.

Comment by Alex_Altair on Choosing a book on causality · 2024-02-08T05:58:45.488Z · LW · GW

I looked at these several months ago and unfortunately recommend neither. Pearl's Causality is very dense, and not really a good introduction. The Primer is really egregiously riddled with errors; there seems to have been some problem with the publisher. And on top of that, I just found it not very well written.

I don't have a specific recommendation, but I believe that at this point there are a bunch of statistics textbooks that competently discuss the essential content of causal modelling; maybe check the reviews for some of those on amazon.

Comment by Alex_Altair on D0TheMath's Shortform · 2024-01-14T02:03:22.128Z · LW · GW

One way that the analogy with code doesn't carry over is that in math, you often can't even being to use a theorem if you don't know a lot of detail about what the objects in the theorem mean, and often knowing what they mean is pretty close to knowing why the theorem's you're building on are true. Being handed a theorem is less like being handed an API and more like being handed a sentence in a foreign language. I can't begin to make use of the information content in the sentence until I learn what every symbol means and how the grammar works, and at that point I could have written the sentence myself.

Comment by Alex_Altair on The Perceptron Controversy · 2024-01-11T21:30:37.813Z · LW · GW

I'd recommend porting it over as a sequence instead of one big post (or maybe just port the first chunk as an intro post?). LW doesn't have a citation format, but you can use footnotes for it (and you can use the same footnote number in multiple places).

Comment by Alex_Altair on A model of research skill · 2024-01-10T22:23:05.541Z · LW · GW

I had a side project to get better at research in 2023. I found very little resources that were actually helpful to me. But here are some that I liked. 

  • A few posts by Holden Karnofsky on Cold Takes, especially Useful Vices for Wicked Problems and Learning By Writing.
  • Diving into deliberate practice. Most easily read is the popsci book Peak. This book emphasizes "mental representations", which I find the most useful part of the method, though I think it's also the least supported by the science.
  • The popsci book Grit.
  • The book Ultralearning. Extremely skimmable, large collection of heuristics that I find essential for the "lean" style of research.
  • Reading a scattering of historical accounts of how researchers did their research, and how it came to be useful. (E.g. Newton, Einstein, Erdős, Shannon, Kolmogorov, and a long tail of less big names.)

(Many resources were not helpful for me for reasons that might not apply to others; I was already doing what they advised, or they were about how to succeed inside academia, or they were about emotional problems like lack of confidence or burnout. But, I think mostly I failed to find good resources because no one knows how to do good research.)

Comment by Alex_Altair on New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?" · 2024-01-02T22:47:59.119Z · LW · GW

Finally, I want to note an aspect of the discussion in the report that makes me quite uncomfortable: namely, it seems plausible to me that in addition to potentially posing existential risks to humanity, the sorts of AIs discussed in the report might well be moral patients in their own right.

I strongly appreciate this paragraph for stating this concern so emphatically. I think this possibility is strongly under-represented in the AI safety discussion as a whole.

Comment by Alex_Altair on The Plan - 2023 Version · 2024-01-02T22:01:19.617Z · LW · GW

I agree there's a core principle somewhere around the idea of "controllable implies understandable". But when I think about this with respect to humans studying biology, then there's another thought that comes to my mind; the things we want to control are not necessarily the things the system itself is controlling. For example, we would like to control the obesity crisis (and weight loss in general) but it's not clear that the biological system itself is controlling that. It almost certainly was successfully controlling it in the ancestral environment (and therefore it was understandable within that environment) but perhaps the environment has changed enough that it is now uncontrollable (and potentially not understandable). Cancer manages to successfully control the system in the sense of causing itself to happen, but that doesn't mean that our goal, "reliably stopping cancer" is understandable, since it is not a way that the system is controlling itself.

This mismatch seems pretty evidently applicable to AI alignment.

And perhaps the "environment" part is critical. A system being controllable in one environment doesn't imply it being controllable in a different (or broader) environment, and thus guaranteed understandability is also lost. This feels like an expression of misgeneralization.

Comment by Alex_Altair on Meaning & Agency · 2024-01-02T18:25:11.922Z · LW · GW

Looking back at Flint's work, I don't agree with this summary.

Ah, sorry, I wasn't intending for that to be a summary. I found Flint's framework very insightful, but after reading it I sort of just melded it into my own overall beliefs and understanding around optimization. I don't think he intended it to be a coherent or finished framework on its own, so I don't generally try to think "what does Flint's framework say about X?". I think its main influence on me was the whole idea of using dynamical systems and phase space as the basis for optimization. So for example;

In any case, I agree that Flint's work also eliminates the need for an unnatural baseline in which we have to remove the agent.

I would say that working in the framework of dynamical systems is what lets one get a natural baseline against which to measure optimization, by comparing a given trajectory with all possible trajectories.

I think I could have some more response/commentary about each of your bullet points, but there's a background overarching thing that may be more useful to prod at. I have a clear (-feeling-to-me) distinction between "optimization" and "agent", which doesn't seem to be how you're using the words. The dynamical systems + Yudkowsky measure perspective is a great start on capturing the optimization concept, but it is agnostic about (my version of) the agent concept (except insofar as agents are a type of optimizer). It feels to me like the idea of endorsement you're developing here is cool and useful and is... related to optimization, but isn't the basis of optimization. So I agree that e.g. "endorsement" is closer to alignment, but also I don't think that "optimization" is supposed to be all that close to alignment; I'd reserve that for "agent". I think we'll need a few levels of formalization in agent foundations, and you're working toward a different level than those, and so these ideas aren't in conflict.

Breaking that down just a bit more; let's say that "alignment" refers to aligning the intentional goals of agents. I'd say that "optimization" is a more general phenomenon where some types of systems tend to move their state up an ordering; but that doesn't mean that it's "intentional", nor that that goal is cleanly encoded somewhere inside the system. So while you could say that two optimizing systems "are more aligned" if they move up similar state orderings, it would be awkward to talk about aligning them.

(My notion of) optimization has its own version of the thing you're calling "Vingean", which is that if I believe a process optimizes along a certain state ordering, but I have no beliefs about how it works on the inside, then I can still at least predict that the state will go up the ordering. I can predict that the car will arrive at the airport even though I don't know the turns. But this has nothing to do with the (optimization) process having beliefs or doing reasoning of any kind (which I think of as agent properties). For example I believe that there exists an optimization process such that mountains get worn down, and so I will predict it to happen, even though I know very little about the chemistry of erosion or rocks. And this is kinda like "endorsement", but it's not that the mountain has probability assignments or anything.

In fact I think it's just a version of what makes something a good abstraction; an abstraction is a compact model that allows you to make accurate predictions about outcomes without having to predict all intermediate steps. And all abstractions also have the property that if you have enough compute/etc. then you can just directly calculate the outcome based on lower-level physics, and don't need the abstraction to predict the outcome accurately.

I think that was a longer-winded way to say that I don't think your concepts in this post are replacements for the Yudkowsky/Flint optimization ideas; instead it sounds like you're saying "Assume the optimization process is of the kind that has beliefs and takes actions. Then we can define 'endorsement' as follows; ..."

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:16:57.777Z · LW · GW

What's your preferred response/solution to ~"problems"(?) of events that have probability zero but occur nevertheless

My impression is that people have generally agreed that this paradox is resolved (=formally grounded) by measure theory. I know enough measure theory to know what it is but haven't gone out of my way to explore the corners of said paradoxes.

But you might be asking me about it in the framework of Yudkowsky's measure of optimization. Let's say the states are the real numbers in [0, 1] and the relevant ordering is the same as the one on the real numbers, and we're using the uniform measure over it. Then, even though the probability of getting any specific real number is zero, the probability mass we use to calculate bit of optimization power is all the probability mass below that number. In that case, all the resulting numbers would imply finite optimization power. ... except if we got the result that was exactly the number 0. But in that case, that would actually be infinitely surprising! And so the fact that the measure of optimization returns infinity bits reflects intuition.

It's (probably) true that our physical reality has only finite precision

I'm also not a physicist but my impression is that physicists generally believe that the world does actually have infinite precision.

I'd also guess that the description length of (a computable version of) the standard model as-is (which includes infinite precision because it uses the real number system) has lower K-complexity than whatever comparable version of physics where you further specify a finite precision.

Comment by Alex_Altair on Draft: Introduction to optimization · 2023-12-27T21:04:35.980Z · LW · GW

I don't understand this part. How does probability mass constrain how "bad" the states can get? Could you rephrase this maybe?

The probability mass doesn't constraint how "bad" the states can get; I was saying that the fact that there's only 1 unit of probability mass means that the amount of probability mass on lower states is bounded (by 1).

Restricting the formalism to orderings means that there is no meaning to how bad a state is, only a meaning to whether it is better or worse than another state. (You can additionally decide on a measure of how bad, as long as it's consistent with the ordering, but we don't need that to analyze (this concept of) optimization.)

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:43:18.828Z · LW · GW

I'll also note that I think what you're calling "Vingean agency" is a notable sub-type of optimization process that you've done a good job at analyzing here. But it's definitely not the definition of optimization or agency to me. For example, in the post you say

We perceive agency when something is better at doing something than us; we endorse some aspect of its reasoning or activity.

This doesn't feel true to me (in the carve-nature-at-its-joints sense). I think children are strongly agents, even though I do everything more competently than they do.

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T23:36:05.097Z · LW · GW

I have some comments on the arbitrariness of the "baseline" measure in Yudkowsky's measure of optimization.

Sometimes, I am surprised in the moment about how something looks, and I quickly update to believing there's an optimization process behind it. For example, if I climb a hill expecting to see a natural forest, and then instead see a grid of suburban houses or an industrial logging site, I'll immediately realize that there's no way this is random and instead there's an optimization process that I wasn't previously modelling. In cases like this, I think Yudkowsky's measure accurately captures the measure of optimization.

Alternatively, sometimes I'm thinking about optimization processes that I've always known are there, and I'm wondering to myself how powerful they are. For example, sometimes I'll be admiring how competent one of my friends is. To measure their competence, I can imagine what a "typical" person would do in that situation, and check the Yudkowsky measure as a diff. I can feel what you mean about arbitrarily drawing a circle around the known optimizer and then "deleting" it, but this just doesn't feel that weird to me? Like I think the way that people model the world allows them to do this kind of operation with pretty substantially meaningful results.

While it may be clear how to do this in many cases, it isn't clear in general. I suspect if we tried to write down the algorithm for doing it, it would involve an "agency detector" at some point; you have to be able to draw a circle around the agent in order to selectively forget it.

I think this is where Flint's framework was insightful. Instead of "detecting" and "deleting" the optimization process and then measuring the diff, you consider the system of every possible trajectory, measure the optimization of each (with respect to the ordering over states), take the average, and then compare your potential optimizer to this. The potential optimization process will be in that average, but it will be washed out by all the other trajectories (assuming most trajectories don't go up the ordering nearly as much; if they did, then your observed process would rightly not register as an optimizer).

(Obviously this is not helpful for e.g. looking into a neural network and figuring out whether it contains something that will powerfully optimize the world around you. But that's not what this level of the framework is for; this level is for deciding what it even means for something to powerfully optimize something around you.)

Of course, to run this comparison you need a "baseline" of a measure over every possible trajectory. But I think this is just reflecting the true nature of optimization; I think it's only meaningful relative to some other expectation.

Comment by Alex_Altair on Meaning & Agency · 2023-12-20T22:36:10.751Z · LW · GW

I feel like there's a key concept that you're aiming for that isn't quite spelled out in the math.

I remember reading somewhere that there's a typically unmentioned distinction between "Bayes' theorem" and "Bayesian inference". Bayes' theorem is the statement about , which is true from the axioms of probability theory for any  and  whatsoever. Notably, it has nothing to do with time, and it's still true even after you learn . On the other hand, Bayesian inference is the premise your beliefs should change in accordance with Bayes' theorem. Namely that  where  is an observation. That is, when you observe something, you wholesale replace your probability space  with a new probability space  which is calculated by applying the conditional (via Bayes' theorem).

And I think there's a similar thing going on with your definitions of endorsement. While trying to understand the equations, I found it easier to visualize  and  as two separate distributions on the same , where endorsement is simply a consistency condition. For belief consistency, you would just say that  endorses  on event  if .

But that isn't what you wrote; instead you wrote thing this with conditioning on a quoted thing. And of course, the thing I said is symmetrical between  and , whereas your concept of endorsement is not symmetrical. It seems like the intention is that  "learns" or "hears about" 's belief, and then  updates (in the above Bayesian inference sense) to have a new  that has the consistency condition with .

By putting  in the conditional, you're saying that it's an event on , a thing with the same type as . And it feels like that's conceptually correct, but also kind of the hard part. It's as if  is modelling  as an agent embedded into .

Comment by Alex_Altair on 2022 (and All Time) Posts by Pingback Count · 2023-12-16T21:46:38.970Z · LW · GW

You guys could compute a kind of Page Rank for LW posts.

Comment by Alex_Altair on Introduction to abstract entropy · 2023-12-16T19:33:23.859Z · LW · GW

Yeah, So8res wrote that post after reading this one and having a lot of discussion in the comments. That said, my memory was that people eventually convinced him that the title idea in his post was wrong.

Comment by Alex_Altair on Introduction to abstract entropy · 2023-12-16T00:06:15.686Z · LW · GW

[This is a self-review because I see that no one has left a review to move it into the next phase. So8res's comment would also make a great review.]

I'm pretty proud of this post for the level of craftsmanship I was able to put into it. I think it embodies multiple rationalist virtues. It's a kind of "timeless" content, and is a central example of the kind of content people want to see on LW that isn't stuff about AI.

It would also look great printed in a book. :)

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-12T16:44:17.844Z · LW · GW

You can also add the PIBBSS Speaker Events to your calendar through this link.

FYI this link redirects to a UC Berkeley login page.

Comment by Alex_Altair on How I became a person who wakes up early · 2023-12-10T03:56:30.291Z · LW · GW

Two years later, this is still pretty much how much sleep works!

  • Still aging
  • Still do regular morning climbing with my friend twice a week
  • Still hang out with my partner before best most nights
  • Still maintain control through time changes

I never went back into software, so I never again had a 9-5 job. Instead, I'm an independent researcher. In order to further motivate waking up for that, I schedule body-doubling with people on most days of the week, usually starting between 7 and 8:30am. I rarely use melatonin.

My current biggest sleep problem is that, if I don't have climbing, body-doubling, or something else scheduled early, then I usually stay in bed for a while, awake but unproductive. Hm, I haven't used Focusmate in a long time either. Maybe I should try that again?

Comment by Alex_Altair on LW is probably not the place for "I asked this LLM (x) and here's what it said!", but where is? · 2023-12-09T17:21:43.781Z · LW · GW

Isn't the shortform feature perfect for this?

Comment by Alex_Altair on Intro to Naturalism: Orientation · 2023-12-09T03:13:44.286Z · LW · GW

[This is a review for the whole sequence.]

I think of LessWrong as a place whose primary purpose is and always has been to develop the art of rationality. One issue is that this mission tends to attract a certain kind of person -- intelligent, systematizing, deprioritizing social harmony, etc -- and that can make it harder for other kinds of people to participate in the development of the art of rationality. But rationality is for everyone, and ideally the art would be equally accessible to all.

This sequence has many good traits, but one of the most distinguishing is that it wholly legible and welcoming to people not of the aforementioned kind. In a world where huge efforts of cooperation will be needed to ensure a good future, I think this trait takes this sequence worthy of being further showcased!

Comment by Alex_Altair on Toy Models of Superposition · 2023-12-09T02:22:34.959Z · LW · GW

This paper, like others from Anthropic, is is exemplary science and exceptional science communication. The authors are clear, precise and thorough. It is evident that their research motivation is to solve a problem, and not to publish a paper, and that their communication motivation is to help others understand, and not to impress.

Comment by Alex_Altair on How To Go From Interpretability To Alignment: Just Retarget The Search · 2023-12-06T00:20:18.182Z · LW · GW

This post expresses an important idea in AI alignment that I have essentially believed for a long time, and which I have not seen expressed elsewhere. (I think a substantially better treatment of the idea is possible, but this post is fine, and you get a lot of points for being the only place where an idea is being shared.)

Comment by Alex_Altair on Useful Vices for Wicked Problems · 2023-12-06T00:16:01.707Z · LW · GW

Earlier this year I spent a lot of time trying to understand how to do research better. This post was one of the few resources that actually helped. It described several models that I resonated with, but which I had not read anywhere else. It essentially described a lot of the things I was already doing, and this gave me more confidence in deciding to continue doing full time AI alignment research. (It also helps that Karnofsky is an accomplished researcher, and so his advice has more weight!)

Comment by Alex_Altair on The LessWrong 2022 Review · 2023-12-05T20:59:58.569Z · LW · GW

I'm curious what you would estimate the cost of producing the books to be. That is, how much would someone have to donate to pay for Lightcone to produce the books?

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T03:22:57.672Z · LW · GW

I'd like to gain clarity on what we think the relationship should be between AI alignment and agent foundations. To me, the relationship is 1) historical, in that the people bringing about the field of agent foundations are coming from the AI alignment community and 2) motivational, in that the reason they're investigating agent foundations is to make progress on AI alignment, but not 3) technical, in that I think agent foundations should not be about directly answering questions of how to make the development of AI beneficial to humanity. I think it makes more sense to pursue agent foundations as a quest to understand the nature of agents as a technical concept in its own right.

If you are a climate scientist, then you are very likely in the field in order to help humanity reduce the harms from climate change. But on a day-to-day basis, the thing you are doing is trying to understand the underlying patterns and behavior of the climate as a physical system. It would be unnatural to e.g. exclude papers from climate science journals on the grounds of not being clearly applicable to reducing climate change.

For agent foundations, I think some of the core questions revolve around things like, how does having goals work? How stable are goals? How retargetable are goals? Can we make systems that optimize strongly but within certain limitations? But none of those question are are directly about aligning the goals with humanity.

There's also another group of questions like, what are human's goals? How can we tell? How complex and fragile are they? How can we get an AI system to imitate a human? Et cetera. But I think these questions come from a field that is not agent foundations.

There should certainly be constant and heavy communication between these fields. And I also think that even individual people should be thinking about the applicability questions. But they're somewhat separate loops. A climate scientist will have an outer loop that does things like, chooses a research problem because they think the answer might help reduce climate change, and they should keep checking on that belief as they perform their research. But while they're doing their research, I think they should generally be using an inner loop that just thinks, "huh, how does this funny 'climate' thing work?"

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T01:46:26.222Z · LW · GW

FWIW I saw "Anti-MATS" in the sidebar and totally assumed that meant that someone in the dialogue was arguing that the MATS program was bad (instead of discussing the idea of a program that was like MATS but opposite).

Comment by Alex_Altair on What's next for the field of Agent Foundations? · 2023-12-02T01:44:47.971Z · LW · GW

Agent foundations is studying a strange alternate world where agents know the source code to themselves and the universe, where perfect predictors exist and so on

I just want to flag that this is very much not a defining characteristic of agent foundations! Some work in agent foundations will make assumptions like this, some won't -- I consider it a major goal of agent foundations to come up with theories that do not rely on assumptions like this.

(Or maybe you just meant those as examples?)

Comment by Alex_Altair on Stupid Question: Why am I getting consistently downvoted? · 2023-11-30T06:21:16.315Z · LW · GW

Maybe, "try gaining skill somewhere with lower standards"?

Comment by Alex_Altair on Inositol Non-Results · 2023-11-30T06:11:09.751Z · LW · GW

Somehow I read "non-results" in the title and unthinkingly interpreted it as "we now have more data that says inositol does nothing". Maybe the title could be "still not enough data on insotol"?

Comment by Alex_Altair on Shallow review of live agendas in alignment & safety · 2023-11-28T18:14:40.182Z · LW · GW

I wonder if we couldn't convert this into some kind of community wiki, so that the people represented in it can provide endorsed representations of their own work, and so that the community as a whole can keep it updated as time goes on.

Obviously there's the problem where you don't want random people to be able to put illegitimate stuff on the list. But it's also hard to agree on a way to declare legitimacy.

...Maybe we could have a big post like lukeprog's old textbook post, where researchers can make top-level comments describing their own research? And then others can up- or down-vote the comments based on the perceived legitimacy of the research program?

Comment by Alex_Altair on Appendices to the live agendas · 2023-11-28T18:09:57.659Z · LW · GW

Honestly this isn't that long, I might say to re-merge it with the main post. Normally I'm a huge proponent of breaking posts up smaller, but yours is literally trying to be an index, so breaking a piece off makes it harder to use.

Comment by Alex_Altair on Alex_Altair's Shortform · 2023-11-22T23:10:02.004Z · LW · GW

Here's my guess as to how the universality hypothesis a.k.a. natural abstractions will turn out. (This is not written to be particularly understandable.)

  1. At the very "bottom", or perceptual level of the conceptual hierarchy, there will be a pretty straight-forward objective set of concept. Think the first layer of CNNs in image processing, the neurons in the retina/V1, letter frequencies, how to break text strings into words. There's some parameterization here, but the functional form will be clear (like having a basis of n vectors in R^n, but it (almost) doesn't matter which vectors).
  2. For a few levels above that, it's much less clear to me that the concepts will be objective. Curve detectors may be universal, but the way they get combined is less obviously objective to me.
  3. This continues until we get to a middle level that I'd call "objects". I think it's clear that things like cats and trees are objective concepts. Sufficiently good language models will all share concepts that correspond to a bunch of words. This level is very much due to the part where we live in this universe, which tends to create objects, and on earth, which has a biosphere with a bunch of mid-level complexity going on.
  4. Then there will be another series of layers that are less obvious. Partly these levels are filled with whatever content is relevant to the system. If you study cats a lot then there is a bunch of objectively discernible cat behavior. But it's not necessary to know that to operate in the world competently. Rivers and waterfalls will be a level 3 concept, but the details of fluid dynamics are in this level.
  5. Somewhere around the top level of the conceptual hierarchy, I think there will be kind of a weird split. Some of the concepts up here will be profoundly objective; things like "and", mathematics, and the abstract concept of "object". Absolutely every competent system will have these. But then there will also be this other set of concepts that I would map onto "philosophy" or "worldview". Humans demonstrate that you can have vastly different versions of these very high-level concepts, given very similar data, each of which is in some sense a functional local optimum. If this also holds for AIs, then that seems very tricky.
  6. Actually my guess is that there is also a basically objective top-level of the conceptual hierarchy. Humans are capable of figuring it out but most of them get it wrong. So sufficiently advanced AIs will converge on this, but it may be hard to interact with humans about it. Also, some humans' values may be defined in terms of their incorrect worldviews, leading to ontological crises with what the AIs are trying to do.
Comment by Alex_Altair on Public Call for Interest in Mathematical Alignment · 2023-11-22T15:39:42.664Z · LW · GW

Note that we are interested in people at all levels of seniority, including graduate students,

 

If I imagine being an undergraduate student who's interested, then this sentence leaves me unclear on whether I should fill it out.

Comment by Alex_Altair on [deleted post] 2023-11-13T23:12:37.184Z

I can imagine some ways that the universe might escape heat death, but I seriously doubt that Kurzweil is referring to anything concrete that has technical merit. Under anything resembling normal laws of physics, computers need negentropy to run calculations, and they cannot just "decide" to keep on computing.

Comment by Alex_Altair on Vote on Interesting Disagreements · 2023-11-08T16:23:03.898Z · LW · GW

I would love to try having dialogues with people about Agent Foundations! I'm on the vaguely-pro side, and want to have a better understanding of people on the vaguely-con side; either people who think it's not useful, or people who are confused about what it is and why we're doing it, etc.

Comment by Alex_Altair on Entropy Scaling And Intrinsic Memory · 2023-11-07T22:00:14.390Z · LW · GW

I like this post for the way it illustrates how the probability distribution over blocks of strings changes as you increase block length.

Otherwise, I think the representation of other ideas and how they related to it is not very accurate, and might mislead reader about the consensus among academics.

As an example, strings where the frequency of substrings converges to a uniform distribution is are called "normal". The idea that this could be the definition of a random string was a big debate through the first half of the 20th century, as people tried to put probability theory on solid foundations. But you can have a fixed, deterministic program that generates normal strings! And so people generally rejected this ideas as the definition of random. Algorithmic information theory uses the definition of Martin-Löf random, which is that an (infinite) string is random if it can't be compressed by any program (with a bunch of subtleties and distinctions in there).

Comment by Alex_Altair on Optimisation Measures: Desiderata, Impossibility, Proposals · 2023-11-06T21:53:47.723Z · LW · GW
  • Utility functions might already be the true name - after all, they do directly measure optimisation, while probability doesn't directly measure information.
  • The true name might have nothing to do with utility functions - Alex Altair has made the case that it should be defined in terms of preference orderings instead.

My vote here is for something between "Utility functions might already be the true name" and "The true name might have nothing to do with utility functions".

It sounds to me like you're chasing an intuition that is validly reflecting one of nature's joints, and that that joint is more or less already named by the concept of "utility function" (but where further research is useful).

And separately, I think there's another natural joint that I (and Yudkowsky and others) call "optimization", and this joint has nothing to do with utility functions. Or more accurately, maximizing a utility function is an instance of optimization, but has additional structure.

Comment by Alex_Altair on AI Alignment Breakthroughs this week (10/08/23) · 2023-10-11T00:07:40.046Z · LW · GW

FWIW I don't think it's honest to title this "breakthroughs". It's almost the opposite, a list of incremental progress.

Comment by Alex_Altair on A Defense of Work on Mathematical AI Safety · 2023-07-06T20:07:13.302Z · LW · GW

Unrelatedly, why not make this a cross-post rather than a link-post?

Comment by Alex_Altair on A Defense of Work on Mathematical AI Safety · 2023-07-06T20:06:44.262Z · LW · GW

I think it would help a lot to provide people with examples. For example, here

Many machine learning research agendas for safety are investigating issues identified years earlier by foundational research, and are at least partly informed by that research.

You say that, but then don't provide any examples. I imagine readers just not thinking of any, and then moving on without feeling any more convince.

Overall, I think that it's hard for people to believe agent foundations will be useful because they're not visualizing any compelling concrete path where it makes a big difference.

Comment by Alex_Altair on Towards Measures of Optimisation · 2023-06-29T17:17:03.735Z · LW · GW

The first is what Garrett points out, that probabilities are map things, and it’s a bit… weird for our measure of a (presumably) territory thing to be dependent on them. It’s the same sort of trickiness that I don’t feel we’ve properly sorted out in thermodynamics—namely, that if we take the existence of macrostates to be reflections of our uncertainty (as Jaynes does), then it seems we are stuck saying something to the effect of “ice cubes melt because we become more uncertain of their state,” which seems… wrong.

For this part, my answer is Kolmogorov complexity. An ice cube has lower K-complexity than the same amount of liquid water, which is a fact about the territory and not our maps. (And if a state has lower K-complexity, it's more knowable; you can observe fewer bits, and predict more of the state.)

One of my ongoing threads is trying to extend this to optimization. I think a system is being objectively optimized if the state's K-complexity is being reduced. But I'm still working through the math.

Comment by Alex_Altair on Towards Measures of Optimisation · 2023-06-29T17:09:44.040Z · LW · GW

Yeah... so these are reasonable thoughts of the kind that I thought through a bunch when working on this project, and I do think they're resolvable, but to do so I'd basically be writing out my optimization sequence.

I agree with Alexander below though, a key part of optimization is that it is not about utility functions, it is only about a preference ordering. Utility functions are about choosing between lotteries, which is a thing that agents do, whereas optimization is just about going up an ordering. Optimization is a thing that a whole system does, which is why there's no agent/environment distinction. Sometimes, only a part of the system is responsible for the optimization, and in that case you can start to talk about separating them, and then you can ask questions about what that part would do if it were placed in other environments.

Comment by Alex_Altair on My research agenda in agent foundations · 2023-06-29T16:08:07.605Z · LW · GW

Yeah, this is why we need a better explainer for agent foundations. I won't do it justice in this comment but I'll try to say some helpful words. (Have you read the Rocket Alignment Problem?)

Do you expect there will be a whole new paradigm, and that current neural networks will be nothing like future AIs?

I can give an easy "no" to this question. I do not necessarily expect future AIs to work in a whole new paradigm.

My understanding is that you're trying to build a model of what actual AI agents will be like.

This doesn't really describe what I'm doing. I'm trying to help figure out what AIs we should build, so I'm hoping to affect what actual AI agents will be like.

But more of what I'm doing is trying to understand what the space of possible agents looks like at all. I can see how that could sound like someone saying, "it seems like we don't know how to build a safe bridge, so I'm going to start by trying to understand what the space of possible configurations of matter looks like at all" but I do think it's different than that.

Let me try putting it this way. The arguments that AI could be an existential risk were formed before neural networks were obviously useful for anything. So the inherent danger of AIs does not come from anything particular to current systems. These arguments use specific properties about the general nature of intelligence and agency. But they are ultimately intuitive arguments. The intuition is good enough for us to know that the arguments are correct, but not good enough to help us understand how to build safe AIs. I'm trying to find the formalization behind those intuitions, so that we can have any chance at building a safe thing. Once we get some formal results about how powerful AIs could be safe even in principle, then we can start thinking about how to build versions of existing systems that have those properties. (And yes, that's a really long feedback loop, so I try to recurringly check that my trains of ideas could still in principle apply to ML systems.)

Comment by Alex_Altair on Causality: A Brief Introduction · 2023-06-23T23:45:17.207Z · LW · GW

I'd agree that the bits of output are not independent in some physical sense. But they're definitely independent in my mind! If I hear that the 100th binary digit of pi is 1, then my subjective probability over the 101st digit does not update at all, and remains at 0.5/0.5. So this still feels like a frequentism/Bayesianism thing to me.

Re: the modified experiment about random strings, you say that "To get the string of random bits we have to sample a coin flip, and then make two copies of the outcome". But there's nothing preventing the universe from simply containing to copies of the same random string, created causally independently. But that's also vanishingly unlikely as the string gets longer.

Comment by Alex_Altair on Causality: A Brief Introduction · 2023-06-23T23:23:00.256Z · LW · GW

Yeah, I think I agree that the resolution here is something about how we should use these words. In practice I don't find myself having to distinguish between "statistics" and "probability" and "uncertainty" all that often. But in this case I'd be happy to agree that "all statistical correlations are due to casual influences" given that we mean "statistical" in a more limited way than I usually think of it.

But I don't think we know how to properly formalise or talk about that yet.

A group of LessWrong contributors has made a lot of progress on these ideas of logical uncertainty and (what I think they're now calling) functional decision theory over the last 15ish years, although I don't really follow it myself, so I'm not sure how close they'd say we are to having it properly formalized.

Comment by Alex_Altair on Causality: A Brief Introduction · 2023-06-21T20:16:35.784Z · LW · GW

Thanks for writing that out! I've enjoyed thinking this through some more.

I agree that, if you instantiated many copies of the program across the universe as your sampling method, or somehow otherwise "ran them many times", then their outputs would be independent in the sense that P(A, B) = P(A, B). This also holds true if, on each run, there was some "local" error to the program's otherwise deterministic output.

I had intended to be using the program's output as a time series of bits, where we are considering the bits to be "sampling" from A and B. Let's say it's a program that outputs the binary digits of pi. I have no idea what the bits are (after the first few) but there is a sense in which P(A) = 0.5 for either A = 0 or A = 1, and at any timestep. The same is true for P(B). So P(A)P(B) = 0.25. But clearly P(A = 0, B = 0) = 0.5, and P(A = 0, B = 1) = 0, et cetera. So in that case, they're not probabilistically independent, and therefore there is a correlation not due to a causal influence.

But this is in a Bayesian framing, where the probability isn't a physical thing about the programs, it's a thing inside my mind. So, while there is a common source of the correlation (my uncertainty over what the digits of pi are) it's certainly not a "causal influence" on A and B.

This matters to me because, in the context of agent foundations and AI alignment, I want my probabilities to be representing my state of belief (or the agent's state of belief).