Posts

ChatGPT struggles to respond to the real world 2023-01-12T16:02:56.358Z
Coherent extrapolated dreaming 2022-12-26T17:29:14.420Z
Response to Holden’s alignment plan 2022-12-22T16:08:53.188Z
Notes on OpenAI’s alignment plan 2022-12-08T19:13:59.439Z
Logical induction for software engineers 2022-12-03T19:55:35.474Z
Implications of automated ontology identification 2022-02-18T03:30:53.795Z
Alignment versus AI Alignment 2022-02-04T22:59:09.794Z
Seattle: The physics of dynamism (and AI alignment) 2022-01-19T17:10:45.248Z
Life, struggle, and the psychological fallout from COVID 2021-12-06T16:59:39.611Z
Comments on Allan Dafoe on AI Governance 2021-11-29T16:16:03.482Z
Stuart Russell and Melanie Mitchell on Munk Debates 2021-10-29T19:13:58.244Z
Three enigmas at the heart of our reasoning 2021-09-21T16:52:52.089Z
David Wolpert on Knowledge 2021-09-21T01:54:58.095Z
Comments on Jacob Falkovich on loneliness 2021-09-16T22:04:58.773Z
The Blackwell order as a formalization of knowledge 2021-09-10T02:51:16.498Z
AI Risk for Epistemic Minimalists 2021-08-22T15:39:15.658Z
The inescapability of knowledge 2021-07-11T22:59:15.148Z
The accumulation of knowledge: literature review 2021-07-10T18:36:17.838Z
Agency and the unreliable autonomous car 2021-07-07T14:58:26.510Z
Musings on general systems alignment 2021-06-30T18:16:27.113Z
Knowledge is not just precipitation of action 2021-06-18T23:26:17.460Z
Knowledge is not just digital abstraction layers 2021-06-15T03:49:55.020Z
Knowledge is not just mutual information 2021-06-10T01:01:32.300Z
Knowledge is not just map/territory resemblance 2021-05-25T17:58:08.565Z
Problems facing a correspondence theory of knowledge 2021-05-24T16:02:37.859Z
Concerning not getting lost 2021-05-14T19:38:09.466Z
Understanding the Lottery Ticket Hypothesis 2021-05-14T00:25:21.210Z
Agency in Conway’s Game of Life 2021-05-13T01:07:19.125Z
Life and expanding steerable consequences 2021-05-07T18:33:39.830Z
Parsing Chris Mingard on Neural Networks 2021-05-06T22:16:14.610Z
Parsing Abram on Gradations of Inner Alignment Obstacles 2021-05-04T17:44:16.858Z
Follow-up to Julia Wise on "Don’t Shoot The Dog" 2021-05-01T19:07:45.468Z
Pitfalls of the agent model 2021-04-27T22:19:30.031Z
Beware over-use of the agent model 2021-04-25T22:19:06.132Z
Probability theory and logical induction as lenses 2021-04-23T02:41:25.414Z
Where are intentions to be found? 2021-04-21T00:51:50.957Z
My take on Michael Littman on "The HCI of HAI" 2021-04-02T19:51:44.327Z
Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment 2021-01-14T12:58:37.256Z
Reflections on Larks’ 2020 AI alignment literature review 2021-01-01T22:53:36.120Z
Search versus design 2020-08-16T16:53:18.923Z
The ground of optimization 2020-06-20T00:38:15.521Z
Set image dimensions using markdown 2020-06-17T12:37:54.198Z
Our take on CHAI’s research agenda in under 1500 words 2020-06-17T12:24:32.620Z
How does one authenticate with the lesswrong API? 2020-06-15T23:46:39.296Z
Reply to Paul Christiano on Inaccessible Information 2020-06-05T09:10:07.997Z
Feedback is central to agency 2020-06-01T12:56:51.587Z
The simple picture on AI safety 2018-05-27T19:43:27.025Z
Opportunities for individual donors in AI safety 2018-03-31T18:37:21.875Z
Superrationality and network flow control 2013-07-22T01:49:46.093Z
Personality tests? 2012-02-29T09:33:00.489Z

Comments

Comment by Alex Flint (alexflint) on The ground of optimization · 2024-03-21T13:21:42.162Z · LW · GW

A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.

Comment by Alex Flint (alexflint) on The ground of optimization · 2024-02-19T15:43:10.717Z · LW · GW

It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.

Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

Comment by Alex Flint (alexflint) on Teleosemantics! · 2023-03-30T08:35:22.568Z · LW · GW

If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise another kind of homunculus-like regression where we were trying to directly explain semantics, but we ended up needing to inquire into yet another mind, the complete understanding of which would require further unpacking of the frames and concepts held in that mind, and the complete understanding of those frames and concepts requiring even further inquiry into a yet earlier mind that was responsible for doing the optimization of those frames and concepts?

Comment by Alex Flint (alexflint) on Here's the exit. · 2023-02-27T23:15:30.254Z · LW · GW

There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.

Comment by Alex Flint (alexflint) on Beginning to feel like a conspiracy theorist · 2023-02-27T22:44:25.419Z · LW · GW

friends and family significantly express their concern for my well being

What exact concerns do they have?

Comment by Alex Flint (alexflint) on SolidGoldMagikarp (plus, prompt generation) · 2023-02-07T23:07:36.786Z · LW · GW

Wow, thank you for this context!

Comment by Alex Flint (alexflint) on Fucking Goddamn Basics of Rationalist Discourse · 2023-02-04T14:34:39.277Z · LW · GW
  1. You don't get to fucking assume any shit on the basis of "but... ah... come on". If you claim X and someone asks why, then congratulations now you're in a conversation. That means maybe possible shit is about to get real, like some treasured assumptions might soon be questioned. There are no sarcastic facial expressions or clever grunts that get you an out from this. You gotta look now at the thing itself.
Comment by Alex Flint (alexflint) on I don't think MIRI "gave up" · 2023-02-03T15:17:03.307Z · LW · GW

I just want to acknowledge the very high emotional weight of this topic.

For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth -- it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage with the scope of it.

It's not a small thing at all. We're in this situation where we have AI capabilities kind of out of control. We're not exactly sure where any of the leader's we've previously relied on stand. We all have this opportunity now to take action. The opportunity is simply there. Nobody, actually, can take it away. But there is also the opportunity, truly available to everyone regardless of past actions, to falter, exactly when the world most needs us.

What matters, actually, is what, concretely, we do going forward.

Comment by Alex Flint (alexflint) on Logical induction for software engineers · 2023-02-01T15:51:38.751Z · LW · GW

That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it's done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that "defeat" the current set of trading algorithms. That function (from prices to quantities) is what I call a "trading policy", and it has to be represented in a particular way -- as a set of syntax tree over trading primitives -- in order for the logical inductor to solve for prices. A trading algorithm is a sequence of such sets of syntax trees, where each element in the sequence is the trading policy for a different time step.

Normally, it would be strange to set up one function (trading algorithms) that generates another function (trading policies) that is different for every timestep. Why not just have the trading algorithm directly output the amount that it wants to buy/sell? The reason is that we need not just the quantity to buy/sell, but that quantity as a function of price, since prices themselves are determined by solving an optimization problem with respect to these functions. Furthermore, these functions (trading policies) have to be represented in a particular way. Therefore it makes most sense to have trading algorithms output a sequence of trading policies, one per timestep.

Comment by Alex Flint (alexflint) on How it feels to have your mind hacked by an AI · 2023-01-30T17:56:53.832Z · LW · GW

Thank you for this extraordinarily valuable report!

I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the "other end" of the relationship is a "real person" but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a "real person" on the other end of it (whatever that would mean, even in the case of a human).

Comment by Alex Flint (alexflint) on Worst-case thinking in AI alignment · 2023-01-29T16:32:47.476Z · LW · GW

This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other things on the list. In the "differences between these arguments" section, the post doesn't clearly elucidate deep differences between the arguments, it just lists verbal responses that you might make if you are challenged on plausibility grounds in each case.

Overall, I felt that this post under-delivered on an important topic.

Comment by Alex Flint (alexflint) on Grokking the Intentional Stance · 2023-01-29T15:41:26.672Z · LW · GW

Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and ascribe to systems we observe externally. I also think that much of this confusion dissolves with the realization that internally experienced thoughts, beliefs, desires, goals, etc. are actually “external” with respect to the parts of the mind that are observing them—including the part(s) of the mind that is modeling the mind-system as a whole as “being an agent” (or a “multiagent mind,” etc.). You couldn't observe thoughts (or the mind in general) at all if they weren't external to "you" (the observer), in the relevant sense.

The real point of the intentional stance idea is that there is no fact of the matter about whether something really is an agent, and that point is most potent when applied to ourselves. It is neither the case that we really truly are an agent, nor that we really truly are not an agent.

This post does an excellent job of highlighting this facet. However, I think this post could have been more punchy. There is too much meta-text of little value, like this paragraph:

In an attempt to be as faithful as possible in my depiction of Dennett’s original position, as well as provide a good resource to point back to on the subject for further discussion[1], I will err on the side of directly quoting Dennett perhaps too frequently, at least in this summary section.

In a post like this, do we need to be fore-warned that the author will err perhaps to frequently on the side of directly quoting Dennett, at least in the summary section? No, we don't need to know that. In fact the post does not contain all that many direct quotes.

At the top of the "takeaways" section, the author gives the following caveat:

Editorial note: To be clear, these “takeaways” are both “things Dan Dennett is claiming about the nature of agency with the intentional stance” and “ideas I’m endorsing in the context of deconfusing agency for AI safety.”

The word "takeaways" in the heading already tells us that this section will contain points extracted by the reader that may or may not be explicitly endorsed by the original author. There is no need for extra caveats, it just leads to a bad reading experience.

In the comments section, Rohin makes the following very good point:

I mostly agree with everything here, but I think it is understating the extent to which the intentional stance is insufficient for the purposes of AI alignment. I think if you accept "agency = intentional stance", then you need to think "well, I guess AI risk wasn't actually about agency".

Although we can "see through" agency as not-an-ontologically-fundamental-thing, nevertheless we face the practical problem of what to do about the (seemingly) imminent destruction of the world by powerful AI. What actually should we do about that? The intentional stance not only fails to tell us what to do, it also fails to tell us how any approach to averting AI risk can co-exist with the powerful deconstruction of agency offered by the intentional stance idea itself. If agency is in the eye of the beholder, then... what? What do we actually do about AI risk?

Comment by Alex Flint (alexflint) on Soares, Tallinn, and Yudkowsky discuss AGI cognition · 2023-01-29T14:54:28.864Z · LW · GW

Have you personally ever ridden in a robot car that has no safety driver?

Comment by Alex Flint (alexflint) on Soares, Tallinn, and Yudkowsky discuss AGI cognition · 2023-01-28T23:56:15.016Z · LW · GW

This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.

The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differently in private. But still. Read the post and you kind of see the social dynamics between them. It's fascinating, actually.

Eliezer is just incredibly doom-y. He describes in fantastic detail the specific ways that a treacherous turn might play out, over dozens of paragraphs, 3 levels deep in a one-on-one conversation, in a document that merely summarizes a prior debate on the topic. He uses Capitalized Terms to indicate that things like "Doomed Phase" and "Terminal Phase" and "Law of Surprisingly Undignified Failure" are not merely for one time use but in fact refer to specific nodes in a larger conceptual framework.

One thing that happens often is that Jaan asks a question, Eliezer gives an extensive reply, and then Jaan response that, no, he was actually asking a different question.

There is one point where Jaan describes his frustration over the years with mainstream AI researchers objecting to AI safety arguments as being invalid due to anthropomorphization, when in fact the arguments were not invalidly anthropomorphizing. There is a kind of gentle vulnerability in this section that is worth reading seriously.

There is a lot of swapping of models of others in and outside the debate. Everyone is trying to model everyone all the time.

Eliezer does unfortunately like to explicitly underscore his own brilliance. He says things like:

I consider all of this obvious as a convergent instrumental strategy for AIs. I could probably have generated it in 2005 or 2010 [...]

But it's clear enough that probably nobody was ever going to pass the validation set for generating lines of reasoning obvious enough to be generated by Eliezer in 2010 or possibly 2005

I do think that the content itself really comes down to the same basic question tackled in the original Hanson/Yudkowsky FOOM debate. I understand that this debate was ostensibly a broader question than FOOM. In practice I don't think this discourse has actually moved on much since 2008.

The main thing the FOOM debate is missing, in my opinion, is this: we have almost no examples of AI systems that can do meaningful sophisticated things in the physical world. Self-driving cars still aren't a reality. Walk around a city or visit an airport or drive down a highway, and you see shockingly few robots, and certainly no robots pursuing even the remotest kind of general-purpose tasks. Demo videos of robots doing amazing, scary, general-purpose things abound, but where are these robots in the real world? They are always just around the corner. Why?

Comment by Alex Flint (alexflint) on Agency in Conway’s Game of Life · 2023-01-26T19:31:38.764Z · LW · GW

Thanks for the note.

In Life, I don't think it's easy to generate an X-1 time state that leads to an X time state, unfortunately. The reason is that each cell in an X time state puts a logical constraint on 9 cells in an X-1 time state. It is therefore possible to set up certain constraint satisfaction problems in terms of finding an X-1 time state that leads to an X time state, and in general these can be NP-hard.

However, in practice, it is very very often quite easy to find an X-1 time state that leads to a given X time state, so maybe this experiment could be tried in an experimental form anyhow.

In our own universe, the corresponding operation would be to consider some goal configuration of the whole universe, and propagate that configuration backwards to our current time. However, this would generally just tell us that we should completely reconfigure the whole universe right now, and that is generally not within our power, since we can only act locally, have access only to certain technologies, and such.

I think it is interesting to push on this "brute forcing" approach to steering the future, though. I'd be interested to chat more about it.

Comment by Alex Flint (alexflint) on Agency in Conway’s Game of Life · 2023-01-25T14:03:45.735Z · LW · GW

Interesting. Thank you for the pointer.

The real question, though, is whether it is possible within our physics.

Comment by Alex Flint (alexflint) on Agency in Conway’s Game of Life · 2023-01-25T13:53:25.450Z · LW · GW

Oh the only information I have about that is Dave Green's comment, plus a few private messages from people over the years who had read the post and were interested in experimenting with concrete GoL constructions. I just messaged the author of the post on the GoL forum asking about whether any of that work was spurred by this post.

Comment by Alex Flint (alexflint) on Logical induction for software engineers · 2023-01-25T13:36:40.943Z · LW · GW

Thanks - fixed! And thank you for the note, too.

Comment by Alex Flint (alexflint) on ChatGPT struggles to respond to the real world · 2023-01-13T15:11:50.639Z · LW · GW

Yeah it might just be a lack of training data in 10-second-or-less interactive instructions.

The thing I really wanted to test with this experiment was actually whether ChatGPT could engage with the real world using me as a guinea pig. The 10-second-or-less thing was just the format I used to try to "get at" the phenomenon of engaging with the real world. I'm interested in improving the format to more cleanly get at the phenomenon.

I do currently have the sense that it's more than just a lack of training data. I have the sense that ChatGPT has learned much less about how the world really works at a causal level than it appears from much of its dialog. Specifically, I have the sense that it has learned how to satisfy idle human curiosity using language, in a way that largely routes around a model of the real world, and especially routes around a model of the dynamics of the real world. That's my hypothesis -- I don't think this particular experiment has demonstrated it yet.

Comment by Alex Flint (alexflint) on ChatGPT struggles to respond to the real world · 2023-01-12T20:00:32.241Z · LW · GW

I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.

Comment by Alex Flint (alexflint) on ChatGPT struggles to respond to the real world · 2023-01-12T19:59:58.222Z · LW · GW

I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.

If you look at how ChatGPT responds, it seems to be really struggling to "get" what's happening in the kitchen -- it never really comes to the point of giving specific instructions, and especially never comes to the point of having any sense of the "situation" in the kitchen -- e.g. whether the milk is currently in the suacepan or not.

In contrast, my human friend did "get" this in quite a visceral way (it seems to me). I don't have the sense that this was due to out-of-band context but I'd be interested to retry the experiment with more carefully controlled context.

Comment by Alex Flint (alexflint) on Coherent extrapolated dreaming · 2022-12-28T16:42:47.402Z · LW · GW

I'm very interested in Wei Dai's work, but I haven't followed closely in recent years. Any pointers to what I might read of his recent writings?

I do think Eliezer tackled this problem in the sequences, but I don't really think he came to an answer to these particular questions. I think what he said about meta-ethics is that it is neither that there is some measure of goodness to be found in the material world independent from our own minds, nor that goodness is completely open to be constructed based on our whims or preferences. He then says "well there just is something we value, and it's not arbitrary, and that's what goodness is", which is fine, except it still doesn't tell us how to find that thing or extrapolate it or verify it or encode it into an AI. So I think his account of meta-ethics is helpful but not complete.

Comment by Alex Flint (alexflint) on Coherent extrapolated dreaming · 2022-12-28T16:37:27.142Z · LW · GW

Recursive relevance realization seems to be designed to answer about the "quantum of wisdom".

It does! But... does it really answer the question? Curious about your thoughts on this.

Comment by Alex Flint (alexflint) on Coherent extrapolated dreaming · 2022-12-28T16:36:15.585Z · LW · GW

you ask whether you are aligned to yourself (ideals, goals etc) and find that your actuality is not coherent with your aim

Right! Very often, what it means to become wiser is to discover something within yourself that just doesn't make sense, and then to in some way resolve that.

Discovering incoherency seems very different from keeping a model on coherence rails

True. Eliezer is quite vague about the term "coherent" in his write-ups, and some more recent discussions of CEV drop it entirely. I think "coherent" was originally about balancing the extrapolated volition of many people by finding the places where they agree. But what exactly that means is unclear.

And isn't there mostly atleast two coherent paths out of an incoherence point?

Yeah, if the incoherent point is caused by a conflict between two things, then there are at least two coherent paths out, namely dropping one or the other of those two things. I have the sense that you can also drop both of them, or sometimes drop some kind of overarching premise that was putting the two in conflict.

Does the CEV pick one or track both?

It seems that CEV describes a process for resolving incoherencies, rather than a specific formula for which side of an incoherence to pick. That process, very roughly, is to put a model of a person through the kind of transformations that would engender true wisdom if experienced in real life. I do have the sense that this is how living people become wise, but I question whether it can be usefully captured in a model of a person.

Or does it refuse to enter into genuine transformation processes and treats them as dead-ends as it refuses to step into incoherencies?

I think that CEV very much tries to step into a genuine transformation process. Whether it does or not is questionable. Specifically, if it does, then one runs into the four questions from the write-up.

Comment by Alex Flint (alexflint) on Coherent extrapolated dreaming · 2022-12-28T16:25:01.597Z · LW · GW

Did you ever end up reading Reducing Goodhart?

Not yet, but I hope to, and I'm grateful to you for writing it.

processes for evolving humans' values that humans themselves think are good, in the ordinary way we think ordinary good things are good

Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.

Comment by Alex Flint (alexflint) on Response to Holden’s alignment plan · 2022-12-25T14:17:03.759Z · LW · GW

Wasn't able to record it - technical difficulties :(

Comment by Alex Flint (alexflint) on Response to Holden’s alignment plan · 2022-12-22T20:07:45.439Z · LW · GW

Yes, I should be able to record the discussion and post a link in the comments here.

Comment by Alex Flint (alexflint) on Notes on OpenAI’s alignment plan · 2022-12-09T15:23:23.599Z · LW · GW

If you train a model by giving it reward when it appears to follow a particular human's intention, you probably get a model that is really optimizing for reward, or appearing to follow said humans intention, or something else completely different, while scheming to seize control so as to optimize even more effectively in the future. Rather than an aligned AI.

Right yeah I do agree with this.

Perhaps instead you mean: No really the reward signal is whether the system really deep down followed the humans intention, not merely appeared to do so [...] That would require getting all the way to the end of evhub's Interpretability Tech Tree

Well I think we need something like a really-actually-reward-signal (of the kind you're point at here). The basic challenge of alignment as I see it is finding such a reward signal that doesn't require us to get to end of the Interpretability Tech Tree (or similar tech trees). I don't think we've exhausted the design space of reward signals yet but it's definitely the "challenge of our times" so to speak.

Comment by Alex Flint (alexflint) on Notes on OpenAI’s alignment plan · 2022-12-09T15:13:33.105Z · LW · GW

Well even if language models do generalize beyond their training domain in the way that humans can, you still need to be in contact with a given problem in order to solve that problem. Suppose I take a very intelligent human and ask them to become a world expert at some game X, but I don't actually tell them the rules of game X nor give them any way of playing out game X. No matter how intelligent the person is, they still need some information about what the game consists of.

Now suppose that you have this intelligent person write essays about how one ought to play game X, and have their essays assessed by other humans who have some familiarity with game X but not a clear understanding. It is not impossible that this could work, but it does seem unlikely. There are a lot of levels of indirection stacked against this working.

So overall I'm not saying that language models can't be generally intelligent, I'm saying that a generally intelligent entity still needs to be in a tight feedback loop with the problem itself (whatever that is).

Comment by Alex Flint (alexflint) on A challenge for AGI organizations, and a challenge for readers · 2022-12-09T14:54:05.811Z · LW · GW

Here is a critique of OpenAI's plan

Comment by Alex Flint (alexflint) on Agency in Conway’s Game of Life · 2022-12-08T14:50:39.886Z · LW · GW

This is a post about the mystery of agency. It sets up a thought experiment in which we consider a completely deterministic environment that operates according to very simple rules, and ask what it would be for an agentic entity to exist within that.

People in the game of life community actually spent some time investigating the empirical questions that were raised in this post. Dave Greene notes:

The technology for clearing random ash out of a region of space isn't entirely proven yet, but it's looking a lot more likely than it was a year ago, that a workable "space-cleaning" mechanism could exist in Conway's Life.

As previous comments have pointed out, it certainly wouldn't be absolutely foolproof. But it might be surprisingly reliable at clearing out large volumes of settled random ash -- which could very well enable a 99+% success rate for a Very Very Slow Huge-Smiley-Face Constructor.

I have the sense that the most important question raised in this post is about whether it is possible to construct a relatively small object in the physical world that steers the configuration of a relatively large region of the physical world into a desired configuration. The Game of Life analogy is intended to make that primary question concrete, and also to highlight how fundamental the question of such an object's existence is.

The main point of this post was that the feasibility or non-feasibility of AI systems that exert precise influence over regions of space much larger than themselves may actually be a basic kind of descriptive principle for the physical world. It would be great to write a follow-up post highlighting this point.

Comment by Alex Flint (alexflint) on Agency in Conway’s Game of Life · 2022-12-08T14:42:29.645Z · LW · GW

Thanks for this note Dave

Comment by Alex Flint (alexflint) on Beware over-use of the agent model · 2022-12-08T14:34:12.609Z · LW · GW

This post attempts to separate a certain phenomenon from a certain very common model that we use to understand that phenomenon. The model is the "agent model" in which intelligent systems operate according to an unchanging algorithm. In order to make sense of their being an unchanging algorithm at the heart of each "agent", we suppose that this algorithm exchanges inputs and outputs with the environment via communication channels known as "observations" and "actions".

This post really is my central critique of contemporary artificial intelligence discourse. That critique is: any unexamined views that we use to understand ourselves are likely to enter the design of AI systems that we build. This is because if we think that deep down we really are "agents", then we naturally conclude that any similar intelligent entity would have that same basic nature. In this way we take what was once an approximate description ("humans are somewhat roughly like agents in certain cases") and make it a reality (by building AI systems that actually are designed as agents, and which take over the world).

In fact the agent model is a very effective abstraction. It is precisely because it so effective that we have forgotten the distinction between the model and the reality. It is as if we had so much success in modelling our refrigerator as an ideal heat pump that we forgot that there even is a distinction between real-world refrigerators and the abstraction of an ideal heat pump.

I have the sense that a great deal of follow-up work is needed on this idea. I would like to write detailed critiques of many of the popular approaches to AI design, exploring ways in which over-use of the agent model is a stumbling block for those approaches. I would also like to explore the notion of goals and beliefs in a similar light to this post: what exactly is the model we're using when we talk about goals and beliefs, and what is the phenomenon we're trying to explain with those models?

Comment by Alex Flint (alexflint) on Three enigmas at the heart of our reasoning · 2022-12-08T14:14:42.224Z · LW · GW

This is an essay about methodology. It is about the ethos with which we approach deep philosophical impasses of the kind that really matter. The first part of the essay is about those impasses themselves, and the second part is about what I learned in a monastery about addressing those impasses.

I cried a lot while writing this essay. The subject matter -- the impasses themselves -- are deeply meaningful to me, and I have the sense that they really do matter.

It is certainly true that there are these three philosophical impasses -- each has been discussed in the philosophical literature for hundreds of years. What is offered in this essay is a kind of a plea to take them seriously, using a methodology that does not drive you into insanity but instead clears the way to move forward with the real work of your life.

The best way to test the claims of this essay would be to spend some time working with a highly realized spiritual teacher.

Comment by Alex Flint (alexflint) on AI Risk for Epistemic Minimalists · 2022-12-08T13:47:04.053Z · LW · GW

This post trims down the philosophical premises that sit under many accounts of AI risk. In particular it routes entirely around notions of agency, goal-directedness, and consequentialism. It argues that it is not humans losing power that we should be most worried about, but humans quickly gaining power and misusing such a rapid increase in power.

Re-reading the post now, I have the sense that the arguments are even more relevant than when it was written, due to the broad improvements in machine learning models since it was written. The arguments in this post apply much more cleanly to models like GPT-3 and DALL-E than do arguments based on agency and goal-directedness.

The most useful follow-up work would probably be to contrast it more directly to other accounts of AI risk, perhaps by offering critiques of other accounts.

Comment by Alex Flint (alexflint) on Logical induction for software engineers · 2022-12-04T14:06:51.922Z · LW · GW

Thanks Scott

Comment by Alex Flint (alexflint) on Conjecture: a retrospective after 8 months of work · 2022-11-26T21:10:44.484Z · LW · GW

Thanks for writing this.

Alignment research has a track record of being a long slow slog. It seems that what we’re looking for is a kind of insight that is just very very hard to see, and people who have made real progress seem to have done so through long periods of staring at the problem.

With your two week research sprints, how do you decide what to work on for a given sprint?

Comment by Alex Flint (alexflint) on Charging for the Dharma · 2022-11-11T19:23:53.045Z · LW · GW

Well suffering is a real thing, like bread or stones. It's not a word that refers to a term in anyone's utility function, although it's of course possible to formulate utility functions that refer to it.

Comment by Alex Flint (alexflint) on EA (& AI Safety) has overestimated its projected funding — which decisions must be revised? · 2022-11-11T19:17:51.655Z · LW · GW

The direct information I'm aware of is (1) CZ's tweets about not acquiring, (2) SBF's own tweets yesterday, (3) the leaked P&L doc from Alameda. I don't think any of these are sufficient to decide "SBF committed fraud" or "SBF did something unethical". Perhaps there is additional information that I haven't seen, though.

(I do think that if SBF committed fraud, then he did something unethical.)

Comment by Alex Flint (alexflint) on Adversarial epistemology · 2022-11-11T16:23:29.443Z · LW · GW

If you view people as machiavelian actors using models to pursue goals then you will eventually find social interactions to be bewildering and terrifying, because there actually is no way to discern honesty or kindness or good intention if you start from the view that each person is ultimately pursuing some kind of goal in an ends-justify-means way.

But neither does it really make sense to say "hey let's give everyone the benefit of the doubt because then such-and-such".

I think in the end you have to find a way to trust something that is not the particular beliefs or goals of a person.

Comment by Alex Flint (alexflint) on Charging for the Dharma · 2022-11-11T16:10:24.923Z · LW · GW

In Buddhist ideology, the reason to pick one set of values over another is to find an end to suffering. The Buddha claimed that certain values tended to lead towards the end of suffering and other values tended to lead in the opposite direction. He recommended that people check this claim for themselves.

In this way values are seen as instrumental rather than fundamental in Buddhism -- that is, Buddhists pick values on the basis of the consequences of holding those values, rather than any fundamental rightness of the values themselves.

Now you may say that the "end of suffering" is itself a value; that there is nothing special about the end of suffering except to one who happens to value it. If you take this perspective then you're essentially saying: there is nothing objectively worthwhile in life, only things that certain people happen to value. But if this was true then you'd expect to be able to go through life and see that each seemingly-worthwhile thing is not intrinsically worthwhile, but only worthwhile from a certain parochial perspective. Is that really the case?

Comment by Alex Flint (alexflint) on EA (& AI Safety) has overestimated its projected funding — which decisions must be revised? · 2022-11-11T15:22:09.161Z · LW · GW

There's mounting evidence that FTX was engaged in theft/fraud, which would be straightforwardly unethical.

I think it's way too early to decide anything remotely like that. As far as I understand, we have a single leaked balance sheet from Alameda and a handful of tweets from CZ (CEO of Binance) who presumably got to look at some aspect of FTX internals when deciding whether to acquire. Do we have any other real information?

Comment by Alex Flint (alexflint) on EA (& AI Safety) has overestimated its projected funding — which decisions must be revised? · 2022-11-11T15:17:27.483Z · LW · GW

I'm curious about this too. I actually have the sense that overall funding for AI alignment was already larger than overall shovel-ready projects before FTX was involved. This is normal and expected in a field that many people is working on an important problem but where most of the work is funding for research, and where hardly anyone has promising scalable uses for money.

I think this led a lot of prizes being announced. A prize is a good way to fund if you don't see enough shovel-ready projects to exhaust your funding. You offer prizes for anyone who can formulate and execute new projects, hence enticing people who weren't previously working on the problem to start working on the problem. This is a pretty good approach IMO.

With the collapse of FTX, I guess a bunch of prizes will go away.

What else? I'm interested.

Comment by Alex Flint (alexflint) on Counterfactability · 2022-11-08T14:30:53.401Z · LW · GW

Regarding your point on ELK: to make the output of the opaque machine learning system counterfactable, wouldn't it be sufficient to include the whole program trace? Program trace means the results of all the intermediate computations computed along the way. Yet including a program trace wouldn't help us much if we don't know what function of that program trace will tell us, for example, whether the machine learning system is deliberately deceiving us.

So yes it's necessary to have an information set that includes the relevant information, but isn't the main part of the (ELK) problem to determine what function of that information corresponds to the particular latent variable that we're looking for?

Comment by Alex Flint (alexflint) on Counterfactability · 2022-11-08T14:22:50.317Z · LW · GW

If I understand you correctly, the reason that this notion of counterfactable connects with what we normally call a counterfactual is that when an event screens of its own history, it's easy to consider other "values" of the "variable" underlying that event without coming into any logical contradictions with other events ("values of other variables") that we're holding fixed.

For example if I try to consider what would have happened if there had been a snow storm in Vermont last night, while holding fixed the particular weather patterns observed in Vermont and surrounding areas on the preceding day, then I'm in kind of a tricky spot, because on the one hand I'm considering the weather patterns from the previous day as fixed (which did not in fact give rise to a snow storm in Vermont last night), and yet I'm also trying to "consider" a snow storm in Vermont. The closer I look into this the more confused I'm going to get, and in the end I'll find that this notion of "consider a snow storm took place in Vermont last night" is a bit ill-defined.

What I would like to say is: let's consider a snow storm in Vermont last night; in order to do that let's forget everything that would mess with that consideration.

My question for you is: in the world we live in, the full causal history of any real event contains almost the whole history of Earth from the time of the event backwards, because the Earth is so small relative to the speed of light, and everything that could have interacted with the event is part of the history of that event. So in practice, won't all counterfactable events need to be a more-or-less a full specification of the whole state of the world at a certain point in time?

Comment by Alex Flint (alexflint) on Counterarguments to the basic AI x-risk case · 2022-11-07T15:15:22.984Z · LW · GW

I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).

I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.

I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system that carefully balances itself and the whole world in such a way that does not slip down the evolutionary slope towards pursuing psychopathic goals by any means necessary?

I think this balancing is possible, it's just not the default attractor, and the default attractor seems to have a huge basin.

Comment by Alex Flint (alexflint) on Counterarguments to the basic AI x-risk case · 2022-11-07T15:08:51.794Z · LW · GW

I really appreciate this post!

For instance, employers would often prefer employees who predictably follow rules than ones who try to forward company success in unforeseen ways.

Fascinatingly, EA employers in particular seem to seek employees who do try to forward organization goals in unforeseen ways!

Comment by Alex Flint (alexflint) on Mind is uncountable · 2022-11-03T14:00:39.087Z · LW · GW

Yeah right. There is something about existing within a spatial world that makes it reasonable to have a bunch of bodies operating somewhat independently. The laws of physics seem to be local, and they also place limits on communication across space, and for this reason you get, I suppose, localized independent consciousness.

Comment by Alex Flint (alexflint) on Mind is uncountable · 2022-11-02T20:40:29.236Z · LW · GW

Agreed, but what is it about the structure of the world that made it the case that this Cartesian simplification works so much of the time?

Comment by Alex Flint (alexflint) on Mind is uncountable · 2022-11-02T18:32:53.416Z · LW · GW

It's a very interesting point you make because we normally think of our experience as so fundamentally separate from others. Just to contemplate conjoined twins accessing one anothers' experiences but not have identical experiences really bends the heck out of our normal way of considering mind.

Why is it, do you think, that we have this kind of default way of thinking about mind as Cartesian in the first place? Where did that even come from?