Posts

David Wolpert on Knowledge 2021-09-21T01:54:58.095Z
Comments on Jacob Falkovich on loneliness 2021-09-16T22:04:58.773Z
The Blackwell order as a formalization of knowledge 2021-09-10T02:51:16.498Z
AI Risk for Epistemic Minimalists 2021-08-22T15:39:15.658Z
The inescapability of knowledge 2021-07-11T22:59:15.148Z
The accumulation of knowledge: literature review 2021-07-10T18:36:17.838Z
Agency and the unreliable autonomous car 2021-07-07T14:58:26.510Z
Musings on general systems alignment 2021-06-30T18:16:27.113Z
Knowledge is not just precipitation of action 2021-06-18T23:26:17.460Z
Knowledge is not just digital abstraction layers 2021-06-15T03:49:55.020Z
Knowledge is not just mutual information 2021-06-10T01:01:32.300Z
Knowledge is not just map/territory resemblance 2021-05-25T17:58:08.565Z
Problems facing a correspondence theory of knowledge 2021-05-24T16:02:37.859Z
Concerning not getting lost 2021-05-14T19:38:09.466Z
Understanding the Lottery Ticket Hypothesis 2021-05-14T00:25:21.210Z
Agency in Conway’s Game of Life 2021-05-13T01:07:19.125Z
Life and expanding steerable consequences 2021-05-07T18:33:39.830Z
Parsing Chris Mingard on Neural Networks 2021-05-06T22:16:14.610Z
Parsing Abram on Gradations of Inner Alignment Obstacles 2021-05-04T17:44:16.858Z
Follow-up to Julia Wise on "Don’t Shoot The Dog" 2021-05-01T19:07:45.468Z
Pitfalls of the agent model 2021-04-27T22:19:30.031Z
Beware over-use of the agent model 2021-04-25T22:19:06.132Z
Probability theory and logical induction as lenses 2021-04-23T02:41:25.414Z
Where are intentions to be found? 2021-04-21T00:51:50.957Z
My take on Michael Littman on "The HCI of HAI" 2021-04-02T19:51:44.327Z
Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment 2021-01-14T12:58:37.256Z
Reflections on Larks’ 2020 AI alignment literature review 2021-01-01T22:53:36.120Z
Search versus design 2020-08-16T16:53:18.923Z
The ground of optimization 2020-06-20T00:38:15.521Z
Set image dimensions using markdown 2020-06-17T12:37:54.198Z
Our take on CHAI’s research agenda in under 1500 words 2020-06-17T12:24:32.620Z
How does one authenticate with the lesswrong API? 2020-06-15T23:46:39.296Z
Reply to Paul Christiano on Inaccessible Information 2020-06-05T09:10:07.997Z
Feedback is central to agency 2020-06-01T12:56:51.587Z
The simple picture on AI safety 2018-05-27T19:43:27.025Z
Opportunities for individual donors in AI safety 2018-03-31T18:37:21.875Z
Superrationality and network flow control 2013-07-22T01:49:46.093Z
Personality tests? 2012-02-29T09:33:00.489Z
What independence between ZFC and P vs NP would imply 2011-12-08T14:30:44.714Z
Weight training 2011-08-26T15:25:42.166Z
Derek Parfit, "On What Matters" 2011-07-07T16:52:51.007Z
[link] Bruce Schneier on Cognitive Biases in Risk Analysis 2011-05-03T18:37:42.698Z
What would you do with a solution to 3-SAT? 2011-04-27T18:19:51.186Z
[link] flowchart for rational discussions 2011-04-05T09:14:40.772Z
The AI-box for hunter-gatherers 2011-04-02T12:09:42.602Z
I want a better memory. 2011-04-02T11:36:01.147Z
Size of the smallest recursively self-improving AI? 2011-03-30T23:31:52.022Z
Audio from Eliezer's talk at the Oxford Transhumanists 2011-03-29T21:31:35.562Z
rationality for turing machines 2011-03-23T00:38:01.980Z
Enjoying musical fashion: why not? 2011-02-21T16:22:02.891Z

Comments

Comment by alexflint on Comments on Jacob Falkovich on loneliness · 2021-09-20T01:18:09.262Z · LW · GW

Well thank you. I'm glad you enjoyed it. I enjoyed reading this comment.

Out of interest, what parts did you see as ironic or anti-ironic?

Comment by alexflint on Comments on Jacob Falkovich on loneliness · 2021-09-18T00:32:38.630Z · LW · GW

So, what exactly prevents us from creating and maintaining emotionally satisfying non-sexual friendships?

It’s a good question. I don’t know the answer. But it does seem to me that there is a closeness that is in some romantic relationships that is very rare in friendships, even healthy loving long term friendships. So I do think there is something real that is actually difficult here.

Comment by alexflint on All Possible Views About Humanity's Future Are Wild · 2021-09-17T00:36:42.783Z · LW · GW

I very much appreciated this write-up Holden.

Why do you believe that things will eventually stabilize? Perhaps we will always be on the verge of the next frontier, though it may not always be a spatial one. Yes there may be periods of lock-in, but even if we are locked-in to a certain religion or something for 100,000 years at a time, that still may look pretty dynamic over a long time horizon.

It seems that your claim ought to be that we will soon lock ourselves into something forever. This is a very interesting claim!

Comment by alexflint on Comments on Jacob Falkovich on loneliness · 2021-09-17T00:16:06.442Z · LW · GW

:D Yes!

Comment by alexflint on The Blackwell order as a formalization of knowledge · 2021-09-13T02:12:05.779Z · LW · GW

Yes I believe everything you have said here is consistent with the way the Blackwell order is defined.

Comment by alexflint on The Blackwell order as a formalization of knowledge · 2021-09-13T02:10:22.377Z · LW · GW

Thank you for the kind words and feedback.

I wonder if the last section could be viewed as a post-garbling of the prior sections...

Comment by alexflint on The Blackwell order as a formalization of knowledge · 2021-09-13T02:09:41.854Z · LW · GW

Yes. Thank you. Fixed.

Comment by alexflint on The Blackwell order as a formalization of knowledge · 2021-09-13T02:09:15.636Z · LW · GW

Yes. Thank you. Fixed.

Comment by alexflint on LessWrong is providing feedback and proofreading on drafts as a service · 2021-09-08T15:39:13.328Z · LW · GW

Thank you for doing this. I just requested a proofread on a draft I'm working on.

When I first clicked "Get Feedback" it didn't do anything. I think this was because I had "Hide Intercom" turned on in my settings. When that setting was on, I saw the following error in my console:

PostSubmit.tsx:93 Uncaught TypeError: window.Intercom is not a function
    at onClick (PostSubmit.tsx:93)
    at Object.ein (react-dom.production.min.js:14)
    at rin (react-dom.production.min.js:14)
    at nin (react-dom.production.min.js:14)
    at cgt (react-dom.production.min.js:15)
    at min (react-dom.production.min.js:52)
    at GAe (react-dom.production.min.js:51)
    at ooe (react-dom.production.min.js:52)
    at Jgt (react-dom.production.min.js:56)
    at ygt (react-dom.production.min.js:287)
    at bgt (react-dom.production.min.js:19)
    at ZAe (react-dom.production.min.js:70)
    at loe (react-dom.production.min.js:69)
    at fc.unstable_runWithPriority (scheduler.production.min.js:19)
    at wM (react-dom.production.min.js:122)
    at vgt (react-dom.production.min.js:287)
    at Cin (react-dom.production.min.js:68)
    at HTMLDocument.n (helpers.ts:87)

When I turned off "Hide Intercom", the "Get Feedback" button worked and I requested feedback.

Comment by alexflint on Provide feedback on Open Philanthropy’s AI alignment RFP · 2021-08-29T19:01:19.349Z · LW · GW

Thank you for posting this Asya and Nick. After I read it I realized that it connected to something that I've been thinking about for a while that seems like it might actually be a fit for this RFP under research direction 3 or 4 (interpretability, truthful AI). I drafted a very rough 1.5-pager this morning in a way that hopefully connects fairly obviously to what you've written above:

https://docs.google.com/document/d/1pEOXIIjEvG8EARHgoxxI54hfII2qfJpKxCqUeqNvb3Q/edit?usp=sharing

Interested in your thoughts.

Feedback from everyone is most welcome, too, of course.

Comment by alexflint on AI Risk for Epistemic Minimalists · 2021-08-27T22:23:52.090Z · LW · GW

Yes. Thank you. Would love to hear more about you work on goal-directedness. Let me know if you're up for chatting.

Comment by alexflint on AI Risk for Epistemic Minimalists · 2021-08-25T17:47:30.005Z · LW · GW

How then would you evaluate the level of existential risk at time X? Is that you would ask whether people at time X believed that there was existential risk?

Comment by alexflint on We need a new philosophy of progress · 2021-08-25T16:01:30.968Z · LW · GW

Jason, wouldn't you say that what we need is an understanding of how to make progress, not optimism about progress?

I mean, we do have an understanding of how to make material progress, and we've made a great deal of material progress over the past few millennia, but surely material progress is not where the marginal action is at just now, right?

Comment by alexflint on AI Risk for Epistemic Minimalists · 2021-08-25T15:51:50.428Z · LW · GW

Well all existential risk is about a possible existential catastrophe in the future, and there are zero existential catastrophes in our past, because if there were then we wouldn't be here. Bioweapons, for example, have never yet produced an existential catastrophe, so how is it that we conclude that there is any existential risk due to bioweapons?

So when we evaluate existential risk over time, we are looking at how close humanity is flirting with danger at various times, and how dis-coordinated that flirtation is.

Comment by alexflint on AI Risk for Epistemic Minimalists · 2021-08-24T23:20:08.396Z · LW · GW

Hey- Look, existential risk doesn't arise from risky technologies alone, but from the combination of risky technologies and a dis-coordinated humanity. And existential risk increases not just when a dis-coordinated humanity develops, say, bioweapons, but also when a dis-coordinated humanity develops the precursors to bioweapons, and we can propagate that backwards.

Now the conclusion I am arguing for in the post is that developing powerful AI is likely to increase existential risk, and the evidence I am leaning on is that rapid technological development has landed us where we are now, and where we are now is that we have a great deal of power over the future of life on the planet, but we are not using that power very reliably due to our dis-coordinated state. The clearest illustration of us not using our power very reliably seems to me to be the fact that the level of existential risk is high, and most of that risk is due to humans.

Most technological developments reduce existential risk, since they provide more ways of dealing with the consequences of something like a meteor impact

Well that is definitely a benefit of technological development, but you should consider ways that most technological developments could increase existential risk before concluding that most technological developments overall reduce existential risk. Generally speaking, it really seems to me that most technological developments give humanity more power, and giving a dis-coordinated humanity more power beyond its current level seems very dangerous. A well-coordinated humanity, on the other hand, could certainly take up more power safely.

Comment by alexflint on AI Risk for Epistemic Minimalists · 2021-08-23T22:24:28.791Z · LW · GW

Seems excellent to me. Thank you as always for your work on the newsletter Rohin.

Comment by alexflint on Agency in Conway’s Game of Life · 2021-08-16T22:20:58.880Z · LW · GW

Yeah, so if every configuration has a unique predecessor that we have conservation of information, because you can you can take some future state and evolve it backwards in time to find any past state, so any information present in the state of the universe at time T can be recovered from any later state, so in that sense information is never lost from the universe as a whole.

This means that if I know only that the universe is one of N possible states at some time T, then if I evolve the universe forwards, there are still exactly N possible states that world could be in, since by time-reversibility I could rewind each of those states and expect to get back to the N original states from time T. This is what Eliezer refers to as "phase space volume is preserved under time evolution".

This in turn implies the second thermodynamics, because among all possible configurations of the whole universe, there are only a small number of configurations with short descriptions, but many configurations with long descriptions (since there are fewer short descriptions than long descriptions), so it can never be that a randomly-selected long-description configuration is likely to evolve over time to a short-description configuration, since two configurations can never evolve to the same future configuration, so there are too few short-description configurations to be shared among the astronomically more numerous long-description configurations.

Our universe, remarkably, does have time-reversibility. It is called unitarity in quantum physics, but even in ordinary Newtonian mechanics you can imagine a bunch of billiard balls bouncing around on a frictionless table and see that if you knew the exact velocity of each ball then you could reverse all the velocities and play the whole thing backwards in time.

The black hole information paradox is called a paradox because general relativity says that information is lost in a black hole, but quantum mechanics says that information is never lost under any circumstances.

Comment by alexflint on Agency in Conway’s Game of Life · 2021-08-16T17:22:25.337Z · LW · GW

Yup, Life does not have time-reversibility, so it does not preserve the phase space volume under time evolution, so it does not obey the laws of thermodynamics that exist under our physics.

But one could still investigate whether there is some analog of thermodynamics in Life.

There also is a cellular automata called Critters that does have time reversibility.

Comment by alexflint on The ground of optimization · 2021-08-16T17:17:43.072Z · LW · GW

Thank you for this comment Chantiel. Yes, a container that engineered to evaporate water poured anywhere into it and condense it into a central area would be an optimizing system by my definition. That is a bit like a ball rolling down a hill, which is also an optimizing system and also has nothing resembling agency. I am

The bottle cap example was actually about putting a bottle cap onto a bottle and asking whether, since the water now stays inside the bottle, it should be considered an optimizer. I pointed out that this would not qualify as an optimizing system because if you moved a water molecule from the bottle and place it outside the bottle, the bottle cap would not act to put it back inside.

Comment by alexflint on The accumulation of knowledge: literature review · 2021-07-31T01:57:58.070Z · LW · GW

Yeah nice, thank you for thinking about this and writing this comment, Lorenzo.

an extension of this definition is enforcing a maximum effort E required to extract K

I think this is really spot on. Suppose that I compare the knowledge in (1) a Chemistry textbook, (2) a set of journal papers from which one could, in principle, work out everything from the textbook, (3) the raw experimental data from which one could, in principle, work out everything from the journal papers, (4) the physical apparatus and materials from which one could, in principle, extract all the raw experimental data by actually performing experiments. I think that the number of yes/no questions that one can answer given access to (4) is greater than the number of yes/no questions that one can answer given access to (3), and so on for (2) and (1) also. But answering questions based on (4) requires more effort than (3), which requires more effort than (2), which requires more effort than (1).

We must also somehow quantify the usefulness or generality of the questions that we are answering. There are many yes/no questions that we can answer easily with access to (4), such as "what is the distance between this particular object and this other particular object?", or "how much does this particular object weigh?". But if we are attempting to make decisions in service of a goal, the kind of questions we want to answer are more like "what series of chemical reactions must I perform to create this particular molecule?" and here the textbook can give answers with much lower effort than the raw experimental data or the raw materials.

Would be very interested in your thoughts on how to define effort, and how to define this generality/usefulness thing.

Comment by alexflint on Agency in Conway’s Game of Life · 2021-07-31T01:43:28.085Z · LW · GW

If you can create a video of any of your constructions in Life, or put the constructions up in a format that I can load into a simulator at my end, I would be fascinated to take a look at what you've put together!

Comment by alexflint on The inescapability of knowledge · 2021-07-15T15:26:12.379Z · LW · GW

Gunnar- yes I think this is true, but it's really surprisingly difficult to operationalize this. Here is how I think this plays out:

Suppose that we are recording videos of some meerkats running around in a certain area. One might think that the raw video data is not very predictive of the future, but that if we used the video data to infer the position and velocity of each meerkat, then we could predict the future position of the meerkats, which would indicate an increase in knowledge compared to just storing the raw data. And I do think that this is what knowledge means, but if we try to operationalize this "predictive" quality in terms of a correspondence between the present configuration of our computer and the future configuration of the meerkats then the raw data will actually have higher mutual information with future configurations than the position-and-velocity representation will.

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-07-15T14:09:54.332Z · LW · GW

Well if I learn that my robot vacuum is unexpectedly building a model of human psychology then I'm concerned whether or not it in fact acts on that model, which means that I really want to define "knowledge" in a way that does not depend on whether a certain agent acts upon it.

For the same reason I think it would be natural to say that the sailing ship had knowledge, and that knowledge was lost when it sank. But if we define knowledge in terms of the actions that follow then the sailing ship never had knowledge in the first place.

Now you might say that it was possible that the sailing ship would have survived and acted upon its knowledge of the coastline, but imagine a sailing ship that, unbeknownst to it, is sailing into a storm in which it will certainly be destroyed, and along the way is building an accurate map of the coastline. I would say that the sailing ship is accumulating knowledge and that the knowledge is lost when the sailing ship sinks. But the attempted definition from this post would say that the sailing ship is not accumulating knowledge at all, which seems strange.

It's of course important to ground out these investigations in practical goals or else we end up in an endless maze of philosophical examples and counter-examples, but I do think this particular concern grounds out in the practical goal of overcoming deception in policies derived from machine learning.

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-07-13T15:27:23.218Z · LW · GW

Your summaries are excellent Rohin. This looks good to me.

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-12T15:13:54.602Z · LW · GW

I think that I should modify 5a from "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters" to "Search to a proof of this sentence, using your model of the world as a set of starting assumptions". This is indeed a significant change to the algorithm and I thank you for pointing it out. I think that would resolve your second concern, about the problem with 5a itself, yes?

I think it might also resolve your first concern, about the unsoundness of the logical system, because the agent does not have a deduction rule that says that if P can be proved then P is true, but rather reasons from the fact that P can be proved, and the fact that this particular agent chooses its actions based on proof search and will behave in such-and-such a way if it finds a proof of this particular P, to conclude P. This only work for certain particular sentences P, such as the one that is constructed in this story, since this particular sentence P is one that, if the agent finds a proof of it, will cause it to take actions that lead to P itself being true.

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-11T23:12:16.014Z · LW · GW

Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...

Are you referring to step 5a/5b of the algorithm, that says search for a proof and then act based on what is found? I wouldn't exactly characterize this as adding a further deduction rule. But perhaps you are referring to something?

This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.

It's true this gets a bit more finicky in the resource-bounded case. Critch's paper on resource-bounded Lob goes into this. I haven't looked into this very much.

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-07-11T02:48:31.106Z · LW · GW

The sequence is now complete.

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-09T04:05:57.040Z · LW · GW

The statements "not P" and "not P OR Q" might have the same truth value, but they are not logically equivalent. At this point in the proof, saying "THEN I will arrive at 3pm" is arbitrary, and could have been "THEN pigs will fly." I think that's what's known as the principle of explosion, but I could be wrong about that term.

OK I agree this is very much the crux of the issue. However:

  • First, yes, I agree, it could have been anything, including "THEN pigs will fly".
  • Second, the way that "IF .. THEN" is defined in propositional or first order seems not to capture quite what we mean by those words in ordinary language. I think this is part of what you are pointing out.
  • But taking the rules of first order logic as a given, it really is valid to derive from "P" to "P or Q", because in all the worlds where "P" is true, "P or Q" is also true, which is what it means for it to be valid to derive from one thing to another. And for the same reason it really is valid to derive from "not P" to "if P then Q", for an arbitrary Q.
  • In some previous discussions of the 5-and-10 problem, some people have concluded that what needs to be done is to rework the definition of "IF ... THEN" in first order logic. I think it is quite illuminating to attempt something like this, but I don't think this ultimately resolves the issue.
  • I haven't yet seen any completely satisfying resolution of this problem, but I do think that it's a productive problem to work at, and also I find it fun, so thank you for doing it with me :)
Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-08T16:47:11.397Z · LW · GW

Why are you talking about emulation? There are lots of ways to analyze a circuit diagram other than emulation. The autonomous car in the story does not use emulation.

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-08T15:49:24.720Z · LW · GW

Do you mean that since the computer is using some of its processing power to analyze its own circuitry, it cannot therefore dedicate 100% of its processing power to any other process running on the system? Or do you mean that it cannot use all of its processing power to analyze its own circuitry, even if there are no other processes running on the system?

The former I agree with, but that is true of any process whatsoever -- e.g. if the computer is running a game of solitaire then that process will use up some non-zero amount of processing power, which the computer therefore cannot devote to any other process.

The latter I disagree with. Let's say that the computer is running an analysis of its circuitry that can be expressed as a SAT problem (as many real-world hardware verification problems can be). So the computer loads the representation of its circuitry, constructs a SAT problem, then begins searching over the possible variable assignments looking for a solution. Why couldn't the computer dedicate its full capacity at full speed to this search?

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-07T23:20:47.181Z · LW · GW

Sure, but those aren't computations. they're compressed representations, which requires a lot more resources to simulate. And if that circuit/source includes the execution of the resulting structures (aka emulation), we're back to recursion, in which each iteration consumes additional resources.

Can you give me an example of something that cannot be done by a computer that is attempting to reason about its own circuitry?

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-07T23:16:03.385Z · LW · GW

I think your definition of perfect model is a bit off - a circuit diagram of a computer is definitely not a perfect model of the computer! The computer itself has much more state and complexity, such as temperature of the various components, which are relevant to the computer but not the model.

But there is no fundamental law of physics that says that the computer cannot maintain estimates of the temperature of each of its components.

Now it is true that a computer cannot store the exact physical state of every atom that constitutes it, since storing that much information would indeed require the full physical expressivity of every atom that constitutes it, and there would indeed be no room left over to do anything else.

But digital computers are designed precisely so that their behavior can be predicted without needing to track the exact physical state of every atom that constitutes them. Humans are certainly able to reason about computers in quite a bit of detail without tracking the exact physical state of every atom that constitutes them. And there is no reason that a digital computer couldn't store and use a self-model with at least this level of predictive power.

Containing a copy of your source code is a weird definition of a model. All programs contain their source code, does a program that prints it source code have more of a model of itself than other programs, which are just made of their source code?

Well sure, containing a copy of your own source code on its own is not really a "self-model", but it does show that there is nothing fundamentally blocking you from analyzing your own source code, including proving things about your own behavior. There is nothing fundamentally paradoxical or recursive about this. The claim was made that things cannot contain perfect models of themselves, but in fact things can contain models of themselves that are sufficiently to reason about.

The key word is "perfect" - to fit a model of a thing inside that thing, the model must contain less information than the thing does.

Well yes, but the autonomous car in the parable didn't go wrong by having an insufficiently perfect self-model. It had a self-model quite sufficient to make all the predictions it needed to make. If the self-model had been more detailed, the car still would have gone wrong. Even if it had a truly perfect self-model, even though such a thing is not permitted under our laws of physics, it still would have gone wrong. So the problem isn't about the inability for things to contain models of themselves. It's about how those self-models are used.

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-07T18:34:15.872Z · LW · GW

Thanks - fixed!

Comment by alexflint on Agency and the unreliable autonomous car · 2021-07-07T18:31:13.449Z · LW · GW

the builders weren't trying to make a car pick a route, they were trying to ... I don't actually know what.

Well there is a perspective from which the algorithm used here can seem very compelling. After all, what could make more sense than considering each possible action, searching for a proof that connects each action to its consequences, and then picking the action with the most desirable consequences? This is what the proof search algorithm does. It is deeply surprising that it fails so completely. I should have made that clearer in the post.

recursion without exit conditions is taught as a bad thing to elementary school coders

The algorithm here is not actually recursive. The proof search is just a straightforward loop through all possible proofs up to some length limit. Checking the validity of a proof means checking whether each step follows deductively from the previous steps. It does not involve any recursion, not even one level of recursion, and certainly not recursion without exit conditions.

OF COURSE a thing can't contain a perfect model of itself plus any other stuff

In what way is it not possible for a thing to contain a perfect model of itself? A computer can certainly contain a circuit diagram of itself on a file on its hard drive. A program can certainly contain a copy of its own source code inside a string. Humans are capable of comprehending their own anatomy and neurology, apparently without any fixed limit. In what way can a thing not contain a model of itself?

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-07-07T18:17:13.590Z · LW · GW

It's actually written, just need to edit and post. Should be very soon. Thanks for checking on it.

Comment by alexflint on Musings on general systems alignment · 2021-07-02T06:46:55.782Z · LW · GW

Agreed, but there are additional considerations here. The way that we interact with the wider world is influenced by the stories we tell ourselves about our relationship with the world, so narratives about our relationship with the world affect not just our sense of whether we are doing a good job, but also the tone with which we speak to the world, the ambition of our efforts, and the emotional impact of what we hear back from the world.

If we tell ourselves stories in which the world is mostly not on our side then we will speak to the world coercively, we'll shy away from attempting big things, and we'll be gradually worn down as we face difficulties.

But if we see, correctly, I believe, that most people actually have brief moments in which they can appreciate the dangers of powerful agentic systems being developed through ham-fisted engineering methods, and that the most switched-on people in the world seem to be turning to this particular community on these issues, then we might adopt quite a different internal demeanor as we approach these problems, not because we give ourselves some particular amount of credit for our past efforts, but because we see the world as fundamentally friendly to our efforts, without underestimating the depth and reality of the problems that need to be resolved.

I think this issue of friendliness is really the most central point. So far as I can tell, it makes a huge difference to see clearly what it is in the world that is fundamentally friendly to one's efforts. Of course it's also critical not to mistake that which is not friendly to our efforts as being friendly to our efforts. But if one doesn't see that which is friendly towards us, then things just get lonely and exhausting real fast, which is doubly tragic because there is in fact something very real that really is deeply friendly towards our efforts.

Comment by alexflint on Musings on general systems alignment · 2021-07-01T14:33:15.640Z · LW · GW

It's certainly not about feeling good by identifying with a certain group. That helps no one, I agree.

But I'm sorry, the world simply is turning to this community for leadership. That is a thing that is happening in the world. There is a lot of very clear evidence. It is an extremely startling thing. It's not about anyone getting or not getting credit. It's about seeing things as they are.

Comment by alexflint on Precognition · 2021-06-30T22:01:58.474Z · LW · GW

Thanks for this lovely post Jason.

Fundamentally, I think that the reason it is difficult to consistently pay attention to the present is that doing so means accepting things as they are, and generally that is very difficult for humans to do, since we are often attached to certain comfortable beliefs about the world, and we are willing to ignore what is actually happening in order to maintain these comfortable beliefs. Of course this doesn't serve us in the long term, but as you point out, we still do it.

So in terms of the means to consistently pay attention to the present, I would put "drop attachments" at the top of the list.

Comment by alexflint on Musings on general systems alignment · 2021-06-30T21:25:19.207Z · LW · GW

Yes, that is an incredibly important issue in my view. I would consider the construction of an AI that took a view of extreme environmentalism and went on to kill large numbers of humans a terrible error. In fact I would consider the construction of an AI that would take any particular operationalization of some "objective good" through to the end of the world would be a very big error, since it seems to me that any particular operationalization of "good" leads, eventually, to something that is very obviously not good. You can go case-by-case and kind of see that each possible operationalization of "good" misses the mark pretty catastrophically, and then after a while you stop trying.

Yet we have to build things in the world somehow, and anything we build is going to operationalize its goals somehow, so how can possibly proceed? This is why I think this issue deserves the mantle of "the hard problem of alignment".

It doesn't necessarily help to replace "goodness" with "friendliness", although I do agree that "friendliness" seems like a better pointer towards the impossibly simple kind of benevolence that we seek to create.

A second point I think is underlying your comment (correct me if I'm wrong) is that perhaps there is some objective good, but that it isn't good for us (e.g. extreme environmentalism). I think this is a very reasonable concern if we imagine that there might be some particular operationalization of objective goodness that is the one-and-only final operationalization of objective goodness. If we imagine that such an operationalization might one day be discovered by us or by an AI, then yes, it's well worth asking whether this operationalization if in fact good for us. But luckily I don't think any such final operationalization of objective goodness exists. There just is no such thing, in my view.

Our task, then, in my view, is to make sure we don't build powerful systems that behave as though there is some final operationalization of objective goodness. Yet it seems that any tangible system whatsoever is going to behave according to some kind of operationalization of terminal goals implicit in its design. So if these two claims are both true then how the heck do we proceed? This is again what I am calling the "hard problem of alignment".

Comment by alexflint on Knowledge is not just precipitation of action · 2021-06-19T22:36:52.210Z · LW · GW

Well most certainly yes, but what does that actually look like at the level of physics? How do I determine the extent to which my robot vacuum is forming beliefs that pay rent in the form of anticipated experiences? And most importantly, what if I don't trust it to answer questions truthfully and so don't want to rely on its standard input/output channels?

Comment by alexflint on Knowledge is not just digital abstraction layers · 2021-06-17T15:56:19.208Z · LW · GW

Yes, good point, I overlooked this. I had thought that digital abstraction layers were a neat solution to the self-knowledge issue but they actually are not. Thank you for the note!

Comment by alexflint on Knowledge is not just digital abstraction layers · 2021-06-17T15:54:48.678Z · LW · GW

Yep, agreed.

These are still writing that I drafted before we chatted a couple of weeks ago btw. I have some new ideas based on the things we chatted about that I hope to write up soon :)

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-06-15T16:14:09.018Z · LW · GW

Well here is a thought: a random string would have high Kolmogorov complexity, as would a string describing the most fundamental laws of physics. What are the characteristics of the latter that conveys power over one's environment to an agent that receives it, that is not conveyed by the former? This is the core question I'm most interested in at the moment.

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-06-14T17:30:35.362Z · LW · GW

Well yes I agree that knowledge exists with respect to a goal, but is there really no objective difference an alien artifact inscribed with deep facts about the structure of the universe and set up in such a way that it can be decoded by any intelligent species that might find it, and an ordinary chunk of rock arriving from outer space?

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-06-11T15:56:17.232Z · LW · GW

I very much agree with the emphasis on actionability. But what is it about a physical artifact that makes the knowledge it contains actionable? I don't think it can be simplicity alone. Suppose I record the trajectory of the moon over many nights by carving markings into a piece of wood. This is a very simple representation, but it does not contain actionable knowledge in the same way that a textbook on Newtonian mechanics does, even if the textbook were represented in a less simple way (say, as a PDF on a computer).

Comment by alexflint on Problems facing a correspondence theory of knowledge · 2021-06-03T16:44:50.873Z · LW · GW

Thank you for this comment duck_master.

I take your point that it is possible to extract knowledge about human affairs, and about many other things, from the quantum structure of a rock that has been orbiting the Earth. However, I am interested in a definition of knowledge that allows me to say what a given AI does or does not know, insofar as it has the capacity to act on this knowledge. For example, I would like to know whether my robot vacuum has acquired sophisticated knowledge of human psychology, since if it has, and I wasn't expecting it to, then I might choose to switch it off. On the other hand, if I merely discover that my AI has recorded some videos of humans then I am less concerned, even if these videos contain the basic data necessary to constructed sophisticated knowledge of human psychology, as in the case with the rock. Therefore I am interested not just in information, but something like action-readiness. I am referring to that which is both informative and action-ready as "knowledge", although this may be stretching the standard use of this term.

Now you say that we might measure more abstract kinds of knowledge by looking at what an AI is willing to bet on. I agree that this is a good way to measure knowledge if it is available. However, if we are worried that an AI is deceiving us, then we may not be willing to trust its reports of its own epistemic state, or even of the bets it makes, since it may be willing to lose money now in order to convince us that it is not particularly intelligent, in order to make a treacherous turn later. Therefore I would very much like to find a definition that does not require me to interact with the AI through its input/output channels in order to find out what it knows, but rather allows me to look directly at its internals. I realize this may be impossible, but this is my goal.

So as you can see, my attempt at a definition of knowledge is very much wrapped up with the specific problem I'm trying to solve, and so any answers I arrive at may not be useful beyond this specific AI-related question. Nevertheless, I see this as an important question and so am content to be a little myopic in my investigation.

Comment by alexflint on Agency in Conway’s Game of Life · 2021-06-03T16:24:26.704Z · LW · GW

Thank you for this thoughtful comment itaibn0.

Matter and energy and also approximately homogeneously distributed in our own physical universe, yet building a small device that expands its influence over time and eventually rearranges the cosmos into a non-trivial pattern would seem to require something like an AI.

It might be that the same feat can be accomplished in Life using a pattern that is quite unintelligent. In that case I am very interested in what it is about our own physical universe that makes it different in this respect from Life.

Now it could actually be that in our own physical universe it is also possible to build not-very-intelligent machines that begin small but eventually rearrange the cosmos. In this case I am personally more interested in the nature of these machines than in "intelligent machines", because the reason I am interested in intelligence in the first place is due to its capacity to influence the future in a directed way, and if there are simpler avenues to influence in the future in a directed way then I'd rather spend my energy investigating those avenues than investigating AI. But I don't think it's possible to influence the future in a directed way in our own physical universe without being intelligent.

to solve the control problem in an environment full of intelligence only requires marginally more intelligence at best

What do you mean by this?

the solution to the control problem may even be less intelligent than the structures it competes against, and make up for that with hard-coded solutions to NP-hard problems in military strategy.

But if one entity reliably outcompetes another entity, then on what basis do you say that this other entity is the more intelligent one?

Comment by alexflint on Knowledge is not just map/territory resemblance · 2021-05-26T16:41:45.038Z · LW · GW

Thank you for the kind words Jemist.

Yeah I'm open to improvements upon the use of the word "knowledge" because you're right that what I'm describing here isn't quite what either philosophers or cognitive scientists refer to as knowledge.

Yes knowledge-accumulating systems do seem to be a special case of optimizing systems. It may be that among all optimizing systems, it is precisely the ones that accumulate knowledge in the process of optimization that are of most interest to us from an alignment perspective, because knowledge-accumulating optimizing systems are (perhaps) the most powerful of all optimizing systems.

Comment by alexflint on Knowledge is not just map/territory resemblance · 2021-05-26T06:08:38.660Z · LW · GW

Dang, the images in this post are totally off. I have a script that converts a google doc to markdown, then I proofread the markdown, but the images don't show up in the editor, and it looks like my script is off. Will fix tomorrow.

Update: fixed

Comment by alexflint on Concerning not getting lost · 2021-05-25T18:02:30.293Z · LW · GW

Romeo I listened to your dialog. I hear that you are working on the is/ought gap from the ought side. If I may ask, what is the purpose of this work?