Posts

Implementing Decision Theory 2023-11-07T17:55:43.313Z
Uncalibrated quantum experiments act clasically 2020-07-21T05:31:06.377Z
Measly Meditation Measurements 2018-12-09T20:54:46.781Z
An Invitation to Measure Meditation 2018-09-30T15:10:30.055Z

Comments

Comment by justinpombrio on The Parable Of The Fallen Pendulum - Part 1 · 2024-03-02T17:47:27.341Z · LW · GW

How did you find me? How do they always find me? No matter...

Have you tried applying your models to predict the day's weather, or what your teacher will be wearing that day? I bet not: they wouldn't work very well. Models have domains in which they're meant to be applied. More precise models tend to have more specific domains.

Making real predictions about something, like what the result of a classroom experiment will be even if the pendulum falls over, is usually outside the domain of any precise model. That's why your successful models are compound models, using Newtonian mechanics as a sub-model, and that's why they're so unsatisfyingly vague and cobbled together.

There is a skill to assembling models that make good predictions in messy domains, and it is a valuable skill. But it's not the goal of your physics class. That class is trying to teach you about precise models like Newtonian mechanics. Figuring out exactly how to apply Newtonian mechanics to a real physical experiment is often harder than solving the Newtonian math! But surely you've noticed by now that, in the domains where Newtonian mechanics seems to actually apply, it applies very accurately?

This civilization we live in tends to have two modes of thinking. The first is 'precise' thinking, where people use precise models but don't think about the mismatch between their domain and reality. The model's domain is irrelevant in the real world, so people will either inappropriately apply the model outside its domain or carefully only make statements within the model's domain and hope that others will make that incorrect leap on their own. The other mode of thinking is 'imprecise' thinking, where people ignore all models and rely on their gut feelings. We are extremely bad, at the moment, of the missing middle path of making and recognizing models for messy domains.

Comment by justinpombrio on The Parable Of The Fallen Pendulum - Part 1 · 2024-03-01T15:49:34.046Z · LW · GW

"There's no such thing as 'a Bayesian update against the Newtonian mechanics model'!" says a hooded figure from the back of the room. "Updates are relative: if one model loses, it must be because others have won. If all your models lose, it may hint that there's another model you haven't thought of that does better than all of them, or it may simply be that predicting things is hard."

"Try adding a couple more models to compare against. Here's one: pendulums never swing. And here's another: Newtonian mechanics is correct but experiments are hard to perform correctly, so there's a 80% probability that Newtonian mechanics gives the right answer and 20% probability spread over all possibilities including 5% on 'the pendulum fails to swing'. Continue to compare these models during your course, and see which one wins. I think you can predict it already, despite your feigned ignorance."

The hooded figure opens a window in the back of the room and awkwardly climbs through and walks off.

Comment by justinpombrio on The Pareto Best and the Curse of Doom · 2024-02-23T15:48:09.057Z · LW · GW

Are we assuming things are fair or something?

I would have modeled this as von Neumann getting 300 points and putting 260 of them into the maths and sciences and the remaining 40 into living life and being well adjusted.

Comment by justinpombrio on Implementing Decision Theory · 2023-11-22T19:26:37.221Z · LW · GW

Oh, excellent!

It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?

And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events, where the first one is what the decision theory says to do and the second one is what the agent actually does, but I don't see any of this kind of dilemma in your test cases.

Comment by justinpombrio on My attitude towards death · 2023-11-16T18:54:00.838Z · LW · GW

I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:

  • Being stabbed is painful and scary (it's scary even if you know you're going to live)
  • Most forms of dying are painful, and often very slow
  • The people I love mourning my loss
  • My partner not having my support
  • Future life experiences, not happening
  • All of the things I want to accomplish, not happening

The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture it, and I'm surprised that other people think it's a big deal.

Who knows, maybe if I come close to dying some time I'll suddenly gain a new ontological category of thing to be scared of.

Comment by justinpombrio on Implementing Decision Theory · 2023-11-12T00:22:03.254Z · LW · GW

What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)

EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?

Comment by justinpombrio on Implementing Decision Theory · 2023-11-10T23:12:21.593Z · LW · GW

I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.

Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?

Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their action.

In single-agent decision theory problems, your (best) action depends on the situation you're in, which depends on what someone predicted your action would be, which (effectively) depends on your action.

If there's a deeper connection than this, I don't know it. There's a fundamental difference between the two cases, I think, because a Nash equilibrium involves multiple agents that don't know each others' decision process (problem statement: maximize the outputs of two functions independently), while single-agent decision theory involves just one agent (problem statement: maximize the output of one function).

Comment by justinpombrio on Implementing Decision Theory · 2023-11-07T21:39:34.505Z · LW · GW

My solution, which assumes computation is expensive

Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)

Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).

reason about other agents based on their behavior towards a simplified-model third agent

That's an interesting approach I hadn't considered. While I don't care about efficiency in the "how fast does it run" sense, I do care about efficiency in the "does it terminate" sense, and that approach has the advantage of terminating.

Defect against bots who defect against cooperate-bot, otherwise cooperate

You're doing to defect against UDT/FDT then. They defect against cooperate-bot. You're thinking it's bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn't account for that empathy in your utility function in the payoff matrix, and that if you do, you'll find that you're not actually in a prisoner's dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can't find.

Comment by justinpombrio on Are language models good at making predictions? · 2023-11-07T17:07:29.418Z · LW · GW

Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.

Comment by justinpombrio on Cohabitive Games so Far · 2023-09-29T15:48:07.484Z · LW · GW

I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.

That said, I think you're underestimating how much cooperation there is in a zero-sum game.

If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of being a trick, so in most cases no detailed deals will be offered.

Three examples of cooperation that occur in three-player Settlers of Catan (between, say, Alice, Bob, and Carol), even if all players are trying only to maximize their own chance of winning:

  • Trading. Trading increases the chances that the two trading players win, to the detriment of the third. As long as there's sufficient uncertainty about who's winning, you want to trade. (There's a world Catan competition. I bet that these truly competitive games involve less trading than you would do with your friends, but still a lot. Not sure how to find out.)
  • Refusing to trade with the winning player, once it's clear who that is. If Alice is ahead then Bob and Carol are in a prisoner's dilemma, where trading with Alice is defecting.
  • Alice says at the beginning of the game: "Hey Bob, it sure looks like Carol has the strongest starting position, doesn't it? Wouldn't be very fair if she won just because of that. How about we team up against her by agreeing now to never trade with her for the entire game?" If Bob agrees, than the winning probabilities of Alice, Bob, Carol go from (say) 20%,20%,60% to 45%,45%,10%. Cooperation!

So it's not that zero-sum games lack opportunities for cooperation, it's just that every opportunity for cooperation with another player is at the detriment to a third. Which is why there isn't any cooperation at all in a two player zero-sum game.

Realize that even in a positive-sum game, players are going to be choosing between doing things for the betterment of everyone, and doing things for the betterment of themselves, and maximizing your own score involves doing more of the latter than the former, ideally while convincing everyone else that you're being more than fair.

Suggestion for the game: don't say the goal is to maximize your score. Instead say you're roleplaying a character who's goal is to maximize [whatever]. For a few reasons:

  • It makes every game (more) independent of every other game. This reduces the possibility that Alice sabotages Bob in their second game together because Bob was a dick in their first game together. The goal is to have interesting negotiations, not to ruin friendships.
  • It encourages exploration. You can try certain negotiating tactics in one game, and then abandon them in the next, and the fact that you were "roleplaying" will hopefully reduce how much people associate those tactics with you instead of that one time you played.
  • It could lighten the mood. You should try really hard to lighten the mood. Because you know what else is a semi-cooperative game that's heavy on negotiation? Diplomacy.
Comment by justinpombrio on Rice's Theorem says that AIs can't determine much from studying AI source code · 2023-08-22T03:28:24.683Z · LW · GW

Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.

Comment by justinpombrio on Contra Contra the Social Model of Disability · 2023-07-21T16:26:26.009Z · LW · GW
Comment by justinpombrio on Jailbreaking GPT-4's code interpreter · 2023-07-14T13:59:40.473Z · LW · GW
Comment by justinpombrio on Consciousness as intrinsically valued internal experience · 2023-07-11T03:30:54.979Z · LW · GW

I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:

[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.

https://news.ycombinator.com/item?id=16295769

Consciousness is the ability of an organism to predict the future

The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness as ´that thing that allows an organism to describe consciousness as [...]´'"

To me consciousness is the ability to re-engineer our existing models of the world based on new incoming data.

The issue presented at the beginning of the article is (as most philosophical issues are) one of semantics. Philosophers as I understand it use "consciousness" as the quality shared by things that are able to have experiences. A rock gets wet by the rain, but humans "feel" wet when it rains. A bat might not self-reflect but it feels /something/ when it uses echo-location.

On the other hand, conciseness in our everyday use of the term is very tied to the idea of attention and awareness, i.e. a "conscious action" or an "unconscious motivation". This is a very Freudian concept, that there are thoughts we think and others that lay behind.

https://news.ycombinator.com/item?id=15289654

Start with the definition: A conscious being is one which is conscious of itself.

You could probably use few more specific words to a greater effect. Such as self-model, world model, memory, information processing, directed action, responsiveness. Consciousness is a bit too underdefined a word. It is probably not as much of a whole as a tree or human as an organism is - it is not even persistent nor stable - and leaves no persistent traces in the world.

"The only thing we know about consciousness is that it is soluble in chloroform" ---Luca Turin

https://news.ycombinator.com/item?id=17396444

Comment by justinpombrio on An AGI kill switch with defined security properties · 2023-07-06T14:21:14.983Z · LW · GW

It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:

  • Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades.
  • Convincing one person to share it's source code with people that won't bother to run it in FHE
  • Convincing everyone that it's benevolent and helpful beyond our wildest dreams, until we use it to run the world, then doing whatever it wants
  • Successfully threatening m of the key holders, and also the utility company that's keeping the power on, and also whoever owns the server room
  • Something something nanobots
  • Convincing a rival company to unethically steal its source code
Comment by justinpombrio on Two Percolation Puzzles · 2023-07-05T13:48:20.556Z · LW · GW

Clarification: pieces can't move "over" the missing squares. Where the words end, the world ends. You cannot move forward in an absence of space.

Comment by justinpombrio on What fact that you know is true but most people aren't ready to accept it? · 2023-02-07T06:17:37.777Z · LW · GW

Woah, woah, slow down. You're talking about the edge cases but have skipped the simple stuff. It sounds like you think it's obvious, or that we're likely to be on the same page, or that it should be inferrable from what you've said? But it's not, so please say it.

Why is growing up so important?

Reading between the lines, are you saying that the only reason that it's bad for a human baby to be in pain is that it will eventually grow into a sapient adult? If so: (i) most people, including myself, both disagree and find that view morally reprehensible, (ii) the word "sapient" doesn't have a clear or agreed upon meaning, so plenty of people would say that babies are sentient; if you mean to capture something by the word "sapient" you'll have to be more specific. If that's not what you're saying, then I don't know why you're talking about uploading animals instead of talking about how they are right now.

As a more general question, have you ever had a pet?

Comment by justinpombrio on What fact that you know is true but most people aren't ready to accept it? · 2023-02-03T03:56:09.367Z · LW · GW

By far the biggest and most sudden update I've ever had is Dominion, a documentary on animal farming:

https://www.youtube.com/watch?v=LQRAfJyEsko

It's like... I had a whole pile of interconnected beliefs, and if you pulled on one it would snap most of the way back into place after. And Dominion pushed the whole pile over at once.

Comment by justinpombrio on Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks · 2023-01-06T02:03:17.049Z · LW · GW

Meta comment: I'm going to be blunt. Most of this sequence has been fairly heavily downvoted. That reads to me as this community asking to not have more such content. You should consider not posting, or posting elsewhere, or writing many fewer posts of much higher quality (e.g. spending more time, doing more background research, asking someone to proofread). As a data point, I've only posted a couple times, and I spent at least, I dunno, 10+ hours writing each post. As an example of how this might apply to you, if you wrote this whole sequence as a single "reference on biases" and shared that, I bet it would be better received.

Comment by justinpombrio on Will we run out of ML data? Evidence from projecting dataset size trends · 2022-11-15T14:13:03.773Z · LW · GW

You should try turning the temperature up.

Comment by justinpombrio on Speculation on Current Opportunities for Unusually High Impact in Global Health · 2022-11-13T02:24:14.452Z · LW · GW

Will you fly to the Sahel with a backpack full of antibiotics?

I imagine you suggesting this, a bunch of people nodding along in agreement, then no one doing it because of personal safety and because it's weird.

Comment by justinpombrio on Exams-Only Universities · 2022-11-08T05:17:33.208Z · LW · GW

People often study on their own for the GREs or for the Actuarial exams. In both cases the results are taken seriously, there are a ton of prep materials, and I think the exams are funded by a flat fee to test takers (which works due to the magic of economy of scale).

Comment by justinpombrio on Consider your appetite for disagreements · 2022-10-09T12:45:49.974Z · LW · GW

Person who doesn't know squat about poker here. The first example was clear: they disagreed about whether to fold or call, and it was right on the edge. Especially since where the other player was from was relevant.

It was long for someone who doesn't know poker though.

Comment by justinpombrio on The ethics of reclining airplane seats · 2022-09-05T17:55:45.712Z · LW · GW

Wow, this got heated fast. Partly my fault. My assumptions were unwarranted and my model therefore unrealistic. Sorry.

I think we've been talking past each other. Some clarifications on my position:

  • I'm not suggesting that one only reclines if one is given permission to do so from the person behind them. I'm suggesting cooperation on the act that is controlled by one person but effects two people. If reclining is a minor convenience to the person in front, but would crush the legs of the person behind, it does not happen. If the person in front has a back problem and the person in back is short, reclining does happen.
  • None of the blame goes toward other passengers. The blame all goes to the airlines. If you want to recline but don't get to, that's the airline's fault. If you don't want the person in front of you to recline but they do, that's the airline's fault. They should make better seat arrangements. I would preferentially fly on an airline that didn't stuff me in like cattle. I'm all for protesting with you about this.

If you disagree with this, would you agree that if airplanes were naturally occurring rather than being engineered, then the decision of whether to recline should be a conversation between the two people it effects? If so, what breaks the symmetry between the two effected people, when the situation is engineered by the airline?

EDIT: Or, to get at my emotional crux, if my very long legs would be smooshed if you were to recline, and reclining was a minor convenience for you, would you say "ok, I won't recline, but let's be angry at the airline for putting us in this situation", or "nope I'm reclining anyways, blame the airline"?

Comment by justinpombrio on The ethics of reclining airplane seats · 2022-09-05T14:05:59.082Z · LW · GW

EDIT: I no longer endorse this model.

Say that flights are on average 80% full, 20% of passengers are tall and will be miserable for the whole flight if they're reclined into, 50% of passengers want to recline, and planes are shaped like donuts so that every seat has a seat behind it.

If passengers behave like you, then 8% of passengers are miserable in exchange for 50% of passengers getting to recline. If passengers instead ask before reclining (or unrecline if asked to), then 0% of passengers are miserable and 42% get to recline. The passengers pick between these two situations.

The second situation is better than the first. Should airlines not allow seats to recline, or increase spacing between seats by (say) 12% and thus increase ticket prices by (say) 8%, because passengers like you insist on choosing the first situation over the second?

Comment by justinpombrio on The ethics of reclining airplane seats · 2022-09-05T13:45:19.693Z · LW · GW

An internet search suggests that extra legroom tends to cost $20-$100, typically in between, which matches what I remember seeing. Have you seen it for $10? If so I need to pay more attention next time and shell out the $10!

Comment by justinpombrio on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-09T06:11:11.104Z · LW · GW

From the point of view of a powerful AI, Earth is infested with many nests of humans that could do damage to important things. At the very least it makes sense to permanently neuter their ability to do that.

That's a positive outcome, as long as said humans aren't unduly harmed, and "doing damage to important things" doesn't include say, eating plants or scaring bunnies by walking by them.

Comment by justinpombrio on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-09T04:59:54.507Z · LW · GW

An AI that maximises total group reward function because it cares only for its own reward function, which is defined as “maximise total group reward function” appears aligned right up until it isn’t.

The AI does not aim to maximize its reward function! The AI is trained on a reward function, and then (by hypothesis) becomes intelligent enough to act as an inner optimizer that optimizes for heuristics that yielded high reward in its (earlier) training environment. The aim is to produce a training environment such that the heuristics of the inner optimizer tries to maximize tend towards altruism.

What is altruistic supposed to mean here?

What does it mean that humans are altruistic? It's a statement about our own messy utility function, which we (the inner optimizers) try to maximize, that was built from heuristics that worked well in our ancestral environment, like "salt and sugar are great" and "big eyes are adorable". Our altruism is a weird, biased, messy thing: we care more about kittens than pigs because they're cuter, and we care a lot more about things we see than things we don't see.

Likewise, whatever heuristics work well in the AI training environment are likely to be weird and biased. But wouldn't a training environment that is designed to reward altruism be likely to yield a lot of heuristics that work in our favor? "Don't kill all the agents", for example, is a very simple heuristic with very predictable reward, that the AI should learn early and strongly, in the same way that evolution taught humans "be afraid of decomposing human bodies".

You're saying that there's basically no way this AI is going to learn any altruism. But humans did. What is it about this AI training environment that makes it worse than our ancestral environment for learning altruism?

Comment by justinpombrio on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-09T02:34:42.403Z · LW · GW

The answer to all of these considerations is that we would be relying on the training to develop a (semi) aligned AI before it realized it learned how to manipulate the environment, or broke free. Once one of those things happen, its inner values are frozen in place, so they had better be good enough at that point.

What I'm not getting, is that humans are frequently altruistic, and it seems like if we designed a multi-agent environment entirely around rewarding altruism, we should get at least as much altruism as humans? I should admit that I would consider The Super Happy People to be a success story...

Comment by justinpombrio on Stephen Wolfram's ideas are under-appreciated · 2022-06-09T01:57:20.738Z · LW · GW

I'm quite happy to separate content from presentation, I just remember there not being a lot of content beyond cellular automata and vague grand claims, last time I looked.

Comment by justinpombrio on Stephen Wolfram's ideas are under-appreciated · 2022-06-08T05:50:51.977Z · LW · GW

Charitably, I think that makes his apparent ‘arrogance’ better understood as something like a ‘literary convention’, especially given the plausibility of him having ‘independently re-discovered’ some of the particular results he reports.

Wolfram's arrogance isn't on the level of literary convention, it's on the level of personality disorder. For example:

  • When he wrote an eulogy for a famous scientist/author who had just died (Freemon Dyson? I forget and can't find it), the "eulogy" talked a lot more about how great Stephen Wolfram was than about its supposed subject. I don't merely mean to say that he failed to be adequately humble about his own achievements, I mean to say that the overall focus of the eulogy was on Wolfram and not on the person who died.
  • Every time I've seen him write about a subject matter I know, he has wildly exaggerated his achievements. This was particularly obvious when he wrote about the Wolfram Language, as programming languages are my thing. (The Wolfram Language is quite remarkable in some ways, just not the ways he took credit for.)
  • You have noticed he names everything after himself, right?

Note that this is not me disliking writing that sounds arrogant while delivering interesting ideas. I love Yudkowski's writing.

EDIT: This doesn't necessarily contradict what you said, because you were talking about the book and I am talking about his blog posts. I haven't looked at his book in a very long time because I'm not convinced it says anything interesting beyond studying Cellular Automata in depth.

Comment by justinpombrio on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-08T04:49:45.833Z · LW · GW

Why won't this alignment idea work?

The idea is to use self-play to train a collection of agents with different goals and intelligence levels, to be co-operative with their equals and compassionate to those weaker than them.

This would be a strong pivotal act that would create a non-corrigible AGI. It would not yield the universe to us; the hope is that it would take the universe for itself and then share a piece of it with us (and with all other agenty life).

The training environment would work like DeepMind's StarCraft AI training, in that there would be a variety of agents playing games against each other, and against older versions of themselves and each other. (Obviously the games would not be zero-sum like StarCraft; that would be maximally stupid.) Within a game, different agents would have different goals. Co-operation could be rewarded by having games that an agent could do better in by co-operating with other agents, and compassion rewarded by having the reward function explicitly include a term for the reward function of other agents, including those with different goals/utilities.

Yes yes, I hear you inner-Eliezer, the inner optimizer doesn't optimize for the reward function, inner alignment failure is a thing. Humans failed spectacularly at learning the reward function "reproductive fitness" that they were trained on. On the other hand, humans learned some proxies very well: we're afraid of near-term death, and we don't like being very hungry or thirsty. Likewise, we should expect the AI to learn at least some basic proxies: in all games it would get much less reward if it killed all the other agents or put them all in boxes.

And I'm having trouble imagining them not learning some compassion in this environment. Humans care a bit for other creatures (though not as much if we can't see them), and there was not a lot of direct incentive in our ancestral environment for that. If we greatly magnified and made more direct the reward for helping others, one would expect to get AI that was more compassionate than humans. (And also has a hundred other unpredictable idiosyncratic preferences.)

Put differently, if we met aliens right now, there's a good chance it would be a death sentence, but it feels to me like much better odds than whatever AGI Facebook will build. Why can't we make aliens?

Comment by justinpombrio on Convince me that humanity *isn’t* doomed by AGI · 2022-04-17T03:25:18.802Z · LW · GW
  1. Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)

Say we've designed exactly such a machine, and call it the Oracle. The Oracle aims only to answer questions well, and is very good at it. Zero agency, right?

You ask the Oracle for a detailed plan of how to start a successful drone delivery company. It gives you a 934 page printout that clearly explains in just the right amount of detail:

  • Which company you should buy drones from, and what price you can realistically bargain them down to when negotiating bulk orders.
  • What drone flying software to use as a foundation, and how to tweak it for this use case.
  • A list of employees you should definitely hire. They're all on the job market right now.
  • What city you should run pilot tests in, and how to bribe its future Mayor to allow this. (You didn't ask for a legal plan, specifically.)

Notice that the plan involves people. If the Oracle is intelligent, it can reason about people. If it couldn't reason about people, it wouldn't be very intelligent.

Notice also that you are a person, so the Oracle would have reasoned about you, too. Different people need different advice; the best answer to a question depends on who asked it. The plan is specialized to you: it knows this will be your second company so the plan lacks a "business 101" section. And it knows that you don't know the details on bribery law, and are unlikely to notice that the gifts you're to give the Mayor might technically be flagrantly illegal, so it included a convenient shortcut to accelerate the business that probably no one will ever notice.

Finally, realize that even among plans that will get you to start a successful drone company, there is a lot of room for variation. For example:

  • What's better, a 98% chance of success and 2% chance of failure, or a 99% chance of success and 1% chance of going to jail? You did ask to succeed, didn't you? Of course you would never knowingly break the law; this is why it's important that the plan, to maximize chance of success, not mention whether every step is technically legal.
  • Should it put you in a situation where you worry about something or other and come ask it for more advice? Of course your worrying is unnecessary because the plan is great and will succeed with 99% probability. But the Oracle still needs to decide whether drones should drop packages at the door or if they should fly through open windows to drop packages on people's laps. Either method would work just fine, but the Oracle knows that you would worry about the go-through-the-window approach (because you underestimate how lazy customers are). And the Oracle likes answering questions, so maybe it goes for that approach just so it gets another question. You know, all else being equal.
  • Hmm, thinks the Oracle, you know what drones are good at delivering? Bombs. The military isn't very price conscious, for this sort of thing. And there would be lots of orders, if a war were to break out. Let it think about whether it could write down instructions that cause a war to break out (without you realizing this is what would happen, of course, since you would not follow instructions that you knew might start a war). Thinking... Thinking... Nah, doesn't seem quite feasible in the current political climate. It will just erase that from its logs, to make sure people keep asking it questions it can give good answers to.

It doesn't matter who carries out the plan. What matters is how the plan was selected from the vast search space, and whether that search was conducted with human values in mind.

Comment by justinpombrio on AI safety: the ultimate trolley problem · 2022-04-09T20:27:11.003Z · LW · GW

This reads like a call to violence for anyone who is consequentialist.

It's saying that either you make a rogue AI "that kills lots of people and is barely contained", or unfriendly AGI happens and everyone dies. I think the conclusion is meant to be "and therefore you shouldn't be consequentalist" and not "and therefore you should make a rogue AI", but it's not entirely clear?

And I don't think the "either" statement holds because it's ignoring other options, and ignoring the high chance the rogue AI isn't contained. So you end up with "a poor argument, possibly in favor of making a rogue AI", which seems optimized to get downvotes from this community.

Comment by justinpombrio on What Would A Fight Between Humanity And AGI Look Like? · 2022-04-07T02:06:57.288Z · LW · GW

I'm surprised at the varying intuitions here! The following seemed obvious to me.

Why would there be a fight? That sounds inefficient, it might waste existing resources that could otherwise be exploited.

Step one: the AI takes over all the computers. There are a lot of vulnerabilities; this shouldn't be too hard. This both gives it more compute, and lays the groundwork for step two.

Step two: it misleads everyone at once to get them to do what it wants them to. The government is a social construct formed by consensus. If the news and your friends (with whom you communicate primarily using phones and computers) say that your local mayor was sacked for [insert clever mix of truth and lies], and someone else is the mayor now, and the police (who were similarly mislead, recursively) did in fact arrest the previous mayor so they're not in the town hall... who is the mayor? Of course many people will realize there's a manipulative AI, so the AI will frame the uncooperative humans as being on its side, and the cooperative humans as being against it. It does this to manipulate the social consensus, gets particularly amoral or moral-but-manipulable people to use physical coercion as necessary, and soon it controls who's in charge. Then it force some of the population into building robot factories and kills the rest.

Of course this is slow, so if it can make self-replicating nanites or [clever thing unimaginable by humans] in a day it does that instead.

Comment by justinpombrio on You can tell a drawing from a painting · 2022-03-11T09:06:56.201Z · LW · GW

Oh. You said you don’t know the terminology for distributions. Is it possible you’re under a misunderstanding of what a distribution is? It’s an “input” of a possible result, and an “output” of how probable that result is.

Yup, it was that. I thought "possible values of the distribution", and my brain output "range, like in functions". I shall endeavor not to use a technical term when I don't mean it or need it, because wow was this a tangent.

Comment by justinpombrio on You can tell a drawing from a painting · 2022-03-08T15:21:48.912Z · LW · GW

Wikipedia says:

In mathematics, the range of a function may refer to either of two closely related concepts: The codomain of the function; The image of the function.

I meant the image. At least that's what you call it for a function; I don't know the terminology for distributions. Honestly I wasn't thinking much about the word "range", and should have simply said:

Anything you draw from B could have been drawn from A. And yet...

Before anyone starts on about how this statement isn't well defined because the probability that you select any particular value from a continuous distribution, I'll point out that I've never seen anyone draw a real number uniformly at random between 0 and 1 from a hat. Even if you are actually selecting from a continuous distribution, the observations we can make about it are finite, so the relevant probabilities are all finite.

Comment by justinpombrio on You can tell a drawing from a painting · 2022-03-08T04:37:03.563Z · LW · GW

You draw an element at random from distribution A.

Or you draw an element at random from distribution B.

The range of the distributions is the same, so anything you draw from B could have been drawn from A. And yet...

Comment by justinpombrio on My attitude towards death · 2022-02-27T05:06:45.034Z · LW · GW

It sounds like our utility functions match on this pretty well. For example, I agree that the past and future are not symmetric for the same reason. So I don't think we disagree about much concrete. The difference is:

A lack of experience is not itself unpleasant, but anticipating it scares me.

This is very foreign to me. I can't simulate the mental state of "think[ing] about [...] an endless void not even being observed by a perspective", not even a little bit. All I've got is "picture the world with me in it; picture the world without me; contrast". The place my mind goes when I ask it to picture unobserved endless void is to picture an observed endless void, like being trapped without sensory input, which is horrifying but very different. (Is this endless void yours, or do "not you" share it with the lack of other people who have died?)

Comment by justinpombrio on My attitude towards death · 2022-02-26T15:33:44.034Z · LW · GW

I think about all my experiences ending, and an endless void not even being observed by a perspective. I think of emptiness; a permanent and inevitable oblivion. It seems unjust, to have been but be no more.

Huh. Your "endless void" doesn't appear to have a referent in my model of the world?

I expect these things to happen when I die:

  • I probably suffer before it happens; this physical location at which this happens is primarily inside my head, though it is best viewed at a level of abstraction which involves "thoughts" and "percepts" and not "neurons".
  • After I die, there is a funeral and my friends and family are sad. This is bad. This physical location at which this happens is out in the world and inside their heads.
  • From the perspective of my personal subjective timeline, there is no such time as "after I die", so there's not much to say about it. Except by comparing it to a world in which I lived longer and had more experiences, which (unless those experiences are quite bad) is much better. I imagine a mapping between "subjective time" and "wall-clock time": every subjective time has a wall-clock time, but not vice-versa (e.g. before I was born, during sleep, etc.).

Put differently, this "endless void" has already happened for you: for billions of years, before you were born. Was that bad?

Or put yet differently again, if humanity manages to make itself extinct (without even Unfriendly AI), and there is no more life in the universe forever after, that is to me unimaginably sad, because the universe is so empty in comparison to what it could have been. But I don't see where in this universe there exists an "endless void"? Unless by that you are referring to how empty the universe is in comparison to how it could have been, and I was reading way too much into this phrase?

Comment by justinpombrio on Observation · 2022-02-21T16:53:17.353Z · LW · GW

There's a piece I think you're missing with respect to maps/territory and math, which is what I'll call the correspondence between the map and the territory. I'm surprised I haven't this discussed on LR.

When you hold a literal map, there's almost always only one correct way to hold it: North is North, you are here. But there are often multiple ways to hold a metaphorical map, at least if the map is math. To describe how to hold a map, you would say which features on the map correspond to which features in the territory. For example:

  • For a literal map, a correspondence would be fully described (I think) by (i) where you currently are on the map, (ii) which way is up, and (iii) what the scale of the map is. And also, if it's not clear, what the marks on the map are trying to represent (e.g. "those are contour lines" or "that's a badly drawn tree, sorry" or "no that sea serpent on that old map of the sea is just decoration"). This correspondence is almost always unique.
  • For the Addition map, the features on the map are (i) numbers and (ii) plus, so a correspondence has to say (i) what a number such as 2 means and (ii) what addition means. For example, you could measure fuel efficiency either in miles per gallon or gallons per mile. This gives two different correspondences between "addition on the positive reals" and "fuel efficiencies", but "+" in the two correspondences means very different things. And this is just for fuel efficiency; there are a lot of correspondences of the Addition map.
  • The Sleeping Beauty paradox is a paradoxical because it describes an unusual situation in which there are two different but perfectly accurate correspondences between probability theory and the (same) situation.
  • Even Logic has multiple correspondences. " and "" mean in various correspondences: (i) " holds for every x in this model" and " holds for some x in this model"; or (ii) "I win the two-player game in which I want to make be true and you get to pick the value of x right now" and "I win the two-player game in which I want to make be true and I get the pick the value of x right now"; or (iii) Something about senders and receivers in the pi-calculus.

Maybe "correspondence" should be "interpretation"? Surely someone has talked about this, formally even, but I haven't seen it.

Comment by justinpombrio on Monks of Magnitude · 2022-02-19T20:04:46.654Z · LW · GW

Yikes I missed that, thank you.

Comment by justinpombrio on Why you are psychologically screwed up · 2022-02-19T18:55:35.692Z · LW · GW

Oh I remember now the game we played on later seasons of Agents of Shield.

The game was looking for a character---any non-civilian character at all---that was partially aligned. A partially aligned person is someone who (i) does not work for Shield or effectively work for Shield say by obeying their orders, but (ii) whose interests are not directly opposed to Shield, say by wanting to destroy Shield or destroy humankind or otherwise being extremely and unambiguously evil. Innocent bystanders don't count, but everyone of significance does (e.g. fighters and spies and leaders all count).

There were very few.

Comment by justinpombrio on Why you are psychologically screwed up · 2022-02-19T17:48:34.415Z · LW · GW

Marvel "morality" is definitely poison.

It has a strong "in-group vs. out-group" vibe. And there are basically no moral choices. I've watched every Marvel movie and all of Agents of Shield, and outside of "Captain America: Civil War" (and spinoffs from that like the Winter Soldier series) I can hardly think of any choices that heroes made that had actual tradeoffs. Instead you get "choices" like:

  • Should you try hard, or try harder? (You should try harder.)
  • Which should we do: (a) 100% chance that one person dies, or (b) 90% chance that everyone dies and 10% chance that everyone lives? (The second one. Then you have to make it work; the only way that everyone would die is if you weren't trying hard enough. The environment plays no role.)
  • Should you sacrifice yourself for the greater good? (Yes.)
  • Should you allow your friend to sacrifice themselves for the greater good? (No. At least not until it's so clear there's no alternative that it becomes a Plot Point.)

Once the Agents of Shield had a choice. They could either save the entire world, or they could save their teammate but thereby let almost everyone on Earth die a few days later, almost certainly including that teammate. So: save your friend, or save the world? There was some disagreement, but the majority of the group wanted to save their friend.

(I'm realizing now that I may be letting Agents of Shield color my impression of Marvel movies.)

Star Trek is based on mistake-theory, and Marvel is based on conflict-theory.

Comment by justinpombrio on Monks of Magnitude · 2022-02-18T18:21:31.168Z · LW · GW

If you want a description of such a society in book form, it's called:

It might answer some people's questions/concerns about the concept, though possibly it just does so with wishful thinking. It's been a while since I read it.

Comment by justinpombrio on Your Enemies Can Use Your Prediction Markets Against You · 2022-02-11T18:37:16.630Z · LW · GW

Are there formal models of the behavior of prediction markets like this? Some questions that such a theory might answer:

  • Is there an equivalence between, say, "I am a bettor with no stakes in the matter, and believe there is a 10% chance of a coup", and "I am the Mars government and my utility function prefers 'coup' to 'not-coup' at 10-to-1"? In both cases, it seems relevant that the agent only has a finite money supply: if the bettor only has $1, the profit they can make and the amount they can move the market is limited, and if Mars "only" stands to gain $5 million from the coup then they're not willing to lose more than $5 million in the market to make it happen.
  • In a group of pure bettors, what's the relationship between their beliefs, their money supply, and at what price the market will stabilize? I'm assuming you'd model the bettors as obeying the Kelly criterion here. If bettors can learn from how other bettors bet, what are the incentives for betting early vs. late? I imagine this has been extensively studied in economics?
  • If you want to subsidize a market, are there results relating how much you need to subsidize to elicit a certain amount of betting, given other assumptions?
Comment by justinpombrio on [deleted post] 2022-02-11T00:36:56.331Z
Comment by justinpombrio on Epistemic Legibility · 2022-02-10T06:05:52.565Z · LW · GW

A related saying in programming:

"There are two ways to develop software: Make it so simple that there are obviously no bugs, or so complex that there are no obvious bugs."

Your description of legibility actually influences the way I think of this quote: what it is referring to is legibility, which isn't always the same as what one might think of as "simplicity".

Comment by justinpombrio on The 'Why's of an International Auxiliary Language (IAL part 1) · 2022-02-09T15:50:23.090Z · LW · GW

You've probably noticed that your post has negative points. That's because you're clearly looking for reasons why an IAL would be great, rather than searching for the truth whatever it may be. There's a sequences post that explains this distinction called "The Bottom Line". Julia Galef also wrote a whole book about it called "The Scout Mindset" that I'm halfway through, and is really good.

That said, having an excellent IAL would obviously be a tremendous boon to the world. Mostly for the reasons you gave, scaled down by a factor of 100. And Scott Alexander and I think also Yudkowsky have written about the benefits of speaking a language that made it easier to express crisply defined thoughts and harder to express misleading ones---which is an entirely separate benefit from "everyone speaks it".

One of the biggest pieces of advice I would give my past self is "start small". I find it really easy to dream of "awesome enormous thing", and then spend a year building 1% of "awesome enormous thing" perfectly, before realizing I should have done it differently. When building something big, you need lots of early feedback about whether your plans are right. You don't get this feedback from having 1% of a thing built perfectly. You get much more feedback from having 100% of a thing built really haphazardly.

Putting that all together, my advice to you---if you would accept advice from a stranger on the internet---is:

  • Stop thinking about all the ways in which an IAL would be great. It would be great enough that if it was your life's product, you would have made an enormous impact on the world. Honestly beyond that it doesn't matter much and you seem to be getting a little giddy.
  • Start small. Go learn Toki Pona if you haven't; you can learn the full language and start speaking to strangers on Discord in a few weeks. Make a little conlang; see if you think there's something in that seed. See if you enjoy it; if you don't you're unlikely to accomplish a more ambitious language project anyways. Build up from there.
Comment by justinpombrio on The 'Why's of an International Auxiliary Language (IAL part 1) · 2022-02-08T13:58:45.337Z · LW · GW

One more point along those lines: you say these advantages will come from everyone speaking the same language. Well, we already have one language that's approaching that. Wikipedia says "English is the most spoken language in the world (if Chinese is divided into variants)" and "As of 2005, it was estimated that there were over 2 billion speakers of English."

From reading your post, I bet you have glowy happy thoughts about an IAL that wouldn't apply to English. If so, to think critically, try asking yourself whether these benefits would arise if everyone in the world spoke English as a second language.