You Can Do Futarchy Yourself 2020-06-14T00:16:20.823Z · score: 51 (25 votes)
Tetraspace Grouping's Shortform 2019-08-02T01:37:14.859Z · score: 10 (2 votes)


Comment by tetraspace-grouping on Comparing LICDT and LIEDT · 2020-07-24T16:21:58.425Z · score: 1 (1 votes) · LW · GW

The statement of the law of logical causality is:

Law of Logical Causality: If conditioning on any event changes the probability an agent assigns to its own action, that event must be treated as causally downstream.

If I'm interpreting things correctly, this is just because anything that's upstream gets screened off, because the agent knows what action it's going to take.

You say that LICDT pays the blackmail in XOR blackmail because it follows this law of logical causality. Is this because, conditioned on the letter being sent, if there is a disaster the agent assigns  to sending money, and if there isn't a disaster the agent assigns  to sending money, so the disaster must be causally downstream of the decision to send money if the agent is to know whether or not it sends money?

Comment by tetraspace-grouping on Smoking Lesion Steelman · 2020-07-21T02:43:38.359Z · score: 13 (4 votes) · LW · GW

I didn't find the conclusion about the smoke-lovers and non-smoke-lovers obvious in the EDT case at first glance, so I added in some numbers and ran through the calculations that the robots will do to see for myself and get a better handle on what not being able to introspect but still gaining evidence about your utility function actually looks like.

Suppose that, out of the  robots that have ever been built,  are smoke-lovers and  are non-smoke-lovers. Suppose also the smoke-lovers end up smoking with probability  and non-smoke-lovers end up smoking with probability .

Then  robots smoke, and  robots don't smoke. So by Bayes' theorem, if a robot smokes, there is a   chance that it's killed, and if a robot doesn't smoke, there's a chance that it's killed.

Hence, the expected utilities are:

  • An EDT non-smoke-lover looks at the possibilities. It sees that if it smokes, it expects to get utilons, and that if it doesn't smoke, it expects to get  utilons.
  • An EDT smoke-lover looks at the possibilities. It sees that if it smokes, it expects to get  utilons, and if it doesn't smoke, it expects to get  utilons.

Now consider some equilibria. Suppose that no non-smoke-lovers smoke, but some smoke-lovers smoke. So  and . So (taking limits as  along the way):

  • non-smoke-lovers expect to get  utilons if they smoke, and  utilons if they don't smoke.  so non-smoke-lovers will choose not to smoke.
  • smoke-lovers expect to get  utilons if they smoke, and  utilons if they don't smoke. Smoke-lovers would be indifferent between the two if . This works fine if at least 90% of robots are smoke lovers, and equilibrium is achieved. But if less than 90% of robots are smoke-lovers, then there is no point at which they would be indifferent, and they will always choose not to smoke.

But wait! This is fine if more than 90% are smoke-lovers, but if fewer than 90% are smoke-lovers, then they would always choose not to smoke, that's inconsistent with the assumption that  is much larger than . So instead suppose that  is only only a little bit bigger than , say that . Then:

  • non-smoke-lovers expect to get  utilons if they smoke, and  utilons if they don't smoke. They will choose to smoke if , i.e. if smoke-lovers smoke so rarely that not smoking would make them believe they're a smoke-lover about to be killed by the blade runner.
  • smoke-lovers expect to get   utilons if they smoke, and  utilons if they don't smoke. They are indifferent between these two when . This means that, when  is at the equilibrium point, non-smoke-lovers will not choose to smoke when fewer than 90% of robots are smoke-lovers, which is exactly when this regime applies.

I wrote a quick python simulation to check these conclusions, and it was the case that  for , and  for  there as well.

Comment by tetraspace-grouping on Reductive Reference · 2020-06-25T13:13:16.203Z · score: 1 (1 votes) · LW · GW

Your reliable thermometer doesn't need to be well-calibrated - it only has to show the same value whenever it's used to measure boiling water, regardless of what that value is. So the dependence isn't quite so circular, thankfully.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-05-28T16:58:11.094Z · score: 1 (1 votes) · LW · GW

So the definition of myopia given in Defining Myopia was quite similar to my expansion in the But Wait There's More section; you can roughly match them up by saying and , where is a real number corresponding to the amount that the agent cares about rewards obtained in episode and is the reward obtained in episode . Putting both of these into the sum gives , the undiscounted, non-myopic reward that the agent eventually obtains.

In terms of the definition that I give in the uncertainty framing, this is , and .

So if you let be a vector of the reward obtained on each step and be a vector of how much the agent cares about each step then , and thus the change to the overall reward is , which can be negative if the two sums have different signs.

I was hoping that a point would reveal itself to me about now but I'll have to get back to you on that one.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-05-27T00:27:56.445Z · score: 11 (3 votes) · LW · GW

Thoughts on Abram Demski's Partial Agency:

When I read Partial Agency, I was struck with a desire to try formalizing this partial agency thing. Defining Myopia seems like it might have a definition of myopia; one day I might look at it. Anyway,

Formalization of Partial Agency: Try One

A myopic agent is optimizing a reward function where is the vector of parameters it's thinking about and is the vector of parameters it isn't thinking about. The gradient descent step picks the in the direction that maximizes (it is myopic so it can't consider the effects on ), and then moves the agent to the point .

This is dual to a stop-gradient agent, which picks the in the direction that maximizes but then moves the agent to the point (the gradient through is stopped).

For example,

  • Nash equilibria - are the parameters defining the agent's behavior. are the parameters of the other agents if they go up against the agent parametrized by . is the reward given for an agent going up against a set of agents .
  • Image recognition with a neural network - is the parameters defining the network, are the image classifications for every image in the dataset for the network with parameters , and is the loss function plus the loss of the network described by on classifying the current training example.
  • Episodic agent - are parameters describing the agents behavior. are the performances of the agent in future episodes. is the sum of , plus the reward obtained in the current episode.

Partial Agency due to Uncertainty?

Is it possible to cast partial agency in terms of uncertainty over reward functions? One reason I'd be myopic is if I didn't believe that I could, in expectation, improve some part of the reward, perhaps because it's intractable to calculate (behavior of other agents) or something I'm not programmed to care about (reward in other episodes).

Let be drawn from a probability distribution over reward functions. Then one could decompose the true, uncertain, reward into defined in such a way that for any ? Then this is would be myopia where the agent either doesn't know or doesn't care about , or at least doesn't know or care what its output does to . This seems sufficient, but not necessary.

Now I have two things that might describe myopia, so let's use both of them at once! Since you only end up doing gradient descent on , it would make sense to say , , and hence that .

Since for small , this means that , so substituting in my expression for gives , so . Uncertainly is only over , so this is just the claim that the agent will be myopic with respect to if . So it won't want to include in its gradient calculation if it thinks the gradients with respect to are, on average, 0. Well, at least I didn't derive something obviously false!

But Wait There's More

When writing the examples for the gradient descenty formalisation, something struck me: it seems there's a structure to a lot of them, where is the reward on the current episode, and are rewards obtained on future episodes.

You could maybe even use this to have soft episode boundaries, like say the agent receives a reward on each timestep so , and saying that so that for , which is basically the criterion for myopia up above.

Unrelated Note

On a completely unrelated note, I read the Parable of Predict-O-Matic in the past, but foolishly neglected to read Partial Agency beforehand. The only thing that I took away from PoPOM the first time around was the bit about inner optimisers, coincidentally the only concept introduced that I had been thinking about beforehand. I should have read the manga before I watched the anime.

Comment by tetraspace-grouping on Open & Welcome Thread—May 2020 · 2020-05-26T09:48:57.662Z · score: 3 (3 votes) · LW · GW

The Whole City is Center:

This story had a pretty big impact on me and made me try to generate examples of things that could happen such that I would really want the perpetrators to suffer, even more than consequentialism demanded. I may have turned some very nasty and imaginative parts of my brain, the ones that wrote the Broadcast interlude in Unsong, to imagining crimes perfectly calculated to enrage me. And in the end I did it. I broke my brain to the point where I can very much imagine certain things that would happen and make me want the perpetrator to suffer – not infinitely, but not zero either.
Comment by tetraspace-grouping on A game designed to beat AI? · 2020-05-07T22:05:33.139Z · score: 2 (2 votes) · LW · GW

The AI Box game, in contrast with the thing it's a metaphor for, is a two player game played over text chat by two humans where the goal is for Player A to persuade Player B to let them win (traditionally by getting them to say "I let you out of the box"), within a time limit.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-05-02T10:09:48.008Z · score: 5 (3 votes) · LW · GW

Thoughts on Dylan Hadfield-Menell et al.'s The Off-Switch Game.

  • I don't think it's quite right to call this an off-switch - the model is fully general to the situation where the AI is choosing between two alternatives A and B (normalized in the paper so that U(B) = 0), and to me an off-switch is a hardware override that the AI need not want you to press.
  • The wisdom to take away from the paper: An AI will voluntarily defer to a human - in the sense that the AI thinks that it can get a better outcome by its own standards if it does what the human says - if it's uncertain about the utilities, or if the human is rational.
  • This whole setup seems to be somewhat superseded by CIRL, which has the AI, uh, causally find by learning its value from the human actions, instead of evidentially(?) doing it by taking decisions that happen to land it on action A when is high because it's acting in a weird environment where a human is present as a side-constraint.
    • Could some wisdom to gain be that the high-variance high-human-rationality is something of an explanation as to why CIRL works? I should read more about CIRL to see if this is needed or helpful and to compare and contrast etc.
  • Why does the reward gained drop when uncertainty is too high? Because the prior that the AI gets from estimating the human reward is more accurate than the human decisions, so in too-high-uncertainty situations it keeps mistakenly deferring to the flawed human who tells it to take the worse action more often?
    • The verbal description, that the human just types in a noisily sampled value of , is somewhat strange - if the human has explicit access to their own utility function, they can just take the best actions directly! In practice, though, the AI would learn this by looking at many past human actions (there's some CIRL!) which does seem like it plausibly gives a more accurate policy than the human's (ht Should Robots Be Obedient).
    • The human is Boltzmann-rational in the two-action situation (hence the sigmoid). I assume that it's the same for the multi-action situation, though this isn't stated. How much does the exact way in which the human is irrational matter for their results?
Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-04-19T23:38:15.083Z · score: 13 (5 votes) · LW · GW

PMarket Maker

Just under a month ago, I said "web app idea: one where you can set up a play-money prediction market with only a few clicks", because I was playing around on Hypermind and wishing that I could do my own Hypermind. It then occurred to me that I can make web apps, so after getting up to date on modern web frameworks I embarked in creating such a site.

Anyway, it's now complete enough to use, provided that you don't blow on it too hard. Here it is: Enjoy!

You can create a market, and then create a set of options within that market. Players can make buy and sell limit orders on those options. You can close an option and pay out a specific amount per owned share. There are no market makers, despite the pun in the name, but players start with 1000 internet points that they can use to shortsell.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-04-10T03:01:43.806Z · score: 12 (3 votes) · LW · GW

Thoughts on Ryan Carey's Incorrigibility in the CIRL Framework (I am going to try to post these semi-regularly).

  • This specific situation looks unrealistic. But it's not really trying to be too realistic, it's trying to be a counterexample. In that spirit, you could also just use , which is a reward function parametrized by that gives the same behavior but stops me from saying "Why Not Just set ", which isn't the point.
    • How something like this might actually happen: you try to have your be a complicated neural network that can approximate any function. But you butcher the implementation and get something basically random instead, and this cannot approximate the real human reward.
  • An important insight this highlights well: An off-switch is something that you press only when you've programmed the AI badly enough that you need to press the off-switch. But if you've programmed it wrong, you don't know what it's going to do, including, possibly, its off-switch behavior. Make sure you know under which assumptions your off-switch will still work!
  • Assigning high value to shutting down is incorrigible, because the AI shuts itself down. What about assigning high value to being in a button state?
  • The paper considers a situation where the shutdown button is hardcoded, which isn't enough by itself. What's really happening is that the human either wants or doesn't want the AI to shut down, which sounds like a term in the human reward that the AI can learn.
    • One way to do this is for the AI to do maximum likelihood with a prior that assigns 0 probability to the human erroneously giving the shutdown command. I suspect there's something less hacky related to setting an appropriate prior over the reward assigned to shutting down.
  • The footnote on page 7 confuses me a bit - don't you want the AI to always defer to the human in button states? The answer feels like it will be clearer to me if I look into how "expected reward if the button state isn't avoided" is calculated.
    • Also I did just jump into this paper. There are probably lots of interesting things that people have said about MDPs and CIRLs and Q-values that would be useful.
Comment by tetraspace-grouping on Blog Post Day II Retrospective · 2020-04-01T01:46:00.735Z · score: 5 (3 votes) · LW · GW

I'm interested in participating in a Blog Post Day III! And I approve of one this month, mostly out of a self-interested regret that I missed out on Blog Post Day II.

Comment by tetraspace-grouping on Habryka's Shortform Feed · 2020-01-02T03:18:39.116Z · score: 10 (3 votes) · LW · GW

Since this hash is publicly posted, is there any timescale for when we should check back to see the preimage?

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2020-01-02T03:08:52.116Z · score: 6 (4 votes) · LW · GW

Life 3.0 Liveblog/Review Thread


The prologue begins with a short story called the Tale of the Omega Team. It's a wish-fulfilment pseudo-isekai about a bunch of effective altruist tech people working for not-Google called the Omegas who make an AGI and then use it to take over the world.

But a cybersecurity specialist on their team talked them out of the game plan [...] risk of Prometheus breaking out and seizing control of its own destiny [...] weren't sure how its goals would evolve [...] go to great lengths to keep Prometheus confined

For some reason, the Omegas in the story claim that the Prometheus (the AI) might be unsafe, and then proceed to do things like have it write software which they then run on computers and let it produce long pieces of animated media and let it send blueprints of technologies to scientists. There is a cybersecurity expert in the team who just barely stops them from straight up leaving the whole thing unboxed, and I do not envy her job position.

(Prometheus is safe, it turns out, which I can tell because there are humans alive at the end of the story.)

[...] Omega-controlled [...] controlled by the Omegas [...] the Omegas harnessed Prometheus [...] the Omegas' [...] the Omegas' [...]

There's also another odd thing where it says that the Omegas are using Prometheus as a tool to do things, instead of what's clearly actually happening which is that Prometheus is achieving its goals with the Omegas being some lumps of atoms that it's been pushing around according to its whims, as it has been since they decided to switch it on.

All-in-all, I like it. It wouldn't be out of place on r/rational, if wish-fulfillment pseudo-isekai does happen then AGI sweeping aside the previous social order will be how (a real AGI would come close to some of the capabilities I've seen those protagonists have), and fiction about more plausible robopocalypses (or roboutopias) coming about is always great.

Comment by tetraspace-grouping on A Critique of Functional Decision Theory · 2019-12-24T22:31:02.097Z · score: 1 (1 votes) · LW · GW

The note is just set-dressing; you could have both the boxes have glass windows that let you see whether or not they contain a Bomb for the same conclusions if it throws you off.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-12-23T23:51:07.425Z · score: 9 (2 votes) · LW · GW

In the Parable of Predict-O-Matic, a subnetwork of the titular Predict-O-Matic becomes a mesa-optimiser and begins steering the future towards its own goals, independently of the rest of Predict-O-Matic. It does so in a way that sabotages the other subnetworks.

I am reminded of one specification problem that a run of Eurisko faced:

During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

One thing I wondered is whether this could happen in humans, and if not, why it doesn't. A simplified description of memory that I learned in a flash game is that "neural connections" are "strengthened" whenever they are "used", which sounds sort of like gradients in RL if you don't think about it too hard. Maybe the analogue of this would be some memory that "wants" you to remember it repeatedly at the expense of other memories. Trauma?

Comment by tetraspace-grouping on ozziegooen's Shortform · 2019-12-23T22:06:40.611Z · score: 2 (3 votes) · LW · GW

Other things that Tim might mean when he says 20%:

  • Tim is being dishonest, and believes that the listeners will update away from the radical and low-status figure of 20% to avoid being associated with the lowly Tim.
  • Tim believes that other listeners will be encouraged to make their own probability estimates with explicit reasoning in response, which will make their expertise more legible to Tim and other listeners.
  • Tim wants to show cultural allegiance with the Superforecasting tribe.
Comment by tetraspace-grouping on Should We Still Fly? · 2019-12-22T12:40:42.703Z · score: 2 (2 votes) · LW · GW

Quick estimate: Global average is 4.8 tons per person = $50 additional per year per life saved = ~$1500 total (over 30 additional years of life), so over the course of saving an average person's life the costs if you're buying offsets are the same order as the costs of saving a life via a Givewell charity (~half).

For the people helped by Givewell recommended charities, the additional CO2 emissions are probably lower; among the world's poorest, <1 tons of CO2 per capita per year is pretty common, which is <$300 over a lifetime, about an order of magnitude less than the cost of saving a life.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-12-12T21:18:18.607Z · score: 3 (2 votes) · LW · GW

Over the past few days I've been reading about reinforcement learning, because I understood how to make a neural network, say, recognise handwritten digits, but I wasn't sure how at all that could be turned into getting a computer to play Atari games. So: what I've learned so far. Spinning Up's Intro to RL probably explains this better.

(Brief summary, explained properly below: The agent is a neural network which runs in an environment and receives a reward. Each parameter in the neural network is increased in proportion to how much it increases the probability of making the agent do what it just did, and how good the outcome of what the agent just did was.)

Reinforcement learners play inside a game involving an agent and an environment. On turn , the environment hands the agent an observation , and the agent hands the environment an action . For an agent acting in realtime, there can be sixty turns a second; this is fine.

The environment has a transition function which takes an observation-action pair and responds with a probability distribution over observations on the next timestep ; the agent has a policy that takes an observation and responds with a probability distribution over actions to take .

The policy is usually written as , and the probability that outputs an action in response to an observation is . In practise, is usually a neural network that takes observations as input and has actions as output (using something like a softmax layer to give a probability distribution); the parameters of this neural network are , and the corresponding policy is .

At the end of the game, the entire trajectory is assigned a score, , measuring how well the agent has done. The goal is to find the policy that maximises this score.

Since we're using machine learning to maximise, we should be thinking of gradient descent, which involves finding the local direction in which to change the parameters in order to increase the expected value of by the greatest amount, and then increasing them slightly in that direction.

In other words, we want to find .

Writing the expectation value in terms of a sum over trajectories, this is = , where is the probability of observing the trajectory if the agent follows the policy , and is the space of possible trajectories.

The probability of seeing a specific trajectory happen is the product of the probabilities of any individual step on the trajectory happening, and is hence where is the probability that the environment outputs the observation in response to the observation-action pair . Products are awkward to work with, but products can be turned into sums by taking the logarithm - .

The gradient of this is . But what the environment does is independent of , so that entire term vanishes, and we have . The gradient of the policy is quite easy to find, since our policy is just a neural network so you can use back-propagation.

Our expression for the expectation value is just in terms of the gradient of the probability, not the gradient of the logarithm of the probability, so we'd like to express one in terms of the other.

Conveniently, the chain rule gives , so . Substituting this back into the original expression for the gradient gives


and substituting our expression for the gradient of the logarithm of the probability gives


Notice that this is the definition of the expectation value of , so writing the sum as an expectation value again we get


You can then find this expectation value easily by sampling a large number of trajectories (by running the agent in the environment many times), calculating the term inside the brackets, and then averaging over all of the runs.


(More sophisticated RL algorithms apply various transformations to the reward to use information more efficiently, and use various gradient descent tricks to use the gradients acquired to converge on the optimal parameters more efficiently)

Comment by tetraspace-grouping on Grue_Slinky's Shortform · 2019-10-01T10:36:48.703Z · score: 1 (1 votes) · LW · GW

Are we allowed to I-am-Groot the word "cake" to encode several bits per word, or do we have to do something like repeat "cake" until the primes that it factors into represent a desired binary string?

(edit: ah, only nouns, so I can still use whatever I want in the other parts of speech. or should I say that the naming cakes must be "cake", and that any other verbal cake may be whatever this speaking cake wants)

Comment by tetraspace-grouping on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-28T01:09:36.183Z · score: 2 (2 votes) · LW · GW

Dank EA Memes is a Facebook group. It's pretty good.

Comment by tetraspace-grouping on Follow-Up to Petrov Day, 2019 · 2019-09-28T00:59:41.532Z · score: 19 (10 votes) · LW · GW

If anyone asks, I entered a code that I knew was incorrect as a precommitment to not nuke the site.

Comment by tetraspace-grouping on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T00:38:46.380Z · score: 30 (9 votes) · LW · GW

To make sure I have this right and my LW isn't glitching: TurnTrout's comment is a Drake meme, and the two other replies in this chain are actually blank?

Comment by tetraspace-grouping on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T00:35:05.499Z · score: 12 (4 votes) · LW · GW

Well, at least we have a response to the doubters' "why would anyone even press the button in this situation?"

Comment by tetraspace-grouping on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T23:37:24.075Z · score: 17 (7 votes) · LW · GW


Clicking on the button permanently switches it to a state where it's pushed-down, below which is a prompt to enter launch codes. When moused over, the pushed-down button has the tooltip "You have pressed the button. You cannot un-press it." Screenshot.

(On an unrelated note, on r/thebutton I have a purple flair that says "60s".)

Upon entering a string of longer than 8 characters, a button saying "launch" appears below the big red button. Screenshot.


I'm nowhere near the PST timezone, so I wouldn't be able to reliably pull a shenanigan whereby if I had the launch codes I would enter or not enter them depending on the amount of counterfactual money pledged to the Ploughshares Fund in the name of either launch-code-entry-state, but this sentence is not apophasis.


Conspiracy theory: There are no launch codes. People who claim to have launch codes are lying. The real test is whether people will press the button at all. I have failed that test. I came up with this conspiracy theory ~250 milliseconds after pressing the button.

IV. (Update)

I can no longer see the button when I am logged in. Could this mean that I have won?

Comment by tetraspace-grouping on Novum Organum: Preface · 2019-09-24T01:44:38.049Z · score: 16 (3 votes) · LW · GW

At the start of the Sequences, you are told that rationality is a martial art, used to amplify the power of the unaided mind in the same way that a martial art doesn't necessarily make you stronger but just lets you use your body properly.

Bacon, on the other hand, throws the prospect of using the unaided mind right out; Baconian rationality is a machine, like a pulley or a lever, where you apply your mind however feebly to one end and by its construction the other end moves a great distance or applies a great force (either would do for the metaphor).

If I have my history right, Bacon's machine is Science. Its function is to accumulate a huge mountain of evidence, so big that even a human could be persuaded by it, and instruction in the use of science is instruction in being persuaded by that mountain of evidence. Philosophers of old simply ignored the mountain of evidence (failed to use the machine) and maybe relied on syllogisms and definitions and hence failed to move the stone column.

And later, with the aid of Bacon's machine, it turns out that one discovers that you don't really need this huge mountain of evidence or the systematic stuff and that an ideal reasoner could simply perform a Bayesian update on each bit that comes in and get to the truth way faster, while avoiding all the slowness or all the mistakes that come if you insist on setting up the machine every single time. At your own risk, of course - get your stance slightly wrong lifting a stone column, and you throw your back out.

Comment by tetraspace-grouping on A Critique of Functional Decision Theory · 2019-09-15T14:43:55.051Z · score: 6 (4 votes) · LW · GW

An agent also faces a guaranteed payoffs problem in Parfit's hitchhiker, since the driver has already made their prediction (the agent knows they're safe in the town) so the agent's choice is between losing $1,000 and losing $0. Is it also a bad idea for the agent to pay the $1000 in this problem?

Comment by tetraspace-grouping on ozziegooen's Shortform · 2019-09-10T15:09:12.386Z · score: 1 (1 votes) · LW · GW

There's something of a problem with sensitivity; if the x-risk from AI is ~0.1, and the difference in x-risk from some grant is ~10^-6, then any difference in the forecasts is going to be completely swamped by noise.

(while people in the market could fix any inconsistency between the predictions, they would only be able to look forward to 0.001% returns over the next century)

Comment by tetraspace-grouping on Open & Welcome Thread - September 2019 · 2019-09-06T21:43:49.344Z · score: 1 (1 votes) · LW · GW

Is the issue that it's pain-based and hence makes my life worse (probably false for me: maths is fun and gives me a sense of pride and accomplishment when I do it, it's just that darn System 1 always saying "better for you if you play Kerbal Space Program"), or that social punishment isn't always available and therefore ought not to be relied on (this is probably an issue for me), or some third thing?

Comment by tetraspace-grouping on Open & Welcome Thread - September 2019 · 2019-09-04T20:17:40.376Z · score: 6 (4 votes) · LW · GW

Previously: August.

Dear Diary,

In the intervening month I have done chapters 8 and 9 of Tao's Analysis I, which feels terribly slow. Two chapters in a month? I could do the whole book in that time if I tried! And I know that I can because I have, like I'm getting a physics degree and it definitely feels like I've done at least one textbook worth of learning per term.

One of the active ingredients seems to be time pressure, which is present but not salient here - if I fail, all that happens is the wrong math is deployed to steer the future of the lightcone, which doesn't hold a candle to me losing a little bit of status. Ah, to be a brain.

Thus: by October I'll have finished Analysis I; think less of me if I haven't.

(And perhaps I'll have done even more!)

UPDATE SEP 26: You can rest easy now; I have completed the book.

Comment by tetraspace-grouping on I think I came up with a good utility function for AI that seems too obvious. Can you people poke holes in it? · 2019-08-30T15:34:21.095Z · score: 1 (1 votes) · LW · GW

This AI wouldn't be trying to convince a human to help it, just that it's going to succeed.

So instead of convincing humans that a hell-world is good, it would convince the humans that it was going to create a hell-world (and they would all disapprove, so it would score low).

I think what this ends up doing is having everyone agree with a world that sounds superficially good but is actually terrible in a way that's difficult for unaided humans to realize e.g. the AI convinces everyone that it will create an idyllic natural world where people live forager lifestyles in harmony etc. etc., everyone approves because they like nature and harmony and stuff, it proceeds to create such an idyllic natural world, and wild animal suffering outweighs human enjoyment forevermore.

Comment by tetraspace-grouping on I think I came up with a good utility function for AI that seems too obvious. Can you people poke holes in it? · 2019-08-29T13:14:00.142Z · score: 4 (3 votes) · LW · GW

One thing I'd be concerned about is that there are a lot of possible futures that sound really appealing, and that a normal human would sign off on, but are actually terrible (similar concept: siren worlds).

For example, in a world of Christians the AI would score highly on a future where they get to eternally rest and venerate God, which would get really boring after about five minutes. In a world of Rationalists the AI would score highly on a future where they get to live on a volcano island with catgirls, which would also get really boring after about five minutes.

There are potentially lots of futures like this (that might work for a wider range of humans), and because the metric (inferred approval after it's explained) is different from the goal (whether the future is good) and there's optimisation pressure increasing with the number of futures considered, I would expect it to be Goodharted.

Some possible questions this raises:

  • On futures: I can't store the entire future in my head, so the AI would have to only describe some features. Which features? How to avoid the selection of features determining the outcome?
  • On people: What if the future involves creating new people, who most people currently would want to live in that future? What about animals? What about babies?
Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-08-26T19:15:41.503Z · score: 2 (2 votes) · LW · GW

Here are three statements I believe with a probability of about 1/9:

  • The two 6-sided dice on my desk, when rolled, will add up to 5.
  • An AI system will kill at least 10% of humanity before the year 2100.
  • Starvation was a big concern in ancient Rome's prime (claim borrowed from Elizabeth's Epistemic Spot Check post).

Except I have some feeling that the "true probability" of the 6-sided die question is pretty much bang on exactly 1/9, but that the "true probability" of the Rome and AI xrisk questions could be quite far from 1/9 and to say the probability is precisely 1/9 seems... overconfident?

From a straightforward Bayesian point of view, there is no true probability. It's just my subjective degree of belief! I'd be willing to make a bet at 8/1 odds on any of these, but not at worse odds, and that's all there really is to say on the matter. It's the number I multiply by the utilities of the outcomes to make decisions.

One thing you could do is imagine a set of hypotheses that I have that involve randomness, and then I have a probability distribution over which of these hypotheses is the true one, and by mapping each hypothesis to the probability it assigns to the outcome my probability distribution over hypotheses becomes a probability distribution over probabilities. This is sharply around 1/9 for the dice rolls, and widely around 1/9 for AI xrisk, as expected, so I can report 50% confidence intervals just fine. Except sensible hypotheses about historical facts probably wouldn't be random, because either starvation was important or it wasn't, that's just a true thing that happens to exist in my past, maybe.

I like jacobjacob's interpretation of a probability distribution over probabilities as an estimate of what your subjective degree of belief would be if you thought about the problem for longer (e.g. 10 hours). The specific time horizon seems a bit artificial (extreme case: I'm going to chat with an expert historian in 10 hours and 1 minute) but it does work and gives me the kind of results that makes sense. The advantage of this is that you can quite straightforwardly test your calibration (there really is a ground truth) - write down your 50% confidence interval, then actually do the 10 hours of research, and see how often the degree of belief you end up with lies inside the interval.

Comment by tetraspace-grouping on Epistemic Spot Check: The Fate of Rome (Kyle Harper) · 2019-08-24T23:11:37.520Z · score: 4 (4 votes) · LW · GW

What do the probability distributions listed below the claims mean specifically?

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-08-24T18:08:33.448Z · score: 4 (2 votes) · LW · GW

Imagine two prediction markets, both with shares that give you $1 if they pay out and $0 otherwise.

One is predicting some event in the real world (and pays out if this event occurs within some timeframe) and has shares currently priced at $X.

The other is predicting the behaviour of the first prediction market. Specifically, it pays out if the price of the first prediction market exceeds an upper threshhold $T before it goes below a lower threshhold $R.

Is there anything that can be said in general about the price of the second prediction market? For example, it feels intuitively like if T >> X, but R is only a little bit smaller than X, then assigning a high price to shares of the second prediction market violates conservation of evidence - is this true, and can it be quantified?

Comment by tetraspace-grouping on Time Travel, AI and Transparent Newcomb · 2019-08-22T22:51:21.588Z · score: 3 (2 votes) · LW · GW

We would also expect destroying time machines to be a convergent instrumental goal in this universe, since any agent that does this would be more likely to have been created. So by default powerful enough optimization processes would try to prevent time travel.

Comment by tetraspace-grouping on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-08-07T15:24:56.193Z · score: 5 (4 votes) · LW · GW

The counterfactual oracle can answer questions for which you can evaluate answers automatically (and might be safe because it doesn't care about being right in the case where you read the prediction so it won't manipulate you), and the low-bandwith oracle can answer multiple-choice questions (and might be safe because none of the multiple-choice options are unsafe).

My first thought for this is to ask the counterfactual oracle for an essay on the importance of coffee, and in the case where you don't see its answer, you get an expert to write the best essay on coffee possible, and score the oracle by the similarity between what it writes and what the expert writes. Though this only gives you human levels of performance.

Comment by tetraspace-grouping on Open & Welcome Thread - August 2019 · 2019-08-04T23:04:35.201Z · score: 15 (9 votes) · LW · GW

I might as well post a monthly update on my doing things that might be useful for me doing AI safety.

I decided to just continue with what I was doing last year before I got distracted, and learn analysis, from Tao's Analysis I, on the grounds that it's maths which is important to know and that I will climb the skill tree analysis -> topology -> these fixed point exercises. Have done chapters 5, 6 and 7.

My question on what it would be most useful for me to be doing remains if anyone has any input.

Comment by tetraspace-grouping on Occam's Razor: In need of sharpening? · 2019-08-04T22:28:21.753Z · score: 11 (3 votes) · LW · GW

The formalisation used in the Sequences (and algorithmic information theory) is the complexity of a hypothesis is the shortest computer program that can specify that hypothesis.

An illustrative example is that, when explaining lightning, Maxwell's equations are simpler in this sense than the hypothesis that Thor is angry because the shortest computer program that implements Maxwell's equations is much simpler than an emulation of a humanlike brain and its associated emotions.

In the case of many-worlds vs. Copenhagen interpretation, a computer program that implemented either of them would start with the same algorithm (Schrodinger's equation), but (the claim is) that the computer program for Copenhagen would have to have an extra section that specified how collapse upon observation worked that many-worlds wouldn't need.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-08-02T16:50:01.941Z · score: 5 (3 votes) · LW · GW

In Against Against Billionaire Philanthropy, Scott says

The same is true of Google search. I examined the top ten search results for each donation, with broadly similar results: mostly negative for Zuckerberg and Bezos, mostly positive for Gates.

With Gates' philanthropy being about malaria, Zuckerberg's being about Newark schools, and Bezos' being about preschools.

Also, as far as I can tell, Moskovitz' philanthropy is generally considered positively, though of course I would be in a bubble with respect to this. Also also, though I say this without really checking, it seems that people are pretty much all against the Sacklers' donations to art galleries and museums.

Squinting at these data points, I can kind of see a trend: people favour philanthropy that's buying utilons, and are opposed to philanthropy that's buying status. They like billionaires funding global development more than they like billionaires funding local causes, and they like them funding art galleries for the rich least of all.

Which is basically what you'd expect if people were well-calibrated and correctly criticising those who need to be taken down a peg.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-08-02T15:01:08.524Z · score: 2 (2 votes) · LW · GW

It was inspired by yours - when I read your post I remembered that there was this thing about Solomonoff induction that I was still confused about - though I wasn't directly trying to answer your question so I made it its own thread.

Comment by tetraspace-grouping on Tetraspace Grouping's Shortform · 2019-08-02T01:37:15.009Z · score: 5 (3 votes) · LW · GW

The simplicity prior is that you should assign a prior probability 2^-L to the description of length L. This sort of makes intuitive sense, since it's what you'd get if you generated the description through a series of coinflips...

... except there are 2^L descriptions of length L, so the total prior probability you're assigning is sum(2^L * 2^-L) = sum(1) = unnormalisable.

You can kind of recover this by noticing that not all bitstrings correspond to an actual description, and for some encodings their density is low enough that it can be normalised (I think the threshold is that less than 1/L descriptions of length L are "valid")...

...but if that's the case, you're being fairly information inefficient because you could compress descriptions further, and why are you judging simplicity using such a bad encoding, and why 2^-L in that case if it doesn't really correspond to complexity properly any more? And other questions in this cluster.

I am confused (and maybe too hung up on something idiosyncratic to an intuitive description I heard).

Comment by tetraspace-grouping on AI Safety Debate and Its Applications · 2019-07-25T01:15:06.613Z · score: 1 (1 votes) · LW · GW

In the case of MNIST, how good is the judge itself - for example, if you were to pick the six pixels optimally to give it the most information, how well would it perform?

Comment by tetraspace-grouping on Open Thread July 2019 · 2019-07-16T22:33:48.141Z · score: 13 (5 votes) · LW · GW

I'm off from university (3rd year physics undergrad) for the summer and hence have a lot of free time, and I want to use this to make as much progress as possible towards the goal of getting a job in AI safety technical research. I have found that I don't really know how to do this.

Some things that I can do:

  • work through undergrad-level maths and CS textbooks
  • basic programming (since I do physics, this is at the level required to implement simple numerical methods in MATLAB)
  • the stuff in Andrew Ng's machine learning Coursera course

Thus far I've worked through the first half of Hutton's Programming in Haskell on the grounds that functional programming maybe teaches a style of thought that's useful and opens doors to more theoretical CS stuff.

I'm optimising for something slightly different that purely becoming good at AI safety, in that at the end I'd like to have some legible things to point to or list on a CV or something (or become better-placed to later acquire such legible things).

I'd be interested to hear from people who know more about what would be helpful for this.

Comment by tetraspace-grouping on Open Thread July 2019 · 2019-07-13T21:25:21.690Z · score: 22 (9 votes) · LW · GW

There's no official, endorsed CFAR handbook that's publicly available for download. The CFAR handbook from summer 2016, which I found on libgen, warns

While you may be tempted to read ahead, be forewarned - we've often found that participants have a harder time grasping a given technique if they've already anchored themselves on an incomplete understanding. Many of the explanations here are intentionally approximate or incomplete, because we believe this content is best transmitted in person. It helps to think of this handbook as a companion to the workshop, rather than as a standalone resource.

which I think is still their view on the matter.

I have heard that they would be more comfortable with people learning rationality techniques in-person from a friend, so if you know any CFAR alumni you could ask them (they'd probably also have a better answer to your question).

Comment by tetraspace-grouping on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T16:58:40.734Z · score: 5 (4 votes) · LW · GW

Submission. Counterfactual oracle. Give the oracle the set of questions on Metaculus that have a resolve date before some future date T, and receive output in the form of ordered pairs of question IDs and predictions. The score of the Oracle in the case where we don't see its answers is the number of Metaculus points that it would have earned by T if it had made a prediction on those questions at the time when we asked it.

Comment by tetraspace-grouping on Recommendation Features on LessWrong · 2019-06-15T13:22:17.340Z · score: 8 (4 votes) · LW · GW

Is there any way to mark a post as unread? It's recommending me a lot of sequences that it believes I'm halfway through when in fact I've just briefly checked a couple of posts in it, and it would be nice if I could start it again from the beginning.

Comment by tetraspace-grouping on What should rationalists think about the recent claims that air force pilots observed UFOs? · 2019-05-28T18:12:46.749Z · score: 1 (1 votes) · LW · GW

Cutting-edge modern military AI seems to all be recently developed; the first flight of the F-22 Raptor was in 1997, while the first deployment of Sea Hunter was in 2016. I also think there are strong incentives for civilian organisations to develop AI that aren't present for fighter jets.

Comment by tetraspace-grouping on Open Thread May 2019 · 2019-05-11T18:11:40.450Z · score: 5 (3 votes) · LW · GW

While "rationality" claims to be defined as "stuff that helps you win", and while on paper if it turned out that the Sequences didn't help you arrive at correct conclusions we'd stop calling that "rationality" and call something else "rationality", in practise the word "rationality" points at "the stuff in the Sequences" rather than the "stuff that helps you win", and that people with stuff that helps you win that isn't the type of thing that you'd find in the Sequences have to call it something else to be unambiguous. Such is language.

Comment by tetraspace-grouping on Why the AI Alignment Problem Might be Unsolvable? · 2019-03-28T15:36:07.923Z · score: 1 (1 votes) · LW · GW
You cannot program a general intelligence with a fundamental drive to ‘not intervene in human affairs except when things are about to go drastically wrong otherwise, where drastically wrong is defined as either rape, torture, involuntary death, extreme debility, poverty or existential threats’ because that is not an optimization function.

In the extreme limit, you could create a horribly gerrymandered utility function where you assign 0 utility to universes where those bad things are happening, 1 utility to universes where they aren't, and some reduced impact thing which means that it usually prefers to do nothing.

Comment by tetraspace-grouping on Fixed Point Exercises · 2019-01-03T02:33:11.557Z · score: 1 (1 votes) · LW · GW

Do you have any recommended reading for learning enough math to do these exercises? I'm sort of using these as a textbook-list-by-proxy (e.g. google "Intermediate value theorem", check which area of math it's from, oh hey it's Analysis, get an introductory textbook in Analysis, repeat), though I also have little knowledge of the field and don't want to wander down suboptimal paths.