Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100$ bounty) 2020-06-18T14:47:51.661Z
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z
Implications of GPT-2 2019-02-18T10:57:04.720Z
What shape has mindspace? 2019-01-11T16:28:47.522Z
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z
Quantum AI Goal 2018-06-08T16:55:22.610Z
Quantum AI Box 2018-06-08T16:20:24.962Z
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z


Comment by gurkenglas on Lessons I've Learned from Autodidacting · 2021-01-24T11:07:01.927Z · LW · GW

(Why not simply define the integral of f as the LM of {(x,r)|x in Omega, r <= f(x)}?)

Comment by gurkenglas on Praying to God · 2021-01-19T14:43:08.768Z · LW · GW

Write clearly. What's the point of the post? How do the parts of the post argue for its point?

Comment by gurkenglas on Pseudorandomness contest: prizes, results, and analysis · 2021-01-15T23:41:45.127Z · LW · GW

Ah, which Round 1 submission was mine? I think I wrote it down somewhere but I don't know where... I suppose technically I could search my harddrive for each of the strings.

Comment by gurkenglas on #2: Neurocryopreservation vs whole-body preservation · 2021-01-13T15:38:21.795Z · LW · GW

If I want to sell my great-grandmother on cryonics, "freezing your brain so in centuries it can be transplanted into a young body" sounds like an easier sell than "freezing your brain so in centuries it can be turned into a robot". Freezing her whole body sounds like an instant, understandable no.

Comment by gurkenglas on Troy Macedon's Shortform · 2021-01-09T15:55:30.340Z · LW · GW

To avert Idiocracy? Just clone Einstein.

Comment by gurkenglas on What do we *really* expect from a well-aligned AI? · 2021-01-07T00:53:09.791Z · LW · GW

I see this not as a question to ask now, but later, on many levels of detail, when the omnipotent singleton is deciding what to do with the world. Of course we will have to figure out the correct way to pose such questions before deployment, but this can be deferred until we can generate research.

Comment by gurkenglas on DALL-E by OpenAI · 2021-01-06T12:09:21.686Z · LW · GW

I also have this and have had it for a long time, starting with Google DeepDream. (Or perhaps that animation where you stare ahead while on the edges of your field of view a series of faces is shown, which then start to subjectively look nightmarish/like caricatures.) It lessens with exposure, and returns, somewhat weaker, with each new type of generated image. It feels like neurons burning out from overactivation, as though I was Dracula being shown a cross.

Comment by gurkenglas on What do we *really* expect from a well-aligned AI? · 2021-01-06T11:17:50.627Z · LW · GW
  1. The universe is finite, and has to be distributed in some manner.
  2. Some people prefer interactions with the people alive today to ones with heavenly replicas of them. You might claim that there is no difference, but I say that in the end it's all atoms, all the meaning is made up anyway, and we know exactly why those people would not approve if we described virtual heavens to them so we shouldn't just do them anyway.
  3. Some people care about what other people do in their virtual heavens. You could deontologically tell them to fuck off, but I'd expect the model of dictator lottery + acausal trade to arrive at another solution.
Comment by gurkenglas on What do we *really* expect from a well-aligned AI? · 2021-01-05T20:46:10.515Z · LW · GW
  1. A simple way of rating the scenarios above is to describe them as you have and ask humans what they think.
  2. In a way... but I expect that what we actually need to solve is just how to make a narrow AI faithfully generate AI papers and AI safety papers that humans would have come up with given time.
  3. The CEV paper has gone into this, but indeed human utility functions will have to be aggregated in some manner, and the manner in which to do this and allocate ressources can't be derived from first principles. Fortunately human utility functions are logarithmic enough and enough people care about enough other people that the basin of acceptable solutions is quite large, especially if we get the possible future AIs to acausally trade with each other.
Comment by gurkenglas on Luna Lovegood and the Chamber of Secrets - Part 11 · 2020-12-27T20:25:28.188Z · LW · GW

We're about to find out what a smart psychopath with Slytherin's lore and no episodic memory or hands does in this situation. Try to wandlessly apparate away and hit the wards, presumably. As far as he sees, Lockhart obviously just Obliviated him with that outstretched wand.

"Huh," Moody said, leaning back in his chair. "Minerva and I will be putting some alarms and enchantments on that ring of yours, son, if you don't mind. Just in case you forget to sustain that Transfiguration one day."

Reinforcements are on the way. I expect that one of the devices in Minerva's office ticks regularly while the ring exists. Time turners are a thing, but Minerva may have received a "NO." because she would have been seen on the Map.

Comment by gurkenglas on Luna Lovegood and the Chamber of Secrets - Part 6 · 2020-12-11T20:57:05.430Z · LW · GW

"Why would I do that?" and "You think like a muggle" sound like she thinks Harry is making an epistemological error.

If she were as tuned in as you say, she should see that Harry asks because he doesn't see how tuned in Luna is.

Comment by gurkenglas on Luna Lovegood and the Chamber of Secrets - Part 6 · 2020-12-11T00:01:02.985Z · LW · GW

"You're not going to ask me how I know these things?" said Harry.

"Why would I do that?" said Luna.

But, chapter 1:

"How do you know?" Luna asked.

Is it only some knowledge that cannot be gained from nothing? Such as that a thing does not exist?

Comment by gurkenglas on Luna Lovegood and the Chamber of Secrets - Part 6 · 2020-12-09T13:22:35.384Z · LW · GW

The spoiler doesn't logically follow from what comes before, right? She merely saw the possibility.

And if she's right, that's why it didn't attack her.

Comment by gurkenglas on Three Gods with a twist · 2020-12-06T15:26:51.582Z · LW · GW

Couldn't you simply ask the usual questions, but each mention of themselves is replaced by "a hypothetical God whose behavior is either to always tell the truth, to always lie, or to always give a random answer, and whose behavior is not identical to either of the other Gods"?

Comment by gurkenglas on 12 Rules for Life · 2020-12-05T04:38:55.517Z · LW · GW

If you simply disagree with all that is said against you, then you cannot lose a debate, which is the archetypal way to learn from a debate. Therefore, your arguments should be made of parts, which can be attacked by a commenter.

Comment by gurkenglas on Covid 12/3: Land of Confusion · 2020-12-04T15:10:23.338Z · LW · GW

Why wouldn't vaccinating half of a married couple be better than half of vaccinating the whole? They can vaccinate whoever interacts with other people more, and then have that person deliberately interact with other people instead of the other - do the groceries, say.

Comment by gurkenglas on Luna Lovegood and the Chamber of Secrets - Part 4 · 2020-12-04T04:38:03.812Z · LW · GW

When does she take out Wanda between the two feeding sessions? In the library, because we don't see it? Perhaps Wanda is visible in the library, because Wanda's magic removes all observations that would have made it far enough, which is preempted by the library's magic. Does the second feeding start in the library or right thereafter? Probably after, since we read about it.

Comment by gurkenglas on Hiding Complexity · 2020-12-01T10:33:19.110Z · LW · GW

Does this mean that humans can only keep a few things in mind in order to make us hide complexity? Under that view the stereotypical forgetful professor isn't brilliant because he has a lot of memory free to think with at any time, but because he has had a lot of practice doing the most with a small memory. These seem experimentally distinguishable.

I conjecture that describing the function of a neural network is the archetypal application of Factored Cognition, because we can cheat by training the neural network to have lots of information bottlenecks along which to decompose the task.

Comment by gurkenglas on Nash Score for Voting Techniques · 2020-11-27T08:29:46.486Z · LW · GW

If 30% of people can block the election, someone's going to have to be command the troops. The least perverse option seems to be the last president. Trump could probably have gotten 30% to block it to stay in that chair. A minority blocking the election seems supposed to simulate (aka give a better alternative to) civil war, which is uncommon because it is costly. So perhaps blocking should be made costly to the populace. Say, tax everyone heavily for each blocked election and donate the money to foreign charities. This also incentivizes foreign trolls to cause blocked elections, which seems fair enough - if the enemy controls your election, it should crash, not put a puppet in office.

STAR is useless if people can assign real-valued scores. That makes me think that if it works, it's for reasons of discrete mathematics, so we should analyze the system from the perspective of discrete mathematics before trusting it.

Instead of multiplying values >= 1 and "ignoring" smaller values, you should make explicit that you feed the voter scores through a function (in this case \x -> max(0, log(x))) before adding them up. \x -> max(0, log(x)) does not seem like the optimal function for any seemly purpose.

Comment by gurkenglas on Convolution as smoothing · 2020-11-26T10:31:12.783Z · LW · GW

The fourier transform as a map between function spaces is continuous, one-to-one and maps gaussians to gaussians, so we can translate "convoluting nice distribution sequences tends towards gaussians" to "multiplying nice function sequences tends towards gaussians". The pointwise logarithm as a map between function spaces is continuous, one-to-one and maps gaussians to parabolas, so we can translate further to "nice function series tend towards parabolae", which sounds more almost always false than usually true.

In the second space, functions are continuous and vanish at infinity.
This doesn't work if the fourier transform of some function is somewhere negative, but then multiplying the function sequence has zeroes.
In the third space, functions are continuous and diverge down at infinity.

Comment by gurkenglas on Convolution as smoothing · 2020-11-26T06:29:55.279Z · LW · GW

and it doesn't much matter if you change the kernel each time

That's counterintuitive. Surely for every  there's an  that'll get you anywhere? If .

Comment by gurkenglas on Changing the AI race payoff matrix · 2020-11-24T10:49:27.869Z · LW · GW

Indeed players might follow a different strategy than they declare. A player can only verify another player's precommitment after pressing the button (or through old-fashioned espionage of their button setup). But I find it reasonable to expect that a player, seeing the shape of the AI race and what is needed to prevent mutual destruction, would actually design their AGI to use a decision theory that would follow through on the precommitment. Humans may not be intuitively compelled by weird decision theories, but they can expect someone to write an AGI that uses them. Although even a human may find giving other players what they deserve more important than not letting the world as we know it continue for another decade.

Compare to Dr. Strangelove's doomsday machine. We expect that a human in the loop would not follow through, but we can't expect that no human would build such a machine.

Comment by gurkenglas on It’s not economically inefficient for a UBI to reduce recipient’s employment · 2020-11-22T19:07:02.988Z · LW · GW

The crazy distortions are the damage. People fear low-income people stopping their work because they fear that goods produced by low-income workers will become more expensive.

Comment by gurkenglas on It’s not economically inefficient for a UBI to reduce recipient’s employment · 2020-11-22T17:32:55.132Z · LW · GW

Your argument proves too much - in medieval times, if more than 20% of people stopped working agriculture to buy food with their UBI, food prices would go up until they resumed, as an indicator of damage done to society from people stopping their work.

Comment by gurkenglas on It’s not economically inefficient for a UBI to reduce recipient’s employment · 2020-11-22T17:29:35.709Z · LW · GW

You can cause more than a dollar of damage to society for every dollar you spend, say by hiring people to drive around throwing eggs at people's houses. Though I guess in total society is still better off by a hundred dollars compared to if you had received them via UBI.

Comment by gurkenglas on Open & Welcome Thread – November 2020 · 2020-11-22T05:57:52.501Z · LW · GW

Perhaps the police officer simply thinks that your average person will easily do dangerous things like shaking someone off their car without thinking much of it, but will not take a knife to another's guts. Therefore, the car incident would not mark the man as unusually dangerous.

There is no mathematically canonical way to distinguish between trade and blackmail, between act and omission, between different ways of assigning blame. The world where nobody jumps on cars is as safe as the one where nobody throws people off cars. We decide between them by human intuition, which differs by memetic background.

Comment by gurkenglas on Some AI research areas and their relevance to existential safety · 2020-11-20T16:52:13.702Z · LW · GW

Yeah, I basically hope that enough people care about enough other people that some of the wealth ends up trickling down to everyone. Win probability is basically interchangeable with other people caring about you and your ressources across the multiverse. Good thing the cosmos is so large.

I don't think making acausal trade work is that hard. All that is required is:

  1. That the winner cares about the counterfactual versions of himself that didn't win, or equivalently, is unsure whether they're being simulated by another winner. (huh, one could actually impact this through memetic work today, though messing with people's preferences like that doesn't sound friendly)
  2. That they think to simulate alternate winners before they expand too far to be simulated.
Comment by gurkenglas on The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables · 2020-11-20T03:58:12.768Z · LW · GW

I'm not convinced that we can do nothing if the human wants ghosts to be happy. The AI would simply have to do what would make ghosts happy if they were real. In the worst case, the human's (coherent extrapolated) beliefs are your only source of information on how ghosts work. Any proper general solution to the pointers problem will surely handle this case. Apparently, each state of the agent corresponds to some probability distribution over worlds.

Comment by gurkenglas on Some AI research areas and their relevance to existential safety · 2020-11-20T03:01:07.903Z · LW · GW

with the exception of people who decided to gamble on being part of the elite in outcome B

Game-theoretically, there's a better way. Assume that after winning the AI race, it is easy to figure out everyone else's win probability, utility function and what they would do if they won. Human utility functions have diminishing returns, so there's opportunity for acausal trade. Human ancestry gives a common notion of fairness, so the bargaining problem is easier than with aliens.

Most of us care some even about those who would take all for themselves, so instead of giving them the choice between none and a lot, we can give them the choice between some and a lot - the smaller their win prob, the smaller the gap can be while still incentivizing cooperation.

Therefore, the AI race game is not all or nothing. The more win probability lands on parties that can bargain properly, the less multiversal utility is burned.

Comment by gurkenglas on Some AI research areas and their relevance to existential safety · 2020-11-20T00:39:41.881Z · LW · GW

(2) is essentially aiming to take over the world in the name of making it safer, which is not generally considered the kind of thing we should be encouraging lots of people to do.

Wait, you want to do it the hard way? Not only win the AI race with enough head start for safety, but stop right at the finish line and have everyone else stop at the finish line? However would you prevent everyone everywhere from going over? If you manage to find a way, that sounds like taking over the world with extra steps.

Comment by gurkenglas on Gurkenglas's Shortform · 2020-11-19T23:27:03.809Z · LW · GW

All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.

Comment by gurkenglas on Inner Alignment in Salt-Starved Rats · 2020-11-19T22:30:48.921Z · LW · GW

This all sounds reasonable. I just saw that you were arguing for more being learned at runtime (as some sort of Steven Reknip), and I thought that surely not all the salt machinery can be learnt, and I wanted to see which of those expectations would win.

Comment by gurkenglas on Inner Alignment in Salt-Starved Rats · 2020-11-19T17:38:59.307Z · LW · GW

Do you posit that it learns over the course of its life that salt taste cures salt definiency, or do you allow this information to be encoded in the genome?

Comment by gurkenglas on Anatomy of a Gear · 2020-11-17T11:37:02.468Z · LW · GW

Compare to, which is about defining how well a neural net can be described in terms of gears.

Comment by gurkenglas on Announcing the Forecasting Innovation Prize · 2020-11-16T05:56:29.530Z · LW · GW

I suggest that you allow submission of posts written before this announcement. This incentivizes behavior that people expect might later be subject to prizes.

Comment by gurkenglas on On Arguments for God · 2020-11-15T06:37:20.472Z · LW · GW

For instance, atheists often assume that God must be highly complex (which is essentially the assumption that God must be natural)

What do you mean by natural? In order for God to be simple, his emotions must be denied or explained. For example, there could be some physics exploit by which an ancient human could become omnipotent. Another simple specification of God could be through an equation describing some goal-directed agent. Emotions might fall out of agency via game theory, except that game theory doesn't really apply when you're omnipotent. And what would be the goal? Our world doesn't look like a simple goal is being maximized by a God.

Comment by gurkenglas on Multiple Worlds, One Universal Wave Function · 2020-11-06T01:22:57.210Z · LW · GW

I reply with the same point about orthogonality: Why should (2,1) split into one branch of (2,0) and one branch of (0,1), not into one branch of (1,0) and one branch of (1,1)? Only the former leads to probability equaling squared amplitude magnitude.

(I'm guessing that classical statistical mechanics is invariant under how we choose such branches?)

Comment by gurkenglas on Multiple Worlds, One Universal Wave Function · 2020-11-06T01:09:12.460Z · LW · GW

If a1 is 2 and phi1 has eigenvalue 3, and a2 is 4 and phi2 has eigenvalue 5, then 2*phi1+3*phi2 is mapped to 6*phi1+20*phi2 and therefore not an eigenfunction.

Comment by gurkenglas on Sub-Sums and Sub-Tensors · 2020-11-05T18:52:20.476Z · LW · GW

Can subsums and subtensors be defined as diagrams? Tensors not needing to be subtensors sounds to me like subsums/subtensors should be fronter-row citizens than sums/tensors.

Comment by gurkenglas on Multiple Worlds, One Universal Wave Function · 2020-11-05T10:09:17.038Z · LW · GW

It'd be fine if it were linear in general, but it's not for combinations that aren't orthogonal. Suppose a is drawn from R^2. P(sqrt(2))=P(|(1,1)|)=P(1,1)=P(1,0)+P(0,1)=2*P(|(1,0)|)=2*P(1) which agrees with your analysis, but P(sqrt(5))=P(|(2,1)|)=P(2,1)/=P(1,0)+P(1,1)=3*P(1) doesn't add up.

Comment by gurkenglas on Multiple Worlds, One Universal Wave Function · 2020-11-05T04:31:31.944Z · LW · GW

Accepting that probability is some function of the magnitude of the amplitude, why should it be linear exactly under orthogonal combinations?

Comment by gurkenglas on Confucianism in AI Alignment · 2020-11-04T03:09:34.127Z · LW · GW

As far as I understand, whether minimal circuits are daemon-free is precisely the question whether direct descriptions of the input distribution are simpler than hypotheses of form "Return whatever maximizes property _ of the multiverse".

Comment by gurkenglas on Confucianism in AI Alignment · 2020-11-03T03:39:27.346Z · LW · GW

The hypotheses after the modification are supposed to have knowledge that they're in training, for example because they have enough compute to find themselves in the multiverse. Among hypotheses with equal behavior in training, we select the simpler one. We want this to be the one that disregards that knowledge. If the hypothesis has form "Return whatever maximizes property _ of the multiverse", the simpler one uses that knowledge. It is this form of hypothesis which I suggest to remove by inspection.

Comment by gurkenglas on Confucianism in AI Alignment · 2020-11-03T02:33:34.500Z · LW · GW

Take an outer-aligned system, then add a 0 to each training input and a 1 to each deployment input. Wouldn't this add only malicious hypotheses that can be removed by inspection without any adverse selection effects?

Comment by gurkenglas on What is the right phrase for "theoretical evidence"? · 2020-11-02T20:22:26.404Z · LW · GW

"Armchair evidence".

Comment by gurkenglas on "Inner Alignment Failures" Which Are Actually Outer Alignment Failures · 2020-11-01T02:12:55.690Z · LW · GW

More bluntly: It's an outer alignment failure even to add a 0 to each training input and a 1 to each deployment input, because that equates to replacing "Be aligned." with "Be aligned during training."

Comment by gurkenglas on Top Time Travel Interventions? · 2020-10-27T00:15:30.141Z · LW · GW

I don't trust humanity to make it through the invention of nuclear weapons again, so let's not go back too far. Within the last few decades, you could try a reroll on the alignment problem. Collect a selection of safety papers and try to excise hints at such facts as "throwing enough money at simple known architectures produces AGI". Wait to jump back until waiting longer carries a bigger risk of surprise UFAI than it's worth, or until the local intel agency knocks on your door for your time machine. You could build a reverse box - a Faraday bunker that sends you back if it's breached, leaving only a communication channel for new papers, X-risk alerts and UFAI hackers - some UFAIs may not care enough whether I make it out of their timeline. Balance acquiring researcher's recognition codes against the threat of other people taking the possibility of time travel seriously.

Comment by gurkenglas on When was the term "AI alignment" coined? · 2020-10-21T22:58:59.896Z · LW · GW

I recall Eliezer asking on Facebook for a good word for the field of AI safety research before it was called alignment.

Comment by gurkenglas on jacobjacob's Shortform Feed · 2020-10-15T22:30:34.475Z · LW · GW

then it would be 

Comment by gurkenglas on Msg Len · 2020-10-12T10:40:11.452Z · LW · GW

I approve the haikuesque format.

Do you agree that the "bijection" Intelligence -> Prediction preserves more structure than Prediction -> Compression?