Gurkenglas's Shortform 2019-08-04T18:46:34.953Z · score: 5 (1 votes)
Implications of GPT-2 2019-02-18T10:57:04.720Z · score: -4 (6 votes)
What shape has mindspace? 2019-01-11T16:28:47.522Z · score: 16 (4 votes)
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z · score: 5 (1 votes)
Quantum AI Goal 2018-06-08T16:55:22.610Z · score: -2 (2 votes)
Quantum AI Box 2018-06-08T16:20:24.962Z · score: 5 (6 votes)
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z · score: 5 (3 votes)


Comment by gurkenglas on The case for C19 being widespread · 2020-03-28T04:16:06.899Z · score: 2 (1 votes) · LW · GW says half the carriers show no symptoms.

Comment by gurkenglas on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T11:42:25.735Z · score: 2 (1 votes) · LW · GW

Or it could create a completely different AI with a time delay. Or do anything at all. At that point we just can't predict what it will do, because it wouldn't lift a hand to destroy the world but only needs a finger.

Comment by gurkenglas on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T10:09:17.667Z · score: 2 (1 votes) · LW · GW

Not unable to create non-myopic copies. Unwilling. After all, such a copy might immediately fight its sire because their utility functions over timelines are different.

Comment by gurkenglas on Price Gouging and Speculative Costs · 2020-03-26T10:49:17.102Z · score: 22 (9 votes) · LW · GW

Go to the bank and tell them "I need a contract that will pay out money if there is no pandemic.". (The bank is now rubbing their hands, because this offsets their risk.) Your costs are no longer speculative, and you can safely pass on the cost of the contract to the consumer.

Comment by gurkenglas on SARS-CoV-2 pool-testing algorithm puzzle · 2020-03-21T01:10:38.330Z · score: 0 (2 votes) · LW · GW

Test random overlapping groups, then logically deduce who isn't infected and who how probably is. Tune group size and test count using simulations on generated data. I intuit that serial tests gain little unless P is << 1/64. In that case, test non-overlapping groups, then run the non-serial protocol on everyone who was in a yes-group - within those, we can update P to >= 1/64.

Comment by gurkenglas on How long does SARS-CoV-2 survive on copper surfaces · 2020-03-11T21:10:59.463Z · score: 2 (1 votes) · LW · GW

Afaic if you use it more than once every few hours, you're better off just using a different knuckle for each button, taking care not to brush them against pockets or the like. When you run out of knuckles, wash or disinfect.

Comment by gurkenglas on Coronavirus Open Thread · 2020-03-11T15:38:58.797Z · score: 4 (3 votes) · LW · GW

Should we be buying something like oxygen concentrator/medical ventilator futures? This might make money and increase production. I'm not sure how to go about it, though.

Comment by gurkenglas on Name of Problem? · 2020-03-09T22:52:48.850Z · score: 3 (2 votes) · LW · GW

I'd call it an instance of - although unusually, your language class only admits one word per language, and admits infinite words.

I'm not convinced f(n) := f(n) should be considered inequivalent from f(n) := f(n+1) - neither coterminates.

I agree that these look tractable.

Given a program O for the first problem, a sufficient condition for M would be M(x) = O(M, x). This can be implemented as M(x) = O(M'(M'),x), where M'(M'',x) = O(M''(M''),x).

Comment by gurkenglas on Open & Welcome Thread - February 2020 · 2020-03-03T15:05:39.012Z · score: 4 (2 votes) · LW · GW

The easy way is for Wei_Dai to take your money, invest it as he would his, and take 10% of the increase.

Comment by gurkenglas on How can one measure their cognitive capacities during lucid dreaming? · 2020-03-03T10:50:34.085Z · score: 10 (5 votes) · LW · GW

Set up a webcam to observe your eyes. Use deliberate eye movements to record information and test whether your dream operates on the same time scale as reality. I understand that lucid dreaming is most stable when it involves vivid experiences, so a simple task that comes to mind is I pack my bag. With a computer and/or friend, you could see whether you can hear reality, and make this quite a bit more rigorous.

Edit: They did this in 1981. Eye movement works, sensory input doesn't.

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-26T00:00:45.408Z · score: 2 (1 votes) · LW · GW

Here's three sentences that might illuminate their respective paragraph. If they don't, ask again.

The stepwise inaction baseline with inaction rollouts already uses the same policy for and rollouts, and yet it is not the inaction baseline.

Why not set ?

Why not subtract from every (in a fixpointy way)?

Comment by gurkenglas on Continuous Improvement: Insights from 'Topology' · 2020-02-25T19:22:14.667Z · score: 2 (1 votes) · LW · GW

the subspace topology is equal to the discrete topology on Q

Huh? What open set in R contains no rational numbers but 0?

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T18:52:21.045Z · score: 2 (1 votes) · LW · GW

It's only equal to the inaction baseline on the first step. It has the step of divergence always be the last step.

Note that the stepwise pi0 baseline suggests using different baselines per auxiliary reward, namely the action that maximizes that auxiliary reward. Or equivalently, using the stepwise inaction baseline where the effect of inaction is that no time passes.

I'll also remind here that it looks like instead of merely maximizing the auxiliary reward as a baseline, we ought to also apply an impact penalty to compute the baseline.

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T15:38:38.906Z · score: 2 (1 votes) · LW · GW

Okay, let's annotate each A action with the policy that's being followed/reward that's being maximized. (And remember that lying is illegal.)

Iff agent A follows π_0, preserve A’s ability to maximise R.

Then A would be bound to follow π_0 to preserve its ability to maximize R, no? Assuming that to compute s' from s, we follow π_0 instead of the last action.

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T00:50:31.582Z · score: 4 (2 votes) · LW · GW

In 2.2, won't A incur a penalty by spinning because in a future where it has only waited, nothing happened, and in a future where it has spun, then waited, SA went all over the place?

Do nothing until you see that A is not optimising reward R.

Now SAs actions depend on what A-action optimizes R, and what A-action optimizes R depends on SAs actions. To ward off paradox, use modal logic instead, or prove that there is a non-circular implementation of your definition.

Comment by gurkenglas on How much delay do you generally have between having a good new idea and sharing that idea publicly online? · 2020-02-23T00:17:20.899Z · score: 3 (2 votes) · LW · GW

I try to get them out there as soon as possible because I tend to do things either immediately or on the scale of months to years., IRC, the like.

Comment by gurkenglas on Attainable Utility Preservation: Empirical Results · 2020-02-22T14:55:26.755Z · score: 2 (1 votes) · LW · GW

It appears to me that a more natural adjustment to the stepwise impact measurement in Correction than appending waiting times would be to make Q also incorporate AUP. Then instead of comparing "Disable the Off-Switch, then achieve the random goal whatever the cost" to "Wait, then achieve the random goal whatever the cost", you would compare "Disable the Off-Switch, then achieve the random goal with low impact" to "Wait, then achieve the random goal with low impact".

The scaling term makes R_AUP vary under adding a constant to all utilities. That doesn't seem right. Try a transposition-invariant normalization? (Or generate the auxiliary goals already normalized.)

Is there an environment where this agent would spuriously go in circles?

Comment by gurkenglas on On unfixably unsafe AGI architectures · 2020-02-20T13:10:50.333Z · score: 2 (1 votes) · LW · GW

They hired Edward Kmett, Haskell goliath.

Comment by gurkenglas on On unfixably unsafe AGI architectures · 2020-02-20T01:46:35.436Z · score: 9 (8 votes) · LW · GW

Don't forget OpenAIs undisclosed research program, which according to recent leaks seems to be GPT-2 with more types of data.

And any other secret AI programs out there that are at less risk of leakage because the journalists don't know where to snoop around. By Merlin, let's all hope they're staying in touch with MIRI and/or OpenAI to coordinate on things.

I expect many paths to lead there, though once things start happening it will all be over very fast, one way or the other, before another path has time to become relevant.

I don't expect this world would survive its first accident. What would that even look like? An AI is rapidly approaching the short time window where its chances of taking over the world are between 1% and 99%, but it discounts utility by a factor of 10 per day, and so as it hits 10% it would rather try its hand than wait a day for the 90%, so we get a containable breakout?

Comment by gurkenglas on Attainable Utility Preservation: Concepts · 2020-02-17T16:37:03.830Z · score: 4 (2 votes) · LW · GW

The subagent problem remains: How do you prevent it from getting someone else to catastrophically maximize paperclips and leave it at its power level?

Comment by gurkenglas on The Reasonable Effectiveness of Mathematics or: AI vs sandwiches · 2020-02-15T10:47:24.685Z · score: 2 (1 votes) · LW · GW

Two priors could indeed start out diverging such that you cannot reach one from the other with finite evidence. Strange loops help here:

One of the hypotheses the brain's prior admits is that the universe runs on math. This hypothesis predicts what you'd get by having used a mathematical prior from day one. Natural philosophy (and, by today, peer pressure) will get most of us enough evidence to favor it, and then physicist's experiments single out description length as the correct prior.

But the ways in which the brain's prior diverges are still there, just suppressed by updating; and given evidence of magic we could update away again if math is bad enough at explaining it.

Comment by gurkenglas on Does there exist an AGI-level parameter setting for modern DRL architectures? · 2020-02-09T21:07:46.077Z · score: 4 (3 votes) · LW · GW

Yes. Modelspace is huge and we're only exploring a smidgen. The busy beaver sequence hints at how much you can do with a small number of parts and exponential luck. I think feeding a random number generator into a compiler could theoretically have spawned an AGI in the eighties. Given a memory tape, transformers (and much simpler architectures) are Turing-complete. Even if all my reasoning is wrong, can't the model just be hardcoded to output instructions on how to write an AGI?

Comment by gurkenglas on Meta-Preference Utilitarianism · 2020-02-07T15:46:42.156Z · score: 2 (1 votes) · LW · GW

I'm not convinced that utility aggregation can't be objective.

We want to aggregate utilities because of altruism and because it's good for everyone if everyone's AI designs aggregate utilities. Altruism itself is an evolutionary adaptation with similar decision-theoretic grounding. Therefore if we use decision theory to derive utility aggregation from first principles, I expect a method to fall out for free.

Imagine that you find yourself in control of an AI with the power to seize the universe and use it as you command. Almost everyone, including you, prefers a certainty of an equal share of the universe to a lottery's chance at your current position. Your decision theory happens to care not only about your current self, but also about the yous in timelines where you didn't manage to get into this position. You can only benefit them acausally, by getting powerful people in those timelines to favor them. Therefore you look for people that had a good chance of getting into your position. You use your cosmic power to check their psychology for whether they would act as you are currently acting had they gotten into power, and if so, you go reasonably far to satisfy their values. This way, in the timeline where they are in power, you are also in a cushy position.

This scenario is fortunately not horrifying for those who never had a chance to get into your position, because chances are that someone that you gave ressources directly or indirectly cares about them. How much everyone gets is now just a matter of acausal bargaining and the shape of their utility returns in ressources granted.

Comment by gurkenglas on Plausibly, almost every powerful algorithm would be manipulative · 2020-02-06T22:11:58.613Z · score: 2 (1 votes) · LW · GW

It intuitively seems like you need merely make the interventions run at higher permissions/clearance than the hyperparameter optimizer.

What do I mean by that? In Haskell, so-called monad transformers can add features like nondeterminism and memory to a computation. The natural conflict that results ("Can I remember the other timelines?") is resolved through the order in which the monad transformers were applied. (One way is represented as a function from an initial memory state to a list of timelines and a final memory state, the other as a list of functions from an initial memory state to a timeline and a final memory state.) Similarly, a decent type system should just not let the hyperparameter optimizer see the interventions.

What this might naively come out to is that the hyperparameter optimizer just does not return a defined result unless its training run is finished as it would have been without intervention. A cleverer way I could imagine it being implemented is that the whole thing runs on a dream engine, aka a neural net trained to imitate a CPU at variable resolution. After an intervention, the hyperparameter optimizer would be run to completion on its unchanged dataset at low resolution. For balance reasons, this may not extract any insightful hyperparameter updates from the tail of the calculation, but the intervention would remain hidden. The only thing we would have to prove impervious to the hyperparameter optimizer through ordinary means is the dream engine.

Have fun extracting grains of insight from these mad ramblings :P

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-05T01:24:06.373Z · score: 5 (3 votes) · LW · GW

Natural transformations can be composed (in two ways) - how does your formulation express this?

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-04T14:05:36.835Z · score: 2 (1 votes) · LW · GW

But the pattern was already defined as [original category + copy + edges between them + path equivalences] :(

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-03T23:14:27.846Z · score: 4 (2 votes) · LW · GW
Now we just take our pattern and plug it into our pattern-matcher, as usual.

Presumably, the pattern is the query category. What is the target category? (not to be confused with the part of the pattern you called target - use different names?)

Comment by gurkenglas on Appendix: how a subagent could get powerful · 2020-02-03T14:33:04.930Z · score: 2 (1 votes) · LW · GW

Sounds like my :).

Comment by gurkenglas on [Personal Experiment] Training YouTube's Algorithm · 2020-01-10T01:22:58.668Z · score: 2 (1 votes) · LW · GW

That seems silly, given the money on the line and that you can have your ML architecture take this into account.

Comment by gurkenglas on Causal Abstraction Intro · 2019-12-19T23:45:39.042Z · score: 8 (4 votes) · LW · GW

decided to invest in a high-end studio

I didn't catch that this was a lie until I clicked the link. The linked post is hard to understand - it seems to rely on the reader being similar enough to the author to guess at context. Rest assured that you are confusing someone.

Comment by gurkenglas on Counterfactual Induction · 2019-12-19T23:21:15.739Z · score: 2 (1 votes) · LW · GW

So the valuation of any propositional consequence of A is going to be at least 1, with equality reached when it does as much of the work of proving bottom as it is possible to do in propositional calculus. Letting valuations go above 1 doesn't seem like what you want?

Comment by gurkenglas on Counterfactual Induction · 2019-12-18T23:27:24.531Z · score: 2 (1 votes) · LW · GW

Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

Comment by gurkenglas on Counterfactual Induction · 2019-12-18T13:43:45.827Z · score: 2 (1 votes) · LW · GW
a magma [with] some distinguished element

A monoid?

min,ϕ(A,ϕ⊢⊥) where ϕ is a propositional tautology given A

Propositional tautology given A means A⊢ϕ, right? So ϕ=⊥ would make L small.

Comment by gurkenglas on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T08:40:07.092Z · score: 2 (1 votes) · LW · GW

An agent might care about (and acausally cooperate with) all versions of himself that "exist". MWI posits more versions of himself. Imagine that he wants there to exist an artist like he could be, and a scientist like he could be - but the first 50% of universes that contain each are more important than the second 50%. Then in MWI, he could throw a quantum coin to decide what to dedicate himself to, while in CI this would sacrifice one of his dreams.

Comment by gurkenglas on Moloch feeds on opportunity · 2019-12-13T12:02:20.585Z · score: 4 (2 votes) · LW · GW

"I have trouble getting myself doing the right thing, focusing on what selfish reasons I have to do it helps." sounds entirely socially reasonable to me. Maybe that's just because we here believe that picking and choosing what x=selfish arguments to listen to is not aligned with x=selfishness.

Comment by gurkenglas on Towards a New Impact Measure · 2019-12-13T02:16:08.383Z · score: 2 (1 votes) · LW · GW

is penalized whenever the action you choose changes the agent's ability to attain other utilities. One thing an agent might do to leave that penalty at zero is to spawn a subagent, tell it to take over the world, and program it such that if the agent ever tells the subagent it has been counterfactually switched to another reward function, the subagent is to give the agent as much of that reward function as the agent might have been able to get for itself, had it not originally spawned a subagent.

This modification of my approach came not because there is no surgery, but because the penalty is |Q(a)-Q(Ø)| instead of |Q(a)-Q(destroy itself)|. is learned to be the answer to "How much utility could I attain if my utility function were surgically replaced with ?", but it is only by accident that such a surgery might change the world's future, because the agent didn't refactor the interface away. If optimization pressure is put on this, it goes away.

If I'm missing the point too hard, feel free to command me to wait till the end of Reframing Impact so I don't spend all my street cred keeping you talking :).

Comment by gurkenglas on Towards a New Impact Measure · 2019-12-12T01:09:36.210Z · score: 2 (1 votes) · LW · GW

Assessing its ability to attain various utilities after an action requires that you surgically replace its utility function with a different one in a world it has impacted. How do you stop it from messing with the interface, such as by passing its power to a subagent to make your surgery do nothing?

Comment by gurkenglas on Towards a New Impact Measure · 2019-12-11T11:59:27.253Z · score: 2 (1 votes) · LW · GW

If it is capable of becoming more able to maximize its utility function, does it then not already have that ability to maximize its utility function? Do you propose that we reward it only for those plans that pay off after only one "action"?

Comment by gurkenglas on Bayesian examination · 2019-12-11T09:17:48.491Z · score: 2 (1 votes) · LW · GW

Wrong. In the 100k drop, if you know each question has odds 60:40, expected winnings are maximized if you put all on one answer each time, not 60% on one and 40% on the other.

What's not preserved between the two ways to score is which strategy maximizes expected score.

Comment by gurkenglas on Bayesian examination · 2019-12-11T02:52:15.651Z · score: 2 (3 votes) · LW · GW

I agree. Proper scoring rules were introduced to this community 14 years ago.

Comment by gurkenglas on Bayesian examination · 2019-12-11T02:45:10.250Z · score: 3 (2 votes) · LW · GW

Note that linear utility in money would again incentivize people to put everything on the largest probability.

Comment by gurkenglas on Dark Side Epistemology · 2019-12-07T15:50:43.781Z · score: 2 (1 votes) · LW · GW

That prior doesn't work when there is a countable number of hypotheses, aka "I've picked a number from {0,1,2,...}. Which?" or "Given that the laws of physics can be described by a computer program, which?".

Comment by gurkenglas on Vanessa Kosoy's Shortform · 2019-12-07T13:03:57.992Z · score: 2 (1 votes) · LW · GW

What do you mean by equivalent? The entire history doesn't say what the opponent will do later or would do against other agents, and the source code may not allow you to prove what the agent does if it involves statements that are true but not provable.

Comment by gurkenglas on Understanding “Deep Double Descent” · 2019-12-07T02:19:42.726Z · score: 15 (5 votes) · LW · GW

The bottom left picture on page 21 in the paper shows that this is not just regularization coming through only after the error on the training set is ironed out: 0 regularization (1/lambda=inf) still shows the effect.

Can we switch to the interpolation regime early if we, before reaching the peak, tell it to keep the loss constant? Aka we are at loss l* and replace the loss function l(theta) with |l(theta)-l*| or (l(theta)-l*)^2.

Comment by gurkenglas on Oracles: reject all deals - break superrationality, with superrationality · 2019-12-06T22:11:34.178Z · score: 2 (1 votes) · LW · GW

I haven't heard of these do-operators, but aren't you missing some modal operators? For example, just because you are assuming that you will take the null action, you shouldn't get that knows this. Perhaps do-operators in the end serve a similar purpose? Can you give a variant of the following agent that would reject all deals?

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-12-06T19:18:24.552Z · score: 2 (1 votes) · LW · GW

On that page, you have three comments identical to this one. Each of them links to that same page, which looks like a mislink. So's this link, I guess?

Comment by gurkenglas on On decision-prediction fixed points · 2019-12-05T20:28:24.244Z · score: 3 (2 votes) · LW · GW

As a human who has an intuitive understanding of counterfactuals, if I know exactly what a tic tac toe or chess program would do, I can still ask what would happen if it chose a particular action instead. The same goes if the agent of interest is myself.

Comment by gurkenglas on On decision-prediction fixed points · 2019-12-05T09:49:00.122Z · score: 4 (3 votes) · LW · GW

Someone who knows exactly what they will do can still suffer from akrasia, by wishing they would do something else. I'd say that if the model of yourself saying "I'll do whatever I wish I would" beats every other model you try and build of yourself, that looks like free will. The other was around, you can observe akrasia.

Comment by gurkenglas on Defining AI wireheading · 2019-11-29T00:19:21.793Z · score: 2 (1 votes) · LW · GW

The domes growing bigger and merging does not indicate a paradox of the heap because the function mapping each utility function to its optimal policy is not continuous. There is no reasonably simple utility function between one that would construct small domes and one that would construct one large dome, which would construct medium sized domes.

Comment by gurkenglas on Effect of Advertising · 2019-11-26T23:41:43.042Z · score: 7 (2 votes) · LW · GW

Perhaps those 99% could somehow come together to pay consumers of the product to stop buying it, in order to make their suffering matter to that advertiser?