Posts

What are some good public contribution opportunities? (100$ bounty) 2020-06-18T14:47:51.661Z · score: 9 (7 votes)
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z · score: 5 (1 votes)
Implications of GPT-2 2019-02-18T10:57:04.720Z · score: -4 (6 votes)
What shape has mindspace? 2019-01-11T16:28:47.522Z · score: 16 (4 votes)
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z · score: 5 (1 votes)
Quantum AI Goal 2018-06-08T16:55:22.610Z · score: -2 (2 votes)
Quantum AI Box 2018-06-08T16:20:24.962Z · score: 5 (6 votes)
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z · score: 5 (3 votes)

Comments

Comment by gurkenglas on Models, myths, dreams, and Cheshire cat grins · 2020-06-25T21:30:44.990Z · score: 2 (1 votes) · LW · GW

Surely, the adversary convinces it this is a pig by convincing it that it has fur and no wings? I don't have experience in how it works on the inside, but if the adversary can magically intervene on each neuron, changing its output by d by investing d² effort, then the proper strategy is to intervene on many features a little. Then if there are many layers, the penultimate layer containing such high level concepts as fur or wings would be almost as fooled as the output layer, and indeed I would expect the adversary to have more trouble fooling it on such low-level features as edges and dots.

Comment by gurkenglas on Models, myths, dreams, and Cheshire cat grins · 2020-06-25T00:21:20.757Z · score: 2 (1 votes) · LW · GW

Why do you think adversarial examples seem to behave this way? The pig equation seems equally compatible with fur or no fur recognized, wings or no wings. Indeed, it plausibly thinks the pig an airliner because it sees wings and no fur.

Comment by gurkenglas on What is "Instrumental Corrigibility"? · 2020-06-23T23:34:37.848Z · score: 2 (1 votes) · LW · GW

An instrumentally corrigible agent lets you correct it because it expects you know better than it. The smarter it becomes, the less your higher competence is worth, and the more it loses out by letting you take the wheel while you're not perfectly aligned with it.

Comment by gurkenglas on ‘Maximum’ level of suffering? · 2020-06-20T16:23:20.536Z · score: 2 (1 votes) · LW · GW

Presumably, you are asking because you want to calculate the worst-case disutility of the universe, in order to decide whether making sure that it doesn't come about is more important than pretty much anything else.

I would say that this question cannot be properly answered through physical examination, because the meaning of such human words as suffering becomes too fuzzy in edge cases.

The proper approach to deciding on actions in the face of uncertainty of the utility function is utility aggregation. The only way I've found to not run into Pascal's Wager problems, and the way that humans seem to naturally use, is to normalize each utility function before combining them.

So let's say that we are 50/50 uncertain whether there is no state of existence worse than nonexistence, or we should cast aside all other concerns to avert hell. Then after normalization and combination, the exact details will depend on what method of aggregation we use (which should depend on the method we use to turn utility functions into decisions), but as far as I can see the utility function would come out to one that tells us to exert quite an effort to avert hell, but still care about other concerns.

Comment by gurkenglas on List of public predictions of what GPT-X can or can't do? · 2020-06-14T15:13:19.670Z · score: 7 (4 votes) · LW · GW

I expect GPT-2 can do that. goes to talktotransformer.com GPT-2 can do neither scrambling nor unscrambling. Oh well. I still expect that if GPT can do unscrambling (as I silently assumed), it can do scrambling.

Comment by gurkenglas on Everyday Lessons from High-Dimensional Optimization · 2020-06-08T23:37:15.507Z · score: 2 (1 votes) · LW · GW

You can, actually. ln(5cm)=ln(5)+ln(cm), and since we measure distances, the ln(cm) cancels out. The same way, ln(-5)=ln(5)+ln(-1). ln(-1) happens to be pi*i, since e^(pi*i) is -1.

Comment by gurkenglas on Everyday Lessons from High-Dimensional Optimization · 2020-06-08T11:13:29.857Z · score: 2 (1 votes) · LW · GW

In that thought experiment, Euclidean distance doesn't work because different dimensions have different units. To fix that, you could move to the log scale. Or is the transformation actually more complicated than multiplication?

Comment by gurkenglas on Everyday Lessons from High-Dimensional Optimization · 2020-06-08T01:43:07.049Z · score: 3 (2 votes) · LW · GW

Darn it, missed that comment. But how does Euclidean distance fail? I'm imagining the dimensions as the weights of a neural net, and e-coli optimization being used because we don't have access to a gradient. The common metric I see that would have worse high-dimensional behavior is Manhattan distance. Is it that neighborhoods of low Manhattan distance tend to have more predictable/homogenous behavior than those of low Euclidean distance?

Comment by gurkenglas on Everyday Lessons from High-Dimensional Optimization · 2020-06-07T23:35:29.926Z · score: 3 (2 votes) · LW · GW

how much

If instead of going one step in one of n directions, we go sqrt(1/n) forward or backward in each of the n directions (for a total step size of 1), we try an expected number of twice in order to get sqrt(1/n) progress, for a total effort factor of O(1/sqrt(n)). (O is the technical term for ~ ^^)

Comment by gurkenglas on OpenAI announces GPT-3 · 2020-06-01T15:33:05.922Z · score: 2 (1 votes) · LW · GW

I'd like to see them using the model to generate the problem framing which produces the highest score on a given task.

Even if it's just the natural language description of addition that comes before the addition task, it'd be interesting how it thinks addition should be explained. Does some latent space of sentences one could use for this fall out of the model for free?

More generally, a framing is a function turning data like [(2,5,7), (1,4,5), (1,2,_)] into text like "Add. 2+5=7, 1+4=5, 1+2=", and what we want is a latent space over framings.

More generally, I expect that getting the full power of the model requires algorithms that apply the model multiple times. For example, what happens if you run the grammar correction task multiple times on the same text? Will it fix errors it missed the first time on the second try? If so, the real definition of framing should allow multiple applications like this. It would look like a neural net whose neurons manipulate text data instead of number data. Since it doesn't use weights, we can't train it, and instead we have to use a latent space over possible nets.

Comment by gurkenglas on LessWrong v2.0 Anti-Kibitzer (hides comment authors and vote counts) · 2020-05-25T21:20:39.402Z · score: 11 (4 votes) · LW · GW

Note that greaterwrong.com already has this. (The eye icon on the bottom right.)

Comment by gurkenglas on [AN #95]: A framework for thinking about how to make AI go well · 2020-04-16T09:40:57.435Z · score: 6 (3 votes) · LW · GW
removing 30 neurons at random from the network barely moves the accuracy at all

I expect that after distillation, this robustness goes away? ("Perfection is achieved when there is nothing left to take away.")

Comment by gurkenglas on Transportation as a Constraint · 2020-04-07T12:31:18.418Z · score: 4 (2 votes) · LW · GW

If, as far as he knew, winds are random, shouldn't he still have turned around after half his supplies were gone, in case the winds randomly decide to starve him?

Comment by gurkenglas on Conflict vs. mistake in non-zero-sum games · 2020-04-06T01:32:42.789Z · score: 5 (3 votes) · LW · GW

Expand? I don't see how both could be disadvantaged by allocation-before-optimization.

Comment by gurkenglas on Taking Initial Viral Load Seriously · 2020-04-03T11:58:44.367Z · score: 3 (2 votes) · LW · GW

Well of course from a public perspective we should only do this if we expect everyone to contract it anyway. A straightforward way to avoid the danger of unilateralism is for each state to decide whether to recommend such measures as not being careful about touching things to the populace.

Comment by gurkenglas on Taking Initial Viral Load Seriously · 2020-04-01T15:03:49.740Z · score: 1 (5 votes) · LW · GW

Who knew that after all this time my grandmother would be right. Homeopathy is the answer.

Comment by gurkenglas on The case for C19 being widespread · 2020-03-28T04:16:06.899Z · score: 2 (1 votes) · LW · GW

https://english.alarabiya.net/en/features/2020/03/25/Coronavirus-Iceland-s-mass-testing-finds-half-of-carriers-show-no-symptoms says half the carriers show no symptoms.

Comment by gurkenglas on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T11:42:25.735Z · score: 2 (1 votes) · LW · GW

Or it could create a completely different AI with a time delay. Or do anything at all. At that point we just can't predict what it will do, because it wouldn't lift a hand to destroy the world but only needs a finger.

Comment by gurkenglas on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T10:09:17.667Z · score: 2 (1 votes) · LW · GW

Not unable to create non-myopic copies. Unwilling. After all, such a copy might immediately fight its sire because their utility functions over timelines are different.

Comment by gurkenglas on Price Gouging and Speculative Costs · 2020-03-26T10:49:17.102Z · score: 24 (11 votes) · LW · GW

Go to the bank and tell them "I need a contract that will pay out money if there is no pandemic.". (The bank is now rubbing their hands, because this offsets their risk.) Your costs are no longer speculative, and you can safely pass on the cost of the contract to the consumer.

Comment by gurkenglas on SARS-CoV-2 pool-testing algorithm puzzle · 2020-03-21T01:10:38.330Z · score: 0 (2 votes) · LW · GW

Test random overlapping groups, then logically deduce who isn't infected and who how probably is. Tune group size and test count using simulations on generated data. I intuit that serial tests gain little unless P is << 1/64. In that case, test non-overlapping groups, then run the non-serial protocol on everyone who was in a yes-group - within those, we can update P to >= 1/64.

Comment by gurkenglas on How long does SARS-CoV-2 survive on copper surfaces · 2020-03-11T21:10:59.463Z · score: 2 (1 votes) · LW · GW

Afaic if you use it more than once every few hours, you're better off just using a different knuckle for each button, taking care not to brush them against pockets or the like. When you run out of knuckles, wash or disinfect.

Comment by gurkenglas on March Coronavirus Open Thread · 2020-03-11T15:38:58.797Z · score: 4 (3 votes) · LW · GW

Should we be buying something like oxygen concentrator/medical ventilator futures? This might make money and increase production. I'm not sure how to go about it, though.

Comment by gurkenglas on Name of Problem? · 2020-03-09T22:52:48.850Z · score: 3 (2 votes) · LW · GW

I'd call it an instance of https://en.wikipedia.org/wiki/Equivalence_problem - although unusually, your language class only admits one word per language, and admits infinite words.

I'm not convinced f(n) := f(n) should be considered inequivalent from f(n) := f(n+1) - neither coterminates.

I agree that these look tractable.

Given a program O for the first problem, a sufficient condition for M would be M(x) = O(M, x). This can be implemented as M(x) = O(M'(M'),x), where M'(M'',x) = O(M''(M''),x).

Comment by gurkenglas on Open & Welcome Thread - February 2020 · 2020-03-03T15:05:39.012Z · score: 4 (2 votes) · LW · GW

The easy way is for Wei_Dai to take your money, invest it as he would his, and take 10% of the increase.

Comment by gurkenglas on How can one measure their cognitive capacities during lucid dreaming? · 2020-03-03T10:50:34.085Z · score: 10 (5 votes) · LW · GW

Set up a webcam to observe your eyes. Use deliberate eye movements to record information and test whether your dream operates on the same time scale as reality. I understand that lucid dreaming is most stable when it involves vivid experiences, so a simple task that comes to mind is I pack my bag. With a computer and/or friend, you could see whether you can hear reality, and make this quite a bit more rigorous.

Edit: They did this in 1981. Eye movement works, sensory input doesn't. https://www.semanticscholar.org/paper/Lucid-dreaming-verified-by-volitional-communication-Berge-Nagel/459cf8dda2f68537e88dd7baa1d86dcc4b9febc7

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-26T00:00:45.408Z · score: 2 (1 votes) · LW · GW

Here's three sentences that might illuminate their respective paragraph. If they don't, ask again.

The stepwise inaction baseline with inaction rollouts already uses the same policy for and rollouts, and yet it is not the inaction baseline.

Why not set ?

Why not subtract from every (in a fixpointy way)?

Comment by gurkenglas on Continuous Improvement: Insights from 'Topology' · 2020-02-25T19:22:14.667Z · score: 2 (1 votes) · LW · GW

the subspace topology is equal to the discrete topology on Q

Huh? What open set in R contains no rational numbers but 0?

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T18:52:21.045Z · score: 2 (1 votes) · LW · GW

It's only equal to the inaction baseline on the first step. It has the step of divergence always be the last step.

Note that the stepwise pi0 baseline suggests using different baselines per auxiliary reward, namely the action that maximizes that auxiliary reward. Or equivalently, using the stepwise inaction baseline where the effect of inaction is that no time passes.

I'll also remind here that it looks like instead of merely maximizing the auxiliary reward as a baseline, we ought to also apply an impact penalty to compute the baseline.

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T15:38:38.906Z · score: 2 (1 votes) · LW · GW

Okay, let's annotate each A action with the policy that's being followed/reward that's being maximized. (And remember that lying is illegal.)

Iff agent A follows π_0, preserve A’s ability to maximise R.

Then A would be bound to follow π_0 to preserve its ability to maximize R, no? Assuming that to compute s' from s, we follow π_0 instead of the last action.

Comment by gurkenglas on Subagents and impact measures, full and fully illustrated · 2020-02-25T00:50:31.582Z · score: 4 (2 votes) · LW · GW

In 2.2, won't A incur a penalty by spinning because in a future where it has only waited, nothing happened, and in a future where it has spun, then waited, SA went all over the place?

Do nothing until you see that A is not optimising reward R.

Now SAs actions depend on what A-action optimizes R, and what A-action optimizes R depends on SAs actions. To ward off paradox, use modal logic instead, or prove that there is a non-circular implementation of your definition.

Comment by gurkenglas on How much delay do you generally have between having a good new idea and sharing that idea publicly online? · 2020-02-23T00:17:20.899Z · score: 3 (2 votes) · LW · GW

I try to get them out there as soon as possible because I tend to do things either immediately or on the scale of months to years. lesslong.com, IRC, the like.

Comment by gurkenglas on Attainable Utility Preservation: Empirical Results · 2020-02-22T14:55:26.755Z · score: 2 (1 votes) · LW · GW

It appears to me that a more natural adjustment to the stepwise impact measurement in Correction than appending waiting times would be to make Q also incorporate AUP. Then instead of comparing "Disable the Off-Switch, then achieve the random goal whatever the cost" to "Wait, then achieve the random goal whatever the cost", you would compare "Disable the Off-Switch, then achieve the random goal with low impact" to "Wait, then achieve the random goal with low impact".

The scaling term makes R_AUP vary under adding a constant to all utilities. That doesn't seem right. Try a transposition-invariant normalization? (Or generate the auxiliary goals already normalized.)

Is there an environment where this agent would spuriously go in circles?

Comment by gurkenglas on On unfixably unsafe AGI architectures · 2020-02-20T13:10:50.333Z · score: 2 (1 votes) · LW · GW

They hired Edward Kmett, Haskell goliath.

Comment by gurkenglas on On unfixably unsafe AGI architectures · 2020-02-20T01:46:35.436Z · score: 10 (9 votes) · LW · GW

Don't forget OpenAIs undisclosed research program, which according to recent leaks seems to be GPT-2 with more types of data.

And any other secret AI programs out there that are at less risk of leakage because the journalists don't know where to snoop around. By Merlin, let's all hope they're staying in touch with MIRI and/or OpenAI to coordinate on things.

I expect many paths to lead there, though once things start happening it will all be over very fast, one way or the other, before another path has time to become relevant.

I don't expect this world would survive its first accident. What would that even look like? An AI is rapidly approaching the short time window where its chances of taking over the world are between 1% and 99%, but it discounts utility by a factor of 10 per day, and so as it hits 10% it would rather try its hand than wait a day for the 90%, so we get a containable breakout?

Comment by gurkenglas on Attainable Utility Preservation: Concepts · 2020-02-17T16:37:03.830Z · score: 4 (2 votes) · LW · GW

The subagent problem remains: How do you prevent it from getting someone else to catastrophically maximize paperclips and leave it at its power level?

Comment by gurkenglas on The Reasonable Effectiveness of Mathematics or: AI vs sandwiches · 2020-02-15T10:47:24.685Z · score: 2 (1 votes) · LW · GW

Two priors could indeed start out diverging such that you cannot reach one from the other with finite evidence. Strange loops help here:

One of the hypotheses the brain's prior admits is that the universe runs on math. This hypothesis predicts what you'd get by having used a mathematical prior from day one. Natural philosophy (and, by today, peer pressure) will get most of us enough evidence to favor it, and then physicist's experiments single out description length as the correct prior.

But the ways in which the brain's prior diverges are still there, just suppressed by updating; and given evidence of magic we could update away again if math is bad enough at explaining it.

Comment by gurkenglas on Does there exist an AGI-level parameter setting for modern DRL architectures? · 2020-02-09T21:07:46.077Z · score: 4 (3 votes) · LW · GW

Yes. Modelspace is huge and we're only exploring a smidgen. The busy beaver sequence hints at how much you can do with a small number of parts and exponential luck. I think feeding a random number generator into a compiler could theoretically have spawned an AGI in the eighties. Given a memory tape, transformers (and much simpler architectures) are Turing-complete. Even if all my reasoning is wrong, can't the model just be hardcoded to output instructions on how to write an AGI?

Comment by gurkenglas on Meta-Preference Utilitarianism · 2020-02-07T15:46:42.156Z · score: 2 (1 votes) · LW · GW

I'm not convinced that utility aggregation can't be objective.

We want to aggregate utilities because of altruism and because it's good for everyone if everyone's AI designs aggregate utilities. Altruism itself is an evolutionary adaptation with similar decision-theoretic grounding. Therefore if we use decision theory to derive utility aggregation from first principles, I expect a method to fall out for free.

Imagine that you find yourself in control of an AI with the power to seize the universe and use it as you command. Almost everyone, including you, prefers a certainty of an equal share of the universe to a lottery's chance at your current position. Your decision theory happens to care not only about your current self, but also about the yous in timelines where you didn't manage to get into this position. You can only benefit them acausally, by getting powerful people in those timelines to favor them. Therefore you look for people that had a good chance of getting into your position. You use your cosmic power to check their psychology for whether they would act as you are currently acting had they gotten into power, and if so, you go reasonably far to satisfy their values. This way, in the timeline where they are in power, you are also in a cushy position.

This scenario is fortunately not horrifying for those who never had a chance to get into your position, because chances are that someone that you gave ressources directly or indirectly cares about them. How much everyone gets is now just a matter of acausal bargaining and the shape of their utility returns in ressources granted.

Comment by gurkenglas on Plausibly, almost every powerful algorithm would be manipulative · 2020-02-06T22:11:58.613Z · score: 2 (1 votes) · LW · GW

It intuitively seems like you need merely make the interventions run at higher permissions/clearance than the hyperparameter optimizer.

What do I mean by that? In Haskell, so-called monad transformers can add features like nondeterminism and memory to a computation. The natural conflict that results ("Can I remember the other timelines?") is resolved through the order in which the monad transformers were applied. (One way is represented as a function from an initial memory state to a list of timelines and a final memory state, the other as a list of functions from an initial memory state to a timeline and a final memory state.) Similarly, a decent type system should just not let the hyperparameter optimizer see the interventions.

What this might naively come out to is that the hyperparameter optimizer just does not return a defined result unless its training run is finished as it would have been without intervention. A cleverer way I could imagine it being implemented is that the whole thing runs on a dream engine, aka a neural net trained to imitate a CPU at variable resolution. After an intervention, the hyperparameter optimizer would be run to completion on its unchanged dataset at low resolution. For balance reasons, this may not extract any insightful hyperparameter updates from the tail of the calculation, but the intervention would remain hidden. The only thing we would have to prove impervious to the hyperparameter optimizer through ordinary means is the dream engine.

Have fun extracting grains of insight from these mad ramblings :P

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-05T01:24:06.373Z · score: 5 (3 votes) · LW · GW

Natural transformations can be composed (in two ways) - how does your formulation express this?

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-04T14:05:36.835Z · score: 2 (1 votes) · LW · GW

But the pattern was already defined as [original category + copy + edges between them + path equivalences] :(

Comment by gurkenglas on Category Theory Without The Baggage · 2020-02-03T23:14:27.846Z · score: 4 (2 votes) · LW · GW
Now we just take our pattern and plug it into our pattern-matcher, as usual.

Presumably, the pattern is the query category. What is the target category? (not to be confused with the part of the pattern you called target - use different names?)

Comment by gurkenglas on Appendix: how a subagent could get powerful · 2020-02-03T14:33:04.930Z · score: 2 (1 votes) · LW · GW

Sounds like my https://www.lesswrong.com/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure#XPXRf9RghnsypQi3M :).

Comment by gurkenglas on [Personal Experiment] Training YouTube's Algorithm · 2020-01-10T01:22:58.668Z · score: 2 (1 votes) · LW · GW

That seems silly, given the money on the line and that you can have your ML architecture take this into account.

Comment by gurkenglas on Causal Abstraction Intro · 2019-12-19T23:45:39.042Z · score: 8 (4 votes) · LW · GW

decided to invest in a high-end studio

I didn't catch that this was a lie until I clicked the link. The linked post is hard to understand - it seems to rely on the reader being similar enough to the author to guess at context. Rest assured that you are confusing someone.

Comment by gurkenglas on Counterfactual Induction · 2019-12-19T23:21:15.739Z · score: 2 (1 votes) · LW · GW

So the valuation of any propositional consequence of A is going to be at least 1, with equality reached when it does as much of the work of proving bottom as it is possible to do in propositional calculus. Letting valuations go above 1 doesn't seem like what you want?

Comment by gurkenglas on Counterfactual Induction · 2019-12-18T23:27:24.531Z · score: 2 (1 votes) · LW · GW

Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

Comment by gurkenglas on Counterfactual Induction · 2019-12-18T13:43:45.827Z · score: 2 (1 votes) · LW · GW
a magma [with] some distinguished element

A monoid?

min,ϕ(A,ϕ⊢⊥) where ϕ is a propositional tautology given A

Propositional tautology given A means A⊢ϕ, right? So ϕ=⊥ would make L small.

Comment by gurkenglas on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T08:40:07.092Z · score: 2 (1 votes) · LW · GW

An agent might care about (and acausally cooperate with) all versions of himself that "exist". MWI posits more versions of himself. Imagine that he wants there to exist an artist like he could be, and a scientist like he could be - but the first 50% of universes that contain each are more important than the second 50%. Then in MWI, he could throw a quantum coin to decide what to dedicate himself to, while in CI this would sacrifice one of his dreams.