## Posts

## Comments

**crabman**on Novum Organum: Preface · 2019-09-21T14:00:03.737Z · score: 1 (1 votes) · LW · GW

retain ·the evidence of· the senses subject to certain constraints If you are led •by the evidence of your senses What does Francis mean by "senses" here?

The passage that says it's madness to do intellectual work without tools, I am not sure if I understand what those tools are. I guess they are something like principles of empiricism and rationality. If Francis looked at the state of science in different time points between the time he wrote the book and now, would he say that scientists mostly used the tools? Because it seems science is in an ok shape now.

I think the book's title "The new organon" is bailey, and the passage about not wanting rivalry with ancient philosophers is motte.

**crabman**on crabman's Shortform · 2019-09-15T19:25:15.701Z · score: 1 (1 votes) · LW · GW

Many biohacking guides suggest using melatonin. Does liquid melatonin spoil under high temperature if put in tea (95 degree Celcius)?

More general question: how do I even find answers to questions like this one?

**crabman**on crabman's Shortform · 2019-09-14T12:30:37.731Z · score: 15 (4 votes) · LW · GW

A competition on solving math problems via AI is coming. https://imo-grand-challenge.github.io/

- The problems are from the International math olympiad (IMO)
- They want to formalize all the problems in Lean (theorem prover) language. They haven't figured out how to do that, e.g. how to formalize problems of form "determine the set of objects satisfying the given property", as can be seen in https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean
- A contestant must submit a computer program that will take a problem's description as input and output a solution and its proof in Lean.

I would guess that this is partly a way to promote Lean. I think it would be interesting to pose questions about this on metaculus.

**crabman**on Free-to-Play Games: Three Key Trade-Offs · 2019-09-10T18:25:40.272Z · score: 3 (3 votes) · LW · GW

Where does Eternal, a game you seem to like, lie in the space of these tradeoffs?

**crabman**on Buck's Shortform · 2019-08-18T10:17:57.936Z · score: 11 (5 votes) · LW · GW

How do you spend time with the tutor? Whenever I tried studying with a tutor, it didn't seem more efficient than studying using a textbook. Also when I study on my own, I interleave reading new materials and doing the exercises, but with a tutor it would be wasteful to do exercises during the tutoring time.

**crabman**on Rethinking Batch Normalization · 2019-08-07T14:08:08.256Z · score: 1 (1 votes) · LW · GW

I want to clarify in what domain this theory of batch normalization holds.

The evidence we have is mostly about batch normalization in those types of feedforward neural networks that are often used in 2019, right? So, residual CNNs, VGG-like CNNs, other CNNs, transformers. Maybe other types of feedforward neural networks. But not RNNs.

Has anyone explored the applicability of batch normalization or similar techniques to non neural network functions which we optimize by gradient descent like algorithms? Perhaps to tensor networks?

**crabman**on AALWA: Ask any LessWronger anything · 2019-08-03T11:58:14.723Z · score: 1 (1 votes) · LW · GW

Hi. At http://www.weidai.com/everything.html you say:

Why do we believe that both the past and the future are not completely random, but the future is more random than the past?

I don't understand what you mean saying that the future is more random than the past. Care to explain?

**crabman**on Understanding Batch Normalization · 2019-08-01T22:27:53.242Z · score: 2 (2 votes) · LW · GW

How do we check empirically or otherwise whether this explanation of what batch normalization does is correct?

I am imagining this internal covariate shift thing like this: the neural network together with its loss is a function which takes parameters θ as input and outputs a real number. Large internal covariate shift means that if we choose ε>0, perform some SGD steps, get some θ, and look at the function's graph in ε-area of θ, it doesn't really look like a plane, it's more curvy like. And small internal covariate shift means that the function's graph is more like a plane. Hence gradient descent works better. Is this intuition correct?

Why does the internal covariate shift become less, even though we have μ and β terms?

About the example, it seems to me that the main problem here is that if a gradient descent step changes the sign of an even number of weights, then it might be that the step didn't really achieve anything. Can we fix it somehow? What if we make an optimizer that allows only 1 weight to change sign at each iteration? For actual neural networks allow only weights from one layer to change sign at any given step. Which layer to choose? The one where there are most components want to change sign. (I am not sure what to do about the fact that we use activation functions and biases)

Does batch normalization really cause the distribution of activations of a neuron be more like a Gaussian? Is that like an empirical observation of what happens when a neural network with batch normalization is optimized by an SGD-like optimizer?

P.S. I like the fact that you are posting about deep learning on LessWrong. Maybe there are many rationalists who practice machine learning but they are not sure if there are other people like that on LessWrong, so they don't post about it here?

**crabman**on Insights from Linear Algebra Done Right · 2019-07-14T19:22:49.628Z · score: 3 (2 votes) · LW · GW

Ok. It's just that when I learned that, we didn't even talk about dual spaces in linear algebraic sense, we worked just fine in .

**crabman**on Insights from Linear Algebra Done Right · 2019-07-14T18:59:19.663Z · score: 1 (1 votes) · LW · GW

Do you mean solving convex optimization problems by solving their dual problems instead?

**crabman**on Insights from Linear Algebra Done Right · 2019-07-14T18:42:50.106Z · score: 8 (3 votes) · LW · GW

I wonder, what do you think about the chapter about dual spaces, dual maps, annihilator, etc.? To me it seemed not too connected with everything else, and that's bad. If I remember correctly, the author uses duality just to prove a few results and then throws duality away and never uses it again. Also in real life (numerical linear algebra, machine learning, and stuff) I am not aware of any use for those concepts.

So for “general” operators, this is always true, but there do exist specific operators for which it isn’t.

I believe when mathematicians say that in general P(x) holds, they mean that for any x in the domain of interest P(x) holds. Perhaps you want to you *typical* instead of *general* here. E.g. there is a notion called *typical* tensor rank of tensors of given shape, which means a tensor rank which occurs with non-zero probability when a random tensor of given shape is sampled.

**crabman**on Everybody Knows · 2019-07-08T13:11:11.641Z · score: -5 (5 votes) · LW · GW

including 20% who think the Sun revolves around the Earth

The sun **does** revolve around the Earth. The sun revolving around the earth is equivalent to the Earth revolving around the sun.

**crabman**on What would be the signs of AI manhattan projects starting? Should a website be made watching for these signs? · 2019-07-06T13:05:48.605Z · score: 2 (2 votes) · LW · GW

Why **the** government? Perhaps **a** government?

**crabman**on Reneging prosocially by Duncan Sabien · 2019-06-19T04:17:01.397Z · score: -3 (3 votes) · LW · GW

I have doubts about

If I loan you an item ostensibly for a month, and regret it, I will do significantly less damage asking you to return it in a week than asking you to return it immediately.

So, a good samaritan Alice gave her friend John an item for some time, then she realized she wants to use the item herself. It's already the case that only John is getting something out of this agreement, so why should Alice take on additional costs of waiting another week?

Edit: unless John actually gave Alice money for borrowing her item. IMO people should pay each other money for various acts that provide value much more often than the do.

**crabman**on And My Axiom! Insights from 'Computability and Logic' · 2019-05-10T17:29:22.982Z · score: 3 (2 votes) · LW · GW

It seems "Computability and Logic" doesn't include Kleene's recursion theorem and Rice's theorem. What sources would you recommend for learning those theorems, their proofs, and their corollaries? Also, which chapters of "Computability and Logic" are required to understand them?

**crabman**on AI Safety Prerequisites Course: Revamp and New Lessons · 2019-03-16T16:04:12.213Z · score: 1 (1 votes) · LW · GW

Perhaps he's talking about textbooks?

**crabman**on Embedded World-Models · 2019-03-16T00:52:02.095Z · score: 3 (2 votes) · LW · GW

I think this article is too vague, because for almost almost claims in it I am not sure if I understand the author correctly. Below I am posting my notes. If you want to help me and others clarify understanding of this article, consider answering **questions in bold**, or, if you see a mistake in my notes, correcting it. Also I hope my notes help the author as a piece of feedback. I've only finished 2/3 of the article so far, but posting notes because I might become less interested in this later.

Also it's unfortunate that unlike in https://intelligence.org/2018/11/02/embedded-models/ version of this article we don't have hyperlinks to explanations of various concepts here. Perhaps you could add them under the corresponding images? Or have images themselves be hyperlinks or reference links (like in academic articles) to the bottom of the document where all relevant links would be stored grouped by image number.

The post says an embedded agent can't hold an exact model of the
environment in its head, can't think through the consequences of every
potential course of action, can't hold in its head every possible way the environment
could be. **I think this may not be necessarily true and I am not sure what assumptions the author used here.**

**It seems the whole article assumes countable probability spaces (even before the AIXI part). I wonder why and I wonder how realizability is defined for uncountable probability space.**

--

Regarding relative bounded loss and what this bound is for, my best guess is as follows. Here I use non-conditional probability notation instead of . Suppose some event e is actually true. Let be some "expert" event in the probability space. According to prior, probability of e equals , and its logarithm of probability has a lower bound . Now, according to the expert h, its probability equals just , and its logarithm of probability equals . I conclude that relative bounded loss is the difference between prior logarithm of probability and logarithm of probability of the expert h, which turns out to be at most .

Initially, is your initial trust in expert h, and in each case where it is even a little bit more correct than you, you increase your trust accordingly; the way you do this ensures you assign an expert probability 1 and hence copy it precisely before you lose more than compared to it.

Remember, . It follows
that probability of h increases given evidence e if and only if , i.e. h "is even a little bit more
correct than you". **But I don't understand the bit about copying the expert
h precisely before losing more than **, because losing more
than is logically impossible (assuming ), as was shown above.

Combining this with the previous idea about viewing Bayesian learning as a way of allocating “trust” to “experts” which meets a bounded loss condition, we can see the Solomonoff prior as a kind of ideal machine learning algorithm which can learn to act like any algorithm you might come up with, no matter how clever.

It is assuming all possible algorithms are computable, not that the world is.

I don't understand this. Our probability space is the cartesian product of
the set of all possible UTM programs and the set of all possible UTM
working tape initial configurations. Or, equivalently, the set of outputs
of UTM under these conditions. Hence our whole hypothesis space only
includes computable worlds. **What does "can learn to act like any algorithm"
mean here?** "It's getting bounded loss on its predictive accuracy as
compared with any computable predictor." Huh? **Does predictor here mean
expert h? If yes, what does it mean that h is computable and why? All in
all, is the author claiming it's impossible to have a better computable
predictor than AIXI with Solomonoff prior, even if it has non-computable
worlds in the probability space?**

probabilities may not be calibrated identification of causal structure may not work

**What do these mean?** I only know informally what calibration means
related to forecasting.

So, does AIXI perform well without a realizability assumption?

**How is AIXI even defined without realizability, i.e. when the actual world isn't in the probability space, or it has zero prior probability?**

This is fine if the world “holds still” for us; but because the map is in the world, it may implement some function.

**Is this about the world changing because of the agent just thinking? Or
something else?**

It should be noted, though, that there are additional barriers to getting this property in a game-theoretic setting; so in their common usage cases, "grain of truth" is technically demanding while "realizability" is a technical convenience.

...

In game theory, on the other hand, the assumption itself may be inconsistent. This is because games commonly yield paradoxes of self-reference.

From the former paragraph I don't understand anything except that (the author claims) game theory has more problems with grain of truth / realizability, than AIXI. After the latter paragraph, my best guess is: for any game, if there is no pure strategy equilibrium in it, then we say it has no grain of truth, because for every possible outcome rational agents wouldn't choose it.

If we put weight in both places until a proof rules one out, the beliefs just oscillate forever rather than doing anything useful.

Weights represent possible worlds, therefore they are on the scales right
from the beginning (the prior), we never put new weights on the scales. **My
probably incorrect guess of what the author is saying is**
some agent which acts like AIXI but
instead of updating on pieces of evidence as soon as he receives it, he
stockpiles it, and at some points he (boundedly) searches for proofs that
these pieces of evidence are in favor of some hypothesis and performs
update only when he finds them. **But still, why oscillation?**

Any computable beliefs about logic must have left out something, since the tree will grow larger than any container.

I interpret it as there are infinitely many theorems, hence an agent with finite amount of space or finite amount of computation steps can't process all of them.

Another consequence of the fact that the world is bigger than you is that you need to be able to use high-level world models: models which involve things like tables and chairs.

This is related to the classical symbol grounding problem; but since we want a formal analysis which increases our trust in some system, the kind of model which interests us is somewhat different. This also relates to transparency and informed oversight: world-models should be made out of understandable parts.

**No idea what the second quoted paragraph means.**

All in all, I doubt that high level world models are necessary. And it's very not clear what is meant by "high level" or "things" here. Perhaps embedded agents can (boundedly) reason about the world in other ways, e.g. by modeling only part of the world.

https://intelligence.org/files/OntologicalCrises.pdf explains the ontological crisis idea better. Suppose our AIXI-like agent thinks the world is an elementary outcome of some parameterized probability distribution with the parameter θ. θ is either 1 or 2. We call the set of elementary outcomes with θ=1 the first ontology (e.g. possible worlds running on classical mechanics), and the set of elementary outcomes with θ=2 the second ontology (e.g. possible worlds running on superstrings theory). The programmer has only programmed the agent's utility functiom for θ=1 part, i.e. a u function from ontology 1 to real numbers. The agent keeps count of which value of θ is more probable and chooses actions by considering only current ontology. If at some point he decides that the second ontology is more useful, he switches to it. The agent should extrapolate the utility function to θ=2 part. How can he do it?

**crabman**on AI Safety Prerequisites Course: Revamp and New Lessons · 2019-02-06T11:13:45.569Z · score: 3 (2 votes) · LW · GW

maybe recent machine learning topics are a point of comparative advantage

Do you mean recent ML topics related to AI safety, or just recent ML topics?

RAISE is already working on the former, it's another course which we internally call "main track". Right now it has the following umbrella topics: Inverse Reinforcement Learning; Iterated Distillation and Amplification; Corrigibility. See https://www.aisafety.info/online-course

**crabman**on The 3 Books Technique for Learning a New Skilll · 2019-01-29T14:08:45.406Z · score: 1 (1 votes) · LW · GW

Is the point of your comment that you think people very rarely read (completely or almost completely) 3 books in one field?

(if yes, then I agree)

**crabman**on What are good ML/AI related prediction / calibration questions for 2019? · 2019-01-04T21:47:52.011Z · score: 1 (1 votes) · LW · GW

I find your predictions 1 through 3 not clearly defined.

Does OpenAI bot need to defeat a pro team in unconstrained dota 2 at least once during 2019? Or does it need to win at least one and more than 50% games against pro teams in 2019?

Suppose tesla releases a video footage or a report of their car reaching from one coast to the other, but it had some minor or not so minor problems. How minor should they be to count? Are humans allowed to help it recharge or anything like that?

How do you define "skilled" in SC II?

**crabman**on Good Samaritans in experiments · 2018-12-08T02:26:11.176Z · score: 1 (1 votes) · LW · GW

How did you conclude that people who prepared GS are actually more likely to help than other people? Just from eyeballing 10/19 and 6/21 I can't conclude that this is enough evidence, only that this is suggestive.

**crabman**on Current AI Safety Roles for Software Engineers · 2018-12-07T02:12:18.268Z · score: 4 (3 votes) · LW · GW

Could you please elaborate what kind of culture fit MIRI require?

**crabman**on The First Rung: Insights from 'Linear Algebra Done Right' · 2018-11-27T22:16:04.008Z · score: 1 (1 votes) · LW · GW

What is the point of spending a section on dual maps, I wonder? Is the sole purpose to show that row rank equals column rank, I wonder? If so, then a lot of my time spent on exercises on dual maps might be wasted.

**crabman**on Embedded Agents · 2018-11-01T00:16:09.791Z · score: 4 (3 votes) · LW · GW

Is this the first post in the sequence? It's not clear.

**crabman**on Things I Learned From Working With A Marketing Advisor · 2018-10-10T14:01:03.759Z · score: 4 (3 votes) · LW · GW

You say

Epistemic Status: Opinions stated without justification

but from the text it seems you believe that acting according to the described opinions is useful and that many of them are true. I don't like this, I think you should clarify epistemic status.

**crabman**on [Math] Towards Proof Writing as a Skill In Itself · 2018-06-19T08:38:45.491Z · score: 2 (1 votes) · LW · GW

Can you elaborate? What is a constructive proof? Why should on care?

**crabman**on Terrorism, Tylenol, and dangerous information · 2018-05-13T16:46:01.141Z · score: 3 (1 votes) · LW · GW

Extremely low amount of deaths is due to terrorist attacks (https://i.redd.it/5sq16d2moso01.gif, https://owenshen24.github.io/charting-death/), so this is not important, and people should care about such things less.

**crabman**on Looking for AI Safety Experts to Provide High Level Guidance for RAISE · 2018-05-06T10:36:49.824Z · score: 6 (2 votes) · LW · GW

What does this have in common with https://www.lesswrong.com/posts/kK67yXhmDYwXLqXoQ/fundamentals-of-formalisation-level-1-basic-logic ?

**crabman**on Mere Addition Paradox Resolved · 2018-04-28T21:25:49.957Z · score: 10 (3 votes) · LW · GW

Adding resources to this thought experiment is just adding noise. If something other than life quality values matters in this model, then the model is bad.

A>B is correct in average utilitarianism and incorrect in total utilitarianism. The way to resolve this is to send average utilitarianism into the trash can, because it fails in so many desidarata.

**crabman**on The First Rung: Insights from 'Linear Algebra Done Right' · 2018-04-23T13:49:42.592Z · score: 19 (5 votes) · LW · GW

How many hours did it take you to read the whole book and do all the exercises that you did? I am reading it too, so far I've spent somewhere between 12 and 22 hours and I'm at exercises 2.A. Also I recommend watching https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/playlists Linear Algebra Essence playlist to get (or remind yourself of) some geometric intuitions.

**crabman**on Idea: OpenAI Gym environments where the AI is a part of the environment · 2018-04-12T23:03:25.120Z · score: 2 (1 votes) · LW · GW

I haven't, thanks.

Btw was your goal to show me the link or to learn whether I have seen it before? If the former, then I don't need to respond. If the latter, then you want my response I guess.

**crabman**on Book Review: Consciousness Explained · 2018-03-13T15:21:38.271Z · score: 10 (4 votes) · LW · GW

Can anyone provide a comparison between this book and Consciousness. An Introduction - Susan Blackmore. The latter has been recommended to me, but after having read a chapter I haven't been impressed.

**crabman**on Mathemeditation · 2018-03-12T21:09:13.160Z · score: 4 (2 votes) · LW · GW

Maybe you should do it with paper and a working utensils - I can't really do math without external memory, and other people including you are probably bad at it to.

**crabman**on Making yourself small · 2018-03-12T20:11:56.041Z · score: 7 (2 votes) · LW · GW

So,

- Helen::make yourself small = Impro::act as low status;
- Helen::be low status = something like Impro::be seen by other people as low status (in this situation), or to deserve low Impro::status (in this situation)

**crabman**on Leaving beta: Voting on moving to LessWrong.com · 2018-03-12T19:53:58.169Z · score: 3 (2 votes) · LW · GW

If we migrate, will some lesswrong.com urls become broken?

**crabman**on Intellectual Progress Inside and Outside Academia · 2018-03-10T06:52:11.230Z · score: 1 (1 votes) · LW · GW

I would like to add to Vanessa Kowalski, that it would be useful not only to talk about academic disciplines separately, but also to look at academia of different countries separately. Are y'all talking about academia in the US or in the whole world? I suspect the former. Is it like that in Europe too? What about China? Australia? Japan? India? Russia?