## Posts

What is complexity science? (Not computational complexity theory) How useful is it? What areas is it related to? 2020-09-26T09:15:50.446Z · score: 3 (3 votes)
Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment 2020-07-14T22:48:04.929Z · score: 36 (13 votes)
Why take notes: what I get from notetaking and my desiderata for notetaking systems 2020-05-29T21:46:10.221Z · score: 9 (4 votes)
Is there software for goal factoring? 2020-02-18T19:55:37.764Z · score: 11 (2 votes)
Hard Problems in Cryptocurrency: Five Years Later - Buterin 2019-11-24T09:38:20.045Z · score: 19 (6 votes)
crabman's Shortform 2019-09-14T12:30:37.482Z · score: 3 (1 votes)
Reneging prosocially by Duncan Sabien 2019-06-18T18:52:46.501Z · score: 59 (16 votes)
How to determine if my sympathetic or my parasympathetic nervous system is currently dominant? 2019-05-31T20:40:30.664Z · score: 20 (8 votes)
AI Safety Prerequisites Course: Revamp and New Lessons 2019-02-03T21:04:16.213Z · score: 27 (10 votes)
Fundamentals of Formalisation Level 7: Equivalence Relations and Orderings 2018-08-10T15:12:46.683Z · score: 9 (3 votes)
Fundamentals of Formalisation Level 6: Turing Machines and the Halting Problem 2018-07-23T09:46:42.076Z · score: 11 (4 votes)
Fundamentals of Formalisation Level 5: Formal Proof 2018-07-09T20:55:04.617Z · score: 15 (3 votes)
Fundamentals of Formalisation Level 4: Formal Semantics Basics 2018-06-16T19:09:16.042Z · score: 15 (3 votes)
Fundamentals of Formalisation Level 3: Set Theoretic Relations and Enumerability 2018-06-09T19:57:20.878Z · score: 20 (5 votes)
Idea: OpenAI Gym environments where the AI is a part of the environment 2018-04-12T22:28:20.758Z · score: 10 (3 votes)

Comment by crabman on Bet On Biden · 2020-10-24T15:09:43.976Z · score: 1 (1 votes) · LW · GW

Is there a simple guide on how to bet on Biden if I already have Ethereum and I don't live in the US? It seems I can do it on Augur and FTX, but both platforms seem very complicated.

Comment by crabman on verloren's Shortform · 2020-10-23T18:55:30.814Z · score: 2 (2 votes) · LW · GW

You might be interested in https://www.facebook.com/groups/1781724435404945/ - a facebook group where rich rationalists set up $10-$100 tasks for others to do. However, only about 25% of the tasks are doable if you don't live in the US.

Also, I'll pay you 15 if you fix this issue https://github.com/orgzly/orgzly-android/issues/287 in the Android app called Orgzly, which is an implementation of emacs org-mode for android, and make the owner accept it into the main branch or whatever it is they use that gets merged into the app on google play. Comment by crabman on Turns Out Interruptions Are Bad, Who Knew? · 2020-10-14T17:45:38.005Z · score: 1 (1 votes) · LW · GW Do you write in Roam using a phone? Do you read literature sources on it as well? Comment by crabman on Philosophy of Therapy · 2020-10-13T11:25:59.449Z · score: 5 (3 votes) · LW · GW When I saw the title "Philosophy of therapy", I hoped to find some answers to the following questions: 1. How to think about therapy given Dodo bird verdict? 2. How to think about therapy given that approximately 50% of all published studies in it fail to replicate? 3. Given 1 and 2 and the fact that therapy works for some reason and the fact that different types of therapeautic theories contradict each other, therapy must work not only because because it improves the patient's map of the territory, but also by another mechanism. So, what's going on here? Maybe it improves the patient's map of the territory despite the incorrect information in therapeautic theories? Comment by crabman on Puzzle Games · 2020-10-08T11:02:35.890Z · score: 1 (1 votes) · LW · GW Link, please. Is it https://teorth.github.io/QED/ or https://www.math.ucla.edu/~tao/QED/QED.html? Comment by crabman on Inaccessible finely tuned RNG in humans? · 2020-10-07T22:32:42.869Z · score: 1 (1 votes) · LW · GW My method is to come up with a phrase or find a phrase written somewhere nearby, count the syllables or letters, and take this value modulo the number of bins. For the topicstarter's poll, I found a sentence on a whiteboard near myself, counted its letters modulo 10, got 5, so I voted for 30%, because the bins were like 20% - 30% - 50%. Comment by crabman on What is complexity science? (Not computational complexity theory) How useful is it? What areas is it related to? · 2020-09-26T17:38:37.809Z · score: 3 (2 votes) · LW · GW How useful is their vocabulary and their set of ideas to understand the real world, not as a professional researcher, but just as a rationalist? Comment by crabman on What is complexity science? (Not computational complexity theory) How useful is it? What areas is it related to? · 2020-09-26T13:40:55.215Z · score: 1 (1 votes) · LW · GW Complexity theory seems to be a rarely used synonym for complexity science. Although, it's used in the title of one of the books. I've mistakenly used "complexity theory" too many times in my question. I've just fixed that. Regarding some courses/primers/introductions, I found them by following links and citations from other complexity science related things and by using connectedpapers.com to find similar books/articles, not just by googling complexity science. (Except for the classcentral courses, but those talk about dynamic systems, chaos, and fractals, so they are probably also on-topic) So they most probably support the idea of complexity science. You can also Ctrl+F "emerg" to find the use of the word emergence in them and see that they talk about complexity science. To be clear, I've checked Understanding complexity by Scott E. Page - the book contains lectures and is published by Princeton university press and Complexity: a guided tour - Mitchell 2011 published by Oxford university press and they definitely talk about emergence, self-organization and contain other vocab associated with complexity science. Comment by crabman on What is complexity science? (Not computational complexity theory) How useful is it? What areas is it related to? · 2020-09-26T13:26:49.567Z · score: 1 (1 votes) · LW · GW Suppose Xs are some small parts of a big thing and Y happens in the big thing due to how Xs work and how they interact together. I think people say "Y is an emergent outcome of Xs doing whatever it is that they do" means "Y is an outcome of Xs doing whatever it is that they do and for human it would be difficult to figure out that Y would happen if they just looked at Xs separately". Comment by crabman on Covid 9/17: It’s Worse · 2020-09-18T09:59:00.885Z · score: 23 (7 votes) · LW · GW I've asked Zvi what he thinks about long term consequences of being ill. Due to his answer, my current thinking, which I use to calculate the cost of COVID-19 to myself in dollars, is as follows. COVID-19 long term consequences for myself have 2 components: something that lasts about half a year, and something that's permanent. Or at least modelling it as if it has 2 components is not too bad. The 1st component contains strong fatigue, low grade fever, headaches, or loss of taste and smell and has probability 3% given covid. The 2nd component is permanent lung, heart, or brain damage and I guess has probability about 0.5% given covid. However, this probability estimate is very uncertain and can easily change when new data arrives. I've eyeballed DALY loss estimates for various diseases according to www.jefftk.com/gbdweights2010.pdf (which is a DALY estimate study cited by Doing Good Better) and thought. Due to this I've got estimates of how bad those two components if they happen are: If the 1st component happens, for its duration I will lose 20% of my well-being (as measured in DALY/QALY) and 30% of my productivity. If the 2nd component happens, then for the rest of my life I will lose 8% of my well-being and 10% of my productivity. If you want more details about how I got these percentages, then I can only say what rows in table 2 of that study I found relevant. They are • Illness - Coefficient (lower is better, no adverse effects is 0%, death is 100%) - My comment • Infectious disease: post-acute consequences (fatigue, emotional lability, insomnia) - 26% - The 1st component is basically this • COPD and other chronic respiratory diseases: mild - 1.5% - The 2nd component may realize as this • COPD and other chronic respiratory diseases: moderate - 19% - The 2nd component may realize as this • Heart failure: Mild - 4% - The 2nd component may realize as this Comment by crabman on Donald Hobson's Shortform · 2020-09-10T06:20:41.912Z · score: 3 (2 votes) · LW · GW In the sense that every nonstandard natural number is greater than every standard natural number. Comment by crabman on Donald Hobson's Shortform · 2020-09-10T03:51:46.246Z · score: 1 (1 votes) · LW · GW Since non-standard natural numbers come after standard natural numbers, I will also have noticed that I've already lived for an infinite amount of time, so I'll know something fishy is going on. Comment by crabman on Donald Hobson's Shortform · 2020-09-09T11:15:47.591Z · score: 1 (1 votes) · LW · GW Or X has a high Komelgorov complexity, but the universe runs in a nonstandard model where T halts. Disclaimer: I barely know anything about nonstandard models, so I might be wrong. I think this means that T halts after the amount of steps equal to a nonstandard natural number, which comes after all standard natural numbers. So, how would you see that it "eventually" outputs X? Even trying to imagine this is too bizarre. Comment by crabman on Li and Vitanyi's bad scholarship · 2020-09-07T11:18:12.348Z · score: 1 (1 votes) · LW · GW Doesn't Solomonoff induction at least make a step towards resolving epistemic circularity, since Solomonoff prior dominates (I don't remember in what way exactly) every probability distribution with the same or smaller support? Comment by crabman on Vanessa Kosoy's Shortform · 2020-08-19T05:49:31.073Z · score: 3 (2 votes) · LW · GW An alignment-unrelated question: Can we, humans, increase the probability that something weird happens in our spacetime region (e.g., the usual laws of physics stop working) by making it possible to compress our spacetime location? E.g., by building a structure that is very regular (meaning that its description can be very short) and has never been built before in our space region, something like make a huge perfectly aligned rectangular grid of hydrogen atoms, or something like that. It's like a magical ritual for changing the laws of physics. This gives a new meaning to summoning circles, pentagrams, etc. Comment by crabman on Free Educational and Research Resources · 2020-07-31T11:51:11.733Z · score: 5 (3 votes) · LW · GW I would appreciate easy-to-see tags on entries useful only for people living in the US. This definitely includes community college enrollment and maybe includes libary card, Kanopy, Libby. I've tried to use my Russian library card on Kanopy, and it wasn't recognized. Comment by crabman on avturchin's Shortform · 2020-07-27T09:42:01.748Z · score: 1 (1 votes) · LW · GW You started self quarantining, and by that I mean sitting at home alone and barely going outside, since december or january. I wonder, how's it going for you? How do you deal with loneliness? Comment by crabman on Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment · 2020-07-19T16:57:38.185Z · score: 3 (2 votes) · LW · GW No, I was talking about an almost omnipotent AI, not necessarily aligned. I've now fixed what words I use there. Comment by crabman on Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment · 2020-07-15T23:53:23.579Z · score: 5 (3 votes) · LW · GW I see MIRI's research on agent foundations (including embedded agency) as something like "We want to understand{an aspect of how agents should work}, so let's take the simplest case first and see if we understand everything about it. The simplest case is the case when the agent is nearly omniscient and knows all logical consequences. Hmm, we can't figure out this simplest case yet - it breaks down if the conditions are sufficiently weird". Since it turns out that it's difficult to understand embedded agency even for such simple cases, it seems plausible that an AI trained to understand embedded agency by a naive learning procedure (similar to the evolution) will break down under sufficiently weird conditions.

Why don't these arguments apply to humans? Evolution didn't understand embedded agency, but managed to create humans who seem to do okay at being embedded agents.

(I buy this as an argument that an AI system needs to not ignore the fact that it is embedded, but I don't buy it as an argument that we need to be deconfused about embedded agency.)

Hmm, very good argument. Since I think humans have imperfect understanding of embedded agency, thanks to you I now no longer think that "If we build an AI without understanding embedded agency, and that AI builds a new AI, that new AI also won't understand embedded agency" since that would imply we can't get the "lived happily ever after" at all. We can ignore the case where we can't get the "lived happily ever after" at all, because in that case nothing matters anyway.

I suppose, we could run evolutionary search or something, selecting for AIs which can understand the typical cases of being modified by itself or by the environment, which we include in the training dataset. I wonder how we can make it understand very atypical cases of modification. A near-omnipotent AI will be a very atypical case.

Can we come up with a learning procedure to have the AI learn embedded agency on its own? It seems plausible to me that we will need to understand embedded agency better to do this, but I don't really know.

Btw, in another comment, you say

But usually when LessWrongers argue against "good enough" alignment, they're arguing against alignment methods, saying that "nothing except proofs" will work, because only proofs give near-100% confidence.But usually when LessWrongers argue against "good enough" alignment, they're arguing against alignment methods, saying that "nothing except proofs" will work, because only proofs give near-100% confidence.

I basically subscribe to the argument that nothing except proofs will work in the case of superintelligent agentic AI.

Comment by crabman on Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment · 2020-07-15T19:26:57.277Z · score: 5 (3 votes) · LW · GW

Here are my responses to your comments, sorted by how interesting they are to me, descending. Also, thanks for your input!

## Non-omnipotent AI aligning omnipotent AI

The AI will be making important decisions long before it becomes near-omnipotent, as you put it. In particular, it should be doing all the work of aligning future AI systems well before it is near-omnipotent.

Please elaborate. I can imagine multiple versions of what you're imagining. Is one of the following scenarios close to what you mean?

1. Scientists use AI-based theorem provers to prove theorems about AI alignment.
2. There's an AI, with which you can have conversations. It tries to come up with new mathematical definitions and theorems related to what you're discussing.
3. The AI (or multiple AIs) is not near-omnipotent yet, but it already controls most of the humanity's resources and makes most of the decisions, so it does research into AI instead of humans.

I think, the requirements for how well the non-omnipotent AI in the 3rd scenario should be aligned are basically the same as for a near-omnipotent AI. If the non-omnipotent AI in the 3rd scenario is very misaligned, but it's not catastrophic because the AI is not smart enough, the near-omnipotent AI it'll create will also be misaligned, and that'll be catastrophic.

## Embedded agency

Note though it's quite possible that some things we're confused about are also simply irrelevant to the thing we care about. (I would claim this of embedded agency with not much confidence.)

So, you think embedded agency research is unimportant for AI alignment. On the contrast, I think it's very important. I worry about it mainly for 3 reasons. Suppose we don't figure out embedded agency. Then

• An AI won't be able to safely self-modify
• An AI won't be able to comprehend that it can be killed or damaged or modified by others
• I am not sure about this one. I am very interested to know if this is not the case. I think, if we build an AI without understanding embedded agency, and that AI builds a new AI, that new AI also won't understand embedded agency. In other words, the set of AIs built without taking embedded agency into account is closed under the operation of an AI building a new AI. [Upd: comments under this comment mostly refute this]
• I am even less sure about this item, but maybe such an AI will be too dogmatic (as in dogmatic prior) about how the world might work, because it is sure that it can't be killed or damaged or modified. Due to this, if the physics laws turn out to be weird (e.g. we live in a multiverse, or we're in a simulation), the AI might fail to understand that and thus fail to turn the whole world into hedonium (or whatever it is that we would want it to do with the world).
• If an AI built without taking embedded agency into account meets very smart aliens someday, it might fuck up due to its inability to imagine that someone can predict its actions.

## Usefulness of type-2 research for aligning superintelligent AI

Unless your argument is that type 2 research will be of literally zero use for aligning superintelligent AI.

I think that if one man-year of type-1 research produces 1 unit of superintelligent AI alignment, one man-year of type-2 research produces about 0.15 units of superintelligent AI alignment.

As I see it, the mechanisms by which type-2 research helps align superintelligent AI are:

• It may produce useful empirical data which'll help us make type-1 theoretical insights.
• Thinking about type-2 research contains a small portion of type-1 thinking.

For example, if someone works on making contemporary neural networks robust to out-of-distribution examples, and they do that mainly by experimenting, their experimental data might provide insights about the nature of robustness in abstract, and also, surely some portion of their thinking will be dedicated to theory of robustness.

## My views on tractability and neglectedness

Tractability and neglectedness matter too.

Alright, I agree with you about tractability.

About neglectedness, I think type-2 research is less neglected than type-1 and type-3 and will be less neglected in the next 10 years or so, because

• It's practical, you can sell it to companies which want to make robots or unbreakable face detection or whatever.
• Humans have bias towards near-term thinking.
• Neural networks are a hot topic.
Comment by crabman on Spoiler-Free Review: Witcher 3: Wild Hunt (plus a Spoilerific section) · 2020-07-05T10:42:33.055Z · score: 1 (1 votes) · LW · GW

I've finished The Witcher 3 two days ago, and here you're posting your review just in time. Nice!

Another thing I would add to the list of the best things about The Witcher 3, although they might be irrelevant for native English speakers:

The game feels slavic rather than western. I love it! In the Russian version of the game, the peasants talk in a funny way, which shows that they really are uneducated and mostly stupid. They use funny figures of speech, modified words, etc. Everyone swears a lot using words which are on the more insulting end. I love it! I can believe a few hundred years ago, peasants in slavic countries actually talked like this. I would guess they also talk like this in the Polish version, but I don't know about the English version.

Movement and combat feel extremely clunky. Some people recommend setting movement response time to alternative in the options. I did it, but it still feels clunky. It's like instead of using a proper videogame engine with good physics, they took a very old engine, in which jumping and climbing are not first-class citizens, the locations are supposed to be mostly flat, and you are expected to move around very slowly (like in Neverwinter Nights or Baldur's Gate). I am probably biased here, because right before The Witcher 3, I've played The Legend of Zelda: Breath of the Wild, which has the BEST feeling of moving around EVER - you can climb anything (and not jumping between specially designated protrusions like in old Assassin's Creed games), jump on anything, fly around, aim your bow while jumping, and the controls during the combat and out of the combat are basically the same. And because of this, consider playing BotW as your next game.

Btw, have you read The Witcher or have you watched The Witcher? I've read all the books twice, and game Geralt is very similar to book Geralt, so that's another thing I liked.

Comment by crabman on What's the most easy, fast, efficient way to create and maintain a personal Blog? · 2020-07-02T14:43:04.264Z · score: 2 (2 votes) · LW · GW

A github repo with posts as markdown or org-mode files. https://github.com/ChALkeR/notes as an example. Post links to your posts to lesswrong/reddit/wherever if you want people to discuss them.

Comment by crabman on crabman's Shortform · 2020-06-28T21:28:52.084Z · score: 1 (1 votes) · LW · GW

Comment by crabman on crabman's Shortform · 2020-06-26T20:07:38.297Z · score: 5 (3 votes) · LW · GW

Often in psychology articles I see phrases like "X is associated with Y". These articles' sections often read like the author thinks that X causes Y. But if they had evidence that X causes Y, surely they would've written exactly that. And in such cases I feel that I want to punish them, so in my mind I instead read it as "Y causes X", just for contrarianism's sake. Or, sometimes, I imagine what variable Z can exist which causes both X and Y. I think the latter is a useful exercise.

Examples:

It appears that some types of humor are more effective than others in reducing stress. Chen and Martin (2007) found that humor that is affiliative (used to engage or amuse others) or self-enhancing (maintaining a humorous perspective in the face of adversity) is related to better mental health. In contrast, coping through humor that is self-defeating (used at one’s own expense) or aggressive (criticizing or ridiculing others) is related to poorer mental health.

The author says that non-self-defeating non-agressive humor helps reduce stress. But notice the words "related". For the first "related", it seems plausible that not having a good mental health causes you to lose humor. For the second "related", I think it's very probable that poor mental health, such as depression and low self esteem, causes self-defeating humor.

How does humor help reduce the effects of stress and promote wellness? Several explanations have been proposed (see Figure 4.7). One possibility is that humor affects appraisals of stressful events. Jokes can help people put a less threatening spin on their trials and tribulations. Kuiper, Martin, and Olinger (1993) demonstrated that students who used coping humor were able to appraise a stressful exam as a positive challenge, which in turn lowered their perceived stress levels.

Or it could be that students, who are well prepared for the exams or simply tend to not be afraid of them, will obviously have lower perceived stress levels, and maybe will be able to think about the exams as a positive challenge, hence they'' able to joke about them in this way.

It's possible in this example, that the original paper Kuiper, Martin, and Olinger (1993) actually did an intervention making students use humor, in which case the causality must go from humor to stress reduction. But I don't want to look at every source, so screw you author of Psychology Applied to Modern Life (both quotes are from it) for not making it clear whether that study found causation or only correlation.

Comment by crabman on FactorialCode's Shortform · 2020-06-23T20:39:58.193Z · score: 2 (2 votes) · LW · GW

What do you mean "approve a new user"? AFAIK, registration is totally free.

Comment by crabman on Iterated Distillation and Amplification · 2020-06-21T19:00:04.705Z · score: 1 (1 votes) · LW · GW

I think there are 2 mistakes in the pseudocode.

# Second mistake

In the personal assistant example you say

In the next iteration of training, the Amplify(H, A[0]) system takes over the role of H as the overseer.

which implies that we do

H <- Amplify(H, A)


But in the pseudocode the original human overseer acts as the overseer all the time.

# Suggested change of the pseudocode, which fixes both mistakes

def IDA(H):
repeat:
A ← Distill(H)
H ← Amplify(H, A)

Comment by crabman on Where to Start Research? · 2020-06-16T20:28:11.642Z · score: -1 (6 votes) · LW · GW

I think epistemic spot checks prevent building gears-level models. And so does reading only small parts of books. The reasons why I think so are obvious. What's your take on this problem?

Comment by crabman on Does taking extreme measures to avoid the coronavirus make sense when you factor in the possibility of a really long life? · 2020-06-05T11:10:08.037Z · score: 0 (2 votes) · LW · GW

I think your value of your life is too high, since you almost certainly can't earn nearly that much during your life. Let's say you'll be getting 150k per year for 40 years. Then in total you'll earn only 6kk .

Comment by crabman on cousin_it's Shortform · 2020-06-03T23:33:24.218Z · score: 1 (1 votes) · LW · GW

Do you by any chance have a typo here? Sorry if I am wrong, since I don't actually know quantum information theory.

A pure state, like ( |00> + |11> ) / √2, is a vector in that space.

I think this state is mixed, since it's a sum of two vectors, which can't be represented as just one kronecker product.

Comment by crabman on cousin_it's Shortform · 2020-06-03T23:26:06.327Z · score: 1 (1 votes) · LW · GW

Hey, I've got a sudden question for you. Probability distribution on a set of binary variables is to a quantum state as ??? is to a unitary linear operator.

What should ??? be replaced with?

Here's why I have this question. Somehow I was thinking about Normalizing flows, which are invertible functions which, when applied to a sample from an N-dimensional probability distribution, transform it into a sample from another N-dimensional probability distribution. And then I thought: isn't this similar to how quantum operator is always unitary? Maybe then I can combine encoding an image as a pure state (like in Stoudenmire 2016 - Supervised learning with quantum-inspired tensor networks with representing quantum operators as tensor networks to get a quantum-inspired generative model similar to normalizing flows.

Comment by crabman on The Zettelkasten Method · 2020-05-21T00:10:35.491Z · score: 3 (2 votes) · LW · GW

1. At first, you create a card and put it in the unsorted pile of cards, and you don't give it an index. Is this correct? Or do you give the card an index, add some links, and then put it back into the unsorted pile of cards?
2. At some point (which per your suggestion should not be too soon) you give it an index and put it in the sorted part. Do you only think of links at this point?
Comment by crabman on Mark Xu's Shortform · 2020-05-20T08:37:18.934Z · score: 5 (4 votes) · LW · GW

There are a bunch of explanations of logarithm as length on Arbital.

Comment by crabman on [deleted post] 2020-05-14T08:49:24.793Z

“big data” refers to situations with so much training data you can get away with weak priors The most powerful recent advances in machine learning, such as neural networks, all use big data.

This is only partially true. Consider some image classification dataset, say MNIST or CIFAR10 or ImageNet. Consider some convolutional relu network architecture, say, conv2d -> relu -> conv2d -> relu -> conv2d -> relu -> conv2d -> relu -> fullyconnected with some chosen kernel sizes and numbers of channels. Consider some configuration of its weights . Now consider the multilayer perceptron architecture fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected -> relu -> fullyconnected. Clearly, there exist hyperparameters of the multilayer perceptron (numbers of neurons in hidden layers) such that there exists a configuration of weights of the multilayer perceptron, such that the function implemented by the multilayer perceptron with is the same function as the function implemented by the convolutional architecture with . Therefore, the space of functions which can be implemented by the convolutional neural network (with fixed kernel sizes and channel counts) is a subset of the space of functions which can be implemented by the multilayer perceptron (with correctly chosen numbers of neurons). Therefore, training the convolutional relu network is updating on evidence and having a relatively strong prior, while training the multilayer perceptron is updating on evidence and having a relatively weak prior.

Experimentally, if you train the networks described above, the convolutional relu network will learn to classify images well or at least okay-ish. The multilayer perceptron will not learn to classify images well, its accuracy will be much worse. Therefore, the data is not enough to wash away the multilayer perceptron's prior, hence by your definition it can't be called big data. Here I must note that ImageNet is the biggest publically available data for training image classification, so if anything is big data, it should be.

--

Big data uses weak priors. Correcting for bias is a prior. Big data approaches to machine learning therefore have no built-in method of correcting for bias.

This looks like a formal argument, a demonstration or dialectics as Bacon would call it, which uses shabby definitions. I disagree with the conclusion, i.e. with the statement "modern machine learning approaches have no built-in method of correcting for bias". I think in modern machine learning people are experimenting with various inductive biases and various ad-hoc fixes or techniques with help correcting for all kinds of biases.

--

In your example with a non-converging sequence, I think you have a typo - there should be rather than .

Comment by crabman on Legends of Runeterra: Early Review · 2020-05-13T17:38:12.305Z · score: 1 (1 votes) · LW · GW

Nice review. I like CCGs in general, but I haven't heard about Legends of Runeterra and thanks to your review I decided not to play it.

Regarding Emergents, what platforms will it be on and can I be an alpha/beta tester?

Comment by crabman on crabman's Shortform · 2020-05-10T06:29:45.574Z · score: 3 (2 votes) · LW · GW

How to download the documentation of a programming library for offline use.

1. On the documentation website, look for "downloads" section. Preferrably choose HTML format, because then it will be nicely searchable - I can even create a krunner web shortcut for searching it. Example: Numpy - find "HTML+zip".
2. If you need pytorch, torchvision, or sklearn - simply download https://github.com/unknownue/PyTorch.docs.
5. Use httrack to mirror the documentation website. In my experience it doesn't take long. Do it like $httrack https://click.palletsprojects.com/en/7.x/. This will download everything hosted in https://click.palletsprojects.com/en/7.x/ and will not go outside of this server directory. In this case the search field won't work. Comment by crabman on Michaël Trazzi's Shortform · 2020-05-10T04:28:16.864Z · score: 2 (2 votes) · LW · GW Do you have any tips on how to make the downloaded documentation of programming languages and libraries searchable? Btw here's my shortform on how to download documentations of various libraries: https://www.lesswrong.com/posts/qCrTYSWE2TgfNdLhD/crabman-s-shortform?commentId=Xt9JDKPpRtzQk6WGG Comment by crabman on The Zettelkasten Method · 2020-05-10T04:16:32.126Z · score: 5 (3 votes) · LW · GW It turns out Staples index-cards-on-a-ring are not a thing in Russia. It might be the case in other countries as well, so here I am posting my solution which goes in the spirit of Abram's suggestions. A small A6 binder and pages for it on Aliexpress (archived version). In my opinion it looks nice and feels nice, although now I think A6 is too small and I would prefer A5. Comment by crabman on Named Distributions as Artifacts · 2020-05-04T21:43:28.959Z · score: 1 (1 votes) · LW · GW Let’s start with the application of the central limit theorem to champagne drinkers. First, there’s the distinction between “liver weights are normally distributed” and “mean of a sample of liver weights is normally distributed”. The latter is much better-justified, since we compute the mean by adding a bunch of (presumably independent) random variables together. And the latter is usually what we actually use in basic analysis of experimental data—e.g. to decide whether there’s a significant different between the champagne-drinking group and the non-champagne-drinking group. That does not require that liver weights themselves be normally distributed. I think your statement in bold font is wrong. I think in cases such as champagne drinkers vs non-champagne-drinkers people are likely to use Student's two-sample t-test or Welch's two-sample unequal variances t-test. It assumes that in both groups, each sample is distributed normally, not that the means are distributed normally. Comment by crabman on crabman's Shortform · 2020-04-29T22:19:12.862Z · score: 1 (1 votes) · LW · GW Tbh what I want right now is a very weak form of reproducibility. I want the experiments I am doing nowadays to work the same way on my own computer every time. That works for me so far. Comment by crabman on crabman's Shortform · 2020-04-29T20:54:16.329Z · score: 3 (2 votes) · LW · GW It turns out, Pytorch's pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file do_different_gpus_randn_the_same.py: import torchseed = 0 torch.manual_seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False foo = torch.randn(500, 500, device="cuda") print(f"{foo.min():.30f}") print(f"{foo.max():.30f}") print(f"{foo.min() / foo.max()=:.30f}")  On my system, I get the following for two runs on two different GPUs: $ CUDA_VISIBLE_DEVICES=0 python do_different_gpus_randn_the_same.py
-4.230118274688720703125000000000
4.457311630249023437500000000000
foo.min() / foo.max()=-0.949029088020324707031250000000
\$ CUDA_VISIBLE_DEVICES=1 python do_different_gpus_randn_the_same.py
-4.230118751525878906250000000000
4.377007007598876953125000000000
foo.min() / foo.max()=-0.966440916061401367187500000000


Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility's sake like foo = torch.randn(500, 500, device="cpu").to("cuda").

Comment by crabman on 3 Interview-like Algorithm Questions for Programmers · 2020-04-25T12:48:04.236Z · score: 1 (1 votes) · LW · GW

I want to know why the answer to the first question is like that.

Comment by crabman on Mozilla Hubs Virtual Meetup 10:30AM PDT, April 19th · 2020-04-10T18:00:17.300Z · score: 1 (1 votes) · LW · GW

It's 10:30 AM, right?

Comment by crabman on Choosing the Zero Point · 2020-04-08T14:22:58.806Z · score: 2 (2 votes) · LW · GW

I suggest not only shifting the zero point, but also scaling utilities when you update on information about what's achievable and what's not. For example, suppose you thought that saving 1-10 people in poor countries was the best you could do with your life, and you felt like every life saved was +1 utility. But then you learned about longtermism and figured out that if you try, then in expectation you can save 1kk lives in the far future. In such situation it doesn't make sense to continue caring about saving an individual life as much as you cared before this insight - your system 1 feeling for how good thing can be won't be able to do its epistemological job then. It's better to scale utility of saving lives down, so that +1kk lives is +10 utility, and +1 life is +1/100000 utility. This is related to Caring less.

However, this advice has a very serious downside - it will make it very difficult to communicate with "normies". If a person thinks saving a life is +1 utility and tells you that there's this opportunity to go and do it, and if you're like "meh, +1/100000 utility", they will see your reaction and think you're weird or heartless or something.

Comment by crabman on Option Value in Effective Altruism: Worldwide Online Meetup · 2020-04-04T11:35:13.992Z · score: 2 (2 votes) · LW · GW

Will it be in English?

Comment by crabman on Is there software for goal factoring? · 2020-03-31T21:57:10.444Z · score: 3 (2 votes) · LW · GW

Thanks to your advice, I've tried it for goal factoring and for drawing various diagrams. It's great! (by which I mean it's less awful than other software)

Comment by crabman on The Zettelkasten Method · 2020-03-18T00:04:19.406Z · score: 11 (8 votes) · LW · GW

Failure mode: perfectionism

After creating a couple Zettelkasten pages on Roam and rereading this post, I decided to try it on paper. That was a week ago. I still haven't created a single page. Aaaaah. You can't change things on paper, so it must be PERFECT. And if it's not perfect, then it's a working memory dump which shouldn't be in Zettelkasten in the first place. During this week I filled perhaps 15 A4 pages of my working notebook, but all of it wasn't good enough for Zettelkasten. And then when some of it was good enough, I used it to write a long answer on stackoverflow. And after having done that, why would I also write it on paper? Yeah, perhaps paper zettelkasten is not for me.

Comment by crabman on Why don't singularitarians bet on the creation of AGI by buying stocks? · 2020-03-13T09:22:43.452Z · score: 2 (2 votes) · LW · GW

Do you mind sharing your list of stocks which belong to companies with a nontrivial probability of creating AGI? Also, why Uber?

Comment by crabman on Why don't singularitarians bet on the creation of AGI by buying stocks? · 2020-03-12T23:04:04.371Z · score: 8 (5 votes) · LW · GW

This post's arguments seemed correct to me, so I am gonna sell some S&P500 stocks and buy some google, facebook, tencent, etc. stocks instead. Thank you for writing this post.

Comment by crabman on The Zettelkasten Method · 2020-03-12T22:56:29.342Z · score: 3 (2 votes) · LW · GW

DIsclaimer: I have tried Zettelkasten in Roam very recently, it hasn't impressed me, but I want to try it on paper.

Here's something I don't understand about Zettelkasten. Do you people actually open your index note and then go through all your notes related to your project from time to time? If yes, why? When I am working on a project (say, figuring out how to train a novel machine learning model I came up with), I usually remember most of the relevant information. Usually I write things on paper as an extension of my working memory, but right after having finished the thought, I can throw it away.

I do keep notes in emacs org-mode, but I almost never go and read them sequentially. I think it would be boring - I'd rather go read stuff on the internet. Actually I rarely read my notes at all. Usually I only do it when I want to remind myself something specific and I remember that I have something written about it.

Comment by crabman on Nate Soares' Replacing Guilt Series compiled in epub Format · 2020-03-04T20:08:49.442Z · score: 2 (2 votes) · LW · GW

The posts listed under "Related" on http://mindingourway.com/guilt/, including "Conclusion of the Replacing Guilt series", are missing.