# crabman's Shortform

post by crabman · 2019-09-14T12:30:37.482Z · score: 3 (1 votes) · LW · GW · 15 comments## 15 comments

Comments sorted by top scores.

A competition on solving math problems via AI is coming. https://imo-grand-challenge.github.io/

- The problems are from the International math olympiad (IMO)
- They want to formalize all the problems in Lean (theorem prover) language. They haven't figured out how to do that, e.g. how to formalize problems of form "determine the set of objects satisfying the given property", as can be seen in https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean
- A contestant must submit a computer program that will take a problem's description as input and output a solution and its proof in Lean.

I would guess that this is partly a way to promote Lean. I think it would be interesting to pose questions about this on metaculus.

Any guesses at the difficulty? My first impression was that this is not going to be solved any time soon. I just don't think current techniques are good enough to write flawless lean code given a difficult objective.

I think grand challenges in AI are are sometimes useful, but when they are at this level I am a bit pessimistic. I don't think this is necessarily AI-complete, but it's perhaps close.

Can you quantify soon :) ? For example, I'd be willing to bet at 1/3 odds that this will be solved in the next 10 years conditional on a certain amount of effort being put in and more like 1/1 odds for the next 20 years. It's hard to quantify the conditional piece but I'd cash it out as something like if researchers put in the same amount of effort into this that they put into NLP/image recognition benchmarks. I don't think that'll happen, so this is purely a counterfactual claim, but maybe it will help ground any subsequent discussion with some sort of concrete claim?

By soon I mean 5 years. Interestingly, I have a slightly higher probability that it will be solved within 20 years, which highlights the difficulty of saying ambiguous things like "soon."

That is interesting! I should be clear that my odds ratios are pretty tentative given the uncertainty around the challenge. For example, I literally woke up this morning and thought that my 1/3 odds might be too conservative given recent progress on 8th grade science tests and theorem proving.

I created three PredictionBook predictions to track this if anyone's interested (5 years, 10 years, 20 years).

In my MSc courses the lecturer gives proofs of important theorems, while unimportant problems are given as homework. This is bad for me, because it makes me focus on actually figuring out not too important stuff. I think it works like this because the course instructors want to • make the student do at least something and • check whether the student has learned the course material.

Ideally I would like to study using interactive textbooks where everything is a problem to solve on my own. Such a textbook wouldn't show an important theorem's proof right away. Instead it would show me the theorem's statement and ask me to prove it. There should be hints available and, obviously, I should be able to see the author's proof when I want to see it.

Also, for textbooks about Turing machines, recursive functions, and stuff like that: having an interpreter of Turing machines would be very nice. (googling Turing machine interpreter and using whatever you find is a bad idea, because they all have different flavors)

I found this to vary by field. When I studied topology and combinatorics we proved all the big important things as homework. When I studied automata theory and measure theory, we did what your teacher is doing.

In my understanding, here are the main features of deep convolutional neural networks (DCNN) that make them work really well. (Disclaimer: I am not a specialist in CNNs, I have done one masters level deep learning course, and I have worked on accelerating DCNNs for 3 months.) For each feature, I give my probability, that having this feature is an important component of DCNN success, compared to having this feature to the extent that an average non-DCNN machine learning model has it (e.g. DCNN has weight sharing, an average model doesn't have weight sharing).

- DCNNs heavily use transformations, which are the same for each window of the input - 95%
- For any set of pixels of the input, large distances between pixels in the set make the DCNN model interactions between these pixels less accurately - 90% (perhaps usage of dilution in some DCNNs is a counterargument to this)
- Large depth (together with the use of activation functions) lets us model complicated features, interactions, logic - 82%
- Having a lot of parameters lets us model complicated features, interactions, logic - 60%
- Given 3 and 4, SGD-like optimization works unexpectedly fast for some reason - 40%
- Given 3 and 4, SGD-like optimization with early stopping doesn't overfit too much for some reason - 87% (I am not sure if S in SGD is important, and how important is early stopping)
- Given 3 and 4, ReLu-like activation function works really well (compared to, for example, sigmoid).
- Modern deep neural network libraries are easy to use compared to the baseline of not having specific well-developed libraries - 60%
- Deep neural networks work really fast, when using modern deep neural network libraries and modern hardware - 33%
- DCNNs find such features in photos, which are invisible to the human eye and to most ML algorithms - 20%
- Dropout helps reducing overfitting a lot - 25%
- Batch normalization improve quality of the model a lot for some reason - 15%
- Batch normalization makes the optimization much faster - 32%
- Skip connections (or residual connections, I am not sure if there's a difference) help a lot - 20%

Let me make it more clear how I was assigning the probabilities and why I created this list. I am trying to come up with a tensor network based machine learning model, which would have the main advantages of DCNNs, but which would not, itself, be a deep relu neural network. So I decided to make this list to see which important components my model has.

How to download the documentation of a programming library for offline use.

- On the documentation website, look for "downloads" section. Preferrably choose HTML format, because then it will be nicely searchable - I can even create a krunner web shortcut for searching it. Example: Numpy - find "HTML+zip".
- If you need pytorch, torchvision, or sklearn - simply download https://github.com/unknownue/PyTorch.docs.
- If you need the documentation hosted on https://readthedocs.io: in the bottom left press "Read the docs" a download type from "Downloads". Search field won't work in the HTML version, so feel free to download whatever format you like. Example: Elpy. Warning: for some libraries (e.g. more-itertools) the downloaded version is basically broken, so you should check if what you've downloaded is complete.
- In some weird cases ReadTheDocs documentation for the latest version might of a library might be unlisted in the downloads secion of ReadTheDocs. For example, if you click the readthedocs icon in the bottom right of https://click.palletsprojects.com/en/master/, you won't find a download link for version 8.0. In this case copy the hyperlink https://media.readthedocs.org/pdf/pallets-click/latest/pallets-click.pdf or https://media.readthedocs.org/pdf/pallets-click/stable/pallets-click.pdf and replace
*pallets-click*with the name of the project you want. It doesn't work for all projects, but it works for some. - Use httrack to mirror the documentation website. In my experience it doesn't take long. Do it like
`$ httrack https://click.palletsprojects.com/en/7.x/`

. This will download everything hosted in https://click.palletsprojects.com/en/7.x/ and will not go outside of this server directory. In this case the search field won't work.

It turns out, Pytorch's pseudorandom number generator generates different numbers on different GPUs even if I set the same random seed. Consider the following file *do_different_gpus_randn_the_same.py*:

```
import torchseed = 0
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
foo = torch.randn(500, 500, device="cuda")
print(f"{foo.min():.30f}")
print(f"{foo.max():.30f}")
print(f"{foo.min() / foo.max()=:.30f}")
```

On my system, I get the following for two runs on two different GPUs:

```
$ CUDA_VISIBLE_DEVICES=0 python do_different_gpus_randn_the_same.py
-4.230118274688720703125000000000
4.457311630249023437500000000000
foo.min() / foo.max()=-0.949029088020324707031250000000
$ CUDA_VISIBLE_DEVICES=1 python do_different_gpus_randn_the_same.py
-4.230118751525878906250000000000
4.377007007598876953125000000000
foo.min() / foo.max()=-0.966440916061401367187500000000
```

Due to this, I am going to generate all pseudorandom numbers on my CPU and then transfer them to GPU for reproducibility's sake like `foo = torch.randn(500, 500, device="cpu").to("cuda")`

.

You're going to need to do more than that if you want full reproducibility, because GPUs aren't even deterministic in the first place, and that is big enough to affect DRL/DL results.

A new piece of math notation I've invented which I plan to use whenever I am writing proofs for myself (rather than for other people).

Sometimes when writing a proof, for some long property P(x) I want to write:

*It follows from foo, that there exists x such that P(x). Let x be such that P(x). Then ...*

I don't like that I need to write P(x) twice here. And the whole construction is too long for my liking, especially when the reason foo why such x exists is obvious. And if I omit the first sentence *It follows from foo, that there exists x such that P(x).* and just write

*Let x be such that P(x). Then ...*

then it's not clear what I mean. It could be that I want to show that such x exists and that from its existence some statement of interest follows. Or it could be that I want to prove some statement of form

*For each x such that P(x), it holds that Q(x)*.

Or it could even be that I want to show that something follows from existence of such x, but I am not asserting that such x exists.

The new notation I came up with is to write *L∃t* in cases when I want to assert that such x exists and to bound the variable x in the same place. An example (an excerpt, not a complete proof):

- Suppose is a countably infinite set, suppose is a set of subsets of , suppose .
- L∃t be a bijection from onto .
- L∃t such that , if is in , otherwise .
- By recursion theorem, L∃t be such that , .
- L∃t .
- ...

Many biohacking guides suggest using melatonin. Does liquid melatonin spoil under high temperature if put in tea (95 degree Celcius)?

More general question: how do I even find answers to questions like this one?

When I had a quick go-ogle search I started with:

"melatonin stability temperature"

then

"N-Acetyl-5-methoxytryptamine"

A quick flick through a few abstracts I can't see anything involving temperatures higher than 37 C i.e. body temperature.

Melatonin is a protein, many proteins denature at temperatures above 41 C.

My* (jumped to)* conclusion:

No specific data found.

Melatonin may not be stable at high temperatures, so avoid putting it in hot tea.