# The Extraordinary Link Between Deep Neural Networks and the Nature of the Universe

post by morganism · 2016-09-10T19:13:14.182Z · LW · GW · Legacy · 23 comments

## Contents

  Why does deep and cheap learning work so well?
None


"The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties."

“For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order.” These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones."

Interesting article, and just diving into the paper now, but it looks like this is a big boost to the simulation argument. If the universe is built like a game engine, with stacked sets like Mandelbrots, then the simplicity itself becomes a driver in a fabricated reality.

# Why does deep and cheap learning work so well?

http://arxiv.org/abs/1608.08225

comment by Manfred · 2016-09-10T23:48:43.651Z · LW(p) · GW(p)

I'd blame the MIT press release organ for being clickbait, but the paper isn't much better. It's almost entirely flash with very little substance. This is not to say there's no math - the math just doesn't much apply to the real world. For example, the idea that deep neural networks work well because they recreate the hierarchical generative process for the data is a common misconception.

And then from this starting point you want to start speculating?

comment by Raiden · 2016-09-14T15:48:04.153Z · LW(p) · GW(p)

Can you explain why that's a misconception? Or at least point me to a source that explains it?

I've started working with neural networks lately and I don't know too much yet, but the idea that they recreate the generative process behind a system, at least implicitly, seems almost obvious. If I train a neural network on a simple linear function, the weights on the network will probably change to reflect the coefficients of that function. Does this not generalize?

comment by Manfred · 2016-09-14T18:15:19.251Z · LW(p) · GW(p)

Well, consider a neural net for distinguishing dogs from cats. This neural network might develop features that look like "dog-like eyes" and "cat-like eyes," which are pattern-matched across the image. Images with more activation on the first feature are claimed to be dogs and images with more activation on the second feature are claimed to be cats, along with input from many other features. This is fairly typical-sounding.

Now imagine how bonkers a neural net would have to be in order to reproduce the generative process behind the images! Leaving aside simulations of the early universe, our neural network should still have a solid understanding of the biology of dogs and cats, the different grooming and adornment practices, macroscopic physics and physiology that leads to poses, and the preferences of people taking and storing photographs.

comment by Tyrin · 2016-09-25T23:50:16.487Z · LW(p) · GW(p)

Isn't the idea more that the neural network just learns rough subgraphs of the underlying DAG that captures the causal structure up to quantum detail? Whole-part relationships are such subgraphs: a person being present causes a face to be present, which causes eyes to be present etc.

comment by skeptical_lurker · 2016-09-11T13:42:04.572Z · LW(p) · GW(p)

This might make some sense if DNNs were being used to further our understanding of theoretical physics, but afaik they're not. They're being used to classify cat pics. SInce when do you use polynomial Hamiltonians to recognise cats?

These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones

No finite DNN can approximate sin(x) over the entire real numbers, unless you cheat by having a sin(x) activation function.

comment by Houshalter · 2016-09-11T13:41:51.020Z · LW(p) · GW(p)

I have another theory on how Deep Learning works: http://lesswrong.com/lw/m9p/approximating_solomonoff_induction/

The idea is that neural networks are a (somewhat crude) approximation of solomonoff induction.

comment by The_Jaded_One · 2016-09-12T09:40:05.022Z · LW(p) · GW(p)

Basically every learning algorithm can be seen as a crude approximation of Solomonoff induction. What makes one approximation better than the others?

comment by Houshalter · 2016-09-12T11:41:32.004Z · LW(p) · GW(p)

Well I try to demonstrate you can derive neural networks from first principles, starting with SI. I don't think you can derive decision trees or other ML algorithms in a similar way.

Further, NNs are completely general. In theory recurrent neural nets can learn to simulate any computer program, or at least logical circuits. With certain modifications they can even be given a memory "tape" like a turing machine and become turing complete. Most machine learning methods do not have this property or anything like it. They can only learn "shallow" functions and can't handle recurrency.

comment by Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2020-09-04T09:11:08.491Z · LW(p) · GW(p)

Aren't you just talking about implied priors? AFAIK no one has calculated the implied prior of a neural network.

comment by Daniel_Burfoot · 2016-09-11T15:54:56.640Z · LW(p) · GW(p)

How can neural networks approximate functions well in practice, when the set of possible functions is exponentially larger than the set of practically possible networks?

This question answers itself. If neural networks could really approximate every possible function, they could never generalize. That is the whole point of statistical learning theory: you get a Probably Approximately Correct (PAC) generalization bound when 1) your learning machine gets good empirical accuracy and 2) the number of possible functions expressible by the machine is small in some sense compared to the volume of training data.

comment by morganism · 2016-09-10T20:08:35.990Z · LW(p) · GW(p)

This also reminds me of the scale of the universe claims that everything from subatomics to galaxy clusters scale at 10x9.

animations: http://scaleofuniverse.com/

http://apod.nasa.gov/apod/ap120312.html

comment by James_Miller · 2016-09-10T19:20:21.551Z · LW(p) · GW(p)

it looks like this is a big boost to the simulation argument.

It could be that you only get civilizations "in universes is governed by a tiny subset of all possible functions" because else wise either evolution can't "discover" how to create intelligent life, or evolved intelligent life can't figure out science.

comment by Luke_A_Somers · 2016-09-11T19:49:39.871Z · LW(p) · GW(p)

That reminds me of a fantasy novel I began and abandoned - in it, there's a civilization that can do astonishing things and even though they have math beyond ours, they still have no idea how just about any of it works, because the rules are so much more complicated that they have a hard time pulling off balls rolling down ramps kinds of experiments (the ramp would remember balls rolling and, depending on the details of the ramp, make it happen slower or faster; and if you made a new ramp each time the pattern of your interaction with ramps would develop the same sort of reaction). One of them was kicked out to a place where magic was weaker, allowing her to figure it all out; she ended up stronger than any of them.

comment by morganism · 2016-09-10T22:21:11.341Z · LW(p) · GW(p)

Or it could also be that all the matter in the universe has already been converted to "smart matter" and is running basic algorithms and rulesets.....

comment by Good_Burning_Plastic · 2016-09-12T10:24:38.082Z · LW(p) · GW(p)

That's basically the Unsong universe

comment by morganism · 2016-11-16T09:49:43.345Z · LW(p) · GW(p)

More patterns, and set limiters found.

https://www.quantamagazine.org/20161115-strange-numbers-found-in-particle-collisions/

“It seems so that the periods which nature wants are a smaller set than the periods mathematics can define, but we cannot define very cleanly what this subset really is.”

Brown is looking to prove that there’s a kind of mathematical group — a Galois group — acting on the set of periods that come from Feynman diagrams. “The answer seems to be yes in every single case that’s ever been computed,” he said, but proof that the relationship holds categorically is still in the distance. “If it were true that there were a group acting on the numbers coming from physics, that means you’re finding a huge class of symmetries,” Brown said. “If that’s true, then the next step is to ask why there’s this big symmetry group and what possible physics meaning could it have.”

comment by The_Jaded_One · 2016-09-12T09:50:04.859Z · LW(p) · GW(p)

I am interested in this line of research, I feel it needs a lot more work than one paper, though.

A key question is whether we can dig down into the relationship between environments and learning agents. Are there low complexity environments that neural networks do badly in?

What is really essential about our laws of physics to create a world that neural networks do relatively well in?

comment by morganism · 2016-09-13T22:48:49.327Z · LW(p) · GW(p)

and since you can't "look inside a NN, you cant even see problems developing

"If there hadn’t been an interpretable model, Malioutov cautions, “you could accidentally kill people.”

This is why so many are reluctant to gamble on the mysteries of neural networks."

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable

comment by Houshalter · 2016-09-16T12:07:49.826Z · LW(p) · GW(p)

You triple posted.

comment by morganism · 2016-09-17T21:16:31.915Z · LW(p) · GW(p)

Thought it was another view towards having an "explainer" module in your AI.

Sorry if multiposting, i typically have the "loading" bars rolling for 3-4 min before it posts, and lots of time i actually have to hit cancel after 5 min or so.

I don't see much in edit mode on a previous post, and NoScript doesn't like all the page re-directs here at all. vigilink and websiteoptimizer never work, and have to be re-authorized with each page.

comment by Houshalter · 2016-09-19T09:34:06.409Z · LW(p) · GW(p)

Adblock plus removes the stupid vigilinks for me (or just block vigilink.com or whatever the source site is.) Though noscript should probably do that to begin with.

comment by morganism · 2016-09-13T23:10:32.729Z · LW(p) · GW(p)

and since you can't "look inside a NN, you cant even see problems developing

"If there hadn’t been an interpretable model, Malioutov cautions, “you could accidentally kill people.”

This is why so many are reluctant to gamble on the mysteries of neural networks."

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable

comment by morganism · 2016-09-13T22:49:47.775Z · LW(p) · GW(p)

and since you can't "look inside a NN, you cant even see problems developing

"If there hadn’t been an interpretable model, Malioutov cautions, “you could accidentally kill people.”

This is why so many are reluctant to gamble on the mysteries of neural networks."

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable