Science in a High-Dimensional World

post by johnswentworth · 2021-01-08T17:52:02.261Z · LW · GW · 33 comments

Contents

  The Dimensionality Problem
  Determinism
  Replication
  The Scientific Method In A High-Dimensional World
  Everything Is Connected To Everything Else (But Not Directly)
  Summary
None
33 comments

Claim: the usual explanation of the Scientific Method is missing some key pieces about how to make science work well in a high-dimensional world (e.g. our world). Updating our picture of science to account for the challenges of dimensionality gives a different model for how to do science and how to recognize high-value research. This post will sketch out that model, and explain what problems it solves.

The Dimensionality Problem

Imagine that we are early scientists, investigating the mechanics of a sled sliding down a slope. What determines how fast the sled goes? Any number of factors could conceivably matter: angle of the hill, weight and shape and material of the sled, blessings or curses laid upon the sled or the hill, the weather, wetness, phase of the moon, latitude and/or longitude and/or altitude, etc. For all the early scientists know, there may be some deep mathematical structure to the world which links the sled’s speed to the astrological motions of stars and planets, or the flaps of the wings of butterflies across the ocean, or vibrations from the feet of foxes running through the woods.

Takeaway: there are literally billions of variables which could influence the speed of a sled on a hill, as far as an early scientist knows.

So, the early scientists try to control as much as they can. They use a standardized sled, with standardized weights, on a flat smooth piece of wood treated in a standardized manner, at a standardized angle. Playing around, they find that they need to carefully control a dozen different variables to get reproducible results. With those dozen pieces carefully kept the same every time… the sled consistently reaches the same speed (within reasonable precision).

At first glance, this does not sound very useful. They had to exercise unrealistic levels of standardization and control over a dozen different variables. Presumably their results will not generalize to real sleds on real hills in the wild.

But stop for a moment to consider the implications of the result. A consistent sled-speed can be achieved while controlling only a dozen variables. Out of literally billions. Planetary motions? Irrelevant, after controlling for those dozen variables. Flaps of butterfly wings on the other side of the ocean? Irrelevant, after controlling for those dozen variables. Vibrations from foxes’ feet? Irrelevant, after controlling for those dozen variables.

The amazing power of achieving a consistent sled-speed is not that other sleds on other hills will reach the same predictable speed. Rather, it’s knowing which variables are needed to predict the sled’s speed. Hopefully, those same variables will be sufficient to determine the speeds of other sleds on other hills - even if some experimentation is required to find the speed for any particular variable-combination.

Determinism

How can we know that all other variables in the universe are irrelevant after controlling for a handful? Couldn’t there always be some other variable which is relevant, no matter what empirical results we see?

The key to answering that question is determinism. If the system’s behavior can be predicted perfectly, then there is no mystery left to explain, no information left which some unknown variable could provide. Mathematically, information theorists use the mutual information  to measure the information which  contains about . If  is deterministic - i.e. we can predict  perfectly - then  is zero no matter what variable  we look at. Or, in terms of correlations: a deterministic variable always has zero correlation with everything else. If we can perfectly predict , then there is no further information to gain about it.

In this case, we’re saying that sled speed is deterministic given some set of variables (sled, weight, surface, angle, etc). So, given those variables, everything else in the universe is irrelevant.

Of course, we can’t always perfectly predict things in the real world. There’s always some noise - certainly at the quantum scale, and usually at larger scales too. So how do we science?

The first thing to note is that “perfect predictability implies zero mutual information” plays well with approximation: approximately perfect predictability implies approximately zero mutual information. If we can predict the sled’s speed to within 1% error, then any other variables in the universe can only influence that remaining 1% error. Similarly, if we can predict the sled’s speed 99% of the time, then any other variables can only matter 1% of the time. And we can combine those: if 99% of the time we can predict the sled’s speed to within 1% error, then any other variables can only influence the 1% error except for the 1% of sled-runs when they might have a larger effect.

More generally, if we can perfectly predict any specific variable, then everything else in the universe is irrelevant to that variable - even if we can’t perfectly predict all aspects of the system’s trajectory. For instance, if we can perfectly predict the first two digits of the sled’s speed (but not the less-significant digits), then we know that nothing else in the universe is relevant to those first two digits (although all sorts of things could influence the less-significant digits).

As a special case of this, we can also handle noise using repeated experiments. If I roll a die, I can’t predict the outcome perfectly, so I can’t rule out influences from all the billions of variables in the universe. But if I roll a die a few thousand times, then I can approximately-perfectly predict the distribution of die-rolls (including the mean, variance, etc). So, even though I don’t know what influences any one particular die roll, I do know that nothing else in the universe is relevant to the overall distribution of repeated rolls (at least to within some small error margin).

Replication

This does still leave one tricky problem: what if we accidentally control some variable? Maybe air pressure influences sled speed, but it never occurred to us to test the sled in a vacuum or high-pressure chamber, so the air pressure was roughly the same for all of our experiments. We are able to deterministically predict sled speed, but only because we accidentally keep air pressure the same every time.

This is a thing which actually does happen! Sometimes we test something in conditions never before tested, and find that the usual rules no longer apply.

Ideally, replication attempts catch this sort of thing. Someone runs the same experiment in a different place and time, a different environment, and hopefully whatever things were accidentally kept constant will vary. (You’d be amazed what varies by location - I once had quite a surprise double-checking the pH of deionized water in Los Angeles.)

Of course, like air pressure, some things may happen to be the same even across replication attempts.

On the other hand, if a variable is accidentally controlled across multiple replication attempts, then it will likely be accidentally controlled outside the lab too. If every lab tests sled-speed at atmospheric pressure, and nobody ever accidentally tries a different air pressure, then that’s probably because sleds are almost always used at atmospheric pressure. When somebody goes to predict a sled’s speed in space, some useful new scientific knowledge will be gained, but until then the results will generally work in practice.

The Scientific Method In A High-Dimensional World

Scenario 1: a biologist hypothesizes that adding hydroxyhypotheticol to their yeast culture will make the cells live longer, and the cell population will grow faster as a result. To test this hypothesis, they prepare one batch of cultures with the compound and one without, then measure the increase in cell density after 24 hours. They statistically compare the final cell density in the two batches to see whether the compound had a significant effect.

This is the prototypical Scientific Method: formulate a hypothesis, test it experimentally. Control group, p-values, all that jazz.

Scenario 2: a biologist observes that some of their clonal yeast cultures flourish, while others grow slowly or die out altogether, despite seemingly-identical preparation. What causes this different behavior? They search for differences, measuring and controlling for everything they can think of: position of the dishes in the incubator, order in which samples were prepared, mutations, phages, age of the initial cell, signalling chemicals in the cultures, combinations of all those… Eventually, they find that using initial cells of the same replicative age eliminates most of the randomness.

This looks less like the prototypical Scientific Method. There’s probably some hypothesis formation and testing steps in the middle, but it’s less about hypothesize-test-iterate, and more about figuring out which variables are relevant.

In a high-dimensional world, effective science looks like scenario 2. This isn’t mutually exclusive with the Scientific-Method-as-taught-in-high-school, there’s still some hypothesizing and testing, but there’s a new piece and a different focus. The main goal is to hunt down sources of randomness, figure out exactly what needs to be controlled in order to get predictable results, and thereby establish which of the billions of variables in the universe are actually relevant.

Based on personal experience and reading lots of papers, this matches my impression of which scientific research offers lots of long-term value in practice. The one-shot black-box hypothesis tests usually aren’t that valuable in the long run, compared to research which hunts down the variables relevant to some previously confusing (a.k.a. unpredictable) phenomenon.

Everything Is Connected To Everything Else (But Not Directly)

What if there is no small set of variables which determines the outcome of our experiment? What if there really are billions of variables, all of which matter?

We sometimes see a claim like this made about biological systems. As the story goes, you can perform all sorts of interventions on a biological system - knock out a gene, add a drug, adjust diet or stimulus, etc - and any such intervention will change the level of most of the tens-of-thousands of proteins or metabolites or signalling molecules in the organism. It won’t necessarily be a large change, but it will be measurable. Everything is connected to everything else; any change impacts everything.

Note that this is not at all incompatible with a small set of variables determining the outcome! The problem of science-in-a-high-dimensional-world is not to enumerate all variables which have any influence. The problem is to find a set of variables which determine the outcome, so that no other variables have any influence after controlling for those.

Suppose sled speed is determined by the sled, slope material, and angle. There may still be billions of other variables in the world which impact the sled, the slope material, and the angle! But none of those billions of variables are relevant after controlling for the sled, slope material, and angle; other variables influence the speed only through those three. Those three variables mediate the influence of all the billions of other variables.

In general, the goal of science in a high dimensional world is to find sets of variables which mediate the influence of all other variables on some outcome.

In some sense, the central empirical finding of All Of Science is that, in practice, we can generally find small sets of variables which mediate the influence of all other variables. Our universe is “local” - things only interact directly with nearby things, and only so many things can be nearby at once. Furthermore, our universe abstracts well: even indirect interactions over long distances can usually be summarized by a small set of variables. Interactions between stars across galactic distances mostly just depend on the total mass of each star, not on all the details of the plasma roiling inside.

Even in biology, every protein interacts with every other protein in the network, but the vast majority of proteins do not interact directly - the graph of biochemical interactions is connected, but extremely sparse. The interesting problem is to figure out the structure of that graph - i.e. which variables interact directly with which other variables. If we pick one particular “outcome” variable, then the question is which variables are its neighbors in the graph - i.e. which variables mediate the influence of all the other variables.

Summary

Let’s put it all together.

In a high-dimensional world like ours, there are billions of variables which could influence an outcome. The great challenge is to figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else. In practice, this looks like finding mediators and hunting down sources of randomness. Once we have a set of control variables which is sufficient to (approximately) determine the outcome, we can (approximately) rule out the relevance of any other variables in the rest of the universe, given the control variables.

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a small set of control variables usually suffices. Most of the universe is not directly relevant to most outcomes most of the time.

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe. This sort of research requires a lot of work compared to one-shot hypothesis tests, but it provides a lot more long-run value [LW · GW]: because all the other variables in the universe are irrelevant, we only need to measure/control the control variables each time we want to reuse the model.

33 comments

Comments sorted by top scores.

comment by qbolec · 2021-01-08T23:07:04.479Z · LW(p) · GW(p)

Our universe is “local” - things only interact directly with nearby things, and only so many things can be nearby at once. 

After reading this sentence, I had a short moment of illumination, that this is actually backwards: perhaps what our brains perceive as locality, is the property of "being influenced by/related to". Perhaps childs brain learns which "pixels" of retina are near each other, by observing they often have correlated colors, and similarly which places in space are nearby because you can move things or itself between them etc. So, whatever high-dimensional structure the real universe would have, we would still evolve to notice which nodes in the graph are connected and declare them "local". This doesn't mean, that the observation from the quoted sentence is a tautology: it wouldn't be true in a universe with much higher connectivity - we're lucky to live in a universe with a low [Treewidth](https://en.wikipedia.org/wiki/Treewidth), and thus can hope to grasp it.

Replies from: johnswentworth, Lblack, ESRogs
comment by johnswentworth · 2021-01-08T23:21:06.034Z · LW(p) · GW(p)

I believe this is exactly correct. Good explanation, too.

comment by Lblack · 2021-01-09T14:13:36.667Z · LW(p) · GW(p)

I don't know enough about neurology to make a statement on whether this is something human children learn, or whether it comes evolutionarily preprogrammed, so to speak. But in a universe where physics wasn't at least approximately local, I would expect there'd indeed be little point in holding the notion that points in space and time have given "distances" from one another.

comment by ESRogs · 2021-01-12T23:07:40.382Z · LW(p) · GW(p)

I'm not sure whether it's the standard view in physics, but Sean Carroll has suggested that we should think of locality in space as deriving from entanglement. (With space itself as basically an emergent phenomenon.) And I believe he considers this a driving principle in his quantum gravity work.

Replies from: ESRogs
comment by ESRogs · 2021-01-12T23:08:24.784Z · LW(p) · GW(p)

https://www.preposterousuniverse.com/blog/2016/07/18/space-emerging-from-quantum-mechanics/

Replies from: Viliam
comment by Viliam · 2021-01-17T22:56:52.410Z · LW(p) · GW(p)

When we zoom out, does the graph take on the geometry of a smooth, flat space with a fixed number of dimensions? (Answer: yes, when we put in the right kind of state to start with.)

I don't understand the article enough to decode what "the right kind of state" means, but this feels like circular explanation. The three-dimentional space can "emerge" from a graph, but only assuming it is the right kind of graph. Okay, so what caused the graph to be exactly the kind of graph that generates a three-dimensional space?

comment by adamzerner · 2021-01-09T22:45:57.679Z · LW(p) · GW(p)

I was expecting the central idea of this post to be more similar to/an extension of Everyday Lessons from High-Dimensional Optimization [LW · GW]. That in a high-dimensional world, a good scientist can't afford to waste time testing implausible hypotheses. Doing so will get you the right answer eventually, but it is far too slow. In a high-dimensional world, there are just too many variables to tweak. Relevant excerpt from My Wild and Reckless Youth [LW · GW]:

The way Traditional Rationality is designed, it would have been acceptable for me to spend thirty years on my silly idea, so long as I succeeded in falsifying it eventually, and was honest with myself about what my theory predicted, and accepted the disproof when it arrived, et cetera. This is enough to let the Ratchet of Science click forward, but it’s a little harsh on the people who waste thirty years of their lives. Traditional Rationality is a walk, not a dance. It’s designed to get you to the truth eventually, and gives you all too much time to smell the flowers along the way.

To what extent is this post making these points?

Replies from: johnswentworth
comment by johnswentworth · 2021-01-10T01:29:44.436Z · LW(p) · GW(p)

Great question. This post is completely ignoring those points, and it's really not something which should be ignored.

In the context of this post, the question is: ok, we're trying to hunt down sources of randomness, trying to figure out which of the billions of variables actually matter, but how do we do that? We can't just guess and check all those variables.

comment by Raven · 2021-01-09T04:11:51.697Z · LW(p) · GW(p)

Your description of the second type of science where you repeatedly control variables to isolate one reminds me a lot of debugging a complex program.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-09T15:59:30.925Z · LW(p) · GW(p)

Great point! It's a very similar problem, with a very similar solution. We have some complicated system with a large number of lines/variables which could influence the outcome (i.e. the bug), and the main problem is to figure out which lines/variables mediate the influence of everything else. The first step is to reproduce the bug - i.e. hunt down all the sources of "randomness", until we can make the bug happen consistently. After that, the next step is to look for mediation - i.e. find lines/variables which are "in between" our original reproduction-inputs and the bug itself, and which are themselves sufficient to reproduce the problem.

comment by ryan_b · 2021-01-12T16:42:48.477Z · LW(p) · GW(p)

I detect the ghost of Jaynes in this!

I am not sure exactly why, but this and the optimization post both call to mind the current of thought suggesting we segregate the hypothesis and experimental steps explicitly. I have encountered this in three places:

  • An unfinished textbook on Arxiv (which I cannot now locate to my frustration) that described treating machine learning as a science, which proposed gathering data and then the goodness of machine learning algorithm is measured by compression.
  • The Report likelihoods, not p-values article on Arbital.
  • This is basically how astronomy works by default: no one has a hypothesis for how pulsars interact and then gets a grant from their university department to launch a satellite network to look for pulsars; instead they identify phenomena on which they have little data, and pool resources to build a telescope or satellite or underground neutrino detector to gather the data, and then the publications test their hypotheses against the data gathered from one or more such projects.

I have a vague intuition that dividing up scientific practice in this way chunks the dimensions more tractably, or at least allows for it. Allowing optimization of data gathering and hypothesis formulation independently seems like a clear win for similar reasons.

Maybe the appeal is that it allows hypotheses to come from multiple directions in dimension space. The dimensionality of a body of data is fixed, but if it is generated as a tuple with a single hypothesis then it can only be approached from the perspective of that single hypothesis; if it is independent, then any hypothesis concerned with any of the dimensions of the data can be applied. By analogy, consider convergent evolution: two different paths in phase space arrive at essentially the same thing. Segregating the data step radically compresses this by allowing hypotheses from any other chain of development to be tested against it directly.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-12T16:51:46.564Z · LW(p) · GW(p)

I detect the ghost of Jaynes in this!

In particular, the view in this post is extremely similar to the view in Macroscopic Prediction [LW · GW]. As there, reproducible phenomena are the key puzzle piece.

comment by G Gordon Worley III (gworley) · 2021-01-10T18:21:14.944Z · LW(p) · GW(p)

This post gave me an idea about how you might approach magic in fiction while keeping it ground in reality: something like magic users are people who learn to pick out relevant variables from the noise to consistently nudge reality in ways that otherwise seem not possible.

Basically placebomancy from Unsong.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-12T19:23:39.266Z · LW(p) · GW(p)

I've wanted for a while to see a game along these lines. It would have some sort of 1-v-1 fighting, but dominated by "random" behavior from environmental features and/or unaligned combatants. The centerpiece of the game would be experimenting with the "random" components to figure out how they work, in order to later leverage them in a fight.

comment by ChristianKl · 2021-01-10T14:45:50.635Z · LW(p) · GW(p)

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe. 

I'm very doubtful that hunting down sources of randomness is a good way to go about doing science where there's a big solution space. 

There's a lot of human pattern matching involved in coming up with good hypothesis to test.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-10T17:19:41.697Z · LW(p) · GW(p)

I think you're pointing to the same issue which Adam Zerner was pointing to [LW(p) · GW(p)]. Hunting down sources of randomness is a good goal when doing science, but that doesn't tell us much about how to go about the hunt when the solution space is very large.

Replies from: ryan_b, ChristianKl
comment by ryan_b · 2021-01-12T14:35:45.335Z · LW(p) · GW(p)

It sort of feels like switching the perspectives back and forth between searching for what works at all and searching for things to rule out is analogous to research and development. Iterating between them feels like how knowledge would be refined.

Also: imagining science as "optimizing from zero" is aesthetically pleasing to me.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-12T17:21:35.635Z · LW(p) · GW(p)

Fleshing this out a bit more, within the framework of this comment [LW(p) · GW(p)]: when we can consistently predict some outcomes using only a handful of variables, we've learned a (low-dimensional) constraint on the behavior of the world. For instance, the gas law PV = nRT is a constraint on the relationship between variables in a low-dimensional summary of a high-dimensional gas. (More precisely, it's a template for generating low-dimensional constraints on the summary variables of many different high-dimensional gases.)

When we flip perspective to problems of design (e.g. engineering), those constraints provide the structure of our problem - analogous to the walls in a maze. We look for "paths in the maze" - i.e. designs - which satisfy the constraints. Duality says that those designs act as constraints when searching for new constraints (i.e. doing science). If engineers build some gadget that works, then that lets us rule out some constraints: any constraints which would prevent the gadget from working must be wrong.

Data serves a similar role (echoing your comment here [LW(p) · GW(p)]). If we observe some behavior, then that provides a constraint when searching for new constraints. Data and working gadgets live "in the same space" - the space of "paths": things which definitely do work in the world and therefore cannot be ruled out by constraints.

Replies from: ryan_b
comment by ryan_b · 2021-01-12T18:42:23.025Z · LW(p) · GW(p)

You know, I had never explicitly considered that data and devices would be in the same abstract space, but as soon as I read the words it was obvious. Thank you for that!

comment by ChristianKl · 2021-01-13T16:10:22.777Z · LW(p) · GW(p)

In the realm of biology, I think hunting for patterns and especially those you care about is a better way then hunting for randomness. 

Many times randomness is the result of complex interactions that can't easily be reduced. 

comment by Mary Chernyshenko (mary-chernyshenko) · 2021-01-09T18:01:40.255Z · LW(p) · GW(p)

There's a parallel need to review the actual purpose for which you are doing all of that. It can be mutable.

For example, suppose you culture some unicellular algae, and you notice the cells can be more or less rounded in the same dish. You shrug and discard the dishes with too elongated cells to keep the line pure and strong. You learn what parameters to keep constant to make it easier.

And then someone shows that in point of fact, cell shape for this group of species can vary somewhat even in culture so we have been wrong about the diversity in the wild this whole time. And you read it and hope in your heart that some very motivated people might one day deviate from the beaten path and finally find out what's going on there, despite this looking entirely unfundable.

comment by AspiringRationalist · 2021-03-10T00:19:35.906Z · LW(p) · GW(p)

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a small set of control variables usually suffices.

I'm skeptical that this is true for most things we care about. It's true in the scientific fields where we have the most accurate models, such as physics, but that's likely because there are so few relevant variables in those fields.

Most new drugs that go into clinical trials fail. Essentially, a pharmaceutical company identifies a variable that appears to be the mediator of a medical outcome, they create a drug that tweaks that variable, and then it turns out not to produce the outcome that they thought it would. There are too many other relevant variables that are poorly understood.

The other thing that makes me skeptical is the effectiveness of machine learning models that use a large number of inputs. It's possible that there's a simple underlying structure to what they're predicting that we just haven't figured out yet, but based on what exists now, it sure looks like there are a large number of relevant variables.

Replies from: johnswentworth
comment by johnswentworth · 2021-03-10T00:59:08.666Z · LW(p) · GW(p)

Most new drugs that go into clinical trials fail. Essentially, a pharmaceutical company identifies a variable that appears to be the mediator of a medical outcome, they create a drug that tweaks that variable, and then it turns out not to produce the outcome that they thought it would. There are too many other relevant variables that are poorly understood.

I love this example in particular, because as I understand it, this is exactly what pharma companies do not do. What they actually do is target some variable which is correlated with the medical outcome, but is often not causal and is rarely a mediator.

Case in point: amyloid beta plaques in Alzheimers.

Decades ago, people noticed that if you look at the brains of old people with dementia, they usually have lots of plaques, and these plaques are made of a particular protein fragment called amyloid beta. Therefore clearly amyloid beta causes dementia. Pretty soon people were using amyloid beta plaques to diagnose dementia, which made it really easy to show that the plaques cause dementia: when the plaques are how we diagnose “dementia”, then by golly removing the plaques makes the “dementia” (as diagnosed by plaques) go away.

As far as I can tell, there has never at any point in time been compelling evidence that amyloid beta plaques cause age-related memory problems. Conversely, I have seen at least a few studies suggesting the plaques are not causal.

Meanwhile, according to wikipedia, 244 Alzheimer’s drugs were tested in clinical trials from 2002-2012, mostly targeting the amyloid plaques. Of those, only 1 drug made it through.

I think someone familiar with both causality/mediation and the Alzheimers literature could probably have told you in 2000 that those trials were unlikely to pass. But it turns out correct reasoning about causality/mediation is remarkably rare; remember that Pearl & co's work is still very recent by academic standards, and most people in the sciences still don't know about it. Pharma execs don't have the technical skills for it. Some scientists do this sort of reasoning intuitively, but saying "no" to lots of stupid drug tests is not the sort of thing which makes one a "team player" at a big pharma company. (And besides, if the problem is hard enough, you can probably get more drugs to market by throwing lots of shit at the wall and hoping one passes by random chance; I wouldn't put my money on that one drug which passed out of 244 actually being very effective.)

comment by Srdjan Miletic (srdjan-miletic) · 2021-01-11T07:19:53.194Z · LW(p) · GW(p)

I don't quite think you've solved the problem of induction.

I think there's a fairly serious issue with your claim that being able to predict something accurately means you necessarily fully understand the variables which causes it because determinism.

The first thing to note is that “perfect predictability implies zero mutual information” plays well with approximation: approximately perfect predictability implies approximately zero mutual information. If we can predict the sled’s speed to within 1% error, then any other variables in the universe can only influence that remaining 1% error. Similarly, if we can predict the sled’s speed 99% of the time, then any other variables can only matter 1% of the time. And we can combine those: if 99% of the time we can predict the sled’s speed to within 1% error, then any other variables can only influence the 1% error except for the 1% of sled-runs when they might have a larger effect.

That's not really the cases. E.g: let's say that ice cream melt twice as fast in galaxies without a supermassive black hole at the center. You do experiments to see how fast ice cream melts. After controlling for type of ice cream, temperature, initial temp of the ice cream, airflow and air humidity, you find that you can predict how ice cream melts. You triumphantly claim that you know which things cause ice cream to melt at different rates, having completely missed the black hole's effects.

Essentially, controlling for A & B but not C won't tell you whether C has a causal influence on the thing you're measuring unless

  • you intentionally change C between experiments (not practical given googleplexes of potential causal factors)
  • C happens to naturally vary quite a bit and so makes your experimental results different, cluing you in to the fact that you're missing something.
comment by algon33 · 2021-01-09T15:45:53.241Z · LW(p) · GW(p)

I am kind of suprised you didn't reference causal inference here to just gesture at the task in which we "figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else". Are you pointing to a different sort of idea/do you not feel causal inference is adequate for describing this task?

Also, scenario 1 and 2 seem fairly close to the "linear" and "non-linear" models of innovation Jason Crawford described in his talk "The Non-Linear Model of Innovation." To be honest, I prefered his description of the models. Though he didn't cover how miraculous it is that somehow the model can work. That, to a good approximation, the universe is simple and local.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-09T16:48:54.644Z · LW(p) · GW(p)

Causal inference (or more precisely learning causal structure) is exactly the sort of thing I have in mind here. There's actually a few places in the post where I should distinguish between variables which control an outcome in an information sense (i.e. sufficient to perfectly predict the outcome) vs in a causal sense (i.e. sufficient to cause the outcome under interventions). The main reason I didn't talk about it directly is because I would have had to explain that distinction, and decided that would be too much of a distraction from the main point.

I think the takeaway of Jason's talk, as it relates to this post, is that a large chunk of the "science" of achieving consistent outcomes happens in inventors' workshops rather than scientists' labs. The problem is still largely similar, regardless of the label applied, but scientists aren't the only ones doing science.

comment by CoafOS · 2021-01-09T18:08:37.584Z · LW(p) · GW(p)

I do not think that the prototypical scientific method is not valuable in the long term.

In any experiment, there are lots of naturally varying parameters (current phase of the Moon, air pressure, amount of snow on the slope), and there are lots of naturally constant parameters (strength of gravity, room temperature, amount of hydroxyhypotethicol in the solution). There are base and derived parameters. The distances from the sun and the orbital periods vary between the planets, but (distance)^3/(orbital period)^2 is constant.

In the experiment, you measure X and Y. If X vary, but Y is constant, then they probably have no relation. Suppose that we want to find out that is X related to B or C. We control B to vary, and set C to a constant. If X vary, then it is not connected to C, if X is constant, then it is unrelated to B.

In the second scenario, you try to find the minimal set of base parameters that are related to X (growth rate). After some testing, we found that (growth rate) ~~ (initial age). After we found that connection, we can rule out the uncontrolled varying parameters, but there may be a connection between X and an uncontrolled constant parameter. It is possible that (growth rate) ~~ (initial age) times (1 + (amount of hydroxyhypotethicol)), and the first scenario will test these kinds of connections.

It is not enough to find which parameters won't affect the experiment. It is also important to find out which parameters could affect the experiment.

comment by Lblack · 2021-01-09T14:03:08.245Z · LW(p) · GW(p)

I think this certainly describes a type of gears level work scientists engage in, but not the only type, nor necessarily the most common one in a given field. There's also model building, for example.

Even once you've figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can't predict what that speed would be if you set these dozen variables to different values. You've got to figure out Newton's laws of motion and friction before you can do that.

Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model, but it's not the only step, nor necessarily the hardest one.

Another type of widespread scientific work I can think of is facilitating efficient calculation. Even if you have a deterministic model that you're pretty sure could theoretically predict a class of phenomena perfectly, that doesn't mean you have the computing power necessary to actually use it. 

Lattice Quantum Chromodynamics should theoretically be able to predict all of nuclear physics, but employing it in practice requires coming up with all sorts of ingenuous tricks and effective theories to reduce the computing power required for a given calculation. It's enough to have kept a whole scientific field busy for over fifty years, and we're still not close to actually being able to freely simulate every interaction of nucleons at the quark level from scratch.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-09T16:25:09.804Z · LW(p) · GW(p)

Even once you've figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can't predict what that speed would be if you set these dozen variables to different values. You've got to figure out Newton's laws of motion and friction before you can do that.

Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model...

Exactly correct.

Part of the implicit argument of the post is that the "figure out the dozen or so relevant variables" is the "hard" step in a big-O sense, when the number of variables in the universe is large. This is for largely similar reasons to those in Everyday Lessons From High-Dimensional Optimization [LW · GW]: in low dimensions, brute force-ish methods are tractable. Thus we get things like e.g. tables of reaction rate constants. Before we had the law of mass action, there were too many variables potentially relevant to reaction rates to predict via brute force. But once we have mass action, there are few enough degrees of freedom that we can just try them out and make these tables of reaction constants.

Now, that still leaves the step of going from "temperature and concentrations are the relevant variables" to the law of mass action, but again, that's the sort of thing where brute-force-ish exploration works pretty well. There is an insight step involved there, but it can largely be done by guess-and-check. And even before that insight is found, there's few enough variables involved that "make a giant table" is largely tractable.

Another type of widespread scientific work I can think of is facilitating efficient calculation...

Good example.

Replies from: Lblack
comment by Lblack · 2021-03-09T21:42:53.023Z · LW(p) · GW(p)

To clarify, my point was that at least in my experience, this isn't always the hard step. I can easily see that being the case in a "top-down" field, like a lot of engineering, medicine, parts of material science, biology and similar things. There, my impression is that once you've figured out what a phenomenon is all about, it often really is as simple as fitting some polynomial of your dozen variables to the data.

But in some areas, like fundamental physics, which I'm involved in, building your model isn't that easy or straightforward. For example, we've been looking for a theory of quantum gravity for ages. We know roughly what sort of variables it should involve. We know what data we want it to explain. But still, actually formulating that theory has proven hellishly difficult. We've been on it for over fifty years now and we're still not anywhere close to real success.

comment by TAG · 2021-01-10T19:06:16.156Z · LW(p) · GW(p)

The key to answering that question is determinism. If the system’s behavior can be predicted perfectly, then there is no mystery left to explain, no information left which some unknown variable could provide

  1. What matters is local determinism. You need to show that behaviour is predictable from factors under your control. If local determinism fails, it is hard to tell whether locality or determinism failed individually.

  2. And showing that a system's behaviour is predictable when N factors are held constant by the experimenter doesn't show that those are the only ones it is conditionally dependent one. Its behaviour might counterfactually depend on factors which the experimenter did not vary and which did not naturally change over the course of the experiment. In general, you can't exclude mysterious extra variables.

Replies from: johnswentworth
comment by johnswentworth · 2021-01-10T19:09:42.438Z · LW(p) · GW(p)

Its behaviour might counterfactually depend on factors which the experimenter did not vary and which did not naturally change over the course of the experiment.

Keep reading, the post gets to that.

Replies from: TAG
comment by TAG · 2021-01-10T19:13:20.410Z · LW(p) · GW(p)

until then the results will generally work in practice.

Doesn't really contradict what I am saying. In theory, I am saying, you can't exclude mysterious extra variables...but in practice that often doesn't matter, as you are saying.